Enable split reduce by default #3709

pfultz2 · 2024-12-12T18:41:28Z

This will also enable pointwise fusions around broadcasts by default. An environment variable is added to adjust the size if we want to experiment with different sizes.

codecov · 2024-12-12T18:53:27Z

Codecov Report

Attention: Patch coverage is 0% with 5 lines in your changes missing coverage. Please review.

Project coverage is 92.21%. Comparing base (0860461) to head (dd9c374).
Report is 3 commits behind head on develop.

Files with missing lines	Patch %	Lines
src/fuse_pointwise_reduce.cpp	0.00%	5 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #3709      +/-   ##
===========================================
- Coverage    92.23%   92.21%   -0.03%     
===========================================
  Files          514      514              
  Lines        21746    21750       +4     
===========================================
- Hits         20057    20056       -1     
- Misses        1689     1694       +5

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

causten · 2024-12-12T20:01:44Z

docs/dev/env_vars.rst

-Enable split_reduce.
+.. envvar:: MIGRAPHX_SPLIT_REDUCE_SIZE
+Set to the size of the reduction to do a split reduce.
+Set -1 to disable split reduce.


Would it make sense to have an example of a size? Are you talking bytes? Suggest adding an example and what the default is now

Would it make sense to have an example of a size?

What do you mean example?

Are you talking bytes?

The number of elements, we dont ever measure the size of a reduction in terms of bytes.

Suggest adding an example

Like MIGRAPHX_SPLIT_REDUCE_SIZE=8192?

what the default is now

The default can change depending on the backend.

causten · 2024-12-12T20:04:26Z

src/include/migraphx/fuse_pointwise_reduce.hpp

@@ -35,6 +35,7 @@ struct module_pass_manager;

 struct MIGRAPHX_EXPORT fuse_pointwise_reduce
 {
+    std::size_t split_size = 8192;


Is this the default split of 8192?

Thats the value when using the default constructor.

migraphx-bot · 2024-12-12T23:22:21Z

Test	Batch	Rate new dd9c37	Rate old 6f6705	Diff	Compare
torchvision-resnet50	64	3,256.75	3,257.99	-0.04%	✅
torchvision-resnet50_fp16	64	6,993.85	6,969.89	0.34%	✅
torchvision-densenet121	32	2,435.13	2,434.72	0.02%	✅
torchvision-densenet121_fp16	32	4,062.19	4,079.58	-0.43%	✅
torchvision-inceptionv3	32	1,627.46	1,625.71	0.11%	✅
torchvision-inceptionv3_fp16	32	2,743.72	2,744.57	-0.03%	✅
cadene-inceptionv4	16	765.04	764.53	0.07%	✅
cadene-resnext64x4	16	812.57	812.65	-0.01%	✅
slim-mobilenet	64	7,466.05	7,463.64	0.03%	✅
slim-nasnetalarge	64	209.03	209.02	0.01%	✅
slim-resnet50v2	64	3,438.88	3,438.27	0.02%	✅
bert-mrpc-onnx	8	1,147.07	1,150.14	-0.27%	✅
bert-mrpc-tf	1	475.14	473.42	0.36%	✅
pytorch-examples-wlang-gru	1	418.34	412.17	1.50%	✅
pytorch-examples-wlang-lstm	1	381.69	393.10	-2.90%	✅
torchvision-resnet50_1	1	805.11	790.72	1.82%	✅
cadene-dpn92_1	1	403.22	399.49	0.93%	✅
cadene-resnext101_1	1	382.00	383.53	-0.40%	✅
onnx-taau-downsample	1	345.75	345.20	0.16%	✅
dlrm-criteoterabyte	1	33.33	33.32	0.02%	✅
dlrm-criteoterabyte_fp16	1	52.47	52.75	-0.53%	✅
agentmodel	1	8,154.88	8,373.46	-2.61%	✅
unet_fp16	2	58.82	58.79	0.06%	✅
resnet50v1_fp16	1	946.48	969.89	-2.41%	✅
resnet50v1_int8	1	1,010.14	1,031.16	-2.04%	✅
bert_base_cased_fp16	64	1,169.96	1,170.20	-0.02%	✅
bert_large_uncased_fp16	32	362.88	362.80	0.02%	✅
bert_large_fp16	1	199.25	199.81	-0.28%	✅
distilgpt2_fp16	16	2,202.65	2,195.15	0.34%	✅
yolov5s	1	534.07	534.95	-0.16%	✅
tinyllama	1	43.38	43.65	-0.63%	✅
vicuna-fastchat	1	174.42	174.76	-0.19%	✅
whisper-tiny-encoder	1	418.43	417.83	0.15%	✅
whisper-tiny-decoder	1	435.82	425.41	2.45%	✅

This build is OK for merge ✅

migraphx-bot · 2024-12-12T23:22:22Z

✅ bert-mrpc-onnx: PASSED: MIGraphX meets tolerance

✅ bert-mrpc-tf: PASSED: MIGraphX meets tolerance

✅ pytorch-examples-wlang-gru: PASSED: MIGraphX meets tolerance

✅ pytorch-examples-wlang-lstm: PASSED: MIGraphX meets tolerance

✅ torchvision-resnet50_1: PASSED: MIGraphX meets tolerance

✅ cadene-dpn92_1: PASSED: MIGraphX meets tolerance

✅ cadene-resnext101_1: PASSED: MIGraphX meets tolerance

✅ dlrm-criteoterabyte: PASSED: MIGraphX meets tolerance

✅ agentmodel: PASSED: MIGraphX meets tolerance

✅ unet: PASSED: MIGraphX meets tolerance

✅ resnet50v1: PASSED: MIGraphX meets tolerance

✅ bert_base_cased_fp16: PASSED: MIGraphX meets tolerance

🔴bert_large_uncased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output

✅ bert_large: PASSED: MIGraphX meets tolerance

✅ yolov5s: PASSED: MIGraphX meets tolerance

✅ tinyllama: PASSED: MIGraphX meets tolerance

✅ vicuna-fastchat: PASSED: MIGraphX meets tolerance

✅ whisper-tiny-encoder: PASSED: MIGraphX meets tolerance

✅ whisper-tiny-decoder: PASSED: MIGraphX meets tolerance

✅ distilgpt2_fp16: PASSED: MIGraphX meets tolerance

pfultz2 added 6 commits November 30, 2024 17:34

Enable split reduce by default

c711ece

Format

863f8b9

Fix layernorm test

dd8f804

Merge branch 'develop' into enable-split-reduce

896493b

Add env variable to adjust split reduce size

09359aa

Format

302d744

pfultz2 requested review from a team and causten as code owners December 12, 2024 18:41

pfultz2 requested review from shivadbhavsar, TedThemistokleous and kahmed10 December 12, 2024 18:41

TedThemistokleous assigned pfultz2 Dec 12, 2024

TedThemistokleous added enhancement New feature or request roadmap Tasks to finish for a release simple small or simple changes labels Dec 12, 2024

TedThemistokleous approved these changes Dec 12, 2024

View reviewed changes

spolifroni-amd approved these changes Dec 12, 2024

View reviewed changes

causten reviewed Dec 12, 2024

View reviewed changes

pfultz2 added 3 commits December 12, 2024 14:19

Increase the size

8b39e8a

Update text to make it clearer

e6dc32c

s/the/a/

dd9c374

causten merged commit f56b1b4 into develop Dec 13, 2024
41 of 45 checks passed

causten deleted the enable-split-reduce branch December 13, 2024 20:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable split reduce by default #3709

Enable split reduce by default #3709

pfultz2 commented Dec 12, 2024

codecov bot commented Dec 12, 2024 •

edited

Loading

causten Dec 12, 2024

pfultz2 Dec 12, 2024

causten Dec 12, 2024

pfultz2 Dec 12, 2024

migraphx-bot commented Dec 12, 2024

migraphx-bot commented Dec 12, 2024

Enable split reduce by default #3709

Enable split reduce by default #3709

Conversation

pfultz2 commented Dec 12, 2024

codecov bot commented Dec 12, 2024 • edited Loading

Codecov Report

causten Dec 12, 2024

Choose a reason for hiding this comment

pfultz2 Dec 12, 2024

Choose a reason for hiding this comment

causten Dec 12, 2024

Choose a reason for hiding this comment

pfultz2 Dec 12, 2024

Choose a reason for hiding this comment

migraphx-bot commented Dec 12, 2024

migraphx-bot commented Dec 12, 2024

codecov bot commented Dec 12, 2024 •

edited

Loading