Enable non-packed inputs for mlir #3541

pfultz2 · 2024-10-18T23:22:44Z

No description provided.

codecov · 2024-10-19T00:57:21Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 92.17%. Comparing base (04ac9fc) to head (639d003).
Report is 66 commits behind head on develop.

Additional details and impacted files

@@           Coverage Diff            @@
##           develop    #3541   +/-   ##
========================================
  Coverage    92.17%   92.17%           
========================================
  Files          512      512           
  Lines        21393    21393           
========================================
  Hits         19720    19720           
  Misses        1673     1673

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

migraphx-bot · 2024-10-19T03:40:49Z

Test	Batch	Rate new 639d00	Rate old 275f85	Diff	Compare
torchvision-resnet50	64	3,259.88	3,258.32	0.05%	✅
torchvision-resnet50_fp16	64	6,991.70	6,998.28	-0.09%	✅
torchvision-densenet121	32	2,435.83	2,439.53	-0.15%	✅
torchvision-densenet121_fp16	32	4,078.01	4,083.72	-0.14%	✅
torchvision-inceptionv3	32	1,638.80	1,639.50	-0.04%	✅
torchvision-inceptionv3_fp16	32	2,762.86	2,761.57	0.05%	✅
cadene-inceptionv4	16	776.36	776.25	0.01%	✅
cadene-resnext64x4	16	810.43	811.85	-0.18%	✅
slim-mobilenet	64	7,534.50	7,537.50	-0.04%	✅
slim-nasnetalarge	64	212.16	211.56	0.28%	✅
slim-resnet50v2	64	3,503.11	3,504.39	-0.04%	✅
bert-mrpc-onnx	8	1,151.61	1,148.92	0.23%	✅
bert-mrpc-tf	1	494.07	467.60	5.66%	🔆
pytorch-examples-wlang-gru	1	430.00	422.65	1.74%	✅
pytorch-examples-wlang-lstm	1	395.98	380.77	3.99%	🔆
torchvision-resnet50_1	1	786.19	815.97	-3.65%	🔴
cadene-dpn92_1	1	408.66	400.35	2.07%	✅
cadene-resnext101_1	1	384.86	384.70	0.04%	✅
onnx-taau-downsample	1	342.82	343.42	-0.17%	✅
dlrm-criteoterabyte	1	33.33	33.32	0.03%	✅
dlrm-criteoterabyte_fp16	1	52.71	52.72	-0.03%	✅
agentmodel	1	8,590.92	8,139.01	5.55%	🔆
unet_fp16	2	58.90	58.87	0.05%	✅
resnet50v1_fp16	1	923.74	930.44	-0.72%	✅
resnet50v1_int8	1	1,009.62	994.18	1.55%	✅
bert_base_cased_fp16	64	1,171.37	1,172.52	-0.10%	✅
bert_large_uncased_fp16	32	363.50	363.39	0.03%	✅
bert_large_fp16	1	198.51	198.49	0.01%	✅
distilgpt2_fp16	16	2,202.16	2,199.99	0.10%	✅
yolov5s	1	541.06	543.07	-0.37%	✅
tinyllama	1	43.50	43.73	-0.52%	✅
vicuna-fastchat	1	177.91	172.56	3.10%	🔆
whisper-tiny-encoder	1	418.74	417.97	0.19%	✅
whisper-tiny-decoder	1	429.69	428.62	0.25%	✅

This build is not recommended to merge 🔴

migraphx-bot · 2024-10-19T03:40:50Z

✅ bert-mrpc-onnx: PASSED: MIGraphX meets tolerance

✅ bert-mrpc-tf: PASSED: MIGraphX meets tolerance

✅ pytorch-examples-wlang-gru: PASSED: MIGraphX meets tolerance

✅ pytorch-examples-wlang-lstm: PASSED: MIGraphX meets tolerance

✅ torchvision-resnet50_1: PASSED: MIGraphX meets tolerance

✅ cadene-dpn92_1: PASSED: MIGraphX meets tolerance

✅ cadene-resnext101_1: PASSED: MIGraphX meets tolerance

✅ dlrm-criteoterabyte: PASSED: MIGraphX meets tolerance

✅ agentmodel: PASSED: MIGraphX meets tolerance

✅ unet: PASSED: MIGraphX meets tolerance

✅ resnet50v1: PASSED: MIGraphX meets tolerance

✅ bert_base_cased_fp16: PASSED: MIGraphX meets tolerance

🔴bert_large_uncased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output

✅ bert_large: PASSED: MIGraphX meets tolerance

✅ yolov5s: PASSED: MIGraphX meets tolerance

✅ tinyllama: PASSED: MIGraphX meets tolerance

✅ vicuna-fastchat: PASSED: MIGraphX meets tolerance

✅ whisper-tiny-encoder: PASSED: MIGraphX meets tolerance

✅ whisper-tiny-decoder: PASSED: MIGraphX meets tolerance

✅ distilgpt2_fp16: PASSED: MIGraphX meets tolerance

pfultz2 · 2024-10-21T16:59:18Z

src/targets/gpu/fuse_mlir.cpp

+                          stride_ratios.end(),
+                          ns.lens().begin() + 1,
+                          [](auto ratio, auto len) { return ratio >= len; });
+    }


@krzysz00 This is checking if the shapes can be used by mlir(ie can be constructed from slice/reshape/transpose/broadcast). I know there are similar checks in mlir. Are you able to check that this is checking the shapes correctly?

https://github.com/ROCm/rocMLIR/blob/develop/mlir/lib/Dialect/MIGraphX/IR/MIGraphX.cpp#L182 is our logic.

This code feels similar, though I'd need to run through some examples with broadcast dimensions

krzysz00

Assuming this doesn't read to non-packed outputs, seems fine to me

TedThemistokleous · 2024-12-06T04:54:42Z

src/targets/gpu/fuse_mlir.cpp

+        auto last = std::find(ns.strides().begin(), ns.strides().end(), 0);
+        if(*std::prev(last) != 1)
+            return false;
+        std::adjacent_difference(ns.strides().begin(),


Do we want to possible add a parallel execution policy here or are we not concerned about this difference being to large? Looks like adjacent difference has the ability to be parallelized, or is the intent here since shape is const we want this expression to be const so execution policy doesn't matter?

Disregard, stride_ratio shouldn't be const and order matters here due to back inserter not being valid for parallelism

TedThemistokleous

Looks good unless @krzysz00 has any concerns

CI is all green not sure why github isn't reporting that correctly in jenkins here.

Enable non-packed inputs for mlir

e14df5c

pfultz2 requested a review from causten as a code owner October 18, 2024 23:22

Format

39996a2

pfultz2 requested review from shivadbhavsar and krzysz00 and removed request for causten October 18, 2024 23:23

pfultz2 added 2 commits October 18, 2024 16:28

Fix error

fcdb49e

Format

639d003

pfultz2 commented Oct 21, 2024

View reviewed changes

causten requested a review from lakhinderwalia October 23, 2024 20:03

krzysz00 reviewed Nov 6, 2024

View reviewed changes

lakhinderwalia mentioned this pull request Nov 12, 2024

[INT4] Compress model by quantizing weights to int4 #3307

Closed

18 tasks

TedThemistokleous assigned pfultz2 Nov 27, 2024

TedThemistokleous added the rocMLIR label Nov 27, 2024

pfultz2 requested a review from TedThemistokleous December 5, 2024 23:14

shivadbhavsar approved these changes Dec 6, 2024

View reviewed changes

TedThemistokleous reviewed Dec 6, 2024

View reviewed changes

TedThemistokleous approved these changes Dec 6, 2024

View reviewed changes

TedThemistokleous added the enhancement New feature or request label Dec 6, 2024

causten merged commit 4b15b6c into develop Dec 6, 2024
39 of 43 checks passed

causten deleted the mlir-non-packed branch December 6, 2024 05:06

shivadbhavsar pushed a commit that referenced this pull request Dec 18, 2024

Enable non-packed inputs for mlir (#3541)

22307b0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable non-packed inputs for mlir #3541

Enable non-packed inputs for mlir #3541

pfultz2 commented Oct 18, 2024

codecov bot commented Oct 19, 2024 •

edited

Loading

migraphx-bot commented Oct 19, 2024

migraphx-bot commented Oct 19, 2024

pfultz2 Oct 21, 2024

krzysz00 Oct 23, 2024

krzysz00 left a comment

TedThemistokleous Dec 6, 2024 •

edited

Loading

TedThemistokleous left a comment

Enable non-packed inputs for mlir #3541

Enable non-packed inputs for mlir #3541

Conversation

pfultz2 commented Oct 18, 2024

codecov bot commented Oct 19, 2024 • edited Loading

Codecov Report

migraphx-bot commented Oct 19, 2024

migraphx-bot commented Oct 19, 2024

pfultz2 Oct 21, 2024

Choose a reason for hiding this comment

krzysz00 Oct 23, 2024

Choose a reason for hiding this comment

krzysz00 left a comment

Choose a reason for hiding this comment

TedThemistokleous Dec 6, 2024 • edited Loading

Choose a reason for hiding this comment

TedThemistokleous left a comment

Choose a reason for hiding this comment

codecov bot commented Oct 19, 2024 •

edited

Loading

TedThemistokleous Dec 6, 2024 •

edited

Loading