What's Changed
- [BUG] Add comp server requirements (#661) by Vadim Gimpelson 300fd33
- [BUG] A number of fixes for vllm's TP (#651) by Vadim Gimpelson 9c29f66
- matmul_f16 with wgmma (#627) by kjiang170 9f0ea7d
- [BUG] VLLM (and DMWL) compile with hidet backend (#647) by zhumakhan 6c6be7a
- [IR] Add support for
swizzle
,interleave
andl2Promotion
in tensor map creation (#643) by Bolin Sun 21ff63f - [BUG] fix attach hash to signature (#638) by xiaocenxiaocen dbd6613
- Hexcute base branch (All related PRs will be merged into this base PR. ) (#294) by xiaocenxiaocen b1fdf17
- [PERF] Default value for parallel_k is 'disabled' (#634) by Vadim Gimpelson 135212b
- Adapt to bfloat16 where necessary (#624) by ZichuWu 9045865
- [Bug] Parallel compilation sync (#616) by ZichuWu 4c16c57
- [COMPTIME] Hot start speedup (#625) by Vadim Gimpelson 22c657b
- [BUG] Fix torch2.5 OoM and docs build fix (#637) by zhumakhan bf32f8b
- Revert "[BUG] Fix torch2.5 OoM issue" (#635) by zhumakhan 9131a5c
- [BUG] Fix torch2.5 OoM issue (#609) by zhumakhan fe59c63
- [CI]Fix small typoes for building and publishing to internal Hidet PYPI Index (#598) by xinli-centml f8400fe
- [PERF] Support bf16 in one more place (#623) by Vadim Gimpelson 7f77349
- [Tests] Adapt tests/operators for bfloat16 (#615) by ZichuWu ba9c0ad
- [DISTRIBUTED] Support
all_reduce
intorch.compile
mode (#612) by Vadim Gimpelson 0bca591 - [torchAPI] Inherit cuda stream from torch (#618) by Vadim Gimpelson ad4e00a
- [BUG] Fix bugs in shared map implementation (#608) by Vadim Gimpelson ffdbde4
- [CI] Turn off search space 2 for tests/lang (#617) by ZichuWu 5f7fae8
- [Tests] Adapt tests/lang for bfloat16 test cases (#594) by ZichuWu 5b829cb
- [Tests] Adapt tests/frontends to bfloat16 (#592) by ZichuWu a5b72e6
- [Tests] Adapt tests/ir for bfloat16 test cases (#593) by ZichuWu 545aeea
- [Tests] Adjust test cases for tests/models for bfloat16. (#595) by ZichuWu bedff21
- Use one global cuda workspace for all the CompiledGraph (#603) by Max Hu 6652307
- [Fix] Fixing a minor mistake encountered while adapting test cases for
bfloat16
data type (#607) by Bolin Sun 275070d - Kaihang/wgmma tf32 u8 i8 support (#549) by kjiang170 a0e6658
- [CI] Exclude tests/unit_tests/test_dynamic_shape.py::test_attention[cuda] (#606) by Vadim Gimpelson 5579392
- [Tests] Adjust test cases for tests/unit-tests for bfloat16. (#596) by ZichuWu 0e5ec55
- [BUG] Fix incorrect converting fxgraph to hidet's flow graph + expand looking for nccl lib with user site packages (#604) by Vadim Gimpelson 1995d43
- [Tests] Added bfloat16 test cases for tests/cuda (#590) by ZichuWu febfbd7
- [Tests] Adjust test cases for tests/utils for bfloat16. (#597) by ZichuWu 36aab6f
- [Tests] Change float16 to bfloat16 for tests/apps (#589) by ZichuWu 83cddbb
- [CI] add new github actions workflow to manually build and push to internal pypi index (#554) by xinli-centml 6beffab
- [OPTIONS] Remove unnecessary parallel_k (#572) by ZichuWu 9051f26
- fix test_wgmma.py error for illegal warp address (#588) by kjiang170 8f7e139
- [Operators] Allow NT
matmul
layout forbfloat16
data type (#562) by Bolin Sun d5d0e51 - python3.8 -> python3.9 (#558) by Vadim Gimpelson a09713c
- [CI] Move import torch inside run_torch() (#570) by ZichuWu 4bc4d29
- [CI] Shorten build-docs run time (#565) by ZichuWu edadb07
- [CI] Tests Workflow. Add manual trigger of tests on different gpu types (#555) by c-fteixeira 66d9568
- [OPTIONS] Clean Huggingface tokens option (#561) by ZichuWu cdf2c8a
- [Bug] Fix out of memory error occurred while running
llama-2-7b
(#547) by Bolin Sun b8826d0 - [OPTIONS] Set mma as default in PassContext() (#530) by ZichuWu 35f02b9
- wgmma bf16 support (#531) by kjiang170 f8c057b
- [Bug] ‘uint32_t’ was not declared in this scope in CI build-wheel for runtime (#545) by ZichuWu 4ced47e
- Add more shapes to reduce op in regression (#534) by zhumakhan 8ef1bc2
- [COMPTIME] Added support for run_torch for the rest of transform operation (#525) by ZichuWu 04e4d5e
- f16 rest options supported and tested (#527) by kjiang170 e5e2404
- [Operators]
bfloat16
data type support for attention operators (#524) by Bolin Sun 07e597a - [Enhancement] Save running time by using symbolic_run to replace async_run in optimize (#490) by ZichuWu 92c81e8
- [BUG] Fix distilbert by changing variables names in ops.where (#512) by zhumakhan 2d615b6
- [OP] Support of
logsoftmax
(#517) by Vadim Gimpelson ce43f1e - refactor wgmma (#521) by kjiang170 4a80b9a
- [Bug] Fix the incorrect result after merging changes related to
matmul_nt
(#518) by Bolin Sun 2b7c348 - [PERF] Rewrite softmax (#516) by Vadim Gimpelson b50cca4
- wgmma instruction support and test for f16 input … (#499) by kjiang170 c758e54
- [BUG] Fix NT matmul corner case where
n
ork
dimension is odd (#513) by Bolin Sun 1e54f77 - [Operators] Support
bfloat16
data type inmatmul
operator (#511) by Bolin Sun a467c76 - [Operators] Support matmul with NT layout (#496) by Bolin Sun 8fc6de3
- [CI] Make test and publish workflows use built wheel on tests (#492) by c-fteixeira bc5b54e
- [Hidet Script] Import externally defined function automatically (#503) by Yaoyao Ding 43750c2
- [PERF] Fix for indexes optimization (#488) by Vadim Gimpelson f8c679a
- [CI] Update the set of Regression tests (#493) by Vadim Gimpelson 7e3ae1f
- [Enhancement] Causal attention with fp32 accumulator (#481) by zhumakhan 8b569bd
- [IR] Bound check for task mapping worker (#483) by Vadim Gimpelson 1544cdf
- [Bug] Rule based simplifier. Fix incorrect rule e/c1/c2 -> e/(c1*c2) (#487) by Vadim Gimpelson fd6b439
- [TOOLS] Task benchmark utilities (#479) by Vadim Gimpelson dc175f2
- [Dynamic][Enhancement] Convert div and mod including symbolvars to fast int div/mod (#464) by Max Hu c8d9158
- Revert accidental commit (#484) by Vadim Gimpelson 6c8ad3e
- bug fix by Vadim Gimpelson 3405b55
- [PERF] Сontinue indexes optimisations (#473) by Vadim Gimpelson da24ee3
- [Bug] Resolved multi-threading conflict with save_lower_ir() (#480) by ZichuWu 6a116ad
- Fixed the format change on the new transformers version (#482) by ZichuWu 0a81840
- Fix masked attention by using fp32 accumulate on first matmul (q and k) part (#468) by zhumakhan 40c12c9
- remove mpt-7b due to accuracy failure (#477) by zhumakhan 53a0cc4
- [BUG] Support concat empty tensors (#475) by ZichuWu 85bb6dd
- [TOOLS] Attached hash values to function signature in source.cu (#459) by ZichuWu a6f1033
- [BUG] Fix
ValueError
caused by different operand data types inif_then_else
while initializingConv2dTransposeGemmImageTask
(#470) by Bolin Sun 2826490 - [BUG] Fix
ZeroDivisionError
triggered wihtin the functionparallel_part_heuristic
ingraph/ops/conv2d/conv2d_gemm.py
(#472) by Bolin Sun a11d69c - [BUG] Fixing memory issue encountered while compiling the model
sam
(#466) by Bolin Sun c695974 - [PERF] Indexes optimization (#458) by Vadim Gimpelson f1ee08f
- Added more llms to Regression test (#432) by zhumakhan 03d6250
- Revert "[Dynamic][Enhancement] Convert div and mod including symbolvars to fast int div/mod" (#463) by Max Hu 2989389
- [Dynamic][Enhancement] Convert div and mod including symbolvars to fast int div/mod (#405) by Max Hu 0cffe7e
- [CI] Print stderr in
run_tests.py
(#443) by Vadim Gimpelson 015ffcd - [BUG] Fix
NotImpelementedError
encountered while compiling the modeldoctr_det_predictor
(#462) by Bolin Sun 868dc9d - [Operators] Adding support for
torch.nn.GLU
module (#461) by Bolin Sun f756051 - [BUG] Fixing another error encountered while compiling
detectron2_fcos_r_50_fpn
(#457) by Bolin Sun 798ce6e - [Ir][Primitives] fix #436 via adding missing instructions (#440) by xiaocenxiaocen 131ec20
- [BUG] Fixing errors encountered while compiling
detectron2_fcos_r_50_fpn
(#455) by Bolin Sun c74732d - [PERF] Introduce the new IR optimization Pass that spatial(1,47) -> spatial(47) (#452) by Vadim Gimpelson 0f2990b
- [Bug] Fixing the
ValueError
triggered while compiling the modeldlrm
during operator fusion pass (#437) by Bolin Sun de94946 - [Scripts] Add scripts of our wheel server (#439) by Yaoyao Ding 628eb60
- [Graph][Ops] disable cublas matmul for parallel k (#431) by xiaocenxiaocen 2696c34
- [BUG] Fixing an error triggered from the
conv_channel_last_pass
while compiling the modelsam
(#444) by Bolin Sun ba45522 - [BUG] Fixing a bug triggered while compiling in-place operator
torch.Tensor.scatter_add_
(#429) by Bolin Sun 4f142c4 - [PERF] Specialize pow(x,2) as x*x. llama-7B (#434) by Vadim Gimpelson f421a43
- [Version] Update 0.4.0 -> 0.5.0.dev in
setup.py
(#433) by Vadim Gimpelson d9da46f - [PERF] Allow prologue fusion for
reduce
op (#426) by Vadim Gimpelson 6606477 - [Bug] fixing regression (#422) by zhumakhan 646f7e7
- [Utility] Add ncu and nsys test utilities (#413) by Yaoyao Ding 2fc304f
- [Operators] Adding support for the method
torch.Tensor.scatter_add_
(#421) by Bolin Sun 8568afb - [Fix] fixed torch.pow (#420) by zhumakhan cac4a0e
- [Primitives] Add CUDA primitives: prmt, lop3, f16x2 sub and fma, and barrier (#414) by Yaoyao Ding 5186d87
- [Ir][Primitives] add exp2 (#410) by xiaocenxiaocen bbbfb7b
- [Update] Updating torch docker image from 24.04 to 24.07 (#418) by zhumakhan 9899060
- [Fix] Support writing subbyte data to global memory (#415) by Yaoyao Ding 9cacfe7
- [Bug] Fixing longformer compilation (#403) by zhumakhan 09d1bc0
- [Bug][Enhancement] Correct the behavior of non-parallel build when option
parallel_tune
is set to 1 (#406) by Max Hu c2e8ec9 - [CuTe] fix longformer (#411) by xiaocenxiaocen 4953f73
- [Tests] Adding tests for math primitives (#412) by Bolin Sun f859fac
- Adding accruacy check for huggingface LLMs in Regression (#368) by zhumakhan 4a1f72d
- [Bug] Fix hidet.ops.gather, add torch.sign torch.ceil. Disable torch.autograd.function.FunctionCtx (#394) by zhumakhan 3b6cb58
- workaround for gpt-j (#395) by zhumakhan 9d0e0c0
- [Bug] Cast dtypes in hidet.where when mismatch (#386) by zhumakhan 2172e16
- make llama2 work with all ttransformers versions (#385) by zhumakhan 426d14b
- [DEBUG] Save
Task
pickle in translations cache (#380) by Vadim Gimpelson cb72bc7 - [BUILD] Several changes in wheel building (#392) by Vadim Gimpelson f416ee5
- [Operators] Adding support for the
torch.nn.EmbeddingBag
(#378) by Bolin Sun 9d309c1 - [CI] Adding successfully compiled vision models to the tests/benchmark/run_config.json (#205) by Bolin Sun 8eb61c9
- Fix float return when limited by memory (#389) by Max Hu 03e5966
- [BUG] Fix bug in
normalize_launch_dims()
(#381) by Vadim Gimpelson b61d6b1 - [Operators] Extend the functionality of
einsum
to supportEllipsis
(#374) by Bolin Sun d412db1 - [Dependency] Remove the version restriction of transformers and diffuers (#475) by Yaoyao Ding 3e76c2f
- [README] Fix broken links (#474) by Yaoyao Ding 7b2f680