Release Hidet v0.5.0 · hidet-org/hidet

What's Changed

[BUG] Add comp server requirements (#661) by Vadim Gimpelson 300fd33
[BUG] A number of fixes for vllm's TP (#651) by Vadim Gimpelson 9c29f66
matmul_f16 with wgmma (#627) by kjiang170 9f0ea7d
[BUG] VLLM (and DMWL) compile with hidet backend (#647) by zhumakhan 6c6be7a
[IR] Add support for swizzle, interleave and l2Promotion in tensor map creation (#643) by Bolin Sun 21ff63f
[BUG] fix attach hash to signature (#638) by xiaocenxiaocen dbd6613
Hexcute base branch (All related PRs will be merged into this base PR. ) (#294) by xiaocenxiaocen b1fdf17
[PERF] Default value for parallel_k is 'disabled' (#634) by Vadim Gimpelson 135212b
Adapt to bfloat16 where necessary (#624) by ZichuWu 9045865
[Bug] Parallel compilation sync (#616) by ZichuWu 4c16c57
[COMPTIME] Hot start speedup (#625) by Vadim Gimpelson 22c657b
[BUG] Fix torch2.5 OoM and docs build fix (#637) by zhumakhan bf32f8b
Revert "[BUG] Fix torch2.5 OoM issue" (#635) by zhumakhan 9131a5c
[BUG] Fix torch2.5 OoM issue (#609) by zhumakhan fe59c63
[CI]Fix small typoes for building and publishing to internal Hidet PYPI Index (#598) by xinli-centml f8400fe
[PERF] Support bf16 in one more place (#623) by Vadim Gimpelson 7f77349
[Tests] Adapt tests/operators for bfloat16 (#615) by ZichuWu ba9c0ad
[DISTRIBUTED] Support all_reduce in torch.compile mode (#612) by Vadim Gimpelson 0bca591
[torchAPI] Inherit cuda stream from torch (#618) by Vadim Gimpelson ad4e00a
[BUG] Fix bugs in shared map implementation (#608) by Vadim Gimpelson ffdbde4
[CI] Turn off search space 2 for tests/lang (#617) by ZichuWu 5f7fae8
[Tests] Adapt tests/lang for bfloat16 test cases (#594) by ZichuWu 5b829cb
[Tests] Adapt tests/frontends to bfloat16 (#592) by ZichuWu a5b72e6
[Tests] Adapt tests/ir for bfloat16 test cases (#593) by ZichuWu 545aeea
[Tests] Adjust test cases for tests/models for bfloat16. (#595) by ZichuWu bedff21
Use one global cuda workspace for all the CompiledGraph (#603) by Max Hu 6652307
[Fix] Fixing a minor mistake encountered while adapting test cases for bfloat16 data type (#607) by Bolin Sun 275070d
Kaihang/wgmma tf32 u8 i8 support (#549) by kjiang170 a0e6658
[CI] Exclude tests/unit_tests/test_dynamic_shape.py::test_attention[cuda] (#606) by Vadim Gimpelson 5579392
[Tests] Adjust test cases for tests/unit-tests for bfloat16. (#596) by ZichuWu 0e5ec55
[BUG] Fix incorrect converting fxgraph to hidet's flow graph + expand looking for nccl lib with user site packages (#604) by Vadim Gimpelson 1995d43
[Tests] Added bfloat16 test cases for tests/cuda (#590) by ZichuWu febfbd7
[Tests] Adjust test cases for tests/utils for bfloat16. (#597) by ZichuWu 36aab6f
[Tests] Change float16 to bfloat16 for tests/apps (#589) by ZichuWu 83cddbb
[CI] add new github actions workflow to manually build and push to internal pypi index (#554) by xinli-centml 6beffab
[OPTIONS] Remove unnecessary parallel_k (#572) by ZichuWu 9051f26
fix test_wgmma.py error for illegal warp address (#588) by kjiang170 8f7e139
[Operators] Allow NT matmul layout for bfloat16 data type (#562) by Bolin Sun d5d0e51
python3.8 -> python3.9 (#558) by Vadim Gimpelson a09713c
[CI] Move import torch inside run_torch() (#570) by ZichuWu 4bc4d29
[CI] Shorten build-docs run time (#565) by ZichuWu edadb07
[CI] Tests Workflow. Add manual trigger of tests on different gpu types (#555) by c-fteixeira 66d9568
[OPTIONS] Clean Huggingface tokens option (#561) by ZichuWu cdf2c8a
[Bug] Fix out of memory error occurred while running llama-2-7b (#547) by Bolin Sun b8826d0
[OPTIONS] Set mma as default in PassContext() (#530) by ZichuWu 35f02b9
wgmma bf16 support (#531) by kjiang170 f8c057b
[Bug] ‘uint32_t’ was not declared in this scope in CI build-wheel for runtime (#545) by ZichuWu 4ced47e
Add more shapes to reduce op in regression (#534) by zhumakhan 8ef1bc2
[COMPTIME] Added support for run_torch for the rest of transform operation (#525) by ZichuWu 04e4d5e
f16 rest options supported and tested (#527) by kjiang170 e5e2404
[Operators] bfloat16 data type support for attention operators (#524) by Bolin Sun 07e597a
[Enhancement] Save running time by using symbolic_run to replace async_run in optimize (#490) by ZichuWu 92c81e8
[BUG] Fix distilbert by changing variables names in ops.where (#512) by zhumakhan 2d615b6
[OP] Support of logsoftmax (#517) by Vadim Gimpelson ce43f1e
refactor wgmma (#521) by kjiang170 4a80b9a
[Bug] Fix the incorrect result after merging changes related to matmul_nt (#518) by Bolin Sun 2b7c348
[PERF] Rewrite softmax (#516) by Vadim Gimpelson b50cca4
wgmma instruction support and test for f16 input … (#499) by kjiang170 c758e54
[BUG] Fix NT matmul corner case where n or k dimension is odd (#513) by Bolin Sun 1e54f77
[Operators] Support bfloat16 data type in matmul operator (#511) by Bolin Sun a467c76
[Operators] Support matmul with NT layout (#496) by Bolin Sun 8fc6de3
[CI] Make test and publish workflows use built wheel on tests (#492) by c-fteixeira bc5b54e
[Hidet Script] Import externally defined function automatically (#503) by Yaoyao Ding 43750c2
[PERF] Fix for indexes optimization (#488) by Vadim Gimpelson f8c679a
[CI] Update the set of Regression tests (#493) by Vadim Gimpelson 7e3ae1f
[Enhancement] Causal attention with fp32 accumulator (#481) by zhumakhan 8b569bd
[IR] Bound check for task mapping worker (#483) by Vadim Gimpelson 1544cdf
[Bug] Rule based simplifier. Fix incorrect rule e/c1/c2 -> e/(c1*c2) (#487) by Vadim Gimpelson fd6b439
[TOOLS] Task benchmark utilities (#479) by Vadim Gimpelson dc175f2
[Dynamic][Enhancement] Convert div and mod including symbolvars to fast int div/mod (#464) by Max Hu c8d9158
Revert accidental commit (#484) by Vadim Gimpelson 6c8ad3e
bug fix by Vadim Gimpelson 3405b55
[PERF] Сontinue indexes optimisations (#473) by Vadim Gimpelson da24ee3
[Bug] Resolved multi-threading conflict with save_lower_ir() (#480) by ZichuWu 6a116ad
Fixed the format change on the new transformers version (#482) by ZichuWu 0a81840
Fix masked attention by using fp32 accumulate on first matmul (q and k) part (#468) by zhumakhan 40c12c9
remove mpt-7b due to accuracy failure (#477) by zhumakhan 53a0cc4
[BUG] Support concat empty tensors (#475) by ZichuWu 85bb6dd
[TOOLS] Attached hash values to function signature in source.cu (#459) by ZichuWu a6f1033
[BUG] Fix ValueError caused by different operand data types in if_then_else while initializing Conv2dTransposeGemmImageTask (#470) by Bolin Sun 2826490
[BUG] Fix ZeroDivisionError triggered wihtin the function parallel_part_heuristic in graph/ops/conv2d/conv2d_gemm.py (#472) by Bolin Sun a11d69c
[BUG] Fixing memory issue encountered while compiling the model sam (#466) by Bolin Sun c695974
[PERF] Indexes optimization (#458) by Vadim Gimpelson f1ee08f
Added more llms to Regression test (#432) by zhumakhan 03d6250
Revert "[Dynamic][Enhancement] Convert div and mod including symbolvars to fast int div/mod" (#463) by Max Hu 2989389
[Dynamic][Enhancement] Convert div and mod including symbolvars to fast int div/mod (#405) by Max Hu 0cffe7e
[CI] Print stderr in run_tests.py (#443) by Vadim Gimpelson 015ffcd
[BUG] Fix NotImpelementedError encountered while compiling the model doctr_det_predictor (#462) by Bolin Sun 868dc9d
[Operators] Adding support for torch.nn.GLU module (#461) by Bolin Sun f756051
[BUG] Fixing another error encountered while compiling detectron2_fcos_r_50_fpn (#457) by Bolin Sun 798ce6e
[Ir][Primitives] fix #436 via adding missing instructions (#440) by xiaocenxiaocen 131ec20
[BUG] Fixing errors encountered while compiling detectron2_fcos_r_50_fpn (#455) by Bolin Sun c74732d
[PERF] Introduce the new IR optimization Pass that spatial(1,47) -> spatial(47) (#452) by Vadim Gimpelson 0f2990b
[Bug] Fixing the ValueError triggered while compiling the model dlrm during operator fusion pass (#437) by Bolin Sun de94946
[Scripts] Add scripts of our wheel server (#439) by Yaoyao Ding 628eb60
[Graph][Ops] disable cublas matmul for parallel k (#431) by xiaocenxiaocen 2696c34
[BUG] Fixing an error triggered from the conv_channel_last_pass while compiling the model sam (#444) by Bolin Sun ba45522
[BUG] Fixing a bug triggered while compiling in-place operator torch.Tensor.scatter_add_ (#429) by Bolin Sun 4f142c4
[PERF] Specialize pow(x,2) as x*x. llama-7B (#434) by Vadim Gimpelson f421a43
[Version] Update 0.4.0 -> 0.5.0.dev in setup.py (#433) by Vadim Gimpelson d9da46f
[PERF] Allow prologue fusion for reduce op (#426) by Vadim Gimpelson 6606477
[Bug] fixing regression (#422) by zhumakhan 646f7e7
[Utility] Add ncu and nsys test utilities (#413) by Yaoyao Ding 2fc304f
[Operators] Adding support for the method torch.Tensor.scatter_add_ (#421) by Bolin Sun 8568afb
[Fix] fixed torch.pow (#420) by zhumakhan cac4a0e
[Primitives] Add CUDA primitives: prmt, lop3, f16x2 sub and fma, and barrier (#414) by Yaoyao Ding 5186d87
[Ir][Primitives] add exp2 (#410) by xiaocenxiaocen bbbfb7b
[Update] Updating torch docker image from 24.04 to 24.07 (#418) by zhumakhan 9899060
[Fix] Support writing subbyte data to global memory (#415) by Yaoyao Ding 9cacfe7
[Bug] Fixing longformer compilation (#403) by zhumakhan 09d1bc0
[Bug][Enhancement] Correct the behavior of non-parallel build when option parallel_tune is set to 1 (#406) by Max Hu c2e8ec9
[CuTe] fix longformer (#411) by xiaocenxiaocen 4953f73
[Tests] Adding tests for math primitives (#412) by Bolin Sun f859fac
Adding accruacy check for huggingface LLMs in Regression (#368) by zhumakhan 4a1f72d
[Bug] Fix hidet.ops.gather, add torch.sign torch.ceil. Disable torch.autograd.function.FunctionCtx (#394) by zhumakhan 3b6cb58
workaround for gpt-j (#395) by zhumakhan 9d0e0c0
[Bug] Cast dtypes in hidet.where when mismatch (#386) by zhumakhan 2172e16
make llama2 work with all ttransformers versions (#385) by zhumakhan 426d14b
[DEBUG] Save Task pickle in translations cache (#380) by Vadim Gimpelson cb72bc7
[BUILD] Several changes in wheel building (#392) by Vadim Gimpelson f416ee5
[Operators] Adding support for the torch.nn.EmbeddingBag (#378) by Bolin Sun 9d309c1
[CI] Adding successfully compiled vision models to the tests/benchmark/run_config.json (#205) by Bolin Sun 8eb61c9
Fix float return when limited by memory (#389) by Max Hu 03e5966
[BUG] Fix bug in normalize_launch_dims() (#381) by Vadim Gimpelson b61d6b1
[Operators] Extend the functionality of einsum to support Ellipsis (#374) by Bolin Sun d412db1
[Dependency] Remove the version restriction of transformers and diffuers (#475) by Yaoyao Ding 3e76c2f
[README] Fix broken links (#474) by Yaoyao Ding 7b2f680

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hidet v0.5.0

What's Changed

Contributors

Contributors