Skip to content

Hidet v0.5.0

Latest
Compare
Choose a tag to compare
@vadiklyutiy vadiklyutiy released this 21 Dec 21:47
· 3 commits to main since this release
8db9f39

What's Changed

  • [BUG] Add comp server requirements (#661) by Vadim Gimpelson 300fd33
  • [BUG] A number of fixes for vllm's TP (#651) by Vadim Gimpelson 9c29f66
  • matmul_f16 with wgmma (#627) by kjiang170 9f0ea7d
  • [BUG] VLLM (and DMWL) compile with hidet backend (#647) by zhumakhan 6c6be7a
  • [IR] Add support for swizzle, interleave and l2Promotion in tensor map creation (#643) by Bolin Sun 21ff63f
  • [BUG] fix attach hash to signature (#638) by xiaocenxiaocen dbd6613
  • Hexcute base branch (All related PRs will be merged into this base PR. ) (#294) by xiaocenxiaocen b1fdf17
  • [PERF] Default value for parallel_k is 'disabled' (#634) by Vadim Gimpelson 135212b
  • Adapt to bfloat16 where necessary (#624) by ZichuWu 9045865
  • [Bug] Parallel compilation sync (#616) by ZichuWu 4c16c57
  • [COMPTIME] Hot start speedup (#625) by Vadim Gimpelson 22c657b
  • [BUG] Fix torch2.5 OoM and docs build fix (#637) by zhumakhan bf32f8b
  • Revert "[BUG] Fix torch2.5 OoM issue" (#635) by zhumakhan 9131a5c
  • [BUG] Fix torch2.5 OoM issue (#609) by zhumakhan fe59c63
  • [CI]Fix small typoes for building and publishing to internal Hidet PYPI Index (#598) by xinli-centml f8400fe
  • [PERF] Support bf16 in one more place (#623) by Vadim Gimpelson 7f77349
  • [Tests] Adapt tests/operators for bfloat16 (#615) by ZichuWu ba9c0ad
  • [DISTRIBUTED] Support all_reduce in torch.compile mode (#612) by Vadim Gimpelson 0bca591
  • [torchAPI] Inherit cuda stream from torch (#618) by Vadim Gimpelson ad4e00a
  • [BUG] Fix bugs in shared map implementation (#608) by Vadim Gimpelson ffdbde4
  • [CI] Turn off search space 2 for tests/lang (#617) by ZichuWu 5f7fae8
  • [Tests] Adapt tests/lang for bfloat16 test cases (#594) by ZichuWu 5b829cb
  • [Tests] Adapt tests/frontends to bfloat16 (#592) by ZichuWu a5b72e6
  • [Tests] Adapt tests/ir for bfloat16 test cases (#593) by ZichuWu 545aeea
  • [Tests] Adjust test cases for tests/models for bfloat16. (#595) by ZichuWu bedff21
  • Use one global cuda workspace for all the CompiledGraph (#603) by Max Hu 6652307
  • [Fix] Fixing a minor mistake encountered while adapting test cases for bfloat16 data type (#607) by Bolin Sun 275070d
  • Kaihang/wgmma tf32 u8 i8 support (#549) by kjiang170 a0e6658
  • [CI] Exclude tests/unit_tests/test_dynamic_shape.py::test_attention[cuda] (#606) by Vadim Gimpelson 5579392
  • [Tests] Adjust test cases for tests/unit-tests for bfloat16. (#596) by ZichuWu 0e5ec55
  • [BUG] Fix incorrect converting fxgraph to hidet's flow graph + expand looking for nccl lib with user site packages (#604) by Vadim Gimpelson 1995d43
  • [Tests] Added bfloat16 test cases for tests/cuda (#590) by ZichuWu febfbd7
  • [Tests] Adjust test cases for tests/utils for bfloat16. (#597) by ZichuWu 36aab6f
  • [Tests] Change float16 to bfloat16 for tests/apps (#589) by ZichuWu 83cddbb
  • [CI] add new github actions workflow to manually build and push to internal pypi index (#554) by xinli-centml 6beffab
  • [OPTIONS] Remove unnecessary parallel_k (#572) by ZichuWu 9051f26
  • fix test_wgmma.py error for illegal warp address (#588) by kjiang170 8f7e139
  • [Operators] Allow NT matmul layout for bfloat16 data type (#562) by Bolin Sun d5d0e51
  • python3.8 -> python3.9 (#558) by Vadim Gimpelson a09713c
  • [CI] Move import torch inside run_torch() (#570) by ZichuWu 4bc4d29
  • [CI] Shorten build-docs run time (#565) by ZichuWu edadb07
  • [CI] Tests Workflow. Add manual trigger of tests on different gpu types (#555) by c-fteixeira 66d9568
  • [OPTIONS] Clean Huggingface tokens option (#561) by ZichuWu cdf2c8a
  • [Bug] Fix out of memory error occurred while running llama-2-7b (#547) by Bolin Sun b8826d0
  • [OPTIONS] Set mma as default in PassContext() (#530) by ZichuWu 35f02b9
  • wgmma bf16 support (#531) by kjiang170 f8c057b
  • [Bug] ‘uint32_t’ was not declared in this scope in CI build-wheel for runtime (#545) by ZichuWu 4ced47e
  • Add more shapes to reduce op in regression (#534) by zhumakhan 8ef1bc2
  • [COMPTIME] Added support for run_torch for the rest of transform operation (#525) by ZichuWu 04e4d5e
  • f16 rest options supported and tested (#527) by kjiang170 e5e2404
  • [Operators] bfloat16 data type support for attention operators (#524) by Bolin Sun 07e597a
  • [Enhancement] Save running time by using symbolic_run to replace async_run in optimize (#490) by ZichuWu 92c81e8
  • [BUG] Fix distilbert by changing variables names in ops.where (#512) by zhumakhan 2d615b6
  • [OP] Support of logsoftmax (#517) by Vadim Gimpelson ce43f1e
  • refactor wgmma (#521) by kjiang170 4a80b9a
  • [Bug] Fix the incorrect result after merging changes related to matmul_nt (#518) by Bolin Sun 2b7c348
  • [PERF] Rewrite softmax (#516) by Vadim Gimpelson b50cca4
  • wgmma instruction support and test for f16 input … (#499) by kjiang170 c758e54
  • [BUG] Fix NT matmul corner case where n or k dimension is odd (#513) by Bolin Sun 1e54f77
  • [Operators] Support bfloat16 data type in matmul operator (#511) by Bolin Sun a467c76
  • [Operators] Support matmul with NT layout (#496) by Bolin Sun 8fc6de3
  • [CI] Make test and publish workflows use built wheel on tests (#492) by c-fteixeira bc5b54e
  • [Hidet Script] Import externally defined function automatically (#503) by Yaoyao Ding 43750c2
  • [PERF] Fix for indexes optimization (#488) by Vadim Gimpelson f8c679a
  • [CI] Update the set of Regression tests (#493) by Vadim Gimpelson 7e3ae1f
  • [Enhancement] Causal attention with fp32 accumulator (#481) by zhumakhan 8b569bd
  • [IR] Bound check for task mapping worker (#483) by Vadim Gimpelson 1544cdf
  • [Bug] Rule based simplifier. Fix incorrect rule e/c1/c2 -> e/(c1*c2) (#487) by Vadim Gimpelson fd6b439
  • [TOOLS] Task benchmark utilities (#479) by Vadim Gimpelson dc175f2
  • [Dynamic][Enhancement] Convert div and mod including symbolvars to fast int div/mod (#464) by Max Hu c8d9158
  • Revert accidental commit (#484) by Vadim Gimpelson 6c8ad3e
  • bug fix by Vadim Gimpelson 3405b55
  • [PERF] Сontinue indexes optimisations (#473) by Vadim Gimpelson da24ee3
  • [Bug] Resolved multi-threading conflict with save_lower_ir() (#480) by ZichuWu 6a116ad
  • Fixed the format change on the new transformers version (#482) by ZichuWu 0a81840
  • Fix masked attention by using fp32 accumulate on first matmul (q and k) part (#468) by zhumakhan 40c12c9
  • remove mpt-7b due to accuracy failure (#477) by zhumakhan 53a0cc4
  • [BUG] Support concat empty tensors (#475) by ZichuWu 85bb6dd
  • [TOOLS] Attached hash values to function signature in source.cu (#459) by ZichuWu a6f1033
  • [BUG] Fix ValueError caused by different operand data types in if_then_else while initializing Conv2dTransposeGemmImageTask (#470) by Bolin Sun 2826490
  • [BUG] Fix ZeroDivisionError triggered wihtin the function parallel_part_heuristic in graph/ops/conv2d/conv2d_gemm.py (#472) by Bolin Sun a11d69c
  • [BUG] Fixing memory issue encountered while compiling the model sam (#466) by Bolin Sun c695974
  • [PERF] Indexes optimization (#458) by Vadim Gimpelson f1ee08f
  • Added more llms to Regression test (#432) by zhumakhan 03d6250
  • Revert "[Dynamic][Enhancement] Convert div and mod including symbolvars to fast int div/mod" (#463) by Max Hu 2989389
  • [Dynamic][Enhancement] Convert div and mod including symbolvars to fast int div/mod (#405) by Max Hu 0cffe7e
  • [CI] Print stderr in run_tests.py (#443) by Vadim Gimpelson 015ffcd
  • [BUG] Fix NotImpelementedError encountered while compiling the model doctr_det_predictor (#462) by Bolin Sun 868dc9d
  • [Operators] Adding support for torch.nn.GLU module (#461) by Bolin Sun f756051
  • [BUG] Fixing another error encountered while compiling detectron2_fcos_r_50_fpn (#457) by Bolin Sun 798ce6e
  • [Ir][Primitives] fix #436 via adding missing instructions (#440) by xiaocenxiaocen 131ec20
  • [BUG] Fixing errors encountered while compiling detectron2_fcos_r_50_fpn (#455) by Bolin Sun c74732d
  • [PERF] Introduce the new IR optimization Pass that spatial(1,47) -> spatial(47) (#452) by Vadim Gimpelson 0f2990b
  • [Bug] Fixing the ValueError triggered while compiling the model dlrm during operator fusion pass (#437) by Bolin Sun de94946
  • [Scripts] Add scripts of our wheel server (#439) by Yaoyao Ding 628eb60
  • [Graph][Ops] disable cublas matmul for parallel k (#431) by xiaocenxiaocen 2696c34
  • [BUG] Fixing an error triggered from the conv_channel_last_pass while compiling the model sam (#444) by Bolin Sun ba45522
  • [BUG] Fixing a bug triggered while compiling in-place operator torch.Tensor.scatter_add_ (#429) by Bolin Sun 4f142c4
  • [PERF] Specialize pow(x,2) as x*x. llama-7B (#434) by Vadim Gimpelson f421a43
  • [Version] Update 0.4.0 -> 0.5.0.dev in setup.py (#433) by Vadim Gimpelson d9da46f
  • [PERF] Allow prologue fusion for reduce op (#426) by Vadim Gimpelson 6606477
  • [Bug] fixing regression (#422) by zhumakhan 646f7e7
  • [Utility] Add ncu and nsys test utilities (#413) by Yaoyao Ding 2fc304f
  • [Operators] Adding support for the method torch.Tensor.scatter_add_ (#421) by Bolin Sun 8568afb
  • [Fix] fixed torch.pow (#420) by zhumakhan cac4a0e
  • [Primitives] Add CUDA primitives: prmt, lop3, f16x2 sub and fma, and barrier (#414) by Yaoyao Ding 5186d87
  • [Ir][Primitives] add exp2 (#410) by xiaocenxiaocen bbbfb7b
  • [Update] Updating torch docker image from 24.04 to 24.07 (#418) by zhumakhan 9899060
  • [Fix] Support writing subbyte data to global memory (#415) by Yaoyao Ding 9cacfe7
  • [Bug] Fixing longformer compilation (#403) by zhumakhan 09d1bc0
  • [Bug][Enhancement] Correct the behavior of non-parallel build when option parallel_tune is set to 1 (#406) by Max Hu c2e8ec9
  • [CuTe] fix longformer (#411) by xiaocenxiaocen 4953f73
  • [Tests] Adding tests for math primitives (#412) by Bolin Sun f859fac
  • Adding accruacy check for huggingface LLMs in Regression (#368) by zhumakhan 4a1f72d
  • [Bug] Fix hidet.ops.gather, add torch.sign torch.ceil. Disable torch.autograd.function.FunctionCtx (#394) by zhumakhan 3b6cb58
  • workaround for gpt-j (#395) by zhumakhan 9d0e0c0
  • [Bug] Cast dtypes in hidet.where when mismatch (#386) by zhumakhan 2172e16
  • make llama2 work with all ttransformers versions (#385) by zhumakhan 426d14b
  • [DEBUG] Save Task pickle in translations cache (#380) by Vadim Gimpelson cb72bc7
  • [BUILD] Several changes in wheel building (#392) by Vadim Gimpelson f416ee5
  • [Operators] Adding support for the torch.nn.EmbeddingBag (#378) by Bolin Sun 9d309c1
  • [CI] Adding successfully compiled vision models to the tests/benchmark/run_config.json (#205) by Bolin Sun 8eb61c9
  • Fix float return when limited by memory (#389) by Max Hu 03e5966
  • [BUG] Fix bug in normalize_launch_dims() (#381) by Vadim Gimpelson b61d6b1
  • [Operators] Extend the functionality of einsum to support Ellipsis (#374) by Bolin Sun d412db1
  • [Dependency] Remove the version restriction of transformers and diffuers (#475) by Yaoyao Ding 3e76c2f
  • [README] Fix broken links (#474) by Yaoyao Ding 7b2f680

Contributors