v0.11.2: Patch release
What's Changed
- Update version.txt after 0.11.1 release by @mrwyattii in #4484
- Update DS_BUILD_* references. by @loadams in #4485
- Introduce pydantic_v1 compatibility module for pydantic>=2.0.0 support by @ringohoffman in #4407
- Enable control over timeout with environment variable by @BramVanroy in #4405
- Update ROCm verison by @loadams in #4486
- adding 8bit dequantization kernel for asym fine-grained block quantization in zero-inference by @stephen-youn in #4450
- Fix scale factor on flops profiler by @loadams in #4500
- add DeepSpeed4Science white paper by @conglongli in #4502
- [CCLBackend] update API by @Liangliang-Ma in #4378
- Ulysses: add col-ai evaluation by @samadejacobs in #4517
- Ulysses: Update README.md by @samadejacobs in #4518
- add available memory check to accelerators by @jeffra in #4508
- clear redundant parameters in zero3 bwd hook by @inkcherry in #4520
- Add NPU FusedAdam support by @CurryRice233 in #4343
- fix error type issue in deepspeed/comm/ccl.py by @Liangliang-Ma in #4521
- Fixed deepspeed.comm.monitored_barrier call by @Quentin-Anthony in #4496
- [Bug fix] Add rope_theta for llama config by @cupertank in #4480
- [ROCm] Add rocblas header by @rraminen in #4538
- [docs] ZeRO infinity slides and blog by @jeffra in #4542
- Switch from HIP_PLATFORM_HCC to HIP_PLATFORM_AMD by @loadams in #4539
- Turn off I_MPI_PIN for impi launcher by @delock in #4531
- [docs] paper updates by @jeffra in #4543
- ROCm 6.0 prep changes by @loadams in #4537
- Fix RTD builds by @mrwyattii in #4558
- pipe engine _aggregate_total_loss: more efficient loss concatenation by @nelyahu in #4327
- Add missing rocblas include by @loadams in #4557
- Enable universal checkpoint for zero stage 1 by @tjruwase in #4516
- [AutoTP] Make AutoTP work when num_heads not divisible by number of workers by @delock in #4011
- Fix the sequence-parallelism for the dense model architecture by @RezaYazdaniAminabadi in #4530
- engine.py - save_checkpoint: only rank-0 should create the save dir by @nelyahu in #4536
- Remove PP Grad Tail Check by @Quentin-Anthony in #2538
- Added HIP_PLATFORM_AMD=1 by @rraminen in #4570
- fix multiple definition while building evoformer by @fecet in #4556
- Don't check overflow for bf16 data type by @hablb in #4512
- Public update by @yaozhewei in #4583
- [docs] paper updates by @jeffra in #4584
- Disable CPU inference on PRs by @loadams in #4590
New Contributors
- @ringohoffman made their first contribution in #4407
- @BramVanroy made their first contribution in #4405
- @cupertank made their first contribution in #4480
Full Changelog: v0.11.1...v0.11.2