This repository has been archived by the owner on Sep 19, 2022. It is now read-only.
v0.5.0 release
Closed issues:
- Ensuring CRD requires cluster-level authority #144
- Label naming style inconsistent #140
- Pytorch operator v1beta2 API #134
- Support gang-scheduling by kube-batch #129
- Pytorch workers keep crashing if master is not up yet. #125
- Support cross compile for image build. #42
- Deprecate v1alpha2 API #135
- Distribution across multi-gpu nodes #128
- Upgrade examples to Pytorch 1.0 #123
- Double gradient reduction in examples? #122
Merged pull requests:
- Implement ActiveDeadlineSeconds and BackoffLimit #151 (johnugeorge)
- Use podGroup instead of PDB in v1beta2 #150 (johnugeorge)
- Use kube-batch as scheduler by default when gang-scheduling is enabled #149 (johnugeorge)
- Remove usage of crd client #148 (johnugeorge)
- Update tests to have single operator deployment for v1beta1 and v1beta2 API #147 (johnugeorge)
- Renaming labels to consistent format #146 (johnugeorge)
- Workers are created only when the master is in running phase #145 (johnugeorge)
- Adding tests for v1beta2 #143 (johnugeorge)
- Change cluster version to 1.11 #142 (andreyvelich)
- Update OWNERS #141 (andreyvelich)
- Adding status subresource #139 (johnugeorge)
- Adding v1beta2 API implementation #138 (johnugeorge)
- Upgrading k8s to 1.11 #137 (johnugeorge)
- Removing v1alpha2 API #136 (johnugeorge)
- Adding detailed events/messages to PyTorch Jobs #133 (johnugeorge)
- Skip status reinit when job is succeeded #132 (johnugeorge)
- Travis build fix #131 (johnugeorge)
- Rework example and e2e test script #126 (TimZaman)
- Change Distributed Data Parallel example #124 (andreyvelich)