accumulation_scheduler does not exist #15

zjost · 2023-04-06T00:14:57Z

🐛 Bug

Trainer.accumulation_scheduler does not exist, which makes the strategy code fail.

To Reproduce

Steps to reproduce the behavior:

trainer = Trainer(..., accelerator='gpu', strategy=HorovodStrategy())
trainer.fit(...)

/tmp/ipykernel_1183/1481180612.py in <cell line: 1>()
----> 1 trainer.fit(model=autoencoder, train_dataloaders=train_loader)

/home/default_user/.conda/envs/user/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py in fit(self, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path)
    518         model = _maybe_unwrap_optimized(model)
    519         self.strategy._lightning_module = model
--> 520         call._call_and_handle_interrupt(
    521             self, self._fit_impl, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path
    522         )

/home/default_user/.conda/envs/user/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py in _call_and_handle_interrupt(trainer, trainer_fn, *args, **kwargs)
     42             return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
     43         else:
---> 44             return trainer_fn(*args, **kwargs)
     45 
     46     except _TunerExitException:

/home/default_user/.conda/envs/user/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py in _fit_impl(self, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path)
    557             model_connected=self.lightning_module is not None,
    558         )
--> 559         self._run(model, ckpt_path=ckpt_path)
    560 
    561         assert self.state.stopped

/home/default_user/.conda/envs/user/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py in _run(self, model, ckpt_path)
    909 
    910         # strategy will configure model and move it to the device
--> 911         self.strategy.setup(self)
    912 
    913         # hook

/home/default_user/.conda/envs/user/lib/python3.10/site-packages/lightning_horovod/strategy.py in setup(self, trainer)
    148             hvd.broadcast_optimizer_state(optimizer, root_rank=0)
    149 
--> 150         accumulation_scheduler = trainer.accumulation_scheduler
    151         if accumulation_scheduler.epochs != [0]:
    152             raise MisconfigurationException(

AttributeError: 'Trainer' object has no attribute 'accumulation_scheduler'

Environment

PyTorch Version (e.g., 1.0): 1.12.0+cu113
OS (e.g., Linux): Linux
How you installed PyTorch (conda, pip, source): pip
Build command you used (if compiling from source):
Python version: 3.10.9. - CUDA/cuDNN version: 11.3
Lightning: 2.0.1

Other info

On a side note, the various documentation sources do not really explain how to use Horovod + Lightning in a way that works.
Lightning documentation refer to this repo (not easy to find). This repo refers to Horovod docs. Horovod docs don't mention this repo, but say pl.Trainer(accelerator='horovod'), or pl.Trainer(distributed_backend='horovod'), neither of which work. The README says trainer = Trainer(strategy="horovod", accelerator="gpu", devices=1), but that doesn't work either. I ended up using the CPU example of strategy=HorovodStrategy(), but then also specifying accelerator='gpu'.

The text was updated successfully, but these errors were encountered:

github-actions · 2023-04-06T00:15:31Z

Hi! thanks for your contribution!, great first issue!

uday-rao-aera · 2023-05-20T18:30:53Z

got same error.
what is fix for this?

zjost added bug Something isn't working help wanted Extra attention is needed labels Apr 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

accumulation_scheduler does not exist #15

accumulation_scheduler does not exist #15

zjost commented Apr 6, 2023

github-actions bot commented Apr 6, 2023

uday-rao-aera commented May 20, 2023

accumulation_scheduler does not exist #15

accumulation_scheduler does not exist #15

Comments

zjost commented Apr 6, 2023

🐛 Bug

To Reproduce

Environment

Other info

github-actions bot commented Apr 6, 2023

uday-rao-aera commented May 20, 2023