Skip to content
This repository has been archived by the owner on Feb 1, 2024. It is now read-only.

accumulation_scheduler does not exist #15

Open
zjost opened this issue Apr 6, 2023 · 2 comments
Open

accumulation_scheduler does not exist #15

zjost opened this issue Apr 6, 2023 · 2 comments
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@zjost
Copy link

zjost commented Apr 6, 2023

🐛 Bug

Trainer.accumulation_scheduler does not exist, which makes the strategy code fail.

To Reproduce

Steps to reproduce the behavior:

trainer = Trainer(..., accelerator='gpu', strategy=HorovodStrategy())
trainer.fit(...)
/tmp/ipykernel_1183/1481180612.py in <cell line: 1>()
----> 1 trainer.fit(model=autoencoder, train_dataloaders=train_loader)

/home/default_user/.conda/envs/user/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py in fit(self, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path)
    518         model = _maybe_unwrap_optimized(model)
    519         self.strategy._lightning_module = model
--> 520         call._call_and_handle_interrupt(
    521             self, self._fit_impl, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path
    522         )

/home/default_user/.conda/envs/user/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py in _call_and_handle_interrupt(trainer, trainer_fn, *args, **kwargs)
     42             return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
     43         else:
---> 44             return trainer_fn(*args, **kwargs)
     45 
     46     except _TunerExitException:

/home/default_user/.conda/envs/user/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py in _fit_impl(self, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path)
    557             model_connected=self.lightning_module is not None,
    558         )
--> 559         self._run(model, ckpt_path=ckpt_path)
    560 
    561         assert self.state.stopped

/home/default_user/.conda/envs/user/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py in _run(self, model, ckpt_path)
    909 
    910         # strategy will configure model and move it to the device
--> 911         self.strategy.setup(self)
    912 
    913         # hook

/home/default_user/.conda/envs/user/lib/python3.10/site-packages/lightning_horovod/strategy.py in setup(self, trainer)
    148             hvd.broadcast_optimizer_state(optimizer, root_rank=0)
    149 
--> 150         accumulation_scheduler = trainer.accumulation_scheduler
    151         if accumulation_scheduler.epochs != [0]:
    152             raise MisconfigurationException(

AttributeError: 'Trainer' object has no attribute 'accumulation_scheduler'

Environment

  • PyTorch Version (e.g., 1.0): 1.12.0+cu113
  • OS (e.g., Linux): Linux
  • How you installed PyTorch (conda, pip, source): pip
  • Build command you used (if compiling from source):
  • Python version: 3.10.9. - CUDA/cuDNN version: 11.3
  • Lightning: 2.0.1

Other info

On a side note, the various documentation sources do not really explain how to use Horovod + Lightning in a way that works.
Lightning documentation refer to this repo (not easy to find). This repo refers to Horovod docs. Horovod docs don't mention this repo, but say pl.Trainer(accelerator='horovod'), or pl.Trainer(distributed_backend='horovod'), neither of which work. The README says trainer = Trainer(strategy="horovod", accelerator="gpu", devices=1), but that doesn't work either. I ended up using the CPU example of strategy=HorovodStrategy(), but then also specifying accelerator='gpu'.

@zjost zjost added bug Something isn't working help wanted Extra attention is needed labels Apr 6, 2023
@github-actions
Copy link

github-actions bot commented Apr 6, 2023

Hi! thanks for your contribution!, great first issue!

@uday-rao-aera
Copy link

got same error.
what is fix for this?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants