Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU parallel inference #46

Closed
mchantry opened this issue Nov 15, 2024 · 4 comments
Closed

GPU parallel inference #46

mchantry opened this issue Nov 15, 2024 · 4 comments
Assignees
Labels
enhancement New feature or request

Comments

@mchantry
Copy link
Member

Is your feature request related to a problem? Please describe.

Models greater than the GPU memory capacity cannot be currently run in inference, whilst parallel implementations in training exist.

Describe the solution you'd like

Implement parallel inference, allowing anemoi models to be distributed across several GPUs.

Describe alternatives you've considered

No response

Additional context

No response

Organisation

ECMWF

@mchantry mchantry added the enhancement New feature or request label Nov 15, 2024
@mchantry mchantry assigned mchantry, cathalobrien and japols and unassigned mchantry Nov 15, 2024
@jswijnands
Copy link

Additional context: This also has a use case for running inference on AWS (KNMI), where it would be preferable to request multiple smaller GPU instances and combine the GPU memory through model sharding. When requesting an AWS instance with sufficient memory to run without model sharding, you typically get an instance with multiple GPUs of which only one is used

@dietervdb-meteo
Copy link
Contributor

dietervdb-meteo commented Jan 8, 2025

Any progress on this? We now also have a checkpoint that seems to require multiple GPU's for inference.

I see that PR #55 was closed without success?

@cathalobrien
Copy link
Contributor

@dietervdb-meteo we have a PR now #108

@anaprietonem anaprietonem moved this to Now In Progress in Anemoi-dev Jan 22, 2025
@cathalobrien
Copy link
Contributor

PR #108 has been merged, closing this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants