GPU parallel inference #46

mchantry · 2024-11-15T16:20:29Z

Is your feature request related to a problem? Please describe.

Models greater than the GPU memory capacity cannot be currently run in inference, whilst parallel implementations in training exist.

Describe the solution you'd like

Implement parallel inference, allowing anemoi models to be distributed across several GPUs.

Describe alternatives you've considered

No response

Additional context

No response

Organisation

ECMWF

jswijnands · 2024-11-21T16:03:54Z

Additional context: This also has a use case for running inference on AWS (KNMI), where it would be preferable to request multiple smaller GPU instances and combine the GPU memory through model sharding. When requesting an AWS instance with sufficient memory to run without model sharding, you typically get an instance with multiple GPUs of which only one is used

dietervdb-meteo · 2025-01-08T10:25:31Z

Any progress on this? We now also have a checkpoint that seems to require multiple GPU's for inference.

I see that PR #55 was closed without success?

cathalobrien · 2025-01-16T09:07:10Z

@dietervdb-meteo we have a PR now #108

cathalobrien · 2025-01-23T07:33:49Z

PR #108 has been merged, closing this issue

mchantry added the enhancement New feature or request label Nov 15, 2024

mchantry assigned mchantry, cathalobrien and japols and unassigned mchantry Nov 15, 2024

anaprietonem added this to Anemoi-dev Jan 22, 2025

anaprietonem moved this to Now In Progress in Anemoi-dev Jan 22, 2025

anaprietonem removed this from Anemoi-dev Jan 22, 2025

cathalobrien closed this as completed Jan 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU parallel inference #46

GPU parallel inference #46

mchantry commented Nov 15, 2024

jswijnands commented Nov 21, 2024

dietervdb-meteo commented Jan 8, 2025 •

edited

Loading

cathalobrien commented Jan 16, 2025

cathalobrien commented Jan 23, 2025

GPU parallel inference #46

GPU parallel inference #46

Comments

mchantry commented Nov 15, 2024

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Organisation

jswijnands commented Nov 21, 2024

dietervdb-meteo commented Jan 8, 2025 • edited Loading

cathalobrien commented Jan 16, 2025

cathalobrien commented Jan 23, 2025

dietervdb-meteo commented Jan 8, 2025 •

edited

Loading