-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement coupling of two or more models #76
base: develop
Are you sure you want to change the base?
Conversation
Would be good to develop this multi-GPU with the expected multi-GPU inference for a single model in mind. Perhaps @cathalobrien and/or @japols can have a look at this? |
@@ -47,7 +48,8 @@ def _init(self, state): | |||
if os.path.exists(self.path): | |||
os.remove(self.path) | |||
|
|||
self.ncfile = Dataset(self.path, "w", format="NETCDF4") | |||
with LOCK: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When testing anemoi-inference run
with netcdf output, it hangs forever on with LOCK:
(Not even proceeding to the next line). Replacing this with if True:
the code proceeds to hang on the next instance of with LOCK
. Possibly this is related to something missing in my installation. But no error messages or warnings appear.
I suggest the use of with LOCK
is checked, in case it works in certain instances then at least an assert
statement, making sure it will work or when not throw a message, needs to be added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I'll have a look. Maybe we need an RLock
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, that solves the problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concerning the reformulation of various types of forcings the code looks correct and an explicit test using a LAM checkpoint leading to inference with boundary forcings is succesful: results identical to those of earlier anemoi-inference versions are produced.
Concerning other parts of the code I suggest another reviewer has a closer look.
|
||
|
||
class CoupleCmd(Command): | ||
"""Inspect the contents of a checkpoint file.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This docstring should be updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do there seem to be two commands to do the same thing?
- couple
- coupled
from .couple import CoupleConfiguration as CoupleConfiguration | ||
from .run import RunConfiguration as RunConfiguration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are these importing as the same name?
All beyond the as could be removed
transport: str | ||
couplings: list[dict[str, list[str]]] | ||
tasks: dict[str, dict[str, dict[str, Any]]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can these have docs like all the other properties?
else: | ||
result = np.stack([fields[v] for v in variables], axis=0) | ||
|
||
assert len(result.shape) == 3 and result.shape[0] == len(variables) and result.shape[1] == len(dates), ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given how specific this assertion is, an explanation might be useful
and r["time"] in ("0600", "1800") | ||
): | ||
|
||
r["stream"] = "scda" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain the use of this stream
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it correct that this input
does nothing at the moment?
Both functions raise a NotImplementedError
, why is it being added?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be addressed by #84
The merge-base changed after approval.
The new command
anemoi-inference couple config.yaml
will run two or more models that requires forcing variables from each others. The models can run in different threads, different processes or different MPI ranks. They can use the same GPU (if the fit) or different GPUs.Below is a sample configuration file, that runs an atmosphere model and an ocean model: