-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor variable scaling, pressure level scalings only applied in specific circumstances #52
base: develop
Are you sure you want to change the base?
refactor variable scaling, pressure level scalings only applied in specific circumstances #52
Conversation
Please consider using the knowledge about variables that come from the dataset metadata. See https://github.com/ecmwf/anemoi-transform/blob/7cbf5f3d4baa37453022a5a97e17cc71a5b8ceeb/src/anemoi/transform/variables/__init__.py#L47 |
We have given this some thought, and after wanting to use the information from the dataset in the beginning, I have opted for allowing the definition of our own groups here to use different scaling for self-defined groups. |
data_indices, | ||
).get_variable_scaling() | ||
|
||
# Instantiate the pressure level scaling class with the training configuration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if this is possible but I wonder if here we could instantiate from a list of scalars rather than specific ones? I think this is how it is done for the validation metrics
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this would be useful, as we want to allow more scalar methods for model levels/tendency/etc.
I will have a look.
|
||
# Instantiate the pressure level scaling class with the training configuration | ||
pressurelevelscaler = instantiate( | ||
config.training.pressure_level_scaler, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this config location is wrong? It should be training.variable_loss_scaling.pressure_level_scalar
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, in the config file, the pressure_level_scalar should be defined directly in training as before.
…umstances' of https://github.com/ecmwf/anemoi-core into 7-pressure-level-scalings-only-applied-in-specific-circumstances
|
@abstractmethod | ||
def get_variable_scaling(self) -> np.ndarray: ... | ||
|
||
def get_variable_group(self, variable_name: str) -> tuple[str, str, int]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think maybe we should put this function in an utils. I only say this because I want to use it in another application outside of the loss scalings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is very similar to the function from the plotting function. Will make sure to have only one function that can be used in both cases.
Hi, I would like to know what you think about making all scalers explicit in the config file. Something similar to the |
Seems like a good idea! Would you like to add this in this PR? |
I’m not sure what the best approach is. On the one hand, adding more work to this PR would increase its complexity, which might make it more logical to address this refactor in a future PR. On the other hand, this PR already introduces some changes to the configs, and the future PRs would also involve changes to the configs. From this, it might be better to have 1 PR and communicate all the changes to users at once. What do you think? |
I'm happy for it to be included in this PR! Not sure if @sahahner or @mc4117 have other views? |
for more information, see https://pre-commit.ci
# useful if including slow vs fast evolving variables in the training (e.g. Land/Ocean vs Atmosphere) | ||
# if using this option 'variable_loss_scalings' should all be set close to 1.0 for prognostic variables | ||
# stdev tendency scaler | ||
# - _target_: anemoi.training.data.scaling.StdevTendencyScaler |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should be train.scaling I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! I have just pushed an update now 🙏
Solve the problem explained in issue #7 by refactoring the variable scalings into a general variable scaling and a pressure level scaling.
@mc4117 , @pinnstorm and me came up with a new structure. This PR implements this.
This is first draft. Feedback very welcome!