Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix!: Rework Loss Scalings to provide better modularity #52

Open
wants to merge 57 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
57 commits
Select commit Hold shift + click to select a range
511ed18
first version of refactor of variable scaling
sahahner Dec 27, 2024
7ddf6d6
config training changes
sahahner Dec 27, 2024
3ddeccc
avoid multiple scaling
sahahner Dec 27, 2024
be4602c
docstring and explain variable reference
sahahner Dec 31, 2024
195af07
fix to config for pressure level scaler
mc4117 Dec 31, 2024
2644c18
instantiating scalars as a list
mc4117 Dec 31, 2024
718fc57
preparing for tendency losses
mc4117 Dec 31, 2024
a34ac02
Merge branch '7-pressure-level-scalings-only-applied-in-specific-circ…
mc4117 Dec 31, 2024
b91af11
log the variable level scaling information as before
sahahner Jan 2, 2025
c22c50b
adding tendency scaler to additional scalers
pinnstorm Jan 8, 2025
1f4a532
reformatting
pinnstorm Jan 8, 2025
2843d98
updating description in configs
pinnstorm Jan 8, 2025
c978871
updating var-tendency-scaler spec
pinnstorm Jan 12, 2025
f56f9b2
updating training/default config
pinnstorm Jan 12, 2025
be90000
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jan 12, 2025
e474ae9
updating training/default.yaml
pinnstorm Jan 13, 2025
f005f84
updating training/default.yaml
pinnstorm Jan 13, 2025
7cdccc5
first try at tests
mc4117 Jan 17, 2025
61e7933
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jan 17, 2025
462bb34
variable name and level from mars metadata
sahahner Jan 17, 2025
960a602
Merge branch '7-pressure-level-scalings-only-applied-in-specific-circ…
sahahner Jan 17, 2025
af10173
get variable group and level in utils file
sahahner Jan 17, 2025
395cd6f
empty line
sahahner Jan 17, 2025
1f53a82
convert test for new strucutre. pressure level and general variable s…
sahahner Jan 17, 2025
3747959
more plausible check for availability of mars metadata
sahahner Jan 17, 2025
68cd6e3
update to tendency tests (still not working)
mc4117 Jan 17, 2025
d3a7c29
Merge branch '7-pressure-level-scalings-only-applied-in-specific-circ…
mc4117 Jan 17, 2025
d6e127a
tendency scaler tests now working
mc4117 Jan 20, 2025
fd29cbc
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jan 20, 2025
8bff68b
change function into class, extracting variable group and name
sahahner Jan 22, 2025
4c7cbc1
Merge branch '7-pressure-level-scalings-only-applied-in-specific-circ…
sahahner Jan 22, 2025
7d8c76d
correct function call
sahahner Jan 22, 2025
d928b30
correct typo in test
sahahner Jan 22, 2025
bb054ce
incorporate comments
sahahner Jan 22, 2025
d0046fa
introduce base class for all loss scalings
sahahner Jan 22, 2025
a03d6ba
type checking check after all imports
sahahner Jan 22, 2025
aa7f558
comment: explanation about variable groups in config file
sahahner Jan 22, 2025
9a8a4b9
rm if statement for tendency scaler
mc4117 Jan 22, 2025
66d66ed
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jan 22, 2025
db05ce5
use utils function to retrieve variable group and reference for valid…
sahahner Jan 22, 2025
61766cd
Merge branch '7-pressure-level-scalings-only-applied-in-specific-circ…
sahahner Jan 22, 2025
3adf924
comment in config file that scler name needs to be added to loss as w…
sahahner Jan 22, 2025
f19d69d
fix pre-commit hooks
mc4117 Jan 22, 2025
c26d744
Merge branch '7-pressure-level-scalings-only-applied-in-specific-circ…
mc4117 Jan 22, 2025
00439cb
Update description in training/default
mc4117 Jan 24, 2025
6c857a6
refactor into training/scaling both the code and the config files, re…
sahahner Jan 27, 2025
a2f2728
more scalar renaming to scaler
sahahner Jan 27, 2025
b5f6b5f
fix tendency loss
mc4117 Jan 27, 2025
b5fa55b
fix merge conflict
mc4117 Jan 27, 2025
cdb9e19
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jan 27, 2025
963c543
Add '*' to scaler selection.
HCookie Jan 27, 2025
4f1566b
Add exclusion of scalers
HCookie Jan 27, 2025
e4ceb8e
Fix scalar reference in tests
HCookie Jan 27, 2025
7178074
Add all and exclude tests
HCookie Jan 27, 2025
08b4cb3
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jan 27, 2025
0dbf0b8
fix: update all tests, move scaling module into losses
sahahner Jan 28, 2025
2dccbd2
print final variable scaling in debug mode
sahahner Jan 28, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 17 additions & 15 deletions training/docs/modules/losses.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ This module is used to define the loss function used to train the model.

Anemoi-training exposes a couple of loss functions by default to be
used, all of which are subclassed from ``BaseWeightedLoss``. This class
enables scalar multiplication, and graph node weighting.
enables scaler multiplication, and graph node weighting.

.. automodule:: anemoi.training.losses.weightedloss
:members:
Expand Down Expand Up @@ -47,26 +47,28 @@ reference it in the config as follows:
# loss function kwargs here

*********
Scalars
Scalers
*********

In addition to node scaling, the loss function can also be scaled by a
scalar. These are provided by the ``Forecaster`` class, and a user can
scaler. These are provided by the ``Forecaster`` class, and a user can
define whether to include them in the loss function by setting
``scalars`` in the loss config dictionary.
``scalers`` in the loss config dictionary.

.. code:: yaml

# loss function for the model
training_loss:
# loss class to initialise
_target_: anemoi.training.losses.mse.WeightedMSELoss
scalars: ['scalar1', 'scalar2']
scalers: ['scaler1', 'scaler2']

Currently, the following scalars are available for use:
Scalers can be added as options for the loss functions using the
`scaler` builders in `config.training.scaler`.

- ``variable``: Scale by the feature/variable weights as defined in the
config ``config.training.variable_loss_scaling``.
``*`` is a valid entry to use all `scalers` given, if a scaler is to be
excluded add `!scaler_name`, i.e. ``['*', '!scaler_1']``, and
``scaler_1`` will not be added.

********************
Validation Metrics
Expand All @@ -81,24 +83,24 @@ name
Scaling Validation Losses
=========================

Validation metrics can **not** by default be scaled by scalars across
the variable dimension, but can be by all other scalars. If you want to
Validation metrics can **not** by default be scaled by scalers across
the variable dimension, but can be by all other scalers. If you want to
scale a validation metric by the variable weights, it must be added to
`config.training.scale_validation_metrics`.

These metrics are then kept in the normalised, preprocessed space, and
thus the indexing of scalars aligns with the indexing of the tensors.
thus the indexing of scalers aligns with the indexing of the tensors.

By default, only `all` is kept in the normalised space and scaled.

.. code:: yaml

# List of validation metrics to keep in normalised space, and scalars to be applied
# List of validation metrics to keep in normalised space, and scalers to be applied
# Use '*' in reference all metrics, or a list of metric names.
# Unlike above, variable scaling is possible due to these metrics being
# calculated in the same way as the training loss, within the internal model space.
scale_validation_metrics:
scalars_to_apply: ['variable']
scalers_to_apply: ['variable']
metrics:
- 'all'
# - "*"
Expand Down Expand Up @@ -144,7 +146,7 @@ losses above.
losses:
- __target__: anemoi.training.losses.mse.WeightedMSELoss
- __target__: anemoi.training.losses.mae.WeightedMAELoss
scalars: ['variable']
scalers: ['variable']
loss_weights: [1.0,0.5]

All kwargs passed to ``CombinedLoss`` are passed to each of the loss
Expand All @@ -170,7 +172,7 @@ option ``config.training.loss_gradient_scaling=True``.

``ScaleTensor`` is a class that can record and apply arbitrary scaling
factors to tensors. It supports relative indexing, combining multiple
scalars over the same dimensions, and is only constructed at
scalers over the same dimensions, and is only constructed at
broadcasting time, so the shape can be resolved to match the tensor
exactly.

Expand Down
54 changes: 14 additions & 40 deletions training/src/anemoi/training/config/training/default.yaml
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
---
defaults:
- scalers: scalers

# resume or fork a training from a checkpoint last.ckpt or specified in hardware.files.warm_start
run_id: null
fork_run_id: null
Expand Down Expand Up @@ -46,12 +50,11 @@ zero_optimizer: False
training_loss:
# loss class to initialise
_target_: anemoi.training.losses.mse.WeightedMSELoss
# Scalars to include in loss calculation
# Available scalars include:
# - 'variable': See `variable_loss_scaling` for more information
# - 'loss_weights_mask': Giving imputed NaNs a zero weight in the loss function
scalars: ['variable', 'loss_weights_mask']

# Scalers to include in loss calculation
# A selection of available scalers are listed in training/scalers/scalers.yaml
# '*' is a valid entry to use all `scalers` given, if a scaler is to be excluded
# add `!scaler_name`, i.e. ['*', '!scaler_1'], and `scaler_1` will not be added.
scalers: ['pressure_level', 'general_variable', 'nan_mask_weights']
ignore_nans: False

loss_gradient_scaling: False
Expand All @@ -64,21 +67,21 @@ loss_gradient_scaling: False
validation_metrics:
# loss class to initialise
- _target_: anemoi.training.losses.mse.WeightedMSELoss
Scalars to include in loss calculation
Scalers to include in loss calculation
# Cannot scale over the variable dimension due to possible remappings.
# Available scalars include:
# Available scalers include:
# - 'loss_weights_mask': Giving imputed NaNs a zero weight in the loss function
# Use the `scale_validation_metrics` section to variable scale.
scalars: []
scalers: []
# other kwargs
ignore_nans: True

# List of validation metrics to keep in normalised space, and scalars to be applied
# List of validation metrics to keep in normalised space, and scalers to be applied
# Use '*' in reference all metrics, or a list of metric names.
# Unlike above, variable scaling is possible due to these metrics being
# calculated in the same way as the training loss, within the internal model space.
scale_validation_metrics:
scalars_to_apply: ['variable']
scalers_to_apply: ['general_variable', 'pressure_level']
metrics:
- 'all'
# - "*"
Expand Down Expand Up @@ -106,37 +109,8 @@ lr:
# in order to keep a constant global_lr
# global_lr = local_lr * num_gpus_per_node * num_nodes / gpus_per_model

# Variable loss scaling
# 'variable' must be included in `scalars` in the losses for this to be applied.
variable_loss_scaling:
default: 1
pl:
q: 0.6 #1
t: 6 #1
u: 0.8 #0.5
v: 0.5 #0.33
w: 0.001
z: 12 #1
sfc:
sp: 10
10u: 0.1
10v: 0.1
2d: 0.5
tp: 0.025
cp: 0.0025

metrics:
- z_500
- t_850
- u_850
- v_850

pressure_level_scaler:
_target_: anemoi.training.data.scaling.ReluPressureLevelScaler
minimum: 0.2
slope: 0.001

node_loss_weights:
_target_: anemoi.training.losses.nodeweights.GraphNodeAttribute
target_nodes: ${graph.data}
node_attribute: area_weight
58 changes: 58 additions & 0 deletions training/src/anemoi/training/config/training/scalers/scalers.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
variable_groups:
default: sfc
pl: [q, t, u, v, w, z]

# Several scalers can be added here. In order to be applied their names must be included in the loss.
# scaler name must be included in `scalers` in the losses for this to be applied.
builders:
general_variable:
# Variable groups definition for scaling by variable level.
# The variable level scaling methods are defined under additional_scalers
# A default group is required and is appended as prefix to the metric of all variables not assigned to a group.
_target_: anemoi.training.losses.scaling.variable.GeneralVariableLossScaler
scale_dim: -1 # dimension on which scaling applied
weights:
default: 1
q: 0.6 #1
t: 6 #1
u: 0.8 #0.5
v: 0.5 #0.33
w: 0.001
z: 12 #1
sp: 10
10u: 0.1
10v: 0.1
2d: 0.5
tp: 0.025
cp: 0.0025

pressure_level:
_target_: anemoi.training.losses.scaling.variable_level.ReluVariableLevelScaler
group: pl
y_intercept: 0.2
slope: 0.001
scale_dim: -1 # dimension on which scaling applied

# mask NaNs with zeros in the loss function
nan_mask_weights:
_target_: anemoi.training.losses.scaling.loss_weights_mask.NaNMaskScaler
scale_dim: (-2, -1) # dimension on which scaling applied

# tendency scalers
# scale the prognostic losses by the stdev of the variable tendencies (e.g. the 6-hourly differences of the data)
# useful if including slow vs fast evolving variables in the training (e.g. Land/Ocean vs Atmosphere)
# if using this option 'variable_loss_scalings' should all be set close to 1.0 for prognostic variables
stdev_tendency:
_target_: anemoi.training.losses.scaling.variable_tendency.StdevTendencyScaler
scale_dim: -1 # dimension on which scaling applied
var_tendency:
_target_: anemoi.training.losses.scaling.variable_tendency.VarTendencyScaler
scale_dim: -1 # dimension on which scaling applied

node_weights:
_target_: anemoi.training.losses.nodeweights.GraphNodeAttribute
target_nodes: ${graph.data}
node_attribute: area_weight
scale_dim: 2 # dimension on which scaling applied
Comment on lines +52 to +56
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class doesn't have a scale_dim attribute.
It may also be useful to add a general scale by node attribute scaler.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not yet. Refactor is still ongoing.


# limited_area_mask
5 changes: 5 additions & 0 deletions training/src/anemoi/training/data/datamodule.py
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,10 @@ def __init__(self, config: DictConfig, graph_data: HeteroData) -> None:
def statistics(self) -> dict:
return self.ds_train.statistics

@cached_property
def statistics_tendencies(self) -> dict:
return self.ds_train.statistics_tendencies

@cached_property
def metadata(self) -> dict:
return self.ds_train.metadata
Expand Down Expand Up @@ -183,6 +187,7 @@ def _get_dataset(
rollout=r,
multistep=self.config.training.multistep_input,
timeincrement=self.timeincrement,
timestep=self.config.data.timestep,
shuffle=shuffle,
grid_indices=self.grid_indices,
label=label,
Expand Down
12 changes: 12 additions & 0 deletions training/src/anemoi/training/data/dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ def __init__(
rollout: int = 1,
multistep: int = 1,
timeincrement: int = 1,
timestep: str = "6h",
shuffle: bool = True,
label: str = "generic",
effective_bs: int = 1,
Expand All @@ -57,6 +58,8 @@ def __init__(
length of rollout window, by default 12
timeincrement : int, optional
time increment between samples, by default 1
timestep : int, optional
the time frequency of the samples, by default '6h'
multistep : int, optional
collate (t-1, ... t - multistep) into the input state vector, by default 1
shuffle : bool, optional
Expand All @@ -73,6 +76,7 @@ def __init__(

self.rollout = rollout
self.timeincrement = timeincrement
self.timestep = timestep
self.grid_indices = grid_indices

# lazy init
Expand Down Expand Up @@ -104,6 +108,14 @@ def statistics(self) -> dict:
"""Return dataset statistics."""
return self.data.statistics

@cached_property
def statistics_tendencies(self) -> dict:
"""Return dataset tendency statistics."""
try:
return self.data.statistics_tendencies(self.timestep)
except (KeyError, AttributeError):
return None

@cached_property
def metadata(self) -> dict:
"""Return dataset metadata."""
Expand Down
79 changes: 0 additions & 79 deletions training/src/anemoi/training/data/scaling.py

This file was deleted.

2 changes: 1 addition & 1 deletion training/src/anemoi/training/losses/combined.py
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ def __init__(
losses:
- __target__: anemoi.training.losses.mse.WeightedMSELoss
- __target__: anemoi.training.losses.mae.WeightedMAELoss
scalars: ['variable']
scalers: ['variable']
loss_weights: [1.0,0.5]
```
"""
Expand Down
Loading
Loading