-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Address diffs v2.12.1 to v3 #907
Conversation
This reverts commit a84b860.
Regression testing performed in #903, related to comment. Copying here: Here is the regression notebook that compares latest
For the 85/590 are not equal to ['/global/cfs/cdirs/e3sm/www/e3sm_diags/complete_run/24-12-09-main/lat_lon/COREv2_Flux/COREv2_Flux-PminusE-ANN-global_test.nc',
'/global/cfs/cdirs/e3sm/www/e3sm_diags/complete_run/24-12-09-main/lat_lon/CRU_IPCC/CRU-TREFHT-ANN-land_60S90N_ref.nc',
'/global/cfs/cdirs/e3sm/www/e3sm_diags/complete_run/24-12-09-main/lat_lon/CRU_IPCC/CRU-TREFHT-ANN-land_60S90N_test.nc',
'/global/cfs/cdirs/e3sm/www/e3sm_diags/complete_run/24-12-09-main/lat_lon/Cloud MISR/MISRCOSP-CLDTOT_TAU1.3_9.4_MISR-ANN-global_test.nc',
'/global/cfs/cdirs/e3sm/www/e3sm_diags/complete_run/24-12-09-main/lat_lon/Cloud MISR/MISRCOSP-CLDTOT_TAU1.3_MISR-ANN-global_test.nc',
'/global/cfs/cdirs/e3sm/www/e3sm_diags/complete_run/24-12-09-main/lat_lon/ERA5/ERA5-NET_FLUX_SRF-ANN-global_test.nc',
'/global/cfs/cdirs/e3sm/www/e3sm_diags/complete_run/24-12-09-main/lat_lon/ERA5/ERA5-OMEGA-200-ANN-global_test.nc',
'/global/cfs/cdirs/e3sm/www/e3sm_diags/complete_run/24-12-09-main/lat_lon/ERA5/ERA5-OMEGA-500-ANN-global_test.nc',
'/global/cfs/cdirs/e3sm/www/e3sm_diags/complete_run/24-12-09-main/lat_lon/ERA5/ERA5-OMEGA-850-ANN-global_test.nc',
'/global/cfs/cdirs/e3sm/www/e3sm_diags/complete_run/24-12-09-main/lat_lon/ERA5/ERA5-TREFHT-ANN-global_ref.nc',
'/global/cfs/cdirs/e3sm/www/e3sm_diags/complete_run/24-12-09-main/lat_lon/ERA5/ERA5-TREFHT-ANN-global_test.nc',
'/global/cfs/cdirs/e3sm/www/e3sm_diags/complete_run/24-12-09-main/lat_lon/ERA5/ERA5-TREFHT-ANN-land_test.nc',
'/global/cfs/cdirs/e3sm/www/e3sm_diags/complete_run/24-12-09-main/lat_lon/ERA5/ERA5-U-850-ANN-global_test.nc',
'/global/cfs/cdirs/e3sm/www/e3sm_diags/complete_run/24-12-09-main/lat_lon/GPCP_OAFLux/GPCP_OAFLux-PminusE-ANN-global_test.nc',
'/global/cfs/cdirs/e3sm/www/e3sm_diags/complete_run/24-12-09-main/lat_lon/MERRA2/MERRA2-NET_FLUX_SRF-ANN-global_test.nc',
'/global/cfs/cdirs/e3sm/www/e3sm_diags/complete_run/24-12-09-main/lat_lon/MERRA2/MERRA2-OMEGA-200-ANN-global_test.nc',
'/global/cfs/cdirs/e3sm/www/e3sm_diags/complete_run/24-12-09-main/lat_lon/MERRA2/MERRA2-OMEGA-500-ANN-global_test.nc',
'/global/cfs/cdirs/e3sm/www/e3sm_diags/complete_run/24-12-09-main/lat_lon/MERRA2/MERRA2-OMEGA-850-ANN-global_test.nc',
'/global/cfs/cdirs/e3sm/www/e3sm_diags/complete_run/24-12-09-main/lat_lon/MERRA2/MERRA2-TREFHT-ANN-global_ref.nc',
'/global/cfs/cdirs/e3sm/www/e3sm_diags/complete_run/24-12-09-main/lat_lon/MERRA2/MERRA2-TREFHT-ANN-global_test.nc',
'/global/cfs/cdirs/e3sm/www/e3sm_diags/complete_run/24-12-09-main/lat_lon/MERRA2/MERRA2-TREFHT-ANN-land_test.nc',
'/global/cfs/cdirs/e3sm/www/e3sm_diags/complete_run/24-12-09-main/lat_lon/MERRA2/MERRA2-TREFMNAV-ANN-global_ref.nc',
'/global/cfs/cdirs/e3sm/www/e3sm_diags/complete_run/24-12-09-main/lat_lon/MERRA2/MERRA2-TREFMNAV-ANN-global_test.nc',
'/global/cfs/cdirs/e3sm/www/e3sm_diags/complete_run/24-12-09-main/lat_lon/MERRA2/MERRA2-TREFMXAV-ANN-global_ref.nc',
'/global/cfs/cdirs/e3sm/www/e3sm_diags/complete_run/24-12-09-main/lat_lon/MERRA2/MERRA2-TREFMXAV-ANN-global_test.nc',
'/global/cfs/cdirs/e3sm/www/e3sm_diags/complete_run/24-12-09-main/lat_lon/SST_CL_HadISST/HadISST_CL-SST-ANN-global_test.nc',
'/global/cfs/cdirs/e3sm/www/e3sm_diags/complete_run/24-12-09-main/lat_lon/SST_HadISST/HadISST-SST-ANN-global_test.nc',
'/global/cfs/cdirs/e3sm/www/e3sm_diags/complete_run/24-12-09-main/lat_lon/SST_PD_HadISST/HadISST_PD-SST-ANN-global_test.nc',
'/global/cfs/cdirs/e3sm/www/e3sm_diags/complete_run/24-12-09-main/lat_lon/SST_PI_HadISST/HadISST_PI-SST-ANN-global_test.nc',
'/global/cfs/cdirs/e3sm/www/e3sm_diags/complete_run/24-12-09-main/meridional_mean_2d/MERRA2/MERRA2-OMEGA-ANN-global_ref.nc',
'/global/cfs/cdirs/e3sm/www/e3sm_diags/complete_run/24-12-09-main/polar/CRU_IPCC/CRU-TREFHT-ANN-polar_N_ref.nc',
'/global/cfs/cdirs/e3sm/www/e3sm_diags/complete_run/24-12-09-main/polar/CRU_IPCC/CRU-TREFHT-ANN-polar_N_test.nc',
'/global/cfs/cdirs/e3sm/www/e3sm_diags/complete_run/24-12-09-main/polar/ERA5/ERA5-TREFHT-ANN-polar_N_ref.nc',
'/global/cfs/cdirs/e3sm/www/e3sm_diags/complete_run/24-12-09-main/polar/ERA5/ERA5-TREFHT-ANN-polar_N_test.nc',
'/global/cfs/cdirs/e3sm/www/e3sm_diags/complete_run/24-12-09-main/polar/ERA5/ERA5-TREFHT-ANN-polar_S_ref.nc',
'/global/cfs/cdirs/e3sm/www/e3sm_diags/complete_run/24-12-09-main/polar/ERA5/ERA5-TREFHT-ANN-polar_S_test.nc',
'/global/cfs/cdirs/e3sm/www/e3sm_diags/complete_run/24-12-09-main/polar/ERA5/ERA5-U-850-ANN-polar_N_test.nc',
'/global/cfs/cdirs/e3sm/www/e3sm_diags/complete_run/24-12-09-main/polar/ERA5/ERA5-U-850-ANN-polar_S_test.nc',
'/global/cfs/cdirs/e3sm/www/e3sm_diags/complete_run/24-12-09-main/polar/MERRA2/MERRA2-TREFHT-ANN-polar_N_ref.nc',
'/global/cfs/cdirs/e3sm/www/e3sm_diags/complete_run/24-12-09-main/polar/MERRA2/MERRA2-TREFHT-ANN-polar_N_test.nc',
'/global/cfs/cdirs/e3sm/www/e3sm_diags/complete_run/24-12-09-main/polar/MERRA2/MERRA2-TREFHT-ANN-polar_S_ref.nc',
'/global/cfs/cdirs/e3sm/www/e3sm_diags/complete_run/24-12-09-main/polar/MERRA2/MERRA2-TREFHT-ANN-polar_S_test.nc',
'/global/cfs/cdirs/e3sm/www/e3sm_diags/complete_run/24-12-09-main/polar/MERRA2/MERRA2-TREFMNAV-ANN-polar_N_ref.nc',
'/global/cfs/cdirs/e3sm/www/e3sm_diags/complete_run/24-12-09-main/polar/MERRA2/MERRA2-TREFMNAV-ANN-polar_N_test.nc',
'/global/cfs/cdirs/e3sm/www/e3sm_diags/complete_run/24-12-09-main/polar/MERRA2/MERRA2-TREFMNAV-ANN-polar_S_ref.nc',
'/global/cfs/cdirs/e3sm/www/e3sm_diags/complete_run/24-12-09-main/polar/MERRA2/MERRA2-TREFMNAV-ANN-polar_S_test.nc',
'/global/cfs/cdirs/e3sm/www/e3sm_diags/complete_run/24-12-09-main/polar/MERRA2/MERRA2-TREFMXAV-ANN-polar_N_ref.nc',
'/global/cfs/cdirs/e3sm/www/e3sm_diags/complete_run/24-12-09-main/polar/MERRA2/MERRA2-TREFMXAV-ANN-polar_N_test.nc',
'/global/cfs/cdirs/e3sm/www/e3sm_diags/complete_run/24-12-09-main/polar/MERRA2/MERRA2-TREFMXAV-ANN-polar_S_ref.nc',
'/global/cfs/cdirs/e3sm/www/e3sm_diags/complete_run/24-12-09-main/polar/MERRA2/MERRA2-TREFMXAV-ANN-polar_S_test.nc',
'/global/cfs/cdirs/e3sm/www/e3sm_diags/complete_run/24-12-09-main/polar/SST_CL_HadISST/HadISST_CL-SST-ANN-polar_N_test.nc',
'/global/cfs/cdirs/e3sm/www/e3sm_diags/complete_run/24-12-09-main/polar/SST_CL_HadISST/HadISST_CL-SST-ANN-polar_S_test.nc',
'/global/cfs/cdirs/e3sm/www/e3sm_diags/complete_run/24-12-09-main/polar/SST_PD_HadISST/HadISST_PD-SST-ANN-polar_N_test.nc',
'/global/cfs/cdirs/e3sm/www/e3sm_diags/complete_run/24-12-09-main/polar/SST_PD_HadISST/HadISST_PD-SST-ANN-polar_S_test.nc',
'/global/cfs/cdirs/e3sm/www/e3sm_diags/complete_run/24-12-09-main/polar/SST_PI_HadISST/HadISST_PI-SST-ANN-polar_N_test.nc',
'/global/cfs/cdirs/e3sm/www/e3sm_diags/complete_run/24-12-09-main/polar/SST_PI_HadISST/HadISST_PI-SST-ANN-polar_S_test.nc',
'/global/cfs/cdirs/e3sm/www/e3sm_diags/complete_run/24-12-09-main/zonal_mean_2d/ERA5/ERA5-OMEGA-ANN-global_test.nc',
'/global/cfs/cdirs/e3sm/www/e3sm_diags/complete_run/24-12-09-main/zonal_mean_2d/MERRA2/MERRA2-OMEGA-ANN-global_test.nc',
'/global/cfs/cdirs/e3sm/www/e3sm_diags/complete_run/24-12-09-main/zonal_mean_2d_stratosphere/ERA5/ERA5-H2OLNZ-ANN-global_ref.nc',
'/global/cfs/cdirs/e3sm/www/e3sm_diags/complete_run/24-12-09-main/zonal_mean_2d_stratosphere/ERA5/ERA5-H2OLNZ-ANN-global_test.nc',
'/global/cfs/cdirs/e3sm/www/e3sm_diags/complete_run/24-12-09-main/zonal_mean_2d_stratosphere/MERRA2/MERRA2-H2OLNZ-ANN-global_ref.nc',
'/global/cfs/cdirs/e3sm/www/e3sm_diags/complete_run/24-12-09-main/zonal_mean_2d_stratosphere/MERRA2/MERRA2-H2OLNZ-ANN-global_test.nc',
'/global/cfs/cdirs/e3sm/www/e3sm_diags/complete_run/24-12-09-main/zonal_mean_xy/ERA5/ERA5-TREFHT-ANN-global_ref.nc',
'/global/cfs/cdirs/e3sm/www/e3sm_diags/complete_run/24-12-09-main/zonal_mean_xy/ERA5/ERA5-TREFHT-ANN-global_test.nc',
'/global/cfs/cdirs/e3sm/www/e3sm_diags/complete_run/24-12-09-main/zonal_mean_xy/MERRA2/MERRA2-TREFHT-ANN-global_ref.nc',
'/global/cfs/cdirs/e3sm/www/e3sm_diags/complete_run/24-12-09-main/zonal_mean_xy/MERRA2/MERRA2-TREFHT-ANN-global_test.nc',
'/global/cfs/cdirs/e3sm/www/e3sm_diags/complete_run/24-12-09-main/zonal_mean_xy/MERRA2/MERRA2-TREFMNAV-ANN-global_ref.nc',
'/global/cfs/cdirs/e3sm/www/e3sm_diags/complete_run/24-12-09-main/zonal_mean_xy/MERRA2/MERRA2-TREFMNAV-ANN-global_test.nc',
'/global/cfs/cdirs/e3sm/www/e3sm_diags/complete_run/24-12-09-main/zonal_mean_xy/MERRA2/MERRA2-TREFMXAV-ANN-global_ref.nc',
'/global/cfs/cdirs/e3sm/www/e3sm_diags/complete_run/24-12-09-main/zonal_mean_xy/MERRA2/MERRA2-TREFMXAV-ANN-global_test.nc',
'/global/cfs/cdirs/e3sm/www/e3sm_diags/complete_run/24-12-09-main/zonal_mean_xy/SST_CL_HadISST/HadISST_CL-SST-ANN-global_test.nc',
'/global/cfs/cdirs/e3sm/www/e3sm_diags/complete_run/24-12-09-main/zonal_mean_xy/SST_PD_HadISST/HadISST_PD-SST-ANN-global_test.nc',
'/global/cfs/cdirs/e3sm/www/e3sm_diags/complete_run/24-12-09-main/zonal_mean_xy/SST_PI_HadISST/HadISST_PI-SST-ANN-global_test.nc'] |
thank you @tomvothecoder for summarizing the cases to investigate. It look like I'm running into environment issue while debugging with vscode #906 (comment). Let me know if the env works for you when debugging. |
I just replied here. The dev env works fine for me. |
I think I found the root cause of the large diffs here and in your comment here, specifically with
|
temp = udunits(1.0, var.units) | |
coeff, offset = temp.how(target_units) | |
var = coeff * var + offset | |
var.units = target_units | |
Latest main
uses cf_units.Unit
with new implementation logic
I replaced genutil.udnits
with cf_units.Unit
. It looks like the conversion is incorrect for TREFHT
between K
to C
.
e3sm_diags/e3sm_diags/derivations/utils.py
Lines 85 to 93 in df10271
temp = cf_units.Unit(var.attrs["units"]) | |
target = cf_units.Unit(target_units) | |
coeff, offset = temp.convert(1, target), temp.convert(0, target) | |
# Keep all of the attributes except the units. | |
with xr.set_options(keep_attrs=True): | |
var = coeff * var + offset | |
var.attrs["units"] = target_units |
The implementation might not be able to handle all variables and/or the logic is incorrect, which leads to the massive diff found in your comment here.
Next steps
Investigate the cf_units.Unit
code block in convert_units()
. Fix as needed.
I just pushed Next steps
|
- Addresses performance bottleneck associated with attempting to load large datasets into memory. Time slicing reduces the size before loading into memory
46e77ef
to
b9da444
Compare
- Constraint `dask <2024.12.0`
- Subset and load the climo dataset with the source variables before deriving the target variable to use the appropriate Dask scheduler
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chengzhuzhang FYI if we are still experiencing slowness or hanging with a complete run in #880, I will carry this fix over to that PR as well and re-run.
For cases where deriving a climo variable occurs, this fix will first subset the dataset with the source variables and load it into memory before performing the actual derivation.
This fix is required because the convert_units()
function will try to load the variables into memory using the wrong scheduler by calling .values
instead of .load(scheduler="sync"
), which causes hanging/crashing (code below).
e3sm_diags/e3sm_diags/derivations/utils.py
Lines 85 to 89 in 3ea5422
original_udunit = cf_units.Unit(var.attrs["units"]) | |
target_udunit = cf_units.Unit(target_units) | |
var.values = original_udunit.convert(var.values, target_udunit) | |
var.attrs["units"] = target_units |
@tomvothecoder hey, Tom. Just a heads-up. I think #880 is in good shape and ready to merge. |
- Avoids issue where time bounds are generated for singleton coords that have issues (e.g., OMI files) - Update `squeeze_time_dim()` to catch cases where time_dim does not exist in the dataset - Add logger info messages to print out climatology and time series filepaths and derived variable filepaths
Great, I'll perform a final review before merging. Thanks!
In this PR, I’m debugging the final variables that fail to generate for Here’s a script from # %%
"""
Issue 1 - Sub-optimal `CLOUD` and `time_bnds chunking
* Related dataset: "/global/cfs/cdirs/e3sm/chengzhu/tutorial2024/v3.LR.historical_0101/post/atm/site/CLOUD_twpc3_200001_201412.nc"
* Dataset shape is (time: 131400, bound: 2, lev: 80)
* `CLOUD` variable has chunks of (1, 80), resulting in 131400 chunks in 2 graph layers. (very bad, slow loading)
* `time_bnds` has chunks of (1, 2), resulting in 131400 chunks in 3 graph layers. (very bad, slow loading)
"""
import xcdat as xc
ds = xc.open_mfdataset(
[
"/global/cfs/cdirs/e3sm/chengzhu/tutorial2024/v3.LR.historical_0101/post/atm/site/CLOUD_twpc3_200001_201412.nc"
]
)
print(ds.CLOUD.data)
# Array Chunk
# Bytes 40.10 MiB 320 B
# Shape (131400, 80) (1, 80)
# Dask graph 131400 chunks in 2 graph layers
# Data type float32 numpy.ndarray
print(ds.time_bnds.data)
# Array Chunk
# Bytes 2.01 MiB 16 B
# Shape (131400, 2) (1, 2)
# Dask graph 131400 chunks in 3 graph layers
# Data type object numpy.ndarray
# %%
"""
Issue 2 - Sub-optimal `time_bnds` chunking
* Related dataset: "/global/cfs/cdirs/e3sm/chengzhu/tutorial2024/v3.LR.historical_0101/post/atm/site/FLDS_sgpc1_200001_201412.nc"
* Dataset shape is (time: 131400, bound: 2, lev: 80)
* `FLDS` variable has chunks of (1019,), resulting in 129 in 2 graph layers (okay)
* `time_bnds` has chunks of (1, 2), resulting in 131400 chunks in 3 graph layers (very bad, slow loading)
"""
ds2 = xc.open_mfdataset(
[
"/global/cfs/cdirs/e3sm/chengzhu/tutorial2024/v3.LR.historical_0101/post/atm/site/FLDS_sgpc1_200001_201412.nc"
]
)
print(ds2.FLDS.data)
# Array Chunk
# Bytes 513.28 kiB 3.98 kiB
# Shape (131400,) (1019,)
# Dask graph 129 chunks in 2 graph layers
# Data type float32 numpy.ndarray
print(ds2.time_bnds.data)
# Array Chunk
# Bytes 2.01 MiB 16 B
# Shape (131400, 2) (1, 2)
# Dask graph 131400 chunks in 3 graph layers
# Data type object numpy.ndarray
# %%
"""
Issue 3 - Sub-optimal `time_bnds` chunking
* Related dataset: "/global/cfs/cdirs/e3sm/chengzhu/tutorial2024/v3.LR.historical_0101/post/atm/site/PRECT_sgpc1_200001_201412.nc"
* Dataset shape is (time: 131400, bound: 2, lev: 80)
* `PRECT` variable has chunks of (1019,), resulting in 129 in 2 graph layers (okay)
* `time_bnds` has chunks of (1, 2), resulting in 131400 chunks in 3 graph layers (very bad, slow loading)
"""
ds3 = xc.open_mfdataset(
[
"/global/cfs/cdirs/e3sm/chengzhu/tutorial2024/v3.LR.historical_0101/post/atm/site/PRECT_sgpc1_200001_201412.nc"
]
)
print(ds3.PRECT.data)
# Array Chunk
# Bytes 513.28 kiB 3.98 kiB
# Shape (131400,) (1019,)
# Dask graph 129 chunks in 2 graph layers
# Data type float32 numpy.ndarray
print(ds3.time_bnds.data)
# Array Chunk
# Bytes 2.01 MiB 16 B
# Shape (131400, 2) (1, 2)
# Dask graph 131400 chunks in 3 graph layers
# Data type object numpy.ndarray
# %%
"""
Issue 4 - Sub-optimal `num_a1` and `time_bnds` chunking
* Related dataset: "/global/cfs/cdirs/e3sm/chengzhu/tutorial2024/v3.LR.historical_0101/post/atm/site/num_a1_enac1_200001_201412.nc"
* Dataset shape is (time: 131400, bound: 2, lev: 80)
* `num_a1` variable has chunks of (1, 80), resulting in 131400 chunks in 2 graph layers. (very bad, slow loading)
* `time_bnds` has chunks of (1, 2), resulting in 131400 chunks in 3 graph layers (very bad, slow loading)
"""
ds4 = xc.open_mfdataset(
[
"/global/cfs/cdirs/e3sm/chengzhu/tutorial2024/v3.LR.historical_0101/post/atm/site/num_a1_enac1_200001_201412.nc"
]
)
print(ds4.num_a1.data)
# Array Chunk
# Bytes 40.10 MiB 320 B
# Shape (131400, 80) (1, 80)
# Dask graph 131400 chunks in 2 graph layers
# Data type float32 numpy.ndarray
print(ds4.time_bnds.data)
# Array Chunk
# Bytes 2.01 MiB 16 B
# Shape (131400, 2) (1, 2)
# Dask graph 131400 chunks in 3 graph layers
# Data type object numpy.ndarray
# %%
"""
Issue 5 - Sub-optimal `time_bnds` chunking
* Related dataset: "/global/cfs/cdirs/e3sm/chengzhu/tutorial2024/v3.LR.historical_0101/post/atm/site/PRECT_twpc3_200001_201412.nc"
* Dataset shape is (time: 131400, bound: 2, lev: 80)
* `PRECT` variable has chunks of (1019,), resulting in 129 in 2 graph layers (okay)
* `time_bnds` has chunks of (1, 2), resulting in 131400 chunks in 3 graph layers (very bad, slow loading)
"""
ds5 = xc.open_mfdataset(
[
"/global/cfs/cdirs/e3sm/chengzhu/tutorial2024/v3.LR.historical_0101/post/atm/site/PRECT_twpc3_200001_201412.nc"
]
)
print(ds5.time_bnds.data)
# Array Chunk
# Bytes 2.01 MiB 16 B
# Shape (131400, 2) (1, 2)
# Dask graph 131400 chunks in 3 graph layers
# Data type object numpy.ndarray |
@tomvothecoder I'm trying to understand the new emerging issues. I feel we haven't seen these when we tested the refactor branch. Another difference is that we are now testing a new dataset as well v3 as compared to v2 that is used for testing during refactor. It looks like firstly, I'm now testing with the v2 dataset here: /global/cfs/cdirs/e3sm/e3sm_diags/postprocessed_e3sm_v2_data_for_e3sm_diags/20221103.v2.LR.amip.NGD_v3atm.chrysalis/arm-diags-data/. The time record is longer in this v2 dataset which is from 1985-2014. |
I updated the pre-processing script for single site output, and created correctly chunked datasets: /global/cfs/cdirs/e3sm/chengzhu/tutorial2024/v3.LR.historical_0101/post/atm/site/ The datasets load fast but I ran into an issue with the
|
313a641 should addressed the data problem reported #907 (comment). The lat, and lon values has time as their dimension which offended the diurnal cycle routine. I'm generating a new set of datasets and will replace the files under The old version of site data is now located at |
Thanks Jill. I will do an This PR looks like it's almost done. I'm wrapping up regression testing (v3 is good to go, v2 almost done just |
I can confirm that the performance issue is resolved by the new dataset in place. The full arm set completed in 2 mins in my test. |
To recap on this issue, the root cause is that the original set of site-based output was pre-processed with a script that is based on |
@chengzhuzhang Great, thank you for summarizing the I will complete regression testing with v2 data, fix the integration tests, and do a final review before merging this PR. |
yeah, I'm glad we don't need to update the code base for workaround this data issue. I haven't re-produced v2 site data, so the complete v2 run still has to exclude |
e3sm_diags/driver/qbo_driver.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chengzhuzhang We need to replace cwt
from scipy with PyWavelets (#912) I'm trying to figure out where to pass the deg = 6
arg here. If you can try taking a look too that'd be great.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related SciPy docs:
- https://docs.scipy.org/doc/scipy-1.12.0/reference/generated/scipy.signal.cwt.html
- docs.scipy.org/doc/scipy-1.12.0/reference/generated/scipy.signal.morlet2.html
Related PyWavelets docs:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@whannah1 hey Walter, as we are preparing for the e3sm_diags v3 release, @tomvothecoder found that scipy.signal.cwt
and scipy.signal.morlet2
are removed from latest scipy
, and the suggestion was to use pywavelets
instead. It seems that transition to use pywavelets API is not straightforward and not much documentation was found to help transition. I'm wondering if you are familiar with the this tool and could help re-write this _get_psd_from_wavelet
function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chengzhuzhang I suggest that we:
- Not rush to finish this issue for v3.0.0rc1 due to high risk of side-effects and bugs with new, unfamiliar API
- Move this work to a separate PR and try to get it done during E3SM Unified RC testing phase
- Constrain
scipy < 1.15
for now -- FYI @xylar for E3SM Unified (will update E3SM Confluence page)
Let me know thoughts.
UPDATE: Just saw Xylar's comment here related to 3.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All are reasonable! I was thinking along the same line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Constrain
scipy < 1.15
for now -- FYI @xylar for E3SM Unified (will update E3SM Confluence page)
I want to make sure we're on the same page. This constraint would not be in E3SM-Unified, it would be in e3sm_diags. But it would be important that E3SM-Unified or its other dependencies don't contradict this constraint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Constrain
scipy < 1.15
for now -- FYI @xylar for E3SM Unified (will update E3SM Confluence page)I want to make sure we're on the same page. This constraint would not be in E3SM-Unified, it would be in e3sm_diags. But it would be important that E3SM-Unified or its other dependencies don't contradict this constraint.
Yup we're on the same page. Thanks for confirming.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tomvothecoder, see comment from @whannah1 here. We can either implement this as-is, and testing and fine tune (if needed) during testing period. Or pinning scipy
, and update to use pywavelets
during testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chengzhuzhang I'm going to pin scipy
here and address #912 in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chengzhuzhang Here is my final self-review. All of the code changes look good to me. I will merge if you agree with the testing results below.
v2 data complete run results (notebook):
Results:
We are good to go here. The differences are expected and explained in the comments below.
stat_name value pct
0 matching_files_count 1093 0.875801
1 missing_vars_count 0 0.000000
2 mismatch_errors_count 10 0.008013
3 not_equal_errors_count 143 0.114583
4 key_errors_count 0 0.000000
5 missing_files_count 2 0.001603
- 1093/1246 matching files
- 10/1246 mismatch errors due to ccb regional subsetting differences
- 143/1246 not equal files -- number of different elements are really small and stats (min, max, mean, sum) are similar, most likely related to the
convert_units()
change to replacegenutil.udunits
withcf_units.Unit
.- Except
Q
variable, differences are explained by not updating theQ
results for main branch with unit conversion changes. Non-issue.
- Except
- 2/1246 are missing because
AOD_550
retired.
v3 data complete run results (notebook)
We are good to go here. The differences are expected and explained in the comments below.
stat_name value pct
0 matching_files_count 553 0.937288
1 missing_vars_count 0 0.000000
2 mismatch_errors_count 4 0.006780
3 not_equal_errors_count 33 0.055932
4 key_errors_count 0 0.000000
5 missing_files_count 0 0.000000
- 553 match within the rtol of 1e-5, which is awesome.
- 4 mismatch errors are known issues due to regional subsetting differences with "ccb" flag
- 33 not equal errors are not a concern because they affect very small number of elements in the dataset. The stats (min, max, mean, and sum) of the datasets between branches are close.
...ary_tools/cdat_regression_testing/843-migration-phase3/run-script-model-vs-obs/run_script.py
Outdated
Show resolved
Hide resolved
...ary_tools/cdat_regression_testing/843-migration-phase3/run-script-model-vs-obs/run_script.py
Outdated
Show resolved
Hide resolved
auxiliary_tools/cdat_regression_testing/906-v2_complete_run/run_script.py
Outdated
Show resolved
Hide resolved
@tomvothecoder thank you for the regression tests! the metrics data look good for releasing the first |
@tomvothecoder somehow, I can't find |
Sounds good!
I forgot to make some of the directories |
The directory is now available here: https://portal.nersc.gov/cfs/e3sm/e3sm_diags/complete_run/25-01-15-branch-907/ |
Description
zppy
test using Xarray/xCDAT codebase #906Bug Fixes:
cf_units.Unit
that replacedgenutil.udunits
, which previously caused incorrect results.Performance Improvements:
.load(scheduler="sync")
first, as theconvert_units()
function requires in-memory access forcf_units.Unit
operations -- affectedMERRA2
datasetsOMI-MLS
datasets_get_slice_from_time_bounds()
to load time bounds into memory if they already exist using.load(scheduler="sync")
to avoid hanging -- affectedstreamflow
datasetsH2OLNZ
via Convert H2OLNZ units to ppm by volume #874 to address large differencesTodo
arm_diags
due to chunking issue with dataset described abovearm_diags
due to chunking issue with dataset described aboveconvert_units()
changeconvert_units()
change.Q
variable diffs -- the unit conversion changed fromg/kg
tog/kg
by volume via 4cda0a8, and I didn't re-runzonal_mean_2d_stratosphere
complete run to update/main
results. This is good to go.arm_diags
with v2 and v3 data with correct chunkingopen_dataset()
when only a single file is needed.streamflow
plotsOpening time series files: ['/global/cfs/cdirs/e3sm/e3sm_diags/postprocessed_e3sm_v2_data_for_e3sm_diags/20210528.v2rc3e.piControl.ne30pg2_EC30to60E2r2.chrysalis/time-series/rgr/RIVER_DISCHARGE_OVER_LAND_LIQ_005101_006012.nc']
time_bounds
variable in the dataset without calling.load(scheduler="sync"
) first..load(scheduler="sync"
) if time bounds underlying array is a Dask Array object.AttributeError: module 'scipy.signal' has no attribute 'cwt'
(deprecated, use PyWavelets instead) #912scipy <1.15
Regression testing with v2 data and arm_diags
Link here: #907 (comment)
Checklist
If applicable: