Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate Xmip for preprocessing CMIP6 data prior to data treatment #138

Open
Zeitsperre opened this issue Jun 30, 2023 · 1 comment
Open
Assignees
Labels
data bug Bug with data cleaned by miranda enhancement New feature or request

Comments

@Zeitsperre
Copy link
Collaborator

Proposal

CMIP6 data sometimes requires additional cleaning or treatment to remove known issues with the data (e.g. extra weeks of data, specific errors in values/metadata, inconsistent naming of coordinates, etc.). Issues in our existing data stores of CMIP6 data are difficult to track, annoying to correct, and Miranda's existing data cleaning approach is ill-suited for handling these sparse errors.

While other tools should be explored for collecting CMIP6 data (such as esgpull), we shouldn't be trying to remake the wheel, especially for a project as large and well-supported like CMIP6.

Approach

Xmip should be leveraged for this step. This could be built into Miranda as another method or submodule specifically for preprocessing (miranda.preprocessing.cmip?).

Xmip provides a post-processing module that might be of interest to xscen for building scenarios. To be determined.

@Zeitsperre Zeitsperre added enhancement New feature or request data bug Bug with data cleaned by miranda labels Jun 30, 2023
@juliettelavoie
Copy link
Collaborator

Definitely a lot of interesting features in xmip!
I think a lot of the hard coded issues and fixes in pre-processing are for oceanography, so not variables/experiment that we use often. But, it makes sense to contribute to xmip and have miranda wrap it instead of doing it separatly directly in miranda.

For the post-processing, I think we already solve of a lot of the combination problems with extract_dataset and .to_dataset. I'm not convinced we should add it to xscen until we really need it.
We also already handle grids using xesmf.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data bug Bug with data cleaned by miranda enhancement New feature or request
Projects
Status: No status
Development

No branches or pull requests

2 participants