Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-structuring this repository #6

Open
agstephens opened this issue Sep 16, 2021 · 7 comments
Open

Re-structuring this repository #6

agstephens opened this issue Sep 16, 2021 · 7 comments
Assignees

Comments

@agstephens
Copy link
Contributor

agstephens commented Sep 16, 2021

We should restructure this repository so that all the releases live on the master branch. Let's use the current release that we are working on r4_decadal, as the prototype. It lives under:

https://github.com/cp4cds/c3s_34g_qc_results/tree/master/QC_Results/r4_decadal

Here is my proposed new directory structure:

QC_Results / 
    <release_id> / QC_template.json
        / <check_type> / <json or csv/gz>
        / final / <csv/gz>
        / final / <json>

So an example would be:

QC_Results / 
    r4_decadal / QC_template.json
        / time_checks / r4_decadal_time_checks.csv.gz
        / final / r4_decadal_qc_results.csv.gz
        / final / r4_decadal_qc_results.json

When we have finalised a version, then I think we create a release of the entire repository and tag it as the release name, e.g. "r4_decadal".

@glevava @wachsylon: what do you think? I am happy for an alternative approach if you have one.

@wachsylon
Copy link

I agree. I suppose we should have a total
c3s-34g_pids-datasets.txt
in the master branch in catalogs with all datasets ever provided to cds. We can also keep the individual releases but should rename them in a similar way.

In quality_control, we could have also the total QC_ .json file with all results. Then we might have merge requests for new releases?

@wachsylon
Copy link

wachsylon commented Sep 16, 2021

QC_Results / release /

I think, for our organization it would be good to have these directories. But I think we also need a total file with all releases. That is more useful with hindsight.

@agstephens
Copy link
Contributor Author

Hi @wachsylon, I like your suggestions. Since you are the first to be ready to deposit some results on the repository, I am happy for you to create the start of this structure and to push your changes. Let me know when you have done it and I will take a look.

Regarding your thought about having a copy of the super-set of releases in one place - I agree that we need this.

Should we have a top-level directory called releases that has a single complete version of each release that includes the current release along with all previous content?

@wachsylon
Copy link

So right now we have catalogs and QC_results. If we had not catalogs, we also would not need a top level directory QC_results because we can have the catalogs inside the releases. I.e., we would change QC_results to a top level directory releases, correct? I suggest:

c3s-34g_pids-datasets.csv.gz
c3s-34g_qc-results.json
releases / 
    <release_id> / QC_template.json
        / <check_type> / <json or csv/gz>
        / final / <csv/gz>
        / final / <json>

What file is

QC_Results / r4_decadal / time_checks / r4_decadal_time_checks.csv.gz

again? Is this a catalog of all datasets that one institute provides? Would it be better to have r4_decadal_dkrz.csv.gz then?

@wachsylon
Copy link

wachsylon commented Sep 21, 2021

From slack with translation into dir names:

  • A clear separation of each release (i.e. directory per release)
    releases /
    releases / <release_id>

  • all datasets/files to be QC'd (per release)
    releases / <release_id> / QC_template.json

  • an intake catalog (per release)
    releases / <release_id> / final / QC_<release_id>_catalog.csv.gz
    releases / <release_id> / final / QC_<release_id>_collection.json

  • results for each part of the QC (per release)
    releases / <release_id> /<check_type> / json

  • a combined result of the QC (per release)*
    releases / <release_id> / final / QC_results.json

And if we create a c3s_34g main repo, we could create a QC_results top level directory in that.

@wachsylon
Copy link

My desired total file is a intake catalog in the manifest repo. I would leave it there so that there is no double versioning. If we have a main repo, we would have a services top level directory next to QC_results.

@wachsylon
Copy link

@agstephens #7

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants