You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
$ cd /home/ac.forsyth2/ez/e3sm_data_docs/utils
$ emacs update_reproduction_scripts.bash
# Edit the `for case_name` lines to contain only the relevant cases (replace `hist-GHG_0151`):
# for case_name in hist-GHG_0151; do
$ ./update_reproduction_scripts.bash
That will run ./generate_reproduction_script.bash for each case specified and place the generated reproduction scripts in run_scripts/v2/reproduce (i.e., https://github.com/E3SM-Project/e3sm_data_docs/tree/main/run_scripts/v2/reproduce will show them after they are merged into main). The generate_reproduction_script.bash script does a few things:
It uses the patch command to apply diff_patch to the original run script (which we know is in ../run_scripts/v2/original/). diff_patch is the set of changes we know need to be made to the original run script in order to generate the reproduction script.
The patch process is not perfect. Some lines won't match up exactly. So, python patch_helper.py will be run to apply patches that may have been rejected.
The rejects from the patch command will be displayed. In most cases, the previous step should have addressed all the rejections.
Caution
It's still possible not all patches were properly applied at this point. That would mean the reproduction script may be missing an important change from the original script. (One downside of using ./update_reproduction_scripts.bash is that it can run on many cases, and to avoid needing user input multiple times, simply assumes no changes from diff_patch were missed)
Tip
If the reproduction script isn't in run_scripts/v2/reproduce, then ./generate_reproduction_script.bash must have failed for that script.
(2) Creating test output to compare with the original test output
$ cd /home/ac.forsyth2/ez/e3sm_data_docs/utils
$ emacs test_reproduction_scripts.bash
# Edit the `for simulation_name` lines to contain only the relevant cases (replace `hist-GHG_0151`):
# for simulation_name in hist-GHG_0151; do
$ sbatch test_reproduction_scripts.bash
Important
This can take a long time to run. The allowed wall time is 24 hours. If it is still running after that time, it will be stopped and some cases won't be finished.
For each case, that script will do:
It will copy the reproduction script from run_scripts/v2/reproduce (and not the GitHub repo since I have $use_wget = false, which is because the reproduction script hasn't been merged to main yet). This copy will be visible in /home/ac.forsyth2/E3SMv2_test/scripts
Tip
If the copied reproduction script isn't in /home/ac.forsyth2/E3SMv2_test/scripts, then there must have been nothing to copy from run_scripts/v2/reproduce. (i.e., ./generate_reproduction_script.bash must have failed for that script.)
Initial conditions are retrieved from NERSCH HPSS using zstash. They are placed in /lcrc/group/e3sm/ac.forsyth2/E3SMv2_test -- specifically the <case-name including the v2.>/init subdirectory.
Tip
If that init directory doesn't exist or is empty, then there must have not been any initial conditions archived on NERSC HPSS for that simulation.
From /home/ac.forsyth2/E3SMv2_test/scripts, the reproduction script is run. The script is set up to run the XS_1x10_ndays test. This will generate a /lcrc/group/e3sm/ac.forsyth2/E3SMv2_test/<case-name including the v2.>/tests subdirectory.
Tip
If that tests directory doesn't exist or is empty, then the reproduction script failed to produce output.
(3) Finding the expected checksum
This will be the checksum from the original script. How do I get that though?
If not, we might be able to use the tests/ subdirectory archived on NERSC HPSS. E.g., for v2.NARRM.historical_0151:
a. Log into Globus. Authenticate for both the NERSC HPSS endpoint and the LCRC endpoint.
b. Run:
$ cd /lcrc/group/e3sm/ac.forsyth2/E3SMv2_test/zstash_extractions
$ mkdir v2.NARRM.historical_0151
$ cd v2.NARRM.historical_0151
$ source /lcrc/soft/climate/e3sm-unified/load_latest_e3sm_unified_chrysalis.sh
$ rm ~/.globus-native-apps.cfg # Not doing this can cause Globus issues.
$ zstash extract --hpss=globus://nersc/home/projects/e3sm/www/WaterCycle/E3SMv2/NARRM/v2.NARRM.historical_0151 tests/*
c. If there is no tests/ directory to extract, skip to step 4 of this list.
# Now, we've extracted just the `tests/` subdirectory.
$ cd tests
# We're now in /lcrc/group/e3sm/ac.forsyth2/E3SMv2_test/zstash_extractions/v2.NARRM.historical_0151/tests
$ for test in *_*_ndays
do
gunzip -c ${test}/run/atm.log.*.gz | grep '^ nstep, te ' | uniq > atm_${test}.txt
done
$ md5sum atm_*_ndays.txt
668fb58e3da9070640cf1ec907ac66c0 atm_XL2_1x5_ndays.txt
d. If the test listed does not include 10_ndays, as in this case, then go to step 4 of this list. If it does, then I have the expected checksum.
4. At this point, the original script's test output will have to be re-generated from scratch. E.g, for run.v2.LR.hist-GHG_0151.sh:
$ cd /home/ac.forsyth2/E3SMv2_test/data_docs_scripts
$ cp /home/ac.forsyth2/ez/e3sm_data_docs/run_scripts/v2/original/run.v2.LR.hist-GHG_0151.sh run.v2.LR.hist-GHG_0151.sh
# I need to change a few lines in this copied original script:
# readonly RUN_REFDIR="/lcrc/group/e3sm/ac.forsyth2/E3SMv2_test/v2.LR.hist-GHG_0151/init" # Changed
# readonly run='XS_1x10_ndays' # Changed (we want this to match the reproduction script's test length of 10 days!)
# do_fetch_code=true # Changed
# do_case_build=true # Changed
$ ./run.v2.LR.hist-GHG_0151.sh
Important
This will take an hour or more to run, because of do_fetch_code=true and do_case_build=true
Tip
The RUN_REFDIR directory may be deleted if sbatch test_reproduction_scripts.bash is running. This is because it clears the directory space to start fresh. To keep a separate copy of the initial conditions, using v2.LR.hist-aer_0151 as an example:
$ cd /lcrc/group/e3sm/ac.forsyth2/E3SMv2
$ mkdir -p v2.LR.hist-aer_0151
$ cd v2.LR.hist-aer_0151
$ source /lcrc/soft/climate/e3sm-unified/load_latest_e3sm_unified_chrysalis.sh
$ rm ~/.globus-native-apps.cfg
$ zstash extract -v --hpss=globus://nersc/home/projects/e3sm/www/WaterCycle/E3SMv2/LR/v2.LR.hist-aer_0151 "init/*"
# May need to enter auth code twice
# Run `rm -rf zstash/` and rerun zstash command above.
$ rm -rf zstash # We only need the init directory
$ cd /lcrc/group/e3sm/ac.forsyth2/E3SMv2/v2.NARRM.amip_0101/tests
# Once the script finishes, run the `sq` alias for:
# squeue -o "%8u %.7a %.4D %.9P %7i %.2t %.10r %.10M %.10l %j" --sort=P,-t,-p -u ${USER}
# Once the job finishes, I can get the checksum.
$ for test in *_*_ndays
do
gunzip -c ${test}/run/atm.log.*.gz | grep '^ nstep, te ' | uniq > atm_${test}.txt
done
$ md5sum atm_*_ndays.txt
c9aff4fd826f18d0872135b845090a6b atm_XS_1x10_ndays.txt
(4) Actually comparing that test output with the original test output
$ cd /home/ac.forsyth2/ez/e3sm_data_docs/utils
$ emacs check_results.bash
# Add a check line for each new script, e.g. for v2.NARRM.historical_0101:
# check_test_results E3SMv2_test NARRM historical_0101 <checksum from original script's test>
./check_results.bash
The script should not show up in the output. If it does, it will display Failed line count and/or Failed checksum. That means there is a problem in the reproduction script. It's not matching up with the output of the original script. The reproduction script needs to be fixed somehow. Return to Step (1) above.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Note that https://github.com/E3SM-Project/e3sm_data_docs/blob/main/utils/README.md provides a more general overview. This is a step-by-step guide to my process, with more explanation provided.
(1) Creating reproduction scripts
The scripts used to actually run the v2 simulations can be found in https://github.com/E3SM-Project/e3sm_data_docs/tree/main/run_scripts/v2/original. With these original scripts, I am able to do:
That will run
./generate_reproduction_script.bash
for each case specified and place the generated reproduction scripts inrun_scripts/v2/reproduce
(i.e., https://github.com/E3SM-Project/e3sm_data_docs/tree/main/run_scripts/v2/reproduce will show them after they are merged intomain
). Thegenerate_reproduction_script.bash
script does a few things:patch
command to applydiff_patch
to the original run script (which we know is in../run_scripts/v2/original/
).diff_patch
is the set of changes we know need to be made to the original run script in order to generate the reproduction script.patch
process is not perfect. Some lines won't match up exactly. So,python patch_helper.py
will be run to apply patches that may have been rejected.patch
command will be displayed. In most cases, the previous step should have addressed all the rejections.Caution
It's still possible not all patches were properly applied at this point. That would mean the reproduction script may be missing an important change from the original script. (One downside of using
./update_reproduction_scripts.bash
is that it can run on many cases, and to avoid needing user input multiple times, simply assumes no changes fromdiff_patch
were missed)Tip
If the reproduction script isn't in
run_scripts/v2/reproduce
, then./generate_reproduction_script.bash
must have failed for that script.(2) Creating test output to compare with the original test output
Important
This can take a long time to run. The allowed wall time is 24 hours. If it is still running after that time, it will be stopped and some cases won't be finished.
For each case, that script will do:
run_scripts/v2/reproduce
(and not the GitHub repo since I have$use_wget = false
, which is because the reproduction script hasn't been merged tomain
yet). This copy will be visible in/home/ac.forsyth2/E3SMv2_test/scripts
Tip
If the copied reproduction script isn't in
/home/ac.forsyth2/E3SMv2_test/scripts
, then there must have been nothing to copy fromrun_scripts/v2/reproduce
. (i.e.,./generate_reproduction_script.bash
must have failed for that script.)zstash
. They are placed in/lcrc/group/e3sm/ac.forsyth2/E3SMv2_test
-- specifically the<case-name including the v2.>/init
subdirectory.Tip
If that
init
directory doesn't exist or is empty, then there must have not been any initial conditions archived on NERSC HPSS for that simulation./home/ac.forsyth2/E3SMv2_test/scripts
, the reproduction script is run. The script is set up to run theXS_1x10_ndays
test. This will generate a/lcrc/group/e3sm/ac.forsyth2/E3SMv2_test/<case-name including the v2.>/tests
subdirectory.Tip
If that
tests
directory doesn't exist or is empty, then the reproduction script failed to produce output.(3) Finding the expected checksum
This will be the checksum from the original script. How do I get that though?
tests/
subdirectory archived on NERSC HPSS. E.g., forv2.NARRM.historical_0151
:a. Log into Globus. Authenticate for both the NERSC HPSS endpoint and the LCRC endpoint.
b. Run:
c. If there is no
tests/
directory to extract, skip to step 4 of this list.d. If the test listed does not include
10_ndays
, as in this case, then go to step 4 of this list. If it does, then I have the expected checksum.4. At this point, the original script's test output will have to be re-generated from scratch. E.g, for
run.v2.LR.hist-GHG_0151.sh
:Important
This will take an hour or more to run, because of
do_fetch_code=true
anddo_case_build=true
Tip
The
RUN_REFDIR
directory may be deleted ifsbatch test_reproduction_scripts.bash
is running. This is because it clears the directory space to start fresh. To keep a separate copy of the initial conditions, usingv2.LR.hist-aer_0151
as an example:(4) Actually comparing that test output with the original test output
The script should not show up in the output. If it does, it will display
Failed line count
and/orFailed checksum
. That means there is a problem in the reproduction script. It's not matching up with the output of the original script. The reproduction script needs to be fixed somehow. Return to Step (1) above.Assuming the script passed the test, it is ready to be added officially (i.e., on https://github.com/E3SM-Project/e3sm_data_docs/tree/main/run_scripts/v2/reproduce). Merge the script to
main
.(5) Adding reproduction scripts to the official table
I do this step on Perlmutter rather than Chrysalis (this is because there is a direct
hsi
call in the script).Once a reproduction script has been added to https://github.com/E3SM-Project/e3sm_data_docs/tree/main/run_scripts/v2/reproduce, I need to link it in the "Script" column of https://docs.e3sm.org/e3sm_data_docs/_build/html/v2/reproducing_simulations.html.
Now that the reproduction script exists, this will be accomplished by this code block in
generate_tables.py
(https://github.com/E3SM-Project/e3sm_data_docs/blob/main/utils/generate_tables.py):I add the checksum to the simulation's tuple in
generate_tables.py
, e.g.:Then, I just follow the directions at https://github.com/E3SM-Project/e3sm_data_docs/tree/main/utils#generating-tables and merge the updated tables to
main
.Beta Was this translation helpful? Give feedback.
All reactions