Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

shell_commands for tests with two testmods listed don't concatenate both together.. #2037

Closed
ekluzek opened this issue Jun 22, 2023 · 12 comments
Assignees
Labels
closed: wontfix We won't fix this issue, because it would be too difficult and/or isn't important enough to fix enhancement new capability or improved behavior of existing capability external issue needs to be addressed elsewhere (submodule); issue here for the sake of project tracking

Comments

@ekluzek
Copy link
Collaborator

ekluzek commented Jun 22, 2023

Brief summary of bug

The shell_commands file created for a test with two testmods such as...

SMS_Ld10_D_Mmpi-serial.CLM_USRDAT.I1PtClm51Fates.izumi_nag.clm-NEON-FATES-SJER--clm-FatesPRISM

seems to execute the first test-mod, but create the shell_commands for the testcase that just contains the second.

General bug information

CTSM version you are using: ctsm5.1.dev129

Does this bug cause significantly incorrect results in the model's science? No

Configurations affected: Tests in the test list that include two testmods

This would be an issue in cime, rather than in CTSM. And possibly this behavior is OK, but it makes it hard to see what the test is actually doing.

@ekluzek ekluzek added type: -discussion bug something is working incorrectly next this should get some attention in the next week or two. Normally each Thursday SE meeting. labels Jun 22, 2023
@ekluzek
Copy link
Collaborator Author

ekluzek commented Jun 22, 2023

I also note that the order for the testmods FATES NEON tests is the opposite to the BGC NEON tests. This should be figured out...

@wwieder
Copy link
Contributor

wwieder commented Jun 22, 2023

Could one solution to this and #1949 be to use usermods in NEON/defaultshell_commands to key off of the compset being used and customize FATES namelist setting in? For example:

if [[ $compset =~ *Fates* ]]; then
    echo "fsurdat ='$DIN_LOC_ROOT/lnd/clm2/surfdata_map/NEON/16PFT_mixed/surfdata_1x1_NEON_${NEONSITE}_hist_16pfts_Irrig_CMIP6_simyr2000_c230120.nc'" >> user_nl_clm
    echo "hist_fincl2 = 'LIST','OF','FATES','HISTORY','FILES' " >> user_nl_clm 
fi

@ekluzek
Copy link
Collaborator Author

ekluzek commented Jun 22, 2023

@wwieder I like that suggestion for duplication. This issue here is pretty specific to how testmods work for us. And I probably need to file a specific issue with cime, but I need to investigate more.

@wwieder
Copy link
Contributor

wwieder commented Jun 28, 2023

Should this come in with #2044?

@billsacks
Copy link
Member

@ekluzek - I see the value in what you're saying: having a full list of all shell_commands for a test. This is more of a concern now that (1) we allow multiple testmods and (2) the shell_commands can be pretty complex (if they were simply a list of xmlchange commands, you could see what was done in the CaseStatus file).

I think you're going to run into trouble if you try to append to the shell_commands without removing it at the start of the application of each testmod (https://github.com/ESMCI/cime/blob/95e29f9736d2ef41b690f082afe8509caadecb08/CIME/user_mod_support.py#L29-L31), because you'll end up re-executing commands multiple times (which may often be okay, but in some cases may be problematic). My suggestion would be to introduce a new file like shell_commands_all_testmods into which you copy the contents of shell_commands after executing them (e.g., after this block: https://github.com/ESMCI/cime/blob/95e29f9736d2ef41b690f082afe8509caadecb08/CIME/user_mod_support.py#L118-L121)

@billsacks billsacks removed the next this should get some attention in the next week or two. Normally each Thursday SE meeting. label Jul 6, 2023
@samsrabin
Copy link
Collaborator

Just a note: As part of #1863, I had bumped the CIME version from cime6.0.108 to cime6.0.118. It turns out that the 5 tests like this—with multiple testmods listed—crash in the XML generation step under the latter. I'm not sure whether this is resolved in one of the subsequent CIME tags, but on a quick scan none of their commit messages suggest they'd have an effect.

@samsrabin
Copy link
Collaborator

samsrabin commented Jul 12, 2023

Hmm. I just rolled back to cime6.0.108 and I'm still getting the crashes. This happens with my code as well as with ctsm5.1.dev130.

Here's the crash log, from /scratch/cluster/samrabin/tests_0712-100716iz/SMS_L10d_D_Mmpi-serial.CLM_USRDAT.I1PtClm51Bgc.izumi_nag.clm-default--clm-NEON-NIWO.C.0712-100716iz on Izumi:

2023-07-12 10:07:39: Test 'SMS_L10d_D_Mmpi-serial.CLM_USRDAT.I1PtClm51Bgc.izumi_nag.clm-default--clm-NEON-NIWO' failed in phase 'XML' with exception ''NoneType' object has no attribute 'group''
  File "/home/samrabin/ctsm_rx_crop_calendars4/cime/CIME/test_scheduler.py", line 1093, in _run_catch_exceptions
    return run(test)
  File "/home/samrabin/ctsm_rx_crop_calendars4/cime/CIME/test_scheduler.py", line 885, in _xml_phase
    opt = match.group(1)

 ---------------------------------------------------
2023-07-12 10:07:41: Test 'SMS_L10d_D_Mmpi-serial.CLM_USRDAT.I1PtClm51Bgc.izumi_nag.clm-default--clm-NEON-NIWO' failed in phase 'XML' with exception 'ERROR: locked: cannot make child group in file /scratch/cluster/samrabin/tests_0712-100716iz/SMS_L10d_D_Mmpi-serial.CLM_USRDAT.I1PtClm51Bgc.izumi_nag.clm-default--clm-NEON-NIWO.C.0712-100716iz/env_test.xml'
  File "/home/samrabin/ctsm_rx_crop_calendars4/cime/CIME/test_scheduler.py", line 1093, in _run_catch_exceptions
    return run(test)
  File "/home/samrabin/ctsm_rx_crop_calendars4/cime/CIME/test_scheduler.py", line 790, in _xml_phase
    envtest.add_elements_by_group(files, {}, "env_test.xml")
  File "/home/samrabin/ctsm_rx_crop_calendars4/cime/CIME/XML/entry_id.py", line 448, in add_elements_by_group
    newgroup = self.make_child(name="group", attributes={"id": gname})
  File "/home/samrabin/ctsm_rx_crop_calendars4/cime/CIME/XML/generic_xml.py", line 311, in make_child
    "read_only" if self.read_only else "locked", name, self.filename
  File "/home/samrabin/ctsm_rx_crop_calendars4/cime/CIME/utils.py", line 166, in expect
    raise exc_type(msg)

 ---------------------------------------------------
2023-07-12 10:07:43: Test 'SMS_L10d_D_Mmpi-serial.CLM_USRDAT.I1PtClm51Bgc.izumi_nag.clm-default--clm-NEON-NIWO' failed in phase 'XML' with exception 'ERROR: locked: cannot make child group in file /scratch/cluster/samrabin/tests_0712-100716iz/SMS_L10d_D_Mmpi-serial.CLM_USRDAT.I1PtClm51Bgc.izumi_nag.clm-default--clm-NEON-NIWO.C.0712-100716iz/env_test.xml'
  File "/home/samrabin/ctsm_rx_crop_calendars4/cime/CIME/test_scheduler.py", line 1093, in _run_catch_exceptions
    return run(test)
  File "/home/samrabin/ctsm_rx_crop_calendars4/cime/CIME/test_scheduler.py", line 790, in _xml_phase
    envtest.add_elements_by_group(files, {}, "env_test.xml")
  File "/home/samrabin/ctsm_rx_crop_calendars4/cime/CIME/XML/entry_id.py", line 448, in add_elements_by_group
    newgroup = self.make_child(name="group", attributes={"id": gname})
  File "/home/samrabin/ctsm_rx_crop_calendars4/cime/CIME/XML/generic_xml.py", line 311, in make_child
    "read_only" if self.read_only else "locked", name, self.filename
  File "/home/samrabin/ctsm_rx_crop_calendars4/cime/CIME/utils.py", line 166, in expect
    raise exc_type(msg)

@samsrabin
Copy link
Collaborator

Never mind! Apparently @glemieux fixed this already.

@ekluzek ekluzek added closed: wontfix We won't fix this issue, because it would be too difficult and/or isn't important enough to fix enhancement new capability or improved behavior of existing capability external issue needs to be addressed elsewhere (submodule); issue here for the sake of project tracking and removed discussion bug something is working incorrectly labels Aug 14, 2024
@ekluzek
Copy link
Collaborator Author

ekluzek commented Aug 14, 2024

Closing this as a wontfix. This would need to be addressed in cime.

@samsrabin
Copy link
Collaborator

@ekluzek This is an important issue, though. Could you post it in the CIME repo?

@ekluzek ekluzek self-assigned this Aug 15, 2024
@ekluzek
Copy link
Collaborator Author

ekluzek commented Nov 26, 2024

One further thing I found is that the first two include directories don't get the shell_command files concatenated, but the ones after that do.

@ekluzek
Copy link
Collaborator Author

ekluzek commented Nov 26, 2024

This is now an issue in cime, so closing here.

@ekluzek ekluzek closed this as completed Nov 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
closed: wontfix We won't fix this issue, because it would be too difficult and/or isn't important enough to fix enhancement new capability or improved behavior of existing capability external issue needs to be addressed elsewhere (submodule); issue here for the sake of project tracking
Projects
None yet
Development

No branches or pull requests

4 participants