Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added return codes to IOStream for better managing missing/skipped streams #184

Merged

Conversation

philipwjones
Copy link

This is a short-term fix to address problems with missing or skipped streams. It will be replaced once the Error Handler capabilities are implemented. It adds return codes so that calling routines can determine how to handle errors and removes the critical error when a stream is missing.

Checklist

  • Documentation:
    • Developer's Guide has been updated
  • Testing
    • A comment in the PR documents testing used to verify the changes including any tests that are added/modified/impacted.
    • CTest unit tests for new features have been added per the approved design.
    • Polaris tests for new features have been added per the approved design (and included in a test suite)
    • Unit tests have passed. Please provide a relevant CDash build entry for verification.
    • Polaris test suite has passed

Fixes #175 , Fixes #181

…d streams

  These specific return codes will eventually be replaced by the
  Error Handler codes and functionality.
@philipwjones philipwjones requested a review from xylar December 6, 2024 20:50
@philipwjones philipwjones self-assigned this Dec 6, 2024
@philipwjones
Copy link
Author

Unfortunately, I don't have a good way to test this in the current unit test infrastructure, but all Ctests pass except State which is currently failing with the latest master due to an unrelated bug. I will test again once that is fixed.

@xylar
Copy link

xylar commented Dec 9, 2024

Sorry, @philipwjones, I lost track of this one and forgot to test it today. I will get to it tomorrow.

@xylar
Copy link

xylar commented Dec 10, 2024

@philipwjones, I'm afraid I'm still seeing:

[* info] [TimeMgr.cpp:1218] Omega using calendar type: No Leap
[* warning] [IOStream.cpp:528] Stream InitialState has IO frequency of never and will be skipped
[* info] [Tracers.cpp:46] Tracers::init() called
[* critical] [OceanInit.cpp:192] Error initializing ocean variables from input streams
[* critical] [OceanInit.cpp:56] ocnInit: Error initializing Omega modules
[* error] [OceanDriver.cpp:32] Error initializing OMEGA
[* info] [Tracers.cpp:249] Tracers::clear() called
[* error] [OceanDriver.cpp:61] OMEGA terminating due to error

with this branch. My run directory on Chrysalis is:

/lcrc/group/e3sm/ac.xylar/polaris_0.5/chrysalis/test_20241210/omega_cosine_bell_restart_stream_init_fix/ocean/spherical/icos/cosine_bell/restart/restart_run

My omega.yml is:

Omega:
  TimeIntegration:
    CalendarType: No Leap
    TimeStepper: RungeKutta4
    TimeStep: 0000_00:24:00.000
    StartTime: 0001-01-01_00:24:00
    StopTime: none
    RunDuration: 0000_00:24:00.000
  Dimension:
    NVertLevels: 1
  Decomp:
    HaloWidth: 3
    DecompMethod: MetisKWay
  State:
    NTimeLevels: 2
  Advection:
    FluxThicknessType: Center
    FluxTracerType: Center
  Tendencies:
    ThicknessFluxTendencyEnable: false
    PVTendencyEnable: false
    KETendencyEnable: false
    SSHTendencyEnable: false
    VelDiffTendencyEnable: false
    ViscDel2: 1.0e3
    VelHyperDiffTendencyEnable: false
    ViscDel4: 1.2e11
    TracerHorzAdvTendencyEnable: true
    TracerDiffTendencyEnable: false
    EddyDiff2: 10.0
    TracerHyperDiffTendencyEnable: false
    EddyDiff4: 0.0
  Tracers:
    Base: [Temperature, Salinity]
    Debug: [Debug1, Debug2, Debug3]
  IOStreams:
    # InitialState should only be used when starting from scratch.
    # For restart runs, the frequency units should be changed from
    # "OnStartup" to "never" so that the initial state file is not read.
    InitialState:
      UsePointerFile: false
      Filename: init.nc
      Mode: read
      Precision: double
      Freq: 1
      FreqUnits: never
      UseStartEnd: false
      Contents:
      - State
      - Tracers
    RestartRead:
      UsePointerFile: false
      PointerFilename: ocn.pointer
      Mode: read
      Precision: double
      Freq: 1
      FreqUnits: OnStartup
      UseStartEnd: true
      StartTime: 0001-01-01_00:24:00
      EndTime: 99999-12-31_00:00:00
      Contents:
      - Restart
      Filename: ../restarts/rst.$Y-$M-$D_$h.$m.$s
    RestartWrite:
      UsePointerFile: false
      PointerFilename: ocn.pointer
      Filename: ../restarts/rst.$Y-$M-$D_$h.$m.$s
      Mode: write
      IfExists: replace
      Precision: double
      Freq: 1
      FreqUnits: seconds
      UseStartEnd: false
      Contents:
      - Restart
    History:
      UsePointerFile: false
      Filename: output.nc
      Mode: write
      IfExists: replace
      Precision: double
      Freq: 1440
      FreqUnits: seconds
      UseStartEnd: false
      Contents:
      - Tracers
      - LayerThickness
      - NormalVelocity

@xylar
Copy link

xylar commented Dec 10, 2024

I tested removing RestartRead when it's not needed and that worked fine. So it seems like maybe #175 is fixed but #181 not quite.

@xylar
Copy link

xylar commented Dec 10, 2024

It could also be that I messed something up with defining RestartRead in my restart test above.

@xylar
Copy link

xylar commented Dec 10, 2024

What is the expected syntax if I don't want to use the file pointer. Does this look right?

    RestartRead:
      UsePointerFile: false
      PointerFilename: ocn.pointer
      Mode: read
      Precision: double
      Freq: 1
      FreqUnits: OnStartup
      UseStartEnd: true
      StartTime: 0001-01-01_00:24:00
      EndTime: 99999-12-31_00:00:00
      Contents:
      - Restart
      Filename: ../restarts/rst.$Y-$M-$D_$h.$m.$s

(Filename comes last because it gets automatically appended after the contents that get read and replaced from Default.yml.)

xylar added a commit to xylar/polaris that referenced this pull request Dec 10, 2024
@xylar
Copy link

xylar commented Dec 10, 2024

It seems like this is my error. Should be UseStartEnd: false. Trying this now.

@xylar
Copy link

xylar commented Dec 10, 2024

Yep, that did it!

Copy link

@xylar xylar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving based on testing in Polaris branch E3SM-Project/polaris#251. I am now able to successfully run both the full and restart runs. (The validation step is currently failing but I don't think that is Omega related.)

@xylar
Copy link

xylar commented Dec 10, 2024

Hmm, I spoke too soon. The mechanics of the restart run are working but the result is all NaNs. This could still every easily be my mistake but maybe we'd better hold off on merging until I investigate more. Today, I am not feeling well enough to debug this effectively so it will likely have to wait.

@xylar xylar self-requested a review December 10, 2024 15:11
@philipwjones
Copy link
Author

@xylar See my comments in #186 I suspect that's what's causing your latest issue. The standalone driver test apparently doesn't check for valid fields so it's not picking this up.

@xylar
Copy link

xylar commented Dec 10, 2024

@philipwjones, thanks! Depending on where things are by tomorrow, I can do a test merge and try my restart test in Polaris again.

Copy link

@xylar xylar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It worked! Using a test merge of this branch with the current Omega develop, the cosine bell restart test from E3SM-Project/polaris#251 passes on Chrysalis!

@philipwjones philipwjones merged commit ee5333c into E3SM-Project:develop Dec 11, 2024
2 checks passed
xylar added a commit to xylar/polaris that referenced this pull request Dec 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
2 participants