-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flush the atm.log file at the end of init sequence #2962
Comments
add homme log flushing too? |
That's definitely important, I just don't know (yet?) how do do it. The homme log is created in fortran, and I am not familiar with the flushing mechanisms of fortran. As far as I can tell, there is no direct way to flush a file in Fortran, short of closing it and reopening it. But I need to dig a bit more. |
This came up while I was helping someone. To address the homme_atm.log case, I think I'm going to try inserting |
I think this may only work if the rank issuing the abort is the same rank holding the iulog handler (I think only masterproc opens the log file). Edit: nope, masterproc creates the file, but all ranks open it, so it should be good. |
This can't fix every problem. EAM example: rank > root calls abortmp. iulog is attached to e3sm.log, and the rank has no knowledge of atm.log. So it flushes e3sm.log and then calls MPI_Abort. Other ranks get shut down without flushing, so the root rank won't flush atm.log. This should fix the typical case, where rank = root does input checking. It should also handle the case where rank k prints to iulog, then calls abortmp. Whatever log it printed to (atm or e3sm) will get flushed, showing the reason abortmp was called. To cover every case, I think we need to introduce MPI error handlers (https://docs.open-mpi.org/en/main/man-openmpi/man3/MPI_Comm_create_errhandler.3.html) so ranks can clean things up before exiting. |
For EAMxx's atm.log, I think flushing it in the catch block that was recently modified to call MPI_Abort would do the trick for most cases (any case that throws an exception that eventually leads to the top-level catch block). Another case is EAMxx's homme_atm.log when Hxx throws an exception. Not sure whether the AD has a handle to this log. If it does, it could be flushed in the top-level catch block, too. |
I don't think the AD has a handle to that log, unfortunately. However, it should already get flushed. Here's what happens if the catch block executes:
But Edit: no, calling |
Avoids confusion when the code hangs due to initialization issues: if atm.log is there, and filled, we can be sure EAMxx::init completed.
The text was updated successfully, but these errors were encountered: