zppy debugging guide for developers #573
forsyth2
announced in
Announcements
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
zppy
has many complex inter-connected pieces. It can therefore be a challenging package to develop and debug (and debuggingzppy
could well mean debugging a package it calls).This guide aims to be a starting point for developers trying to debug
zppy
.Check if the issue has come up before
If you have a specific line of code to search for (e.g.
Error: ...
), you can use the GitHub search bar in the upper right hand corner of thezppy
repo to search for that line. It may have appeared in previous issues/PRs/discussions.For more complicated questions, you can also look through the discussions page.
zppy
skipped a job it shouldn't havezppy
will report what dependency is missing when it skips a job. Look at your cfg to determine whyzppy
is looking for that dependency. See #544 (comment) for an in-depth look at zppy dependency handling.A job may also be skipped because its status file says "RUNNING", "WAITING", or "OK". Usually that means this job really shouldn't re-run anyway. However, it may be the case that you found a bug for which
zppy
doesn't exit with an error code. In these cases, simply delete the status file and re-runzppy
.Identifying errors in
.o
filesFrom #291 & https://e3sm-project.github.io/zppy/_build/html/main/tutorial.html#debugging-failures:
The error is in a prior task
For example, if you realize your error in
global_time_series
is really because of an error ints
, then you'll need to fix thets
task and then re-runzppy
. It's recommended to either delete the oldoutput
andwww
directories or set them to a new path so you know you aren't re-using old output.Reduce the number of jobs you have
zppy
launch by deleting or commenting out everything in yourcfg
that's not involved with the debugging. (E.g., if you're debuggingglobal_time_series
, you may need to re-run thets
task dependencies, but you don't need theclimo
task to re-run).The error is in a package zppy calls
#570 provides a chart of which tasks use which packages. If the bug is ultimately in another package, then that package needs to be updated. Then, you can use
environment_commands
(ore3sm_to_cmip_environment_commands
) to set a different environment so you can test zppy with the fixed version of the underlying package. Directions on how to do this can also be found in #570.Could the problem be environment, data, or machine (rather than
zppy
itself)?Does this problem resolve when...
zppy
tasks in the wrong environment by forgetting to setenvironment_commands
accordingly or forgetting to runpip install .
to apply the latest changes. So, double check yourcfg
to confirm environments are set correctly and/or try creating a new dev environment:input
? Perhaps the data, not the code, is faulty. Or if one dataset works and another doesn't, that may tell us something about where the code may be broken.Learning more about the data you have as input
You can run
ncdump -h <file-name>
to get a summary of data in files underneath yourinput
directory.ncdump -h <file-name> | grep float
will show you thefloat
variables defined in the file.ncdump -h <file-name> | grep -E "float (var1|var2|...|varN)\("
to find specificfloat
variables defined in the file.It may be the case that the variables you're trying to process aren't even defined in your input file. In that case, the problem is with the data you're using -- either you need to find simulation output with the required variables or you need to remove the variables from your processing list (e.g.,
vars
,plots_atm
)Creating a MCVE
It can be helpful to reduce a problem to the smallest possible example size -- a minimal complete verifiable example (MCVE). This is helpful both to you as a debugger and to others you show the problem too.
For example, from the
zstash
Bug Report template (https://github.com/E3SM-Project/zstash/issues/new/choose):This can be a real challenge in
zppy
since often the bug arises out of the many inter-connected pieces (hence why the MCVE question isn't even on thezppy
bug report template). Sometimes though, it is possible. In these cases, creating a MCVE can be quite helpful.For example, when debugging
global_time_series
, if you've identified a problem is incoupled_global.py
-- you don't need to re-runzppy
or evenglobal_time_series.bash
-- just look atglobal_time_series.bash
to identify what parameters were used in the call tocoupled_global.py
and runcoupled_global.py
with those parameters yourself.In rare cases it may even be possible to reduce the problem to a few lines of Python, in which case you can debug the problem in an interactive Python interpreter.
In most cases, the simplest way to make a MCVE is to create a minimal
cfg
: run on as few years as possible, run as few tasks as possible, run on as few variables as possible -- what specific parameter combination causes the problem?Write a test
Once you find a bug, think if there's a test you can write that would catch this bug in the future. E.g., what combination of parameters or type of data causes this bug to appear? If we can get a test into the test suite for this bug, then it prevents future users from running into it too.
Beta Was this translation helpful? Give feedback.
All reactions