-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kill tasks during job prep #6535
base: 8.4.x
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested as working, with a task somewhere in the preparing state: the killed preparing task goes to submit-failed, and gets a new submit number if retriggered.
Definitely nicer than my hack, if it covers all the bases. Should this work at any stage of the "job preparation pipeline"? (I don't recall why I tried my other approach, but I think I was worried about this aspect for some reason ... hopefully unnecessarily worried).
One thing: presumably the killed task should get held as well, to prevent automatic retries (until released).
c16df7d
to
50798ed
Compare
This doesn't actually kill the job preparation (e.g. if killed during remote-init, we won't actually kill the remote init command itself). This is probably the best we can do as truly "recalling" a preparation operation is tricky and may have unwanted consequences (e.g. a partially remote-installed workflow). The orphaned operation (e.g. remote-init) is still tracked by the subprocpool and may still be logged, example: setup to reproduce# global.cylc
[platforms]
[[myplatform]]
rsync command = whatevs
job runner = background # ~/bin/whatevs
sleep 20
exit 42 $ cylc vip myworkflow
$ sleep 2
$ cylc kill 'myworkflow//*'
Strangely, in this situation, if I trigger a downstream task, it goes straight to submit-failed with no attempt to re-run the failed remote-fileinstall operation (bug). But if I re-trigger the downstream task a second time, then it repeats both the remote-init and remote-fileinstall (bug?)!?
From testing it appears to abort at any stage 👍 (though I'm not entirely sure how this is guaranteed). We could do with some tests to ensure that this works at each stage and that the following stages are correctly canceled:
Hopefully easy to do with integration tests where we can mock the results of each stage of the process. |
Strictly speaking this "kills the task" but doesn't kill any preparation process it has started. I think it works by setting I'll have a look at testing more thoroughly. |
My approach, on the superseded PR, was to set a new flag on the preparing task and then, based on that flag, abort job submission once prep had completed. There must have been a reason why I did not just use |
The bash syntax check is blocking (not using the subproc pool) so I don't think it matters? cylc-flow/cylc/flow/job_file.py Lines 80 to 93 in 26b12e4
|
Was returning True even if no tasks were submitted
@@ -346,7 +346,7 @@ def submit_livelike_task_jobs( | |||
bc_mgr = self.task_events_mgr.broadcast_mgr | |||
rtconf = bc_mgr.get_updated_rtconfig(itask) | |||
try: | |||
platform = get_platform( | |||
platform = get_platform( # type: ignore[assignment] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in #6564
@@ -1278,7 +1278,7 @@ def _prep_submit_task_job( | |||
workflow, itask, '(platform not defined)', exc) | |||
return False | |||
else: | |||
itask.platform = platform | |||
itask.platform = platform # type: ignore[assignment] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in #6564
async def test_kill_preparing_pipeline( | ||
flow, scheduler, run, monkeypatch: pytest.MonkeyPatch | ||
): | ||
"""Test killing a preparing task through various stages of the preparing | ||
pipeline that involve submitting subprocesses and waiting for them to | ||
complete.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@oliver-sanders Is this what you were after?
Supersedes #5749, closes #5746
This PR allows
cylc kill
[andcylc remove
] to kill preparing tasks, resulting in the submit-failed state.Check List
CONTRIBUTING.md
and added my name as a Code Contributor.?.?.x
branch.