Skip to content

Commit

Permalink
fix(runner): tweak runner idling retry logic
Browse files Browse the repository at this point in the history
Currently the runner keeps notifying `jobs` that it is idle every 10s
However 10s might not be enough for jobs/fleet to terminate the runner
and following request to /idle fails with an error.
This commit increase the timeout when /idle was called successfully to
give enough time to jobs/fleet to terminate the runner
In theory we could also simply stop the loop once /idle has been successful
  • Loading branch information
TBonnin committed Jan 16, 2025
1 parent f23df04 commit 7e0a821
Showing 1 changed file with 9 additions and 3 deletions.
12 changes: 9 additions & 3 deletions packages/runner/lib/monitor.ts
Original file line number Diff line number Diff line change
Expand Up @@ -93,9 +93,10 @@ export class RunnerMonitor {
}
}

private checkIdle(): NodeJS.Timeout | null {
private checkIdle(timeoutMs: number = 10000): NodeJS.Timeout | null {
// eslint-disable-next-line @typescript-eslint/no-misused-promises
return setInterval(async () => {
return setTimeout(async () => {
let nextTimeout = timeoutMs;
if (this.idleMaxDurationMs > 0 && this.tracked.size == 0) {
const idleTimeMs = Date.now() - this.lastIdleTrackingDate;
if (idleTimeMs > this.idleMaxDurationMs) {
Expand All @@ -105,7 +106,11 @@ export class RunnerMonitor {
const res = await idle();
if (res.isErr()) {
logger.error(`Failed to idle runner`, res.error);
nextTimeout = timeoutMs; // Reset to default on error
}
// Increase the timeout to 2 minutes after a successful idle
// to give enough time to fleet to terminate the runner
nextTimeout = 120_000;
} else {
// TODO: DEPRECATE legacy /idle endpoint
await httpFetch({
Expand All @@ -120,7 +125,8 @@ export class RunnerMonitor {
this.lastIdleTrackingDate = Date.now();
}
}
}, 10000);
this.checkIdle(nextTimeout);
}, timeoutMs);
}
}

Expand Down

0 comments on commit 7e0a821

Please sign in to comment.