Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runner: report high memory usage #1943

Merged
merged 1 commit into from
Apr 3, 2024
Merged

Conversation

TBonnin
Copy link
Collaborator

@TBonnin TBonnin commented Apr 2, 2024

Describe your changes

The goal is to report the runner high memory usage so users can understand better when a script is failing.
This is not a perfect solution, for example if memory spikes and runner crashes within a check interval no activity log would be emitted.

@bastienbeurier this is how it looks like
Screenshot 2024-04-02 at 20 51 34

Issue ticket number and link

https://linear.app/nango/issue/NAN-665/make-runner-out-of-memory-error-more-explicit

Checklist before requesting a review (skip if just adding/editing APIs & templates)

  • I added tests, otherwise the reason is:
  • I added observability, otherwise the reason is:
  • I added analytics, otherwise the reason is:

Copy link

linear bot commented Apr 2, 2024

@@ -5,6 +5,7 @@ import type { Request, Response, NextFunction } from 'express';
import timeout from 'connect-timeout';
import type { NangoProps, RunnerOutput } from '@nangohq/shared';
import { getLogger } from '@nangohq/utils/dist/logger.js';
import * as Usage from './usage.js';

This comment was marked as outdated.

Copy link
Member

@khaliqgant khaliqgant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non blocking feedback

Copy link
Collaborator

@bodinsamuel bodinsamuel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a hard topic but could work 🙏🏻
Minor comments except for the DB access

packages/runner/lib/server.ts Show resolved Hide resolved
packages/runner/lib/usage.ts Outdated Show resolved Hide resolved
packages/runner/lib/usage.ts Outdated Show resolved Hide resolved
packages/runner/lib/usage.ts Outdated Show resolved Hide resolved
@TBonnin TBonnin force-pushed the tbonnin/nan-665/runner-report-oom branch 4 times, most recently from 01b97c3 to b9f6a71 Compare April 3, 2024 00:34
@TBonnin TBonnin force-pushed the tbonnin/nan-665/runner-report-oom branch from b9f6a71 to 9d3fbf3 Compare April 3, 2024 00:42
@TBonnin TBonnin force-pushed the tbonnin/nan-665/runner-report-oom branch 5 times, most recently from 41b77c7 to cddbc26 Compare April 3, 2024 01:27
@TBonnin
Copy link
Collaborator Author

TBonnin commented Apr 3, 2024

@khaliqgant @bodinsamuel thank you for your feedback. I have refactored the PR quite a bit. Would you mind giving it another look?

private idleMaxDurationMs = parseInt(process.env['IDLE_MAX_DURATION_MS'] || '') || 0;
private lastIdleTrackingDate = Date.now();
private lastMemoryReportDate: Date | null = null;
private idleInterval: NodeJS.Timeout | null = null;
Copy link
Collaborator Author

@TBonnin TBonnin Apr 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a bit out of scope but I renamed this entire logic RunnerMonitor and moved the idle logic into it, since it consist in monitoring the runner idling time

// see: https://github.com/nodejs/node/issues/51095
// process.constrainedMemory() is supposed to return the memory limit of the container but it doesn't work on Render
// so we need to use a workaround to get the memory limit of the container on Render
return process.constrainedMemory() || getRenderTotalMemoryInBytes() || os.totalmem();
Copy link
Collaborator Author

@TBonnin TBonnin Apr 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is incredibly fragile but I don't have a better solution to make it work in Render since os.totalmem doesn't return the container memory limit but the vm's one :( Let me know what you think

@TBonnin TBonnin force-pushed the tbonnin/nan-665/runner-report-oom branch from cddbc26 to 4c230ba Compare April 3, 2024 03:07
Copy link
Collaborator

@bodinsamuel bodinsamuel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works great locally 👍🏻
Minor comments

packages/runner/lib/monitor.ts Outdated Show resolved Hide resolved
packages/runner/lib/monitor.ts Outdated Show resolved Hide resolved
packages/jobs/lib/server.ts Outdated Show resolved Hide resolved
@TBonnin TBonnin force-pushed the tbonnin/nan-665/runner-report-oom branch 3 times, most recently from 3731fc8 to 0a27a58 Compare April 3, 2024 13:29
Copy link
Collaborator

@bodinsamuel bodinsamuel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very nice 🥰 sorry for all the feedbacks and thanks for the patience!

I have an other, running locally I get this error.

[2]         logger.warn(`Error reading memory limit from ${memoryMaxFile}: ${stringifyError(error)}`);
[2]                ^
[2]
[2] TypeError: logger.warn is not a function
[2]     at getRenderTotalMemoryInBytes (file:///Users/samuelbodin/code/nango/packages/runner/dist/monitor.js:114:16)
[2]     at getTotalMemoryInBytes (file:///Users/samuelbodin/code/nango/packages/runner/dist/monitor.js:123:43)
[2]     at Timeout._onTimeout (file:///Users/samuelbodin/code/nango/packages/runner/dist/monitor.js:52:27)

Raises two questions why it throwed and why logger do not have warn 🤔
edit: I have checked, we have configured the logger with syslog.levels which has warning not warn. Type is not good unfortunately.
Might not exists in macOs

  syscall: 'open',
[2]   code: 'ENOENT',
[2]   path: '/sys/fs/cgroup/memory.max'

@TBonnin TBonnin force-pushed the tbonnin/nan-665/runner-report-oom branch from 0a27a58 to fdaf59c Compare April 3, 2024 13:53
@TBonnin
Copy link
Collaborator Author

TBonnin commented Apr 3, 2024

@bodinsamuel I have decided to remove the logging. It is super noisy when running locally

@TBonnin TBonnin force-pushed the tbonnin/nan-665/runner-report-oom branch from fdaf59c to 8371a79 Compare April 3, 2024 14:46
@TBonnin TBonnin merged commit 1a20ef6 into master Apr 3, 2024
22 checks passed
@TBonnin TBonnin deleted the tbonnin/nan-665/runner-report-oom branch April 3, 2024 15:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants