-
-
Notifications
You must be signed in to change notification settings - Fork 318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve worker event loop lag monitoring #6720
Comments
I would expect perf hooks to work inside worker threads, where did you read this? |
This isn't true, not in the way its suggested here. The PR linked in that issue does this too. It has the main thread communicate with the worker to get data from the worker and include it in the metric registry of the main thread. This is exactly what we do in lodestar. See https://github.com/ChainSafe/lodestar/blob/unstable/packages/beacon-node/src/node/nodejs.ts#L277 |
worker_threads defines eventLoopUtilization in addition to the perf_hooks one, with this comment:
Its implementation itself required low-level changes. There is no equivalent to My main feeling is that eventLoopUtilization provides extra interesting information, and explicitly supports |
This would be really bad if nodejs just does this without having a note in the docs, which they usually do a great job of noting limitations. This is really simple to reproduce as well, if this is true we might also wanna report this upstream to node. Based on previous data #5604, the event loop lag seems to be reported correctly. |
This is another case where "supports worker_threads" means "automagically handles communication between a worker and main thread". Would agree with @nflaig that based on the data we've seen and used in the past, we can assume that monitorEventLoopDelay works properly in workers (just that the data must be manually communicated to the main thread).
Yeah, agree there, could be worth adding as an additional metric |
By this I mean that there are 2 distinct API, performance.eventLoopUtilization and worker.performance.eventLoopUtilization |
I don't think we want to use this feature. (it gets more complicated when workers spawn other workers, like we currently do) Rather we can just use the current pattern of having each worker collect its own metrics (and as a byproduct, reuse the existing mechanism for communicating those to the reporting thread). Recommend appending new metrics collection here, which will automatically be applied to all our workers: https://github.com/ChainSafe/lodestar/blob/unstable/packages/beacon-node/src/metrics/nodeJsMetrics.ts |
collectNodeJSMetrics relies on prom-client to collect
eventLoopMonitoring
. This in turns relies on node:perf_hooks/monitorEventLoopDelay that only works for themain
thread orworker
from thecluster
module (as documented here).woker_threads
support is unclear, although we relies on this forworker_threads
monitoring (network and discv5).Proposed action items
Improve event loop monitoring, especially for
workers
. It looks preferable not to rely onprom-client
for this specific monitoring and add support directly inlodestar
.The text was updated successfully, but these errors were encountered: