Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[IDEA] prometheus-node-exporter-ucode-async #457

Open
spolack opened this issue Dec 17, 2024 · 0 comments
Open

[IDEA] prometheus-node-exporter-ucode-async #457

spolack opened this issue Dec 17, 2024 · 0 comments

Comments

@spolack
Copy link
Member

spolack commented Dec 17, 2024

The POC with the prometheus-node-exporter-lua in our network, few years ago brought some learnings:

  • Scrape model is compute intense on older embedded hardware, on lower scrape intervals it really bumps up the load.
  • Scraping via IPv6 from outside the mesh is not a problem. Only problematic is that we need to find a way to discover the nodes / exporters.
  • It would be good to have some metrics if the mesh is missbehaving / monitoring cant reach the exporter

Brief idea:

  • As prometheus nowadays can ingest samples out of order, we can simply collect the the metrics asyncronously, cache them in ram until the node until scrape.
  • Staying compatbile with plugins of prometheus-node-exporter-ucode might be a good idea in order to profit from the growing ecosystem
  • Add some local relabeling feature in order to add/drop/rewrite labels from e.g. interface metrics.
  • Distribute collection precess of the various collectors evenly over a node-local configurable collection interval. That enables tighter monitoring for more powerful nodes, while still allowing resorting to lower interval for lower-end nodes.

Design of the Exporter

  • Main process, ran within uhttpd which caches the prepared metrics and handles requests towords the a scrape endpoint. Furthermore it forks the collection child process
  • Collection child process, measures the duration for every collection plugin in order to schedule the collector calls and then enters a endless collection loop, to distribute well over the given time. With every run the time is measured and the scheduling readjusted. We probably want to put the actual metrics gathering in a child process, in order to be able to enforce a timeout.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant