[IDEA] prometheus-node-exporter-ucode-async #457

spolack · 2024-12-17T20:45:17Z

The POC with the prometheus-node-exporter-lua in our network, few years ago brought some learnings:

Scrape model is compute intense on older embedded hardware, on lower scrape intervals it really bumps up the load.
Scraping via IPv6 from outside the mesh is not a problem. Only problematic is that we need to find a way to discover the nodes / exporters.
It would be good to have some metrics if the mesh is missbehaving / monitoring cant reach the exporter

Brief idea:

As prometheus nowadays can ingest samples out of order, we can simply collect the the metrics asyncronously, cache them in ram until the node until scrape.
Staying compatbile with plugins of prometheus-node-exporter-ucode might be a good idea in order to profit from the growing ecosystem
Add some local relabeling feature in order to add/drop/rewrite labels from e.g. interface metrics.
Distribute collection precess of the various collectors evenly over a node-local configurable collection interval. That enables tighter monitoring for more powerful nodes, while still allowing resorting to lower interval for lower-end nodes.

Design of the Exporter

Main process, ran within uhttpd which caches the prepared metrics and handles requests towords the a scrape endpoint. Furthermore it forks the collection child process
Collection child process, measures the duration for every collection plugin in order to schedule the collector calls and then enters a endless collection loop, to distribute well over the given time. With every run the time is measured and the scheduling readjusted. We probably want to put the actual metrics gathering in a child process, in order to be able to enforce a timeout.

Provide feedback