Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Feature - rework comms utils / add fluxd zmq #1361

Draft
wants to merge 16 commits into
base: development
Choose a base branch
from

Conversation

MorningLightMountain713
Copy link
Contributor

Work in progress, code is dirty, tests will be broken, untested on old versions

I am running this on a node now.

What this pull does

  • Full overhaul of the fluxCommunicationsUtils module - renamed to networkStateService
  • Enables zmq message queues on fluxd.
  • Updates the fullnode inline package to be more flexible (and parse / write the config)
  • Simplify some of the fluxCommunication stuff
  • Removes one of the major challenges in the current system - polling fluxd for updates. (uses it as fallback now)
  • This pull is a precursor to reworking how fluxOS uses websockets (and how they are stored)

This pull kind of got away on me a little. It's too big. I will split out the zmq / fluxd stuff so we can enable it in stages, as I need to make sure I get the fluxd config file stuff correct - for obvious reasons.

Background

I'll do some diagrams and stuff to make it easier to show what is going on, both before and after.

Historically, fluxOS polls fluxd for various data, mainly the deterministicfluxnodelist which I am now calling the networkState (I think that is a more appropriate identifier). The other heavy user of the fluxd api is the blockProcessor which polls every 5 seconds. The blockProcessor is out of scope here - but could benefit greatly from using the pub / sub zmq feature in the future.

The deterministicfluxnodelist right now is 8.22Mb. So every call to get this list is a lot of i/o.

fluxOS has multiple levels of caching. One at the rpc level (daemonFluxNodeRpcs) where there is a 20 second LRU cache, on both the entire networkState, or if a specific pubkey is searched, then that is cached.

There was a second level of caching in the fluxCommsUtils - this would cache the same thing, but for 4 minutes instead. However, when pulling the full list, it wasn't indexed, so any pubkey searches had to go out to the api again, and were subsequently cached.

fluxOS would then go out every 2 minutes and update the full list. Due to the nature of the way blocks can come in, an LRU cache isn't really a good fit for what we're trying to achieve here; meaning that if a bunch of blocks come in over a short period, the list can get stale (and invalid) quite quick.

In this update, we now subscribe to the fluxd zmq endpoint for updates, now, whenever a block is generated, we get notified immediately, and fetch the full list. There is no more polling for the list; Except if there is an issue with the zmq socket, we fall back to polling.

So now we get up to date data. We don't use any LRU caching at all. (it's enabled by default in the daemonFluxNodeRpcs but I've added an override feature) As soon as a block is generated, the old data is invalidated. I.e. old nodes may have been removed etc.

We don't need caching as we are building our own pubkey and endpoint indexes. This means you can look up a pubkey or ip:port locally without having to go out to the rpc. During testing, with ~13k nodes, it takes approx 8ms to build the indexes. Normally this would block the event queue, which is bad. However, I've partitioned the index building so it is at most O(1000), instead of O(n). We are building the index in chunks of 1000 nodes, then yield to the event queue, and move on to the next chunk. This way, no matter how big the node count gets, it won't block the event queue. I was initially using worker threads for this, but overkill for our application.

Old way:

  • poll every 2 minutes for full list
  • Go and rpc fetch a nodelist filtered by pubkey every time a message comes in and we don't have it in the cache.

New way:

  • Only fetch the list when a block comes in.
  • Build our own indexes.

Will add more detail here soon.

@MorningLightMountain713 MorningLightMountain713 marked this pull request as draft June 30, 2024 17:43
@MorningLightMountain713 MorningLightMountain713 force-pushed the feature/rework_comms_utils branch from a3c6259 to fb42807 Compare July 4, 2024 12:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant