[WIP] Feature - rework comms utils / add fluxd zmq #1361

MorningLightMountain713 · 2024-06-30T17:42:20Z

Work in progress, code is dirty, tests will be broken, untested on old versions

I am running this on a node now.

What this pull does

Full overhaul of the fluxCommunicationsUtils module - renamed to networkStateService
Enables zmq message queues on fluxd.
Updates the fullnode inline package to be more flexible (and parse / write the config)
Simplify some of the fluxCommunication stuff
Removes one of the major challenges in the current system - polling fluxd for updates. (uses it as fallback now)
This pull is a precursor to reworking how fluxOS uses websockets (and how they are stored)

This pull kind of got away on me a little. It's too big. I will split out the zmq / fluxd stuff so we can enable it in stages, as I need to make sure I get the fluxd config file stuff correct - for obvious reasons.

Background

I'll do some diagrams and stuff to make it easier to show what is going on, both before and after.

Historically, fluxOS polls fluxd for various data, mainly the deterministicfluxnodelist which I am now calling the networkState (I think that is a more appropriate identifier). The other heavy user of the fluxd api is the blockProcessor which polls every 5 seconds. The blockProcessor is out of scope here - but could benefit greatly from using the pub / sub zmq feature in the future.

The deterministicfluxnodelist right now is 8.22Mb. So every call to get this list is a lot of i/o.

fluxOS has multiple levels of caching. One at the rpc level (daemonFluxNodeRpcs) where there is a 20 second LRU cache, on both the entire networkState, or if a specific pubkey is searched, then that is cached.

There was a second level of caching in the fluxCommsUtils - this would cache the same thing, but for 4 minutes instead. However, when pulling the full list, it wasn't indexed, so any pubkey searches had to go out to the api again, and were subsequently cached.

fluxOS would then go out every 2 minutes and update the full list. Due to the nature of the way blocks can come in, an LRU cache isn't really a good fit for what we're trying to achieve here; meaning that if a bunch of blocks come in over a short period, the list can get stale (and invalid) quite quick.

In this update, we now subscribe to the fluxd zmq endpoint for updates, now, whenever a block is generated, we get notified immediately, and fetch the full list. There is no more polling for the list; Except if there is an issue with the zmq socket, we fall back to polling.

So now we get up to date data. We don't use any LRU caching at all. (it's enabled by default in the daemonFluxNodeRpcs but I've added an override feature) As soon as a block is generated, the old data is invalidated. I.e. old nodes may have been removed etc.

We don't need caching as we are building our own pubkey and endpoint indexes. This means you can look up a pubkey or ip:port locally without having to go out to the rpc. During testing, with ~13k nodes, it takes approx 8ms to build the indexes. Normally this would block the event queue, which is bad. However, I've partitioned the index building so it is at most O(1000), instead of O(n). We are building the index in chunks of 1000 nodes, then yield to the event queue, and move on to the next chunk. This way, no matter how big the node count gets, it won't block the event queue. I was initially using worker threads for this, but overkill for our application.

Old way:

poll every 2 minutes for full list
Go and rpc fetch a nodelist filtered by pubkey every time a message comes in and we don't have it in the cache.

New way:

Only fetch the list when a block comes in.
Build our own indexes.

Will add more detail here soon.

MorningLightMountain713 marked this pull request as draft June 30, 2024 17:43

MorningLightMountain713 added 14 commits July 4, 2024 13:08

First commit (no tests)

32f7d3c

Fix indexes, add debug

ad042dd

Remove test userconfig

a9fb985

Fix inverted match

ff1cb11

Retry if daemon is down

a2cbbe2

Fix up default case

5a26b9c

Use fluxd pub/sub

e66ef26

Fix zmq uri

171550a

fix indexes to yield to event queue

7c0715a

mid update - commiting to switch branch

1683462

Wait for index to complete on search

82ab922

Fix paths

3ab451a

Dirty fix for broken fluxnodelist (don't connect if no ip)

1eabcd0

Add missing vars

fb42807

MorningLightMountain713 force-pushed the feature/rework_comms_utils branch from a3c6259 to fb42807 Compare July 4, 2024 12:10

MorningLightMountain713 added 2 commits July 4, 2024 13:30

Remove code that is in another pull now

99bee19

Revert fullnode changes (in another pull)

b5143fd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Feature - rework comms utils / add fluxd zmq #1361

[WIP] Feature - rework comms utils / add fluxd zmq #1361

MorningLightMountain713 commented Jun 30, 2024

[WIP] Feature - rework comms utils / add fluxd zmq #1361

Are you sure you want to change the base?

[WIP] Feature - rework comms utils / add fluxd zmq #1361

Conversation

MorningLightMountain713 commented Jun 30, 2024