code: Add jitter before emitting RPC requests #585

romac · 2024-11-20T15:55:32Z

To avoid a synchronized swarm of client requests hitting the same nodes, we should add some jitter to outbound requests, for example by sleeping for a random amount of time before sending a request.

ameya-deshmukh · 2024-12-22T19:58:44Z

Hey @romac! Thinking of diving deep into Malachite over the winter break. Can I take this up?

romac · 2024-12-23T11:36:20Z

Of course, have at it! :)

cason · 2025-01-06T12:31:47Z

But this would slow down clients even in the absence of multiple concurrent requests to the same service. Shouldn't be the server to handle this situation?

ancazamfir · 2025-01-20T12:32:43Z

But this would slow down clients even in the absence of multiple concurrent requests to the same service. Shouldn't be the server to handle this situation?

The delay would be small. And this should only apply for voteSet and Value requests where it wouldn't matter. Would also work for a single client sending multiple requests to different servers. But, indeed the server should also handle it.

nenadmilosevic95 · 2025-01-21T15:28:16Z

Hey @romac , @ancazamfir, could you please provide more context about this issue?

romac · 2025-01-21T16:49:04Z

For synchronization purposes, we sometimes need to send requests to other nodes, and would like to add some jitter before sending those avoid overwhelming a node in the case where multiple nodes are falling behind at the same time and are all picking the same node to send sync requests to. This could happen if only a few nodes managed to move to the next height and all others are left behind.

Here is the place in the code where we send those requests and where the jitter should be added:

malachite/code/crates/sync/src/behaviour.rs

Line 88 in 1947b34

self.rpc.send_request(&peer, RawRequest(data))

nenadmilosevic95 · 2025-01-21T17:06:51Z

Thanks, @romac! How do nodes decide who to send the request to? Was this issue raised based on a situation you’ve already encountered in some experiments?

ancazamfir · 2025-01-21T17:34:28Z

Hey @romac , @ancazamfir, could you please provide more context about this issue?

iirc it started when @romac and i had a discussion about blocksync where for each request we also have a retry mechanism. And I recalled then that in my past workplace was mandatory that all timeouts were randomized, messages were jittered, timeouts were adaptive, etc. depending on the situation. All this to avoid synchronization of message sending at different nodes and avoid bursts of traffic. There was a very good writeup about this, something about positive feedback loops (??) but could not find it quickly. And I'm sure you are all familiar.

Was this issue raised based on a situation you’ve already encountered in some experiments?

Not really although when testing with multiple nodes syncing I remember seeing that at some point a node would get many requests in the same time. This was when we were not picking random peers.
The problem is that we haven't done any QA since the retreat and this was before sync implementations.

I also believe that we might see this in consensus network.

In general we should maybe do some proper testing and analysis before we implement a solution.

romac · 2025-01-21T17:35:49Z

How do nodes decide who to send the request to?

They just randomly pick a peer who is known to be at a higher height.

nenadmilosevic95 · 2025-01-22T08:52:10Z

Thanks, @ancazamfir and @romac, for the clarifications! During my experiments with Byzantine attacks on BFT consensus in a WAN setup, I observed that the success of an attack often depended on the timing of its launch (i.e., when specific messages were sent). Introducing additional jitter before sending messages didn’t have a significant impact. The reason was that the network latencies between nodes already introduced natural jitter, causing nodes to reach the same execution point at different times.

This is why I initially believed that additional jitter might not be necessary and that random peer selection would suffice. However, this was a different setup, and I agree with Anca that conducting prior testing and analysis would be the best way to determine whether it is truly necessary or not.

ancazamfir · 2025-01-22T09:20:36Z

@nenadmilosevic95 good point on latencies on WAN setup. Maybe we can close this issue and re-open if needed? wdyt @romac ?

romac added good first issue Good for newcomers code Code/implementation related labels Dec 3, 2024

romac added this to the Phase 5 milestone Dec 4, 2024

romac added the phase5 label Dec 4, 2024

romac removed the phase5 label Dec 19, 2024

romac assigned ameya-deshmukh Dec 23, 2024

romac unassigned ameya-deshmukh Jan 16, 2025

romac closed this as not planned Won't fix, can't repro, duplicate, stale Jan 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

code: Add jitter before emitting RPC requests #585

code: Add jitter before emitting RPC requests #585

romac commented Nov 20, 2024

ameya-deshmukh commented Dec 22, 2024

romac commented Dec 23, 2024

cason commented Jan 6, 2025

ancazamfir commented Jan 20, 2025

nenadmilosevic95 commented Jan 21, 2025

romac commented Jan 21, 2025

nenadmilosevic95 commented Jan 21, 2025

ancazamfir commented Jan 21, 2025 •

edited

Loading

romac commented Jan 21, 2025

nenadmilosevic95 commented Jan 22, 2025

ancazamfir commented Jan 22, 2025

code: Add jitter before emitting RPC requests #585

code: Add jitter before emitting RPC requests #585

Comments

romac commented Nov 20, 2024

ameya-deshmukh commented Dec 22, 2024

romac commented Dec 23, 2024

cason commented Jan 6, 2025

ancazamfir commented Jan 20, 2025

nenadmilosevic95 commented Jan 21, 2025

romac commented Jan 21, 2025

nenadmilosevic95 commented Jan 21, 2025

ancazamfir commented Jan 21, 2025 • edited Loading

romac commented Jan 21, 2025

nenadmilosevic95 commented Jan 22, 2025

ancazamfir commented Jan 22, 2025

ancazamfir commented Jan 21, 2025 •

edited

Loading