Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

code: Add jitter before emitting RPC requests #585

Closed
romac opened this issue Nov 20, 2024 · 11 comments
Closed

code: Add jitter before emitting RPC requests #585

romac opened this issue Nov 20, 2024 · 11 comments
Labels
code Code/implementation related good first issue Good for newcomers
Milestone

Comments

@romac
Copy link
Member

romac commented Nov 20, 2024

To avoid a synchronized swarm of client requests hitting the same nodes, we should add some jitter to outbound requests, for example by sleeping for a random amount of time before sending a request.

@romac romac added good first issue Good for newcomers code Code/implementation related labels Dec 3, 2024
@romac romac added this to the Phase 5 milestone Dec 4, 2024
@romac romac added the phase5 label Dec 4, 2024
@romac romac removed the phase5 label Dec 19, 2024
@ameya-deshmukh
Copy link

Hey @romac! Thinking of diving deep into Malachite over the winter break. Can I take this up?

@romac
Copy link
Member Author

romac commented Dec 23, 2024

Of course, have at it! :)

@cason
Copy link
Contributor

cason commented Jan 6, 2025

But this would slow down clients even in the absence of multiple concurrent requests to the same service. Shouldn't be the server to handle this situation?

@ancazamfir
Copy link
Collaborator

But this would slow down clients even in the absence of multiple concurrent requests to the same service. Shouldn't be the server to handle this situation?

The delay would be small. And this should only apply for voteSet and Value requests where it wouldn't matter. Would also work for a single client sending multiple requests to different servers. But, indeed the server should also handle it.

@nenadmilosevic95
Copy link
Contributor

Hey @romac , @ancazamfir, could you please provide more context about this issue?

@romac
Copy link
Member Author

romac commented Jan 21, 2025

For synchronization purposes, we sometimes need to send requests to other nodes, and would like to add some jitter before sending those avoid overwhelming a node in the case where multiple nodes are falling behind at the same time and are all picking the same node to send sync requests to. This could happen if only a few nodes managed to move to the next height and all others are left behind.

Here is the place in the code where we send those requests and where the jitter should be added:

self.rpc.send_request(&peer, RawRequest(data))

@nenadmilosevic95
Copy link
Contributor

Thanks, @romac! How do nodes decide who to send the request to? Was this issue raised based on a situation you’ve already encountered in some experiments?

@ancazamfir
Copy link
Collaborator

ancazamfir commented Jan 21, 2025

Hey @romac , @ancazamfir, could you please provide more context about this issue?

iirc it started when @romac and i had a discussion about blocksync where for each request we also have a retry mechanism. And I recalled then that in my past workplace was mandatory that all timeouts were randomized, messages were jittered, timeouts were adaptive, etc. depending on the situation. All this to avoid synchronization of message sending at different nodes and avoid bursts of traffic. There was a very good writeup about this, something about positive feedback loops (??) but could not find it quickly. And I'm sure you are all familiar.

Was this issue raised based on a situation you’ve already encountered in some experiments?

Not really although when testing with multiple nodes syncing I remember seeing that at some point a node would get many requests in the same time. This was when we were not picking random peers.
The problem is that we haven't done any QA since the retreat and this was before sync implementations.

I also believe that we might see this in consensus network.

In general we should maybe do some proper testing and analysis before we implement a solution.

@romac
Copy link
Member Author

romac commented Jan 21, 2025

How do nodes decide who to send the request to?

They just randomly pick a peer who is known to be at a higher height.

@nenadmilosevic95
Copy link
Contributor

Thanks, @ancazamfir and @romac, for the clarifications! During my experiments with Byzantine attacks on BFT consensus in a WAN setup, I observed that the success of an attack often depended on the timing of its launch (i.e., when specific messages were sent). Introducing additional jitter before sending messages didn’t have a significant impact. The reason was that the network latencies between nodes already introduced natural jitter, causing nodes to reach the same execution point at different times.

This is why I initially believed that additional jitter might not be necessary and that random peer selection would suffice. However, this was a different setup, and I agree with Anca that conducting prior testing and analysis would be the best way to determine whether it is truly necessary or not.

@ancazamfir
Copy link
Collaborator

@nenadmilosevic95 good point on latencies on WAN setup. Maybe we can close this issue and re-open if needed? wdyt @romac ?

@romac romac closed this as not planned Won't fix, can't repro, duplicate, stale Jan 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
code Code/implementation related good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

5 participants