Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Content that is available in the network "not found" depending in network key / node id #2901

Open
kdeme opened this issue Dec 3, 2024 · 4 comments

Comments

@kdeme
Copy link
Contributor

kdeme commented Dec 3, 2024

When trying to download a lot of blocks it turns out that occasionally a block fails deterministically.
However, when trying to get that same block from a fluffy node started with a fresh network key (and thus new node id), it usually works.

So the failure seems to be NodeId related, pointing to potentially an issue in the lookup mechanism as the content is available on the network.

One such example is block body for block number 47830, which is block hash 0x8a24f51c42f5c1e216351c6c2ab29d2ae25fc4f366ea690a4e13c640844412e7.

This results in a content id of: 0x345c7c1c31b50e20f6b320d0273690d89d0e2bc63a6d07b1b725e8b3ce5e819c

We managed to retrieve this from node with id 0x3514da5e6fae802b62dbd381813a4fdd24d208f78506e10e7d94eaac0045354f or from node with id 0x30c1ceb0ccc2448a674b73d573924743ce7622ae112b6fe74666fc01ca0cea4f

It works when our node id is 0xb75e9ccdc42ce5191c754f0ad4aacebc17ffa822dd4613f46c12ba4993628f4c, but e.g. not when our node id is 0x91239acd7248819228c893f3d10930b39a2a1ed7a999796a8845d47ebc9285b3

@kdeme
Copy link
Contributor Author

kdeme commented Dec 3, 2024

After further debugging this, it seems that the issue is not in the lookup mechanism as the those nodes that have the data do get contacted. However the uTP connection setup fails each time:

uTP timeout while trying to connect to (nodeId: 3514da5e6fae802b62dbd381813a4fdd24d208f78506e10e7d94eaac0045354f, address: 143.244.168.133:9009)

The strange thing is that the moment I change the network key, the same request works each time. So it seems that the other node simply fails to respond depending on the network key / node id of the requesting node.

After checking two ENRs of the nodes that have this behaviour it seems to be Trin nodes.

@kdeme
Copy link
Contributor Author

kdeme commented Dec 3, 2024

The node is hitting specifically this error: https://github.com/status-im/nim-eth/blob/aa92ad4f42d772c53ce4a2d9c76cf760b4450031/eth/utp/utp_socket.nim#L587

With a retransmitCount=2, so it appears that no ACK on the uTP is not received.

@kdeme
Copy link
Contributor Author

kdeme commented Dec 3, 2024

Can reproduce the exact same issue with a Trin node on the requesting side instead of Fluffy so I created an issue there: ethereum/trin#1596

@kdeme
Copy link
Contributor Author

kdeme commented Dec 18, 2024

So there seems to be an issue with Trin nodes in case a node is not reachable from the outside and has no IP in its ENR: see this issue ethereum/trin#1596 (comment)

I was hitting this because default random UDP port (value set at 0) for nimbus_execution_client.

If network is properly configured to be reachable from the outside (and thus have the ENR configured as such), this issue no longer occurs. Next issue however seems to be actual content that is missing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant