Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AcceleratedDHTClient causes modem crash, revert to experimental and discourage from using #10192

Closed
3 tasks done
markg85 opened this issue Nov 1, 2023 · 8 comments
Closed
3 tasks done
Labels
kind/bug A bug in existing code (including security flaws) need/triage Needs initial labeling and prioritization

Comments

@markg85
Copy link
Contributor

markg85 commented Nov 1, 2023

Checklist

Installation method

third-party binary

Version

❯ ipfs version --all
Kubo version: 0.23.0-3a1a0413a
Repo version: 15
System version: amd64/linux
Golang version: go1.21.1

Config

Irrelevant.
I followed this doc https://github.com/ipfs/kubo/blob/master/docs/config.md to enable it and to subsequently disable it. The only change is enabling AcceleratedDHTClient and disabling it again.

I refrain from posting a config because that has a tendency to turn into a discussion of number of connections low/high water etc. Which is a pointless because low/high does not determine number of connections (it determines the amount to keep at the set grace period time).

Description

This was tested on an modem that had no problems without AcceleratedDHTClient. IPFS worked just fine and my network was stable.

IPFS logging, due to my repo size, suggested me to enable AcceleratedDHTClient.

❯ ipfs repo stat -H
NumObjects: 4179172
RepoSize:   2.5 GB
StorageMax: 10 GB
RepoPath:   /home/mark/.ipfs
Version:    fs-repo@15

Note that i'm using the nocopy filestore, that repo serves 3.4TB of data.

However, enabling this gave me a subtle new bug.
My network became sluggishly slow, internet would occasionally drop out and even my wireless and wired connections would disconnect completely a couple times a day.
In other terms, the router kill issue found it's way back into my configuration.

I verified that specifically enabling AcceleratedDHTClient makes my network go bananas. Disabling it makes the network behave again.

About AcceleratedDHTClient
I don't know exactly how it works internally. I've caught some discussions in the past and read some about it on various places. It's purpose (or one of) is for content to be advertised faster is great and well intended. But the way it works is amplifying an already impossible to debug issues (router kill).

With all that in mind, I'd request that:

  • AcceleratedDHTClient to be discouraged from use as it clearly has harmful side effects. The revert of Move AcceleratedDHTClient from being experimental #9703 would seem appropriate. Putting this feature back in the experimental category.
  • Do not communicate or suggest to use AcceleratedDHTClient till it is at the very least not making things worse anymore.
  • Re-evaluate how this feature works. Potentially completely reverting it if it can't be properly fixed [1]

In my opinion Kubo should not ship features where it's known to cause harm to a subset of users. It was known while this was still experimental that this feature turned the router kill bug into a quick router kill. While this probably hasn't been widely known, it was known to @Jorropo - we chatted about this one too when we were debugging the router kill issue - which is a key developer of this feature. This feature should not have been pushed out of experimental.

Moral of this bug? Kubo kills modems. Some are more sensitive then others. This needs to be properly understood, debugged and fixed.

cc @Jorropo @lidel @BigLep

[1] I know this is a bit of a hard statement. The tradeoff here is a potential router kill (that certainly makes your content unreachable) or reprovides not happening fast enough, that also makes you content unreachable. I'd consider mitigating a router kill to be a far higher importance thus if this feature can't be fixed then reverting it is the only sensible option.

@markg85 markg85 added kind/bug A bug in existing code (including security flaws) need/triage Needs initial labeling and prioritization labels Nov 1, 2023
@Jorropo
Copy link
Contributor

Jorropo commented Nov 1, 2023

I don't think it is sane to revert #9703. We can't be asked to write applicative protocols while keeping in mind how a router is gonna react 5 layers down.
This needs to be handled in the network layer because applications have no control over packet pacing, resource limits, congestion control, or anything really. It's the job of libp2p to provide reliable TCP semantics streams, and part of theses semantics since between October 1986 to August 1988 now include that they wont cause network collapses.
This also looks like a limited issue on some routers, if I hear more peoples having issues I might change my mind.

If you want to help on this I would like you to fill in a report on #9998 using the template (it's best effort). And this issue should be closed. thx

@markg85
Copy link
Contributor Author

markg85 commented Nov 1, 2023

This also looks like a limited issue on some routers, if I hear more peoples having issues I might change my mind.

I'm sorry, but you really should reconsider how you think about this.
The fact that you don't have an issue with it doesn't mean that there is none. There very clearly are fundamental issues in IPFS that you cannot and should not discard as a limited case.

To make my point even more clear. I had IPFS on 2 completely different internet connections. 2 different modems.
On both connections I can now let IPFS crash my modem.
Those ISPs in the netherlands are "Vodafona/Ziggo" and "T-Mobile".
For reference, these two together serve about 50% of the broadband connections in the netherlands.

Granted, the T-mobile one (~7% of the broadband usage here in NL) requires AcceleratedDHTClient to be on before it finally gives in and crashes too.

Now to put it even more into perspective.
I was using default IPFS configurations and following suggestions that kubo itself gives me which made it crash.

If you go by just the default configuration then every user on "Vodafona/Ziggo" (say about 42% of all the broadband users here in NL) will make their modem crash. If you include just following suggestions (thus eventually enabling AcceleratedDHTClient if you host a lot of data) then that 42% will go up.

Or in other terms, half of all the people having broadband (you might as well say half of the population) therefore can't possible ever run IPFS because it will crash their modem.

So cut it out with your nonsense argument that this is a one-off modem issue. It's widespread. The still fairly limited use of IPFS gives the impression that it's a rare issue. The usage of IPFS in datacenters (which is a substantial amount!) skews the image too, as those "modems" just work way differently. Just google on how many times the router kill issues show up with IPFS in it, you'd be shocked.

If you want to help on this I would like you to fill in a report on #9998 using the template (it's best effort). And this issue should be closed. thx

No and no.
I'm not going to help with that anymore. I've spend my time being annoyed with that and getting nowhere. And yesterday i spend another full day reconfiguring my modem (which is super painful if all smart lights/appliances run on it!) just to discover that i didn't have to do any of that and that i instead just should have disabled AcceleratedDHTClient. If IPFS/KUBO can't get it fixed then i'm faster off not using it anymore and instead go for Iroh.

This issue should not be closed. It is certain that this feature amplifies the possibility of your modem crashing thus this feature should be disabled. It is entirely possible (likely even) that a true fix in libp2p would make AcceleratedDHTClient just work as intended too. But we're not at that point in time so this feature is therefore bad and should be disabled and discouraged (and warned that it can crash modems!) as temporary mediating issue. Once a true fix is found this one can probably be turned on safely again too. What is important now is to finally act upon this and start actually investigating with people who know the network layers (not me!) to figure this issue out, properly understand it and fix it.

@Jorropo
Copy link
Contributor

Jorropo commented Nov 2, 2023

I havn't red your latest message yet, if you were happy with the situation before turning on AcceleratedDHTClient feel free to turn it off.
Your content wont be properly announced in the DHT.
Instead you can change your reprovider strategy to something lighter instead but it makes partial and intersecting queries less reliable.
https://github.com/ipfs/kubo/blob/master/docs/config.md#reproviderstrategy

@markg85
Copy link
Contributor Author

markg85 commented Nov 2, 2023

"Vodafone/Ziggo" has about 4 million users in the Netherlands. All getting the same modem. 0 are able to run Kubo, it would crash their modem within 30 minutes.
Enabling AcceleratedDHTClient crashes their modem in mere minutes, I know cause I tested this about a year ago.

You can disagree with the TCP stack implementations, with all the protocols in the world and push for your idealistic approach. Pushing for ideals is commendable!! But at some point idealistic approaches and reality collide and it turns into an uphill battle you can't win. You can't make hardware vendors fix the TCP stack to IPFSs needs. And even if you can, then you'd still have to wait decades for that new implementation to be so widely used that you can ignore the broken ones. You need to adapt to fit in what is on the market whether you like it or not.

You can opt to stay idealistic and thus ignore really. It does mean that the KUBO implementation of IPFS will always be a niche project. It literally can't scale as it itself just crashes on a large subset of modems out there.

I should not need to try and convince you that this is bad....

@sukunrt
Copy link
Contributor

sukunrt commented Nov 2, 2023

@markg85 is it possible for you to run this with

ipfs config --json Swarm.Transports.Network.TCP false
ipfs config --json Swarm.Transports.Network.Websocket false

If the router sluggishness is because of anything other than bandwidth usage by kubo, go-libp2p should handle it.

@markg85
Copy link
Contributor Author

markg85 commented Nov 2, 2023

Thank you for your suggestion @sukunrt!

I just tried it, sadly it solves nothing.
Upon restarting IPFS with AcceleratedDHTClient enabled along with your suggestions gives me immediate network connectivity issues. Connections timing out, internet music stops playing. It's not crashing yet but it's near unusable. So i didn't even leave it on for more then a minute before reverting.

Yes, I am aware this feature has a high initial startup load. It might settle down after that but i'm not willing to even wait for that. As it means that an accidental IPFS crash or reboot also has a - then unintended side effect - of the network becoming effectively unusable at seemingly random times.

@aschmahmann
Copy link
Contributor

  1. The feature is opt-in
  2. The feature has been useful to various groups
  3. The text telling you you're falling behind on provides asks the user to consider the accelerated DHT client (
    💡 Consider enabling the Accelerated DHT to enhance your system performance. See:
    https://github.com/ipfs/kubo/blob/master/docs/config.md#routingaccelerateddhtclient`,
    ) and links to the config https://github.com/ipfs/kubo/blob/f17a06419355afbae3cdb1675675500fa5e17b9d/docs/config.md#routingaccelerateddhtclient.
  4. The config docs contain a list of caveats

Your issue seems to have been covered by "Users that are limited in the number of parallel connections their machines/networks can perform will likely suffer" however if you'd like to add a sub-bullet like "some users have reported consumer routers having degraded behavior during connection bursts" that seems fine (if redundant).


Aside: Obviously the world we'd all like to get to a world where we're not falling behind on provides by default and it's not eating many resources in the process. That's not this issue though, there are other ones about this. If this is the area you're interested in you can comment on one of the existing issues (in here, boxo, or go-libp2p-kad-dht) or create a new one.

@markg85
Copy link
Contributor Author

markg85 commented Nov 6, 2023

Thank you for your suggestions @aschmahmann. I have a lot of interests, but figuring out network layers at the protocol level function is one that's i'm not going to skip. That stuff is hard and should be left to people passionate about that tech.

I'm disappointed in the conclusion here. In my view i'm exposing a symptom that amplifies ill behavior. If the root cause can't be mitigated, the symptoms should be mitigated. This is going to bite back as kubo gains adoption on user equipment. Oh well, i tried.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug A bug in existing code (including security flaws) need/triage Needs initial labeling and prioritization
Projects
None yet
Development

No branches or pull requests

4 participants