-
Notifications
You must be signed in to change notification settings - Fork 233
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IPIP-359: Multi gateway client #359
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,175 @@ | ||
# IPIP-359: Multi gateway client | ||
|
||
- Start Date: 2022-12-16 | ||
- Related Issues: | ||
- https://github.com/ipfs/specs/pull/359 | ||
- Relies on specs: | ||
- IPIP-0280 | ||
|
||
## Summary | ||
|
||
A defined way for getting a list of usable gateways. | ||
|
||
## Motivation | ||
|
||
When developing an application with IPFS functionality you'd ideally want more then 1 gateway and distribute the requests among N gateways. This spec relies on IPIP-0280 (gateways file). | ||
|
||
## Detailed design | ||
|
||
The starting point for any application wanting to use this spec is to first take care of the the `gateways` file (IPIP-0280). That file must either exist already, meaning that there is an IPFS implementation running that exposes it's gateway in that file. Or the implementer of this spec must, through other means, fill that file with 1 or more gateways (the more the better). | ||
|
||
The assumption beyond this point is that there is a gateways file and it contains 1 or more gateways. | ||
|
||
### Finding new gateways | ||
|
||
The `gateways` file is parsed to know the initial - bootstrap - gateways. Each line in this file is a single gateway. This list of gateways should be stored internally in this `multi gateway client` implementation. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
|
||
Your internal gateway list might now look like: | ||
|
||
``` | ||
http://localhost:8080 | ||
https://ipfs.io | ||
``` | ||
|
||
From this point on the client should iterate over those gateways and request each of them to give a list of [gateways that it knows](#Gateway-returns-list-of-gateways-it-knows). Based on the return, this should result in a vastly bigger list of potentially usable gateways: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Flagging that there is no protocol for this atm. FYSA there is vaguely similar proposal for ambient discovery of HTTP content routers (IPIP-342), we also talk about HTTP transport based on gateway MTB5. Unless you plan to wait with this IPIP until we have something, consider removing "gateway discovery" and limit scope to manual management done by client implementaitons. |
||
|
||
``` | ||
http://localhost:8080 | ||
N+1 | ||
N+2 | ||
... | ||
https://ipfs.io | ||
N+1 | ||
N+2 | ||
... | ||
``` | ||
|
||
Important to note here is that this request for more gateways will only be done on the initial list of provided gateways! It's non-recursive and therefore won't be executed on newly found gateways. | ||
|
||
### Validating gateways to be potentially used | ||
|
||
Each gateway will be tasked to execute a query. The results of this query and the time it took determine if a gateway can potentially be used. A hard requirement for a gateway is to support [trustless](https://docs.ipfs.tech/reference/http/gateway/#trusted-vs-trustless) data retrieval! | ||
|
||
```mermaid | ||
graph TD; | ||
A[Supports trustless?] --> B{Response}; | ||
B --> |200 Ok - Response within 200ms| C[CAR request]; | ||
B --> |200 Ok - Response 200+ms| D; | ||
B --> |Anything else| D[Disgard gateway]; | ||
C --> E{Validate CAR}; | ||
E --> |Invalid| D; | ||
E --> G[Valid]; | ||
G --> H[Store gateway]; | ||
``` | ||
|
||
The `200ms` threshold here is arbitrarily picked. From a decentralized point of view, 200ms allows you to go roughly halfway across the globe assuming your internet connection is stable. From a data retrieval point of view 200ms can be slow but can be just fine too. For example, if a site loads with 1 connection at a time with each connection having a 200ms latency then you will experience that site to "load slow". But if you load the same site with multiple concurrent connections where "some" might hit the 200ms threshold then you won't see much difference. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
A big part of the planet dreams to have latency this low. |
||
|
||
Once a gateway, in the above flow, reaches the `Store gateway` point then it should internally in this `multi gateway client` implementation be stored in some array. | ||
|
||
### Keeping the usable gateway list fresh in the background | ||
|
||
Getting this list of gateways and maintaining if they should be used can take quite some time. The adviced approach here is to run each request in an async matter where the async flow follows the same flow as the above flowchart. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. adviced => advised* |
||
|
||
### Configuration options | ||
|
||
`concurrent requests` Defaults to 10. There must be a way to specify how many concurrent requests the `multi gateway client` does per IPFS request. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think default should be 6 requests here per => https://docs.diffusiondata.com/cloud/latest/manual/html/designguide/solution/support/connection_limitations.html#:~:text=Most%20modern%20browsers%20allow%20six,with%20any%20server%20or%20proxy. |
||
|
||
`max simultaneous cids` Defaults to 5. There must be a way to define how many simultaneous IPFS requests the `multi gateway client` can handle at any given time. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there a reason "5" was chosen here? It may make sense to set to 6 also for the reasons above |
||
|
||
`max total gateways in use` Defaults to 25. There must be a way to specify how many total gateways can be used for the `multi gateway client` as a whole. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is the reason for having a limit here? Seems to me that more would always be better |
||
|
||
`racing` Defaults to false. There must be a way to specify if `racing` should be used. Racing means the `multi gateway client` will ask at most the number of `concurrent requests` to all download the same data. The one who downloads it first if the one whose output is used, the rest is ignored. | ||
|
||
`verify raw` Defaults to true. This tells the `multi gateway client` implementation to verify RAW data as wel as CAR data. Setting this option to true (the default) means the `multi gateway client` is guaranteed to only give back valid data. If this option is set to false then raw data is returned as-is, unverified. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. typo: wel => well |
||
|
||
These options are set on a `multi gateway client` level and apply to each request. If no options are specified the defaults as listed should be applied. | ||
|
||
### Request method | ||
|
||
There must be a method to allow IPFS data retrieval. The input for this method must be an IPFS url in these forms: `ipfs://<cid>` and `ipns://<cid>`. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we introduce
|
||
|
||
The data retrieval for a given CID must adhere to the configuration options. | ||
|
||
There must be an async way to get the data represented by that CID. While the `multi gateway client` can handle any CID data, in it's default settings all data is being verified. If `verify raw` is set to false then raw data is passed back as-is. CAR data is always verified. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there any usecase to have CAR data returned without verifying? Probably not, but if so we should include an option for that as well There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would remove this, and clearly state the spec should always verify received bytes against expected CIDs. |
||
|
||
As an aside, a request is what the user puts in. For example: | ||
|
||
```cpp | ||
<client>::request("ipfs://bafyA"); | ||
<client>::request("ipfs://bafyB"); | ||
``` | ||
|
||
Is 2 requests. These count at `max simultaneous cids` where the default is 5 maximum. If there are more then `max simultaneous cids` then those that don't get handled will be put on a queue to be handled as soon as a slot becomes available. | ||
|
||
Internally that CID is represented by N different CIDs (each block). Say `bafyA` consists of 100 blocks (simplified depiction): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does this mean the client always sends the first request for a single block, deserialize it, and then send CAR request for its branches? This is fine for MVP i guess, but it is hard to make a good decision when to swith from block to CAR for a deeper DAG. Hannah made a demo during MTB5 and had some good ideas about adding option to fetch CAR with non-leave blocks first (metadata), and then fetching leaves with actuald ata at the end – wrote some notes in #348 (comment). It also included byte range requests, which are important for use cases like video seeking. I feel we should strongly consider adding these parameters to CAR requests, before this IPIP is finalized. |
||
|
||
``` | ||
bafyA | ||
| -- bafyA001 | ||
| -- bafyA002 | ||
| -- bafyA003 | ||
| -- ... | ||
| -- bafyA100 | ||
``` | ||
|
||
Now the internal parts of the request method should download 10 of those block at the same time in an async way. | ||
|
||
The same would be true for `bafyB`. | ||
|
||
There still is the `max total gateways in use` of 25 by default. Which in this setup means that with 3 user requests you'd saturate the entire budget. In this case the novice approach would case an inbalance of gateway usage: | ||
|
||
```cpp | ||
<client>::request("ipfs://bafyA"); // handled by 10 gateways | ||
<client>::request("ipfs://bafyB"); // handled by 10 gateways | ||
<client>::request("ipfs://bafyC"); // handled by 5 gateways | ||
``` | ||
|
||
The `multi gateway client` must detect that this N-th request is going to cause a gateway allocation inbalance and must rebalance the gateways. It should allocate an equal number of gateways to each request. If that equal number is below the `max total gateways in use` and there are requests with less then `concurrent requests` then the remainder of the budget should be allocated to one of those requests. A rebalanced example would look like this: | ||
|
||
```cpp | ||
<client>::request("ipfs://bafyA"); // handled by 9 gateways | ||
<client>::request("ipfs://bafyB"); // handled by 8 gateways | ||
<client>::request("ipfs://bafyC"); // handled by 8 gateways | ||
``` | ||
|
||
This rebalancing must always happen, also in currently running requests. | ||
|
||
## Test fixtures | ||
|
||
N/A | ||
|
||
## Design rationale | ||
|
||
### User benefit | ||
|
||
Users, applications using IPFS, get a defined way to find gateways to use. | ||
|
||
### Compatibility | ||
|
||
N/A | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Refer to
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe mention https://github.com/ipfs/specs/blob/main/http-gateways/PATH_GATEWAY.md#only-if-cached-head-behavior as mechanism for prioritizing gateways which already have the data? Shotgunning fetch request to 5 gateways and getting same data 5 times back is super wasteful. |
||
|
||
### Security | ||
|
||
N/A | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
|
||
### Alternatives | ||
|
||
N/A | ||
|
||
### Copyright | ||
|
||
Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/). | ||
|
||
## Open questions | ||
|
||
### CAR verification file | ||
|
||
Besides verifying for response headers, we should also define which blob we actually expect. Like a "Hello world" or "Hello IPFS". | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. for a quick heartbeat check, a CAR with single root for a zero-length block will be enough, and won't waste much bandwidth |
||
|
||
### Gateway returns list of gateways it knows? | ||
|
||
Is this feature intended to exist on gateways? Right now this isn't. | ||
|
||
### API? | ||
|
||
This spec is intentionally written without an API definition like WebIDL. The intention here is to just describe the working and leave an implementation of this entirely to the party implementing it. Does this work or should the spec be updated to include an interface too? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaving a note to remind you to link to IPIP-0280 once it is merged