-
Notifications
You must be signed in to change notification settings - Fork 235
IPIP-504: provider
query parameter as hint for HTTP Gateways
#504
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
2d56874
to
c5e690a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
first read through.. makes sense and seems like it could really benefit the ecosystem. One problem I can imagine is that people start encoding dynamic IP address providers in urls and they quickly become useless, so we should probably call that out.
|
||
The CID is the core of a Provider-Hinted URI. Clients MUST extract the CID before evaluating any hints. The format is designed to be compatible with current IPFS like URIs, while explicitly defining how to locate the CID and interpret `provider` query parameters. | ||
|
||
#### CID Extraction Rules |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like the below details should be linking to another spec i'm sure we have written somewhere
#### Query Parameter: `provider` | ||
|
||
- Name: `provider` | ||
- Type: URI Query Parameter (repeating allowed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why repeating parameter instead of comma-delimited?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably to make spec simpler to implement. Some reasons to repeat instead of comma-separating:
- Future-proof in case future values may include
,
? - URL-escaping: in many standard libraries. In Golang,
items=apple,banana,orange
gets turned into request forhttps://example.com/?items=apple%2Cbanana%2Corange
. This means your server code needs to url-decode before splitting at,
. Going with repeated parameters avoids this complexity. - Prior art of Magnet links, which do not support comma-separated values, and
magnet:?tr=udp://tracker1:80&tr=udp://tracker2:80
is how one specifies multiple trackers
- Ignore all `provider` parameters (if unsupported). | ||
- Evaluate hints in order of appearance (left-to-right). | ||
- Evaluate hints in parallel. | ||
- Apply their own prioritization or fallback strategies. If all hints fail, clients SHOULD fall back to default discovery strategies (e.g., DHT/IPNI), if available. Or even rely on discovery strategies in parallel. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or even rely on discovery strategies in parallel.
I feel like if there are provider hints, we SHOULD attempt to process them before IPNI/DHT to prevent any negative cascading network effort
- Evaluate hints in parallel. | ||
- Apply their own prioritization or fallback strategies. If all hints fail, clients SHOULD fall back to default discovery strategies (e.g., DHT/IPNI), if available. Or even rely on discovery strategies in parallel. | ||
|
||
Note that the `multiaddr` string should point to the `origin` server where given CID is provided, and not include the actual CID in the Hint multiaddr as a subdomain/path. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are CIDs valid in multiaddr at all (besides Peer ID)? might be worth linking to multiaddr spec: https://github.com/libp2p/specs/blob/master/addressing/README.md#multiaddr-in-libp2p
provider
query parameter as hint for HTTP Gateways
|
||
TODO | ||
|
||
How will end users benefit from this work? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Things to add here:
- Empower user with ability to exfiltrate / migrate data from providers that do not announce data on Amino DHT and/or during IPNI (
cid.contact
) outages - Improved initial seeding: opportunistically append
/p2p/peerid
hint to links generated by IPFS Desktop (if publicly diallable)
|
||
## Motivation | ||
|
||
Content-addressable systems, such as IPFS, allow data to be identified by the hash of its contents (CID), enabling verifiable, immutable references. However, retrieving content typically relies on side content discovery systems (e.g. DHT, IPNI), even when a client MAY know one (or more) provider of the bytes. A provider in this context is any node, peer, gateway, or service that can serve content identified by a CID. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I understand that there's a latency improvement that can be had here by hard-coding a provider into the URL in practice when I've seen this come up in the past it's been due to people not wanting to use the "mainnet" routing systems while sort of pretending that the content is available via mainnet (e.g. a pinning service not wanting to advertise their data to the Amino DHT or IPNI, but instead have their users use URIs like ipfs://bafyfoo?provider=<the-pinning-service>
. In this light this proposal seems more likely to harm than help the IPFS ecosystem.
Some examples:
- URIs become ephemeral. While encoding
ipfs://bafyfoo
used to not be ephemeralipfs://bafyfoo?provider=<pinning-service-that-had-the-cid-when-the-link-was-made>
is now ephemeral. Yes, you could fallback to ignoring the provider but:- If the ecosystem has come to rely on them for routing then the link is just broken
- If the ecosystem has come to rely on them for a performance boost then users encoding
ipfs://bafyfoo?provider=<pinning-service-that-had-the-data-when-the-link-was-made>
into their applications, smart contracts, etc. will now need to figure out how to update theprovider
part of the URI vs previously when they could just be ephemeral
- This incentivizes pinning service lock-in where moving data off a given provider means latency goes up for all related links unless all the places the link has been shared are updated
Some alternatives to this approach that seem like they resolve much of the problems for users:
- Invest in improving the routing system(s) for mainnet which have received really very little investment over the past several years and which could really use it
- Application developers can hard-code additional routing systems into their applications. For example, if the developers for a given dApp are already hosting all their data and/or paying a pinning service to host the data saying to check that endpoint first seems fine and is inline with how many of them already operate by hard-coding a gateway endpoint provided by the pinning service they pay for storage
It'd be useful to understand why the benefits of this outweigh the associated ecosystem risks
|
||
- Name: `provider` | ||
- Type: URI Query Parameter (repeating allowed) | ||
- Value: Multiaddr string (`?provider=multiaddr`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Independent of my main objection to the idea of provider
I'm not sure multiaddr is a great idea here. Maybe it's the best we have, but my suspicion is that importing the not-really-existent multiaddr spec (due to many years of neglect from the libp2p side of things here) into the gateway spec is a pretty unfortunate dependency.
In Go we saw so many badly written parsers of multiaddrs that there's an in progress regex-like library to try to make them easier to work with.
Based on initial exploration https://github.com/vasco-santos/provider-hinted-uri/blob/main/EXPLORATION.md