Implement index update rework #785

pietroalbini · 2020-05-29T14:11:11Z

Implementation work needs to be done to add the things proposed in this docs.rs RFC.

jyn514 · 2020-05-29T14:37:32Z

Some preliminary things can be done without switching completely:

storing the current commit in the database is pretty easy
we can implement the webhook endpoint in addition to the current index updates and run them both in parallel for a while to make sure they work
synchronization needs to be implemented before fully switching over to webhooks, because there is no analogue of peek_changes for webhooks (there's no way to ensure durability in case of a crash)

pietroalbini · 2020-05-29T14:55:37Z

synchronization needs to be implemented before fully switching over to webhooks, because there is no analogue of peek_changes for webhooks (there's no way to ensure durability in case of a crash)

Yes there is, if we update the hash of the last visited commit in the same database transaction as adding crates to the queue we'd get the same level of consistency we have today.

jyn514 · 2020-05-29T15:15:36Z

last visited commit

I don't think this makes sense in the context on web-hooks because we might not receive the hooks in the order the crates were released. If crates A and B release and we get the webhook for B before A, updating the commit to point to B would mean we could skip A.

Nemo157 · 2020-05-29T15:20:04Z

The webhook is just a trigger "check what crates were released between last commit and now", it will just call the exact same update code as the time based trigger does.

syphar · 2024-10-21T04:51:44Z

The future container-based setup for docs.rs will have:

web workers
build servers
one single registry watcher

which complicates the webhook setup, since the webhook is received by another physical server.

syphar · 2025-05-19T05:58:53Z

some notes [from this zulip thread](#t-crates-io > trigger docs.rs rebuilds from crates.io), cc @Mark-Simulacrum @Turbo87

if we want to drop the registry watcher from here we could go with two approaches:

to keep in mind

we're not only talking about new releases. All event types we handle are:

new release
version-delete
crate-delete
yank
unyank

For the new/delete & and yank/unyank events order is important.

queue between crates.io & docs.rs

We would use a queue system that guarantees order (SQS FIFO?). crates.io would enqueue all changes when it gets them, in order. Docs.rs would then, one by one, handle these.

Thinking about scaling, the order is only important on a crate-level, between crates it doesn't matter. The single-queue design might be limited then.

simple "ping" for all events, docs.rs fetches sparse index.

for all the events, crates.io would "ping" a web endpoint (for example), or we use a queue for these again.
docs.rs would then fetch the crate file from the sparse index, and diff the local database with the index. That diff algorithm has to correctly determine all the events between the docs.rs database state, and the index state.

There might be race conditions, when we are getting a new ping while we are handling an old one, for the same crate.
Which is why we would probably use database-level locking (on the crate level) to tell crates.io to back off or wait.

jyn514 added the C-tracking-issue Category: Tracking issue for an MCP which has been accepted or a category of bugs label Jul 6, 2020

syphar added the S-needs-design Status: There's a problem here, but no obvious solution; or the solution raises other questions label Oct 24, 2023

syphar added the P-low Low priority issues label Oct 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement index update rework #785

Implement index update rework #785

Implement index update rework #785

Implement index update rework #785

Comments

to keep in mind

queue between crates.io & docs.rs

simple "ping" for all events, docs.rs fetches sparse index.