8000 Implement index update rework · Issue #785 · rust-lang/docs.rs · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Implement index update rework #785

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
pietroalbini opened this issue May 29, 2020 · 6 comments
Open

Implement index update rework #785

pietroalbini opened this issue May 29, 2020 · 6 comments
Labels
C-tracking-issue Category: Tracking issue for an MCP which has been accepted or a category of bugs P-low Low priority issues S-needs-design Status: There's a problem here, but no obvious solution; or the solution raises other questions

Comments

@pietroalbini
Copy link
Member
pietroalbini commented May 29, 2020

Implementation work needs to be done to add the things proposed in this docs.rs RFC.

@jyn514
Copy link
Member
jyn514 commented May 29, 2020

Some preliminary things can be done without switching completely:

  • storing the current commit in the database is pretty easy
  • we can implement the webhook endpoint in addition to the current index updates and run them both in parallel for a while to make sure they work
  • synchronization needs to be implemented before fully switching over to webhooks, because there is no analogue of peek_changes for webhooks (there's no way to ensure durability in case of a crash)

@pietroalbini
Copy link
Member Author

synchronization needs to be implemented before fully switching over to webhooks, because there is no analogue of peek_changes for webhooks (there's no way to ensure durability in case of a crash)

Yes there is, if we update the hash of the last visited commit in the same database transaction as adding crates to the queue we'd get the same level of consistency we have today.

@jyn514
Copy link
Member
jyn514 commented May 29, 2020

last visited commit

I don't think this makes sense in the context on web-hooks because we might not receive the hooks in the order the crates were released. If crates A and B release and we get the webhook for B before A, updating the commit to point to B would mean we could skip A.

@Nemo157
Copy link
Member
Nemo157 commented May 29, 2020

The webhook is just a trigger "check what crates were released between last commit and now", it will just call the exact same update code as the time based trigger does.

@jyn514 jyn514 added the C-tracking-issue Category: Tracking issue for an MCP which has been accepted or a category of bugs label Jul 6, 2020
@syphar syphar added the S-needs-design Status: There's a problem here, but no obvious solution; or the solution raises other questions label Oct 24, 2023
@syphar syphar added the P-low Low priority issues label Oct 21, 2024
@syphar
Copy link
Member
syphar commented Oct 21, 2024

The future container-based setup for docs.rs will have:

  • web workers
  • build servers
  • one single registry watcher

which complicates the webhook setup, since the webhook is received by another physical server.

@syphar
Copy link
Member
syphar commented May 19, 2025

some notes [from this zulip thread](#t-crates-io > trigger docs.rs rebuilds from crates.io), cc @Mark-Simulacrum @Turbo87

if we want to drop the registry watcher from here we could go with two approaches:

to keep in mind

we're not only talking about new releases. All event types we handle are:

  • new release
  • version-delete
  • crate-delete
  • yank
  • unyank

For the new/delete & and yank/unyank events order is important.

queue between crates.io & docs.rs

We would use a queue system that guarantees order (SQS FIFO?). crates.io would enqueue all changes when it gets them, in order. Docs.rs would then, one by one, handle these.

Thinking about scaling, the order is only important on a crate-level, between crates it doesn't matter. The single-queue design might be limited then.

simple "ping" for all events, docs.rs fetches sparse index.

  • for all the events, crates.io would "ping" a web endpoint (for example), or we use a queue for these again.
  • docs.rs would then fetch the crate file from the sparse index, and diff the local database with the index. That diff algorithm has to correctly determine all the events between the docs.rs database state, and the index state.

There might be race conditions, when we are getting a new ping while we are handling an old one, for the same crate.
Which is why we would probably use database-level locking (on the crate level) to tell crates.io to back off or wait.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-tracking-issue Category: Tracking issue for an MCP which has been accepted or a category of bugs P-low Low priority issues S-needs-design Status: There's a problem here, but no obvious solution; or the solution raises other questions
Projects
None yet
Development

No branches or pull requests

4 participants
0