8000 handle backoff and bundle requests for API timing by bmschmidt · Pull Request #81 · nomic-ai/ts-nomic · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

handle backoff and bundle requests for API timing #81

New issue
8000

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

bmschmidt
Copy link
Collaborator
@bmschmidt bmschmidt commented Oct 24, 2024

Nomic's embedding API is rate limited to 2 requests per second, but these can include multiple embeddings. This PR does two things.

  1. Batches together all requests in the Embedder class into 510 ms groups to ensure that users are automatically kept within the rate limit.
  2. Respects the 429s newly sent from the API server with exponential backoff of up to 8 seconds.

Important

Batch requests in Embedder class every 510 ms and handle 429 errors with exponential backoff to comply with API rate limits.

  • Behavior:
    • Batches requests in Embedder class every 510 ms to comply with API rate limit of 2 requests per second.
    • Implements exponential backoff up to 8 seconds for 429 errors in flushDeferredEmbeddings().
  • Constants:
    • Increases BATCH_SIZE from 32 to 400 in embedding.ts.
  • Error Handling:
    • Re-queues failed requests due to 429 errors in flushDeferredEmbeddings().
    • Throws error if embedQueue exceeds 100,000 items in embed().
  • Misc:
    • Adjusts setTimeout in periodicallyFlushCache() to 510 ms.

This description was created by Ellipsis for 2dcf449. It will automatically update as commits are pushed.

Copy link
Collaborator Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @bmschmidt and the rest of your teammates on Graphite Graphite

@bmschmidt bmschmidt marked this pull request as ready for review October 24, 2024 21:49
@bmschmidt bmschmidt requested a review from apage43 October 24, 2024 21:49
Copy link
Member
apage43 commented Oct 24, 2024

(1200 request / 300 seconds) is actually 4 rps but this is probably fine

if there were multiple potential things doing embedding requests (I don't think there currently are?) then it should also be fine to burst if needed to keep things snappy - you could send an interactive request immediately and add what would have been the remaining delay to the next background flush to compensate (though would not bother with this if its responsive enough anyway)

Copy link
Member
apage43 commented Oct 24, 2024

oh it looks like this already does first-request-immediately

Copy link
Collaborator Author

Oh I thought the new limit was 600 requests, ok.

@bmschmidt bmschmidt force-pushed the 10-24-handle_backoff_and_bundle_requests_for_api_timing branch from fad8540 to 2dcf449 Compare October 24, 2024 23:19
})
.catch((err) => {
// TODO: -- not the right way to test the error type!
if (('' + err).match(/50[0-9]|429/)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using a more robust method to check for HTTP status codes, such as inspecting the error object directly if it contains the status code, instead of relying on string matching.

Copy link
Collaborator Author
bmschmidt commented Oct 29, 2024

Merge activity

  • Oct 29, 4:05 PM EDT: A user started a stack merge that includes this pull request via Graphite.
  • Oct 29, 4:05 PM EDT: A user merged this pull request with Graphite.

@bmschmidt bmschmidt merged commit d5893f3 into main Oct 29, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0