-
Notifications
You must be signed in to change notification settings - Fork 5
handle backoff and bundle requests for API timing #81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
handle backoff and bundle requests for API timing #81
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. Join @bmschmidt and the rest of your teammates on |
(1200 request / 300 seconds) is actually 4 rps but this is probably fine if there were multiple potential things doing embedding requests (I don't think there currently are?) then it should also be fine to burst if needed to keep things snappy - you could send an interactive request immediately and add what would have been the remaining delay to the next background flush to compensate (though would not bother with this if its responsive enough anyway) |
oh it looks like this already does first-request-immediately |
Oh I thought the new limit was 600 requests, ok. |
fad8540
to
2dcf449
Compare
}) | ||
.catch((err) => { | ||
// TODO: -- not the right way to test the error type! | ||
if (('' + err).match(/50[0-9]|429/)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider using a more robust method to check for HTTP status codes, such as inspecting the error object directly if it contains the status code, instead of relying on string matching.
Nomic's embedding API is rate limited to 2 requests per second, but these can include multiple embeddings. This PR does two things.
Embedder
class into 510 ms groups to ensure that users are automatically kept within the rate limit.Important
Batch requests in
Embedder
class every 510 ms and handle 429 errors with exponential backoff to comply with API rate limits.Embedder
class every 510 ms to comply with API rate limit of 2 requests per second.flushDeferredEmbeddings()
.BATCH_SIZE
from 32 to 400 inembedding.ts
.flushDeferredEmbeddings()
.embedQueue
exceeds 100,000 items inembed()
.setTimeout
inperiodicallyFlushCache()
to 510 ms.This description was created by
for 2dcf449. It will automatically update as commits are pushed.