-
-
Notifications
You must be signed in to change notification settings - Fork 177
Re-authenticate if the session is closed by a concurrent request #1031
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
fc76eaa
to
1efbcf7
Compare
31b36db
to
b3be3e2
Compare
a305e26
to
d60ffc2
Compare
Confirmed to be fixing the issue — see the #980 comments. This change was suspected to cause memory leaks, but later it was proven the leak is not related to this particular change and is reproduced with the main branch, too. Hence, this PR is good to go (after a final quick look, for certainty & safety). |
d60ffc2
to
af0847d
Compare
Signed-off-by: Sergey Vasilyev <nolar@nolar.info>
…closed Signed-off-by: Sergey Vasilyev <nolar@nolar.info>
… one by the handler name Signed-off-by: Sergey Vasilyev <nolar@nolar.info>
…fety Otherwise, with the lock & toggle combined, it led to situations where the lock was released after invalidation, leaving the "current" set empty and only triggering the new re-authentication. At the same time, the lock was acquired by a parallel task that has already checked the readiness state and got it "on" (before invalidation), but was waiting on the lock to actually yield the credentials. As a result, the `select()` call failed because the current set of credentials turned out to be empty by that moment. Now, all operations of both credentials population/invalidation, so as checking for the readiness state, are done under the same lock/condition. This should ensure the lack of inconsistent states between the toggle & the sets of credentials in different moments in time. Signed-off-by: Sergey Vasilyev <nolar@nolar.info>
615e2b3
to
5b9cd71
Compare
Would it be possible to cut a new release once this is merged? Thanks again for your great work :) |
1 task
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Several concurrent requests can step on each others toes while catching HTTP 401s: one of them will catch 401 and cause the re-auth normally (by invalidating the APIContext via the Vault), while others will be left with the closed session and will not be able even to try executing their retried requests to get that 401. Instead, they will get a generic RuntimeError("Session is closed") from aiohttp.
The concurrent requests are highly likely if the API returned some other errors previously, so that the requests went into the back-off sleep for a few seconds before retrying.
A proper fix would be to retry with a new session, but we currently have no mechanisms to replace the session in a context object (safely). Hence, escalate and try again with new credentials (loosing the retry counter as a side effect).
Under the hood, this concurrency caused conflicts in the vault over the state of the current and invalid credentials:
The credentials invalidation happens by the key only, which is the login handler name. It does not feed the actual credentials object that failed with 401 and thus must be remembered as broken. As a result, the 2nd, 3rd, and following failed API streams invalidated the valid credentials, which was acquired after the 1st failure and re-authentication. Therefore, the following re-authentications that tried to add the same credentials, failed — because that credentials was known to be invalid.
The second contributng factor was the inconsistency between the vault's credentials set (
._current
) and its boolean readiness (._state
), protected by different locks.Specifically, the case was this:
Vault._ready._condition
— and gets it, and proceeds further.Vault._lock
, leaves the lock, initiates the re-auth again.Vault._lock
and acquires it.Most likely it was overlooked before in Python 3.10-3.11 or so — because they were relatively slow and the sequence of operations managed to be fast, the event loop control was not going out of the routines fast enough on
await
. Now everything is fasy, so the control leaves the coroutines more often.Now, all these operations are protected under the same condition object, and there should be no unprotected inconsistencies between the current credentials and the vault's boolean readiness state.
TODO: Requires tests to simulate the situation — if the hypothesis is correct.
Presumably fixes #980 #1158