-
Notifications
You must be signed in to change notification settings - Fork 287
🐛(backend) race condition create doc #633
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
8275ab2
to
0aa61a0
Compare
with connection.cursor() as cursor: | ||
cursor.execute( | ||
f'LOCK TABLE "{models.Document._meta.db_table}" ' # noqa: SLF001 | ||
"IN SHARE ROW EXCLUSIVE MODE;" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should add a comment explaining this mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be done when creating children as well? See children method on the viewset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed with lunika, I don't think the lock should be managed by the viewset, is should be at the model/manager level.
And as you mentioned, the same could happen to children/siblings, it should be covered also.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We did try to add it in the model, see here, but it brings instability in our tests with the factory, a fix brings another instability, and so on.
This naive way is much more stable.
Most of my flakiness is gone as well:
with connection.cursor() as cursor: | ||
cursor.execute( | ||
f'LOCK TABLE "{models.Document._meta.db_table}" ' # noqa: SLF001 | ||
"IN SHARE ROW EXCLUSIVE MODE;" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be done when creating children as well? See children method on the viewset.
assert response1.status_code == 201 | ||
assert response2.status_code == 201 | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add test for the children use case as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we are working on it but it's much more difficult to reproduce
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
24dfb83
to
e8b5006
Compare
bfe4f58
to
c2e82b2
Compare
|
||
document = factories.DocumentFactory() | ||
|
||
factories.UserDocumentAccessFactory(user=user, document=document, role="owner") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are not testing concurrency on children creation. But on parent creation. Children will be created on different parent so they have a different path, so you will never have a unique constraint error in you test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, look at here: https://github.com/suitenumerique/docs/pull/633/files#diff-719c99bace85294f27e3909b1067f53da483436c448af38d93a7856fb5ae36faR290
If you remove my patch on the children viewset, this test gives an error.
c2e82b2
to
f96550c
Compare
When 2 docs are created almost at the same time, the second one will fail because the first one. We get a unicity error on the path key already used ("impress_document_path_key"). To fix this issue, we will lock the table the time to create the document, the next query will wait for the lock to be released.
f96550c
to
4307b4f
Compare
Purpose
When 2 docs are created almost at the same time, the second one will fail because the first one.
We get a unicity error on the path key already used ("impress_document_path_key").
This error creates e2e flakiness, we have some cases as well in production.
Proposal
To fix this issue, we will lock the table the time to create the document, the next query will wait for the lock to be released.
Other alternatives
A retry system ?