8000 🐛(backend) race condition create doc by AntoLC · Pull Request #633 · suitenumerique/docs · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

🐛(backend) race condition create doc #633

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 22, 2025
Merged

🐛(backend) race condition create doc #633

merged 1 commit into from
Apr 22, 2025

Conversation

AntoLC
Copy link
Collaborator
@AntoLC AntoLC commented Feb 12, 2025

Purpose

When 2 docs are created almost at the same time, the second one will fail because the first one.
We get a unicity error on the path key already used ("impress_document_path_key").
This error creates e2e flakiness, we have some cases as well in production.

Proposal

To fix this issue, we will lock the table the time to create the document, the next query will wait for the lock to be released.

Other alternatives

A retry system ?

@AntoLC AntoLC added bug Something isn't working backend labels Feb 12, 2025
@AntoLC AntoLC self-assigned this Feb 12, 2025
@AntoLC AntoLC requested review from sampaccoud, qbey and lunika February 12, 2025 09:02
@AntoLC AntoLC force-pushed the fix/create-doc-race branch 3 times, most recently from 8275ab2 to 0aa61a0 Compare February 12, 2025 09:13
with connection.cursor() as cursor:
cursor.execute(
f'LOCK TABLE "{models.Document._meta.db_table}" ' # noqa: SLF001
"IN SHARE ROW EXCLUSIVE MODE;"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should add a comment explaining this mode.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be done when creating children as well? See children method on the viewset.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed with lunika, I don't think the lock should be managed by the viewset, is should be at the model/manager level.
And as you mentioned, the same could happen to children/siblings, it should be covered also.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We did try to add it in the model, see here, but it brings instability in our tests with the factory, a fix brings another instability, and so on.

This naive way is much more stable.
Most of my flakiness is gone as well:
image

with connection.cursor() as cursor:
cursor.execute(
f'LOCK TABLE "{models.Document._meta.db_table}" ' # noqa: SLF001
"IN SHARE ROW EXCLUSIVE MODE;"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be done when creating children as well? See children method on the viewset.

assert response1.status_code == 201
assert response2.status_code == 201


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add test for the children use case as well?

Copy link
Member
@lunika lunika Feb 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we are working on it but it's much more difficult to reproduce

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@lunika lunika force-pushed the fix/create-doc-race branch 2 times, most recently from 24dfb83 to e8b5006 Compare February 17, 2025 10:32
@AntoLC AntoLC force-pushed the fix/create-doc-race branch 2 times, most recently from bfe4f58 to c2e82b2 Compare February 26, 2025 14:20
@AntoLC AntoLC requested review from sampaccoud and lunika February 26, 2025 14:21

document = factories.DocumentFactory()

factories.UserDocumentAccessFactory(user=user, document=document, role="owner")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are not testing concurrency on children creation. But on parent creation. Children will be created on different parent so they have a different path, so you will never have a unique constraint error in you test.

Copy link
Collaborator Author
@AntoLC AntoLC Feb 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, look at here: https://github.com/suitenumerique/docs/pull/633/files#diff-719c99bace85294f27e3909b1067f53da483436c448af38d93a7856fb5ae36faR290

If you remove my patch on the children viewset, this test gives an error.

When 2 docs are created almost at the same time,
the second one will fail because the first one.
We get a unicity error on the path key already
used ("impress_document_path_key").
To fix this issue, we will lock the table the
time to create the document, the next query will
wait for the lock to be released.
@AntoLC AntoLC force-pushed the fix/create-doc-race branch from f96550c to 4307b4f Compare April 22, 2025 09:43
@AntoLC AntoLC merged commit 4307b4f into main Apr 22, 2025
29 of 32 checks passed
@AntoLC AntoLC deleted the fix/create-doc-race branch April 22, 2025 12:51
This was referenced May 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0