Optimize DB requests in task list, quality conflicts list and cloudstorages #8275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughRecent updates enhance the reliability and performance of the CVAT application by improving data serialization and query optimization. Key changes include filtering empty task subsets in serializers, refining dimension retrieval logic, and expanding related fields in database queries for the Changes
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (3)
- cvat/apps/engine/serializers.py (1 hunks)
- cvat/apps/engine/views.py (1 hunks)
- cvat/apps/iam/permissions.py (1 hunks)
Additional comments not posted (3)
cvat/apps/iam/permissions.py (1)
51-57
: LGTM! But verify the function usage in the codebase.The code changes are approved.
However, ensure that all function calls to
get_organization
handle the new logic correctly.Verification successful
The function usage is verified and no issues are found.
The
get_organization
function is used correctly in the codebase, handling the new logic appropriately.
get_iam_context(request, obj)
usesget_organization
to retrieve the organization and then gets membership based on it.- Middleware setup uses
get_organization
to setrequest.iam_context
.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify all function calls to `get_organization` handle the new logic correctly. # Test: Search for the function usage. Expect: Only occurrences of the new logic. rg --type python -A 5 $'get_organization'Length of output: 71
Script:
#!/bin/bash # Description: Verify all function calls to `get_organization` handle the new logic correctly. # Find all Python files and search for the `get_organization` function usage within them. fd -e py -x rg 'get_organization' {}Length of output: 498
Script:
#!/bin/bash # Description: Verify all function calls to `get_organization` handle the new logic correctly. # Find all Python files and search for the `get_organization` function usage within them, including more context lines. fd -e py -x rg 'get_organization' -A 10 -B 5 {}Length of output: 3618
cvat/apps/engine/serializers.py (1)
1363-1370
: LGTM!The changes improve the robustness of the
to_representation
method by ensuring only valid, non-empty task subsets are included and a valid dimension is returned.cvat/apps/engine/views.py (1)
1708-1719
: Optimize database query performance by expandingselect_related
.The added fields in the
select_related
method (segment__task__source_storage
,segment__task__target_storage
,segment__task__organization
,segment__task__project__organization
,segment__task__owner
,segment__task__project__owner
) will help optimize database queries by pre-fetching related data. This change should improve performance when accessing these fields in the viewset.
/api/jobs now takes longer time (on queries metric) Each /api/jobs/id/preview also now takes longer time (on queries metric) [significantly] |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #8275 +/- ##
===========================================
- Coverage 73.94% 73.91% -0.03%
===========================================
Files 435 435
Lines 45595 45661 +66
Branches 3924 3924
===========================================
+ Hits 33716 33752 +36
- Misses 11879 11909 +30
🚀 New features to boost your workflow:
|
I've increased the page_size to get more representative output: Baseline: Select_related (source_storage, target_storage, organization, owner), 12 joins: Select_related (source_storage, target_storage) + prefetch_related(organization, owner), 10 joins: And the same, but for 12 jobs (as UI presents): I think, it makes sense to choose one of the 2 optimized variants. I think, the version with 10 joins is a good tradeoff for now, it provides a good balance between the two others. |
|
|
job_counts = { | ||
task["id"]: task | ||
for task in models.Task.objects | ||
.filter(id__in=page_task_ids) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just thought of something WRT this and similar optimizations - how well does this behave when page_size
is all
and there are a lot of tasks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
page_size=all
must be allowed only in development deployments. It can crash the server because of OOM.
|
||
if is_field_cached(db_task, "segment_set"): | ||
# Refresh segments to report actual dates if they were fetched previously | ||
# Doing so without a check leads to an error if the related object is not prefetched |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, I think that's a bug, so I filed a Django bug report: https://code.djangoproject.com/ticket/36372.
|
Motivation and context
Notes:
/api/quality/conflicts/
- will be further changed in Project quality #9116, but improved for now/api/requests/
- long processing for getting a list of requestsTry to optimizeSo far the chosen approach is to provide an optimized implementation in a dedicatedcount()
requests for list endpoints. Because of the filter.distinct used, we get slow requests. Potentially can be optimized inside paginationlist_serializer_class.to_representation()
.How has this been tested?
Checklist
develop
branch(cvat-canvas,
cvat-core,
cvat-data and
cvat-ui)
License
Feel free to contact the maintainers if that's a concern.