8000 Reduce allocations during aggregations by lnkuiper · Pull Request #16849 · duckdb/duckdb · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Reduce allocations during aggregations #16849

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 24 commits into from
Mar 31, 2025
Merged

Conversation

lnkuiper
Copy link
Contributor
@lnkuiper lnkuiper commented Mar 26, 2025

This PR started off as a minor performance improvement to the string hashing function (to use a fixed Load instead of a variable-size memcpy) but ended up with a lot of effort to reduce allocations during aggregation, which I profiled using heaptrack.

In the current main, when doing SELECT count(DISTINCT UserID) FROM hits; (ClickBench), DuckDB would do >700K allocations (this includes startup, which does a bunch of allocations, e.g., SingleFileBlockManager::LoadFreeList() and InMemoryLogStorage::InMemoryLogStorage. Note that this is on c6a.metal (192 vCPUs).

With the changes in this PR, the total allocation count is down to just >300K. Considering that this includes the startup, the total number of allocations done during aggregation has gone down significantly.

Note that this does not reduce RSS (it's about the same), but just the number of allocations, which should reduce contention as memory is a shared resource.

EDIT: I checked how many allocations there are if we don't do a grouped aggregation, i.e. select sum(userid) from hits;, and it's ~141k. So initially, the query did ~708k allocations, of which 708 - 141 = ~567k were done by the grouped aggregation. Now, we're doing 316k allocations, of which 316 - 141 = ~175k are done by the grouped aggregation, which is a >3x reduction.

@duckdb-draftbot duckdb-draftbot marked this pull request as draft March 26, 2025 15:32
@lnkuiper lnkuiper marked this pull request as ready for review March 28, 2025 11:53
@lnkuiper lnkuiper requested a review from Mytherin March 31, 2025 08:02
@Mytherin Mytherin merged commit b53879b into duckdb:main Mar 31, 2025
52 checks passed
@Mytherin
Copy link
Collaborator

Thanks!

@lnkuiper lnkuiper deleted the string_hash_tweak branch April 14, 2025 09:10
krlmlr added a commit to duckdb/duckdb-r that referenced this pull request May 15, 2025
krlmlr added a commit to duckdb/duckdb-r that referenced this pull request May 15, 2025
krlmlr added a commit to duckdb/duckdb-r that referenced this pull request May 16, 2025
krlmlr added a commit to duckdb/duckdb-r that referenced this pull request May 16, 2025
krlmlr added a commit to duckdb/duckdb-r that referenced this pull request May 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0