8000 Some more Parquet writer performance improvements by lnkuiper · Pull Request #16287 · duckdb/duckdb · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Some more Parquet writer performance improvements #16287

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Feb 18, 2025

Conversation

lnkuiper
Copy link
Contributor

Follow-up of #16243. Scraping the bottom of the barrel here, as the previous PR got many of the biggest performance gains already.

This PR adds some more fast paths for when there are no NULLs, and implements a branchless hash function for string_t's that are inlined. This required some extra care to make sure that the hash function returns the same value whether the string is inlined or not.

Overall, the changes reduce the time it takes to write TPC-H SF10 lineitem to Parquet from ~2.6s to ~2.4s (with the default PARQUET_VERSION V1, ~2.5s to ~2.3s with V2).

@Mytherin Mytherin merged commit 219bafa into duckdb:main Feb 18, 2025
49 checks passed
@Mytherin
Copy link
Collaborator

Thanks!

Antonov548 added a commit to Antonov548/duckdb-r that referenced this pull request Mar 4, 2025
Some more Parquet writer performance improvements (duckdb/duckdb#16287)
krlmlr pushed a commit to duckdb/duckdb-r that referenced this pull request Mar 5, 2025
Some more Parquet writer performance improvements (duckdb/duckdb#16287)
krlmlr added a commit to duckdb/duckdb-r that referenced this pull request May 15, 2025
Some more Parquet writer performance improvements (duckdb/duckdb#16287)
krlmlr added a commit to duckdb/duckdb-r that referenced this pull request May 15, 2025
Some more Parquet writer performance improvements (duckdb/duckdb#16287)
krlmlr added a commit to duckdb/duckdb-r that referenced this pull request May 17, 2025
Some more Parquet writer performance improvements (duckdb/duckdb#16287)
krlmlr added a commit to duckdb/duckdb-r that referenced this pull request May 18, 2025
Some more Parquet writer performance improvements (duckdb/duckdb#16287)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0