-
Notifications
You must be signed in to change notification settings - Fork 952
Fix possible OOB mem access in Parquet decoder #17841
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix possible OOB mem access in Parquet decoder #17841
Conversation
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One non-blocking suggestion otherwise looks good to me.
compute_initial_large_strings_offset( | ||
s, initial_str_offsets[pages[page_idx].chunk_idx], has_repetition); | ||
auto const chunks_per_rowgroup = initial_str_offsets.size(); | ||
auto const input_col_idx = pages[page_idx].chunk_idx % chunks_per_rowgroup; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whoops! So, this worked by the grace of the memory pool?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it seems like it. 😅
/merge |
Description
Fixes #17838. Related to #17702
This PR fixes a possible OOB in parquet string decoder when writing initial offset for nested large string cols. Existing tests should have been throwing segfaults in decoder kernels but somehow weren't. The decoder was producing correct results even without this change as the initial offsets are written from the first decoded ColumnChunk of each input column.
Checklist