8000 Fix possible OOB mem access in Parquet decoder by mhaseeb123 · Pull Request #17841 · rapidsai/cudf · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Fix possible OOB mem access in Parquet decoder #17841

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

mhaseeb123
Copy link
Member
@mhaseeb123 mhaseeb123 commented Jan 28, 2025

Description

Fixes #17838. Related to #17702

This PR fixes a possible OOB in parquet string decoder when writing initial offset for nested large string cols. Existing tests should have been throwing segfaults in decoder kernels but somehow weren't. The decoder was producing correct results even without this change as the initial offsets are written from the first decoded ColumnChunk of each input column.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

Copy link
copy-pr-bot bot commented Jan 28, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Jan 28, 2025
@mhaseeb123 mhaseeb123 added bug Something isn't working 2 - In Progress Currently a work in progress ! - Hotfix Hotfix is a bug that affects the majority of users for which there is no reasonable workaround strings strings issues (C++ and Python) cuIO cuIO issue non-breaking Non-breaking change and removed libcudf Affects libcudf (C++/CUDA) code. labels Jan 28, 2025
@mhaseeb123 mhaseeb123 marked this pull request as ready for review January 28, 2025 21:15
@mhaseeb123 mhaseeb123 requested a review from a team as a code owner January 28, 2025 21:15
@mhaseeb123 mhaseeb123 added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress labels Jan 28, 2025
@mhaseeb123 mhaseeb123 requested a review from davidwendt January 28, 2025 21:19
@mhaseeb123 mhaseeb123 requested a review from vuule January 28, 2025 21:22
@mhaseeb123 mhaseeb123 removed the ! - Hotfix Hotfix is a bug that affects the majority of users for which there is no reasonable workaround label Jan 28, 2025
Copy link
Member
@PointKernel PointKernel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One non-blocking suggestion otherwise looks good to me.

@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Jan 28, 2025
@mhaseeb123 mhaseeb123 added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 3 - Ready for Review Ready for review by team libcudf Affects libcudf (C++/CUDA) code. labels Jan 28, 2025
compute_initial_large_strings_offset(
s, initial_str_offsets[pages[page_idx].chunk_idx], has_repetition);
auto const chunks_per_rowgroup = initial_str_offsets.size();
auto const input_col_idx = pages[page_idx].chunk_idx % chunks_per_rowgroup;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whoops! So, this worked by the grace of the memory pool?

Copy link
Member Author
@mhaseeb123 mhaseeb123 Jan 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it seems like it. 😅

@mhaseeb123
Copy link
Member Author

/merge

@rapids-bot rapids-bot bot merged commit 95c69c3 into rapidsai:branch-25.02 Jan 29, 2025
109 checks passed
@mhaseeb123 mhaseeb123 deleted the fix/pq-decoder-oob-mem-check branch January 29, 2025 20:59
@GregoryKimball GregoryKimball removed this from libcudf Mar 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge bug Something isn't working cuIO cuIO issue non-breaking Non-breaking change strings strings issues (C++ and Python)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] ParquetStringsTest.ChunkedReadNestedLargeStrings compute-sanitizer error
4 participants
0