Fix possible OOB mem access in Parquet decoder #17841

mhaseeb123 · 2025-01-28T21:14:48Z

Description

Fixes #17838. Related to #17702

This PR fixes a possible OOB in parquet string decoder when writing initial offset for nested large string cols. Existing tests should have been throwing segfaults in decoder kernels but somehow weren't. The decoder was producing correct results even without this change as the initial offsets are written from the first decoded ColumnChunk of each input column.

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

copy-pr-bot · 2025-01-28T21:14:52Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

cpp/src/io/parquet/decode_fixed.cu

PointKernel

One non-blocking suggestion otherwise looks good to me.

vuule · 2025-01-28T22:38:31Z

cpp/src/io/parquet/page_delta_decode.cu

-    compute_initial_large_strings_offset(
-      s, initial_str_offsets[pages[page_idx].chunk_idx], has_repetition);
+    auto const chunks_per_rowgroup = initial_str_offsets.size();
+    auto const input_col_idx       = pages[page_idx].chunk_idx % chunks_per_rowgroup;


whoops! So, this worked by the grace of the memory pool?

Yes, it seems like it. 😅

mhaseeb123 · 2025-01-29T00:10:54Z

/merge

Fix possible OOB mem access in Parquet decoder

f1847dc

github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Jan 28, 2025

github-actions bot assigned mhaseeb123 Jan 28, 2025

mhaseeb123 marked this pull request as ready for review January 28, 2025 21:15

mhaseeb123 requested a review from a team as a code owner January 28, 2025 21:15

mhaseeb123 requested review from devavret and pmattione-nvidia January 28, 2025 21:15

mhaseeb123 added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress labels Jan 28, 2025

mhaseeb123 requested a review from davidwendt January 28, 2025 21:19

mhaseeb123 commented Jan 28, 2025

View reviewed changes

cpp/src/io/parquet/decode_fixed.cu Show resolved Hide resolved

mhaseeb123 requested a review from vuule January 28, 2025 21:22

mhaseeb123 removed the ! - Hotfix Hotfix is a bug that affects the majority of users for which there is no reasonable workaround label Jan 28, 2025

PointKernel approved these changes Jan 28, 2025

View reviewed changes

davidwendt approved these changes Jan 28, 2025

View reviewed changes

Add comment to explain the use of new vars

1fb88f5

github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Jan 28, 2025

mhaseeb123 added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 3 - Ready for Review Ready for review by team libcudf Affects libcudf (C++/CUDA) code. labels Jan 28, 2025

vuule reviewed Jan 28, 2025

View reviewed changes

rapids-bot bot merged commit 95c69c3 into rapidsai:branch-25.02 Jan 29, 2025
109 checks passed

mhaseeb123 deleted the fix/pq-decoder-oob-mem-check branch January 29, 2025 20:59

GregoryKimball removed this from libcudf Mar 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix possible OOB mem access in Parquet decoder #17841

Fix possible OOB mem access in Parquet decoder #17841

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Fix possible OOB mem access in Parquet decoder #17841

Fix possible OOB mem access in Parquet decoder #17841

Uh oh!

Conversation

Uh oh!

Description

Checklist

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!