8000 Null-handling for Transforms by lamarrr · Pull Request #18845 · rapidsai/cudf · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Null-handling for Transforms #18845

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 27 commits into from
May 29, 2025
Merged

Conversation

lamarrr
Copy link
Contributor
@lamarrr lamarrr commented May 15, 2025

Description

This merge request implements null-handling for Transforms. The implementation outputs a null row if any of its input rows are null. The output bitmask is computed in a single pass using the null-masks of the input columns before the transform function is run on the valid output rows.

It also handles the scalar input edge cases:

  • Scalar is not null; implies it is valid across the base column's size. We also can't use its bitmask as it is a bitmask of size 1, otherwise, we'd encounter an out-of-bounds error. We exclude the bitmask from the bitmask_and operation.

  • Scalar is null; implies it is null across the base column's size. This would make all the output rows null. The scalar's bitmask is also of size 1. In this case, we output a null-mask of the base column's size, and mark all the elements as null.

Follows up on #18023, #18820

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

Copy link
copy-pr-bot bot commented May 15, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label May 15, 2025
@lamarrr lamarrr added feature request New feature or request breaking Breaking change labels May 19, 2025
@github-actions github-actions bot added the Java Affects Java cuDF API. label May 19, 2025
@lamarrr
Copy link
Contributor Author
lamarrr commented May 19, 2025

/ok to test d8b0433

@lamarrr
Copy link
Contributor Author
lamarrr commented May 19, 2025

/ok to test 8dded0a

@lamarrr
Copy link
Contributor Author
lamarrr commented May 19, 2025

/ok to test 08ecef2

@lamarrr lamarrr marked this pull request as ready for review May 19, 2025 23:34
@lamarrr lamarrr requested review from a team as code owners May 19, 2025 23:34
@lamarrr lamarrr changed the base branch from branch-25.06 to branch-25.08 May 19, 2025 23:42
@lamarrr lamarrr requested a review from kingcrimsontianyu May 29, 2025 17:08
lamarrr and others added 2 commits May 29, 2025 20:54
Co-authored-by: Nghia Truong <7416935+ttnghia@users.noreply.github.com>
@lamarrr
Copy link
Contributor Author
lamarrr commented May 29, 2025

/merge

@rapids-bot rapids-bot bot merged commit 4474202 into rapidsai:branch-25.08 May 29, 2025
91 checks passed
TomAugspurger pushed a commit to TomAugspurger/pygdf that referenced this pull request May 30, 2025
This merge request implements null-handling for Transforms. The implementation outputs a null row if any of its input rows are null. The output bitmask is computed in a single pass using the null-masks of the input columns before the transform function is run on the valid output rows.

It also handles the scalar input edge cases:

- Scalar is not null; implies it is valid across the base column's size. We also can't use its bitmask as it is a bitmask of size 1, otherwise, we'd encounter an out-of-bounds error. We exclude the bitmask from the `bitmask_and` operation.

- Scalar is null; implies it is null across the base column's size. This would make all the output rows null. The scalar's bitmask is also of size 1. In this case, we output a null-mask of the base column's size, and mark all the elements as null.


Follows up on rapidsai#18023, rapidsai#18820

Authors:
  - Basit Ayantunde (https://github.com/lamarrr)

Approvers:
  - David Wendt (https://github.com/davidwendt)
  - Tianyu Liu (https://github.com/kingcrimsontianyu)
  - Muhammad Haseeb (https://github.com/mhaseeb123)
  - Nghia Truong (https://github.com/ttnghia)

URL: rapidsai#18845
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking Breaking change feature request New feature or request Java Affects Java cuDF API. libcudf Affects libcudf (C++/CUDA) code.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0