8000 Implement `lists::stable_sort_lists` for stable sorting of elements within each row of lists column by ttnghia · Pull Request #9425 · rapidsai/cudf · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Implement lists::stable_sort_lists for stable sorting of elements within each row of lists column #9425

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Oct 18, 2021

Conversation

ttnghia
Copy link
Contributor
@ttnghia ttnghia commented Oct 13, 2021

This PR adds lists::stable_sort_lists that can sort elements within rows of lists column using stable sort. This is necessary for implementing lists::drop_list_duplicates that operates on keys-values columns input when we want to remove the values corresponding to duplicate keys with KEEP_FIRST or KEEP_LAST option.

In order to implement lists::stable_sort_lists, stable sort versions for the segmented_sorted_order and segmented_sort_by_key have also been implemented, which can maintain the order of equally-compared elements within segments.

This PR blocks #9345.

@ttnghia ttnghia added feature request New feature or request 3 - Ready for Review Ready for review by team libcudf blocker libcudf Affects libcudf (C++/CUDA) code. Spark Functionality that helps Spark RAPIDS non-breaking Non-breaking change labels Oct 13, 2021
@ttnghia ttnghia self-assigned this Oct 13, 2021
@ttnghia ttnghia requested a review from a team as a code owner October 13, 2021 03:13
@rapidsai rapidsai deleted a comment from codecov bot Oct 13, 2021
@ttnghia ttnghia changed the title Implement stable_segmented_sorted_order and stable_segmented_sort_by_key Implement lists::stable_sort_lists for stable sorting of elements within each row of lists column Oct 13, 2021
@rapidsai rapidsai deleted a comment from codecov bot Oct 13, 2021
@rapidsai rapidsai deleted a comment from codecov bot Oct 13, 2021
@rapidsai rapidsai deleted a comment from codecov bot Oct 14, 2021
@mythrocks
Copy link
Contributor

From looking at the tests, it looks like we only support stable-sorting of single-level lists. (i.e. nested types aren't supported.) Is that correct?
If yes, could we please make mention of that in the doxygen, and the commit message (i.e. description of this PR)?

@rapidsai rapidsai deleted a comment from codecov bot Oct 16, 2021
@rapidsai rapidsai deleted a comment from codecov bot Oct 16, 2021
@ttnghia ttnghia requested a review from mythrocks October 18, 2021 16:30
Copy link
Contributor
@codereport codereport left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, just one small comment

@rapidsai rapidsai deleted a comment from codecov bot Oct 18, 2021
@rapidsai rapidsai deleted a comment from codecov bot Oct 18, 2021
@rapidsai rapidsai deleted a comment from codecov bot Oct 18, 2021
Copy link
Contributor
@mythrocks mythrocks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The enum, the renamed common method, and the reordering of the parameters make this an easier read. Thanks, @ttnghia. LGTM.

@ttnghia
Copy link
Contributor Author
ttnghia commented Oct 18, 2021

Rerun tests.

@codecov
Copy link
codecov bot commented Oct 18, 2021

Codecov Report

Merging #9425 (29538f5) into branch-21.12 (ab4bfaa) will increase coverage by 0.04%.
The diff coverage is n/a.

Impacted file tree graph

@@               Coverage Diff                @@
##           branch-21.12    #9425      +/-   ##
================================================
+ Coverage         10.79%   10.83%   +0.04%     
================================================
  Files               116      117       +1     
  Lines             18869    19442     +573     
================================================
+ Hits               2036     2106      +70     
- Misses            16833    17336     +503     
Impacted Files Coverage Δ
python/cudf/cudf/__init__.py 0.00% <ø> (ø)
python/cudf/cudf/_lib/__init__.py 0.00% <ø> (ø)
python/cudf/cudf/io/csv.py 0.00% <0.00%> (ø)
python/cudf/cudf/io/hdf.py 0.00% <0.00%> (ø)
python/cudf/cudf/io/orc.py 0.00% <0.00%> (ø)
python/cudf/cudf/_version.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/abc.py 0.00% <0.00%> (ø)
python/cudf/cudf/api/types.py 0.00% <0.00%> (ø)
python/cudf/cudf/io/dlpack.py 0.00% <0.00%> (ø)
... and 62 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 74763ba...29538f5. Read the comment docs.

@ttnghia
Copy link
Contributor Author
ttnghia commented Oct 18, 2021

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 823958b into rapidsai:branch-21.12 Oct 18, 2021
@ttnghia ttnghia deleted the stable_segmented_sort branch October 22, 2021 18:21
rapids-bot bot pushed a commit that referenced this pull request Nov 11, 2021
…ns (#9345)

This PR changes the interface of `lists::drop_list_duplicates` such that it may accept a second (optional) input `values` lists column, and returns a pairs of lists columns containing the results of copying the input column without duplicate entries.

If the optional `values` column is given, the users are responsible to have the keys-values columns having the same number of entries in each row. Otherwise, the results will be undefined.

When copying the key entries, the corresponding value entries are also copied at the same time. A parameter `duplicate_keep_option` reused from stream compaction is used to specify which duplicate keys will be copying.

This closes #9124, and blocked by #9425.

Authors:
  - Nghia Truong (https://github.com/ttnghia)

Approvers:
  - Jake Hemstad (https://github.com/jrhemstad)
  - https://github.com/nvdbaranec

URL: #9345
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change Spark Functionality that helps Spark RAPIDS
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0