8000 Move `drop_duplicates`, `drop_na`, `_gather`, `take` to IndexFrame and create their `_base_index` counterparts by isVoid · Pull Request #9807 · rapidsai/cudf · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Move drop_duplicates, drop_na, _gather, take to IndexFrame and create their _base_index counterparts #9807

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 55 commits into from
Jan 19, 2022

Conversation

isVoid
Copy link
Contributor
@isVoid isVoid commented Dec 1, 2021

This PR is a follow up of #9558 (Part 1 of 3)

One remaining problem from #9558 is that Frame is index agnostic, however the above functions needs to perform index-aware operations when building the list of columns to pass to libcudf. For example, to remove duplicates of BaseIndex, it should only construct the list with all its columns. But in a dataframe, it would need to pass in all data columns plus the index columns, while specifying the indices of the data columns to consider duplicates. This complicates for _gather which supports keep_index argument. This PR moves aforementioned functions to IndexedFrames, and create its counterparts in _base_index.

A couple noteworthy changes:

  • Merge object added with two new arguments l(r)hs_is_index
  • DataFrame/Series.take keep_index argument is removed. For internal usage it's more advised to use _gather. (And thus this PR is labeled breaking)

isVoid and others added 30 commits October 21, 2021 15:49
Co-authored-by: Bradley Dice <bdice@bradleydice.com>
Co-authored-by: Bradley Dice <bdice@bradleydice.com>
@codecov
Copy link
codecov bot commented Dec 1, 2021

Codecov Report

Merging #9807 (a9b9c4c) into branch-22.02 (967a333) will decrease coverage by 0.08%.
The diff coverage is n/a.

❗ Current head a9b9c4c differs from pull request most recent head 1f41061. Consider uploading reports for the commit 1f41061 to get more accurate results
Impacted file tree graph

@@               Coverage Diff                @@
##           branch-22.02    #9807      +/-   ##
================================================
- Coverage         10.49%   10.40%   -0.09%     
================================================
  Files               119      119              
  Lines             20305    20556     +251     
================================================
+ Hits               2130     2139       +9     
- Misses            18175    18417     +242     
Impacted Files Coverage Δ
python/custreamz/custreamz/kafka.py 29.16% <0.00%> (-0.63%) ⬇️
python/dask_cudf/dask_cudf/sorting.py 92.66% <0.00%> (-0.25%) ⬇️
python/dask_cudf/dask_cudf/core.py 70.85% <0.00%> (-0.17%) ⬇️
python/cudf/cudf/__init__.py 0.00% <0.00%> (ø)
python/cudf/cudf/api/types.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/frame.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/index.py 0.00% <0.00%> (ø)
python/cudf/cudf/io/parquet.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/dtypes.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/scalar.py 0.00% <0.00%> (ø)
... and 31 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d0c85e1...1f41061. Read the comment docs.

Copy link
Contributor
@vyasr vyasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't check the code in IndexedFrame too carefully

isVoid and others added 2 commits December 6, 2021 08:18
Co-authored-by: Vyas Ramasubramani <vyas.ramasubramani@gmail.com>
@isVoid isVoid requested a review from vyasr December 6, 2021 21:17
Copy link
Contributor
@vyasr vyasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few small changes left.

Co-authored-by: Vyas Ramasubramani <vyas.ramasubramani@gmail.com>
Copy link
Contributor
@brandon-b-miller brandon-b-miller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

connected offline with @isVoid and chatted about this a bit. This LGTM.

@isVoid isVoid added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 3 - Ready for Review Ready for review by team labels Dec 18, 2021
@isVoid
Copy link
Contributor Author
isVoid commented Jan 14, 2022

@vyasr @brandon-b-miller 2 changes since your approval:

@isVoid
Copy link
Contributor Author
isVoid commented Jan 18, 2022

Gained approval from both reviewers offline. Rerun tests.

@galipremsagar
Copy link
Contributor

rerun tests

1 similar comment
@isVoid
Copy link
Contributor Author
isVoid commented Jan 18, 2022

rerun tests

@galipremsagar
Copy link
Contributor

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 8fd7dd2 into rapidsai:branch-22.02 Jan 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge breaking Breaking change improvement Improvement / enhancement to an existing function Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0