8000 [BUG] Series group shift is not being performed on the correct columns · Issue #9969 · rapidsai/cudf · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
[BUG] Series group shift is not being performed on the correct columns #9969
Closed
@galipremsagar

Description

@galipremsagar

Describe the bug
While performing operations like shift on a SeriesGroupBy objects, the correct columns are not being picked up.

Steps/Code to reproduce bug
tmp.csv

>>> import cudf
>>> df = cudf.read_csv("tmp.csv")
>>> df
     Unnamed: 0  Unnamed: 0.1  days_from_start  hours_from_start  categorical_id  power_usage  hour  day_of_week  _id_
0           120          5664             1332          2.001283               1     3.313606     1            1     0
1           409          5665             1332          2.001942               1     3.160683     2            1     0
2          1004          5666             1332          2.002601               1     3.160683     3            1     0
3          1356          5667             1332          2.003260               1     3.313606     4            1     0
4          1767          5668             1332          2.003920               1     3.237144     5            1     0
..          ...           ...              ...               ...             ...          ...   ...          ...   ...
331      122219          5995             1345          2.219437               1     3.237144    20            7     0
332      122699          5996             1345          2.220096               1     0.025749    21            7     0
333      122910          5997             1345          2.220755               1     1.325600    22            7     0
334      123417          5998             1345          2.221415               1     3.237144    23            7     0
335      123737          5999             1345          2.222074               1     3.237144    24            7     0

[336 rows x 9 columns]
>>> df.groupby("_id_")['power_usage'].shift(0)
/nvme/0/pgali/envs/cudfdev/lib/python3.8/site-packages/cudf/core/frame.py:2986: FutureWarning: keep_index is deprecated and will be removed in the future.
  warnings.warn(
0         120
1         409
2        1004
3        1356
4        1767
        ...  
331    122219
332    122699
333    122910
334    123417
335    123737
Name: Unnamed: 0, Length: 336, dtype: int64

>>> import pandas as pd
>>> pdf = pd.read_csv("tmp.csv")
>>> pdf.groupby("_id_")['power_usage'].shift(0)
0      3.313606
1      3.160683
2      3.160683
3      3.313606
4      3.237144
         ...   
331    3.237144
332    0.025749
333    1.325600
334    3.237144
335    3.237144
Name: power_usage, Length: 336, dtype: float64

Expected behavior
shift should happen on the correct column similar to pandas.

Environment overview (please complete the following information)

  • Environment location: [Bare-metal, Docker, Cloud(specify cloud provider)]
  • Method of cuDF install: [conda, Docker, or from source]
    • If method of install is [Docker], provide docker pull & docker run commands used

Environment details
Please run and paste the output of the cudf/print_env.sh script here, to gather any other relevant environment details

Additional context
Add any other context about the problem here.

Metadata

Metadata

Assignees

Labels

PythonAffects Python cuDF API.bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions

    0