8000 Index Query Hangs · Issue #11815 · dask/dask · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
Index Query Hangs #11815
Open
Open
@mscanlon-exos

Description

@mscanlon-exos

Describe the issue:
After setting index to timestamp, some loc based query works but string based querying causes the operation to hang unless we call optimize first

Minimal Complete Verifiable Example:

import dask.dataframe as dd
import pandas as pd
import random

def test_df() -> dd.DataFrame:
    dfs = []
    start_date = '2024-01-01'
    end_date = '2024-01-31'
    for num_rows in [2, 5, 10]:
        df = pd.DataFrame(
            {
                'timestamp': pd.to_datetime(
                    pd.date_range(start_date, end_date, periods=num_rows),
                ),
                'value1': random.choices(range(-20, 20), k=num_rows),
                'value2': random.choices(range(-1000, 1000), k=num_rows),
            },
        )
        dfs.append(
            dd.from_pandas(df, npartitions=1),
        )
    return dd.concat(dfs)

df = test_df()
df = df.set_index('timestamp', npartitions=df.npartitions)

# df = df.optimize()
df.loc[df.index > '2024-01-15'].compute() 

Anything else we need to know?:

Environment:

  • Dask version: 2025.2.0
  • Python version: 3.10
  • Operating System: Mac Os and Linux Tested
  • Install method (conda, pip, source): Pip

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething is brokendataframeneeds attentionIt's been a while since this was pushed on. Needs attention from the owner or a maintainer.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0