-
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Remove legacy DataFrame implementation #11606
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…mentation # Conflicts: # dask/dataframe/__init__.py
…e-legacy-implementation # Conflicts: # dask/dataframe/io/tests/test_orc.py
Unit Test ResultsSee test report for an extended history of previous test failures. This is useful for diagnosing flaky tests. 15 files ± 0 15 suites ±0 3h 59m 0s ⏱️ + 1m 31s Results for commit 4bd27f3. ± Comparison against base commit efd159b. This pull request removes 377 and adds 7 tests. Note that renamed tests count towards both.
This pull request skips 405 tests.
♻️ This comment has been updated with latest results. |
…mentation # Conflicts: # dask/dataframe/io/io.py # dask/dataframe/tests/test_dataframe.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @phofl. LGM as long as CI is green. One nit, the rest are reminders to revert changes to the linked Github repos.
@@ -11,6 +11,9 @@ test_import () { | |||
# dask[distributed] depends on the latest version of distributed | |||
python -m pip install git+https://github.com/dask/distributed | |||
fi | |||
if [[ $1 =~ "pandas dask-expr" ]]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: Revert these changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That one needs to stay, otherwise we get the latest release which will fail
use_dask_expr = dask.config.get("dataframe.query-planning") | ||
if use_dask_expr is False: | ||
raise NotImplementedError("The legacy implementation is no longer supported") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: ValueError
might be more applicable here.
Wow, this is a very big PR! Hopefully it wont take too long for us to find/fix all the broken edges in rapids :) Thanks for working on this @phofl - I know it's challenging to do a migration like this. |
Pin to Dask 2024.12.1, since dask/dask#11606 broke RAPIDS. Authors: - Peter Andreas Entschev (https://github.com/pentschev) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) - Richard (Rick) Zamora (https://github.com/rjzamora) - https://github.com/jakirkham URL: #77
@rjzamora any chance you can provide a list of things you import from dask.dataframe.foo so that we can think if providing a proper downstream packages API makes sense? I suspect that you are also importing stuff from dask_expr, which will go away soon and thus require another update on your end? Putting all of this in some folder that we know that we shouldn't move in the future would make things easier |
Yes, I can definitely put this together. Thanks for the suggestion!
Yes, we definitely import (and patch) a lot of stuff from |
Just following up... I compiled a rough list of relevant imports within Dask cuDF and Dask-CUDA: Imports from Dask DataFrame
I don't think any of these One problem caused by this PR is that we have been using Imports from Dask Expressions
The |
So dispatch and backends is exposed anyway, so this should be fine. For everything else, I'd rather not guarantee that the path doesn't change. I added an api.py file in dask.dataframe that we can use to expose the stuff that other libraries need (underscore functions should probably be renamed to something without underscores and get docstrings if not already there). Can you move the stuff over and put up a PR when you are adapting stuff over in dask-cudf? |
A deprecation cycle or advance warning would have been nice. dask-awkward and dask-histogram are currently broken. |
The legacy implementation was deprecated in 2024.11.0 |
pre-commit run --all-files
TODOs: