8000 Add CI job with `pyarrow` strings turned on by jrbourbeau · Pull Request #10017 · dask/dask · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Add CI job with pyarrow strings turned on #10017

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Mar 3, 2023

Conversation

jrbourbeau
Copy link
Member

As part of #9946, this PR adds a new CI job that has the dataframe.convert_string config option turned on. Given there are still lots of known failures when dataframe.convert_string is enabled, I've marked this new CI job as an allowed failure (all other CI jobs are unchanged). That means, only for this new job, even if the test suite fails, we'll still get a green check mark. After all the test failures in this job have been fixed, we should remove the continue-on-error: true line for this job.

cc @j-bennet

Comment on lines +35 to +44
# Minimum dependencies
- os: "ubuntu-latest"
environment: "mindeps-array"
- os: "ubuntu-latest"
environment: "mindeps-dataframe"
- os: "ubuntu-latest"
environment: "mindeps-distributed"
- os: "ubuntu-latest"
environment: "mindeps-non-optional"
# Pyarrow strings turned on
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While adding this new job, I realized we can easily move our mindeps builds over to the main testing workflow (currently they're in a separate additional.yml workflow). This isn't strictly needed for adding the pyarrow job, but I thought I'd include it here anyways.

@jrbourbeau
Copy link
Member Author

Note in the pyarrow CI job logs there are lots of test failures

environment: "mindeps-non-optional"
# Pyarrow strings turned on
- os: "ubuntu-latest"
environment: "3.10"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not the latest (3.11)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3.11 has a few additional dependencies commented out (like numba and pyspark) because they don't have Python 3.11 support yet. I went with 3.10 here to make sure we have the broadest test coverage

Copy link
Contributor
@j-bennet j-bennet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While working on #10000, I found that a lot of tests that break with pyarrow strings and pandas 1.5, are fixed in pandas 2.0, I think in particular, because in 2.0 min reduction was added for ArrowStringArray. Since the work on extension dtypes is still ongoing in pandas, it might make more sense to use nightly panda builds for this workflow. Otherwise we'll keep looking at same old 1.5.3 issues.

@jrbourbeau
Copy link
Member Author

Sure, that makes sense. We should be installing all the nightly upstream packages in the pyarrow job now

Copy link
Contributor
@j-bennet j-bennet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! 🚀

@jrbourbeau jrbourbeau merged commit ca90a82 into dask:main Mar 3, 2023
@jrbourbeau jrbourbeau deleted the pyarrow-ci-job branch March 3, 2023 22:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0