-
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Add CI job with pyarrow
strings turned on
#10017
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
# Minimum dependencies | ||
- os: "ubuntu-latest" | ||
environment: "mindeps-array" | ||
- os: "ubuntu-latest" | ||
environment: "mindeps-dataframe" | ||
- os: "ubuntu-latest" | ||
environment: "mindeps-distributed" | ||
- os: "ubuntu-latest" | ||
environment: "mindeps-non-optional" | ||
# Pyarrow strings turned on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While adding this new job, I realized we can easily move our mindeps
builds over to the main testing workflow (currently they're in a separate additional.yml
workflow). This isn't strictly needed for adding the pyarrow job, but I thought I'd include it here anyways.
Note in the pyarrow CI job logs there are lots of test failures |
environment: "mindeps-non-optional" | ||
# Pyarrow strings turned on | ||
- os: "ubuntu-latest" | ||
environment: "3.10" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not the latest (3.11)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3.11 has a few additional dependencies commented out (like numba
and pyspark
) because they don't have Python 3.11 support yet. I went with 3.10 here to make sure we have the broadest test coverage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While working on #10000, I found that a lot of tests that break with pyarrow strings and pandas 1.5, are fixed in pandas 2.0, I think in particular, because in 2.0 min
reduction was added for ArrowStringArray
. Since the work on extension dtypes is still ongoing in pandas
, it might make more sense to use nightly panda builds for this workflow. Otherwise we'll keep looking at same old 1.5.3 issues.
Sure, that makes sense. We should be installing all the nightly upstream packages in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! 🚀
As part of #9946, this PR adds a new CI job that has the
dataframe.convert_string
config option turned on. Given there are still lots of known failures whendataframe.convert_string
is enabled, I've marked this new CI job as an allowed failure (all other CI jobs are unchanged). That means, only for this new job, even if the test suite fails, we'll still get a green check mark. After all the test failures in this job have been fixed, we should remove thecontinue-on-error: true
line for this job.cc @j-bennet