Ensure divisions are plain scalars #11767
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This ensures that the
.divisions
of a Dask DataFrame are plain python scalars, rather than NumPy scalars.This does involve a
tolist
on a potentially large-ish (but small enough to fit in memory) ndarray / series. Here's a little pytest-benchmark script to make sure we don't have a huge slowdownI ran that on
main
and this branch with--benchmark-compre
and here's the output. I think the way to read this is comparing theNOW
row (this PR) to0001_main
(main
) for each benchmark.There is a small performance hit for the
array_large
workloads (about23us -> 35us
). I think thedatetime_large
speedups might be frompandas.Index.__getitem__[scalar]
being relatively slow compared to Python lists, so repeated getitems add up, but I haven't verified that. Either way, these numbers are small enough in absolute magnitude to not matter.Note: yesterday while looking at this I thought I hit another failure, related to indexing. But I'm not able to find / reproduce that that today.
DataFrame.divisions
can be created in several ways depending on the code path being exercised, so I wouldn't be surprised if I missed some.Closes #11765