8000 Implement unify_chunks and Rechunk by phofl · Pull Request #11692 · dask/dask · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Implement unify_chunks and Rechunk #11692

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 25 commits into from
Feb 19, 2025
Merged

Implement unify_chunks and Rechunk #11692

merged 25 commits into from
Feb 19, 2025

Conversation

phofl
Copy link
Collaborator
@phofl phofl commented Jan 23, 2025
  • Closes #xxxx
  • Tests added / passed
  • Passes pre-commit run --all-files

sits on top of #11689

Copy link
Contributor
github-actions bot commented Jan 23, 2025

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

     15 files  +     15       15 suites  +15   4h 13m 9s ⏱️ + 4h 13m 9s
 17 837 tests + 17 837   16 568 ✅ + 16 568   1 269 💤 + 1 269  0 ❌ ±0 
216 726 runs  +216 726  185 459 ✅ +185 459  31 267 💤 +31 267  0 ❌ ±0 

Results for commit c1c37af. ± Comparison against base commit 430a951.

♻️ This comment has been updated with latest results.

phofl added 2 commits January 23, 2025 16:17
# Conflicts:
#	dask/array/__init__.py
#	dask/array/_array_expr/__init__.py
#	dask/array/_array_expr/_blockwise.py
#	dask/array/_array_expr/_collection.py
#	dask/array/_array_expr/_expr.py
#	dask/array/_array_expr/tests/test_collection.py
)
from dask.utils import apply, deepmap, derived_from

if da._array_expr_enabled():
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This allows us to keep the actual implementations the same and just switch he imports around. Saves a lot of code duplication

Comment on lines +236 to +259
for _ in range(depth - 1):
x = PartialReduce(
x,
func,
split_every,
True,
dtype=dtype,
name=(name or funcname(combine or aggregate)) + "-partial",
reduced_meta=reduced_meta,
)
func = partial(aggregate, axis=axis, keepdims=keepdims)
if concatenate:
func = compose(func, partial(_concatenate2, axes=sorted(axis)))
return new_collection(
PartialReduce(
x,
func,
split_every,
keepdims=keepdims,
dtype=dtype,
name=(name or funcname(aggregate)) + "-aggregate",
reduced_meta=reduced_meta,
)
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes me wonder if it shouldn't be a single TreeReduce expression. Is there (from an expression POV) any value in using the PartialReduce?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is what you're calling out with the top level comment about ACA. Just want to double check

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, correct. This should be one but that makes migration harder. The whole reduction should probably be a single expression.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this entire module is just moved code, correct?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, sorry for not highlighting this better


def _compute_rechunk(old_name, old_chunks, chunks, level, name):
"""Compute the rechunk of *x* to the given *chunks*."""
# TODO: redo this logic
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you mean to redo the entire function or just the commented out code?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's not new but copied from the legacy implementation



def unify_chunks_expr(*args):
# TODO(expr): This should probably be a dedicated expression
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason why you chose not to inroduce the expression right away?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not completely sure how this would interact with blockwise unfortunately. I'd rather do this when we have a passing test suite

@phofl phofl merged commit 979e577 into dask:main Feb 19, 2025
25 of 27 checks passed
@phofl phofl deleted the array-expr3 branch February 19, 2025 12:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0