-
Notifications
You must be signed in to change notification settings - Fork 703
[Stream] Assign resource affinity based on usage (not just assigned execution affinity) with multiple potential affinities. #20855
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Labels
compiler/dialects
Relating to the IREE compiler dialects (flow, hal, vm)
Comments
benvanik
added a commit
that referenced
this issue
May 22, 2025
benvanik
added a commit
that referenced
this issue
May 22, 2025
…0879) This is an experimental approach for representing a set of device affinities that are able to be resolved at runtime. Currently this is limited to allocation-related ops but may be extended in the future to allow for runtime-provided device rankings based on benchmarked performance. The new `#hal.device.optimal<...>` affinity attribute represents a set of affinities that an operation is able to execute with. The expectation is that the exact affinity can be resolved during compilation if sufficient information is available (TBD) or at runtime based on a user-provided policy (`iree_hal_module_device_policy_t`). As with other treatments of device topology in the runtime this policy is left to the hosting framework or application to decide how they want to configure their overall topology. The common IREE command line tooling adds a `--device_lead_allocator=N` flag to let the user denote which device ordinal (of `--device=` flags) is to be chosen when a `#hal.device.optimal` request makes it all the way to runtime. For simple cases of CPU+accelerator this allows for hack-free ways to indicate the lead allocator that is responsible for any buffer allocation that is used on multiple devices. The `#hal.device.optimal<...>` attr lowers to a `hal.allocator.select` op/runtime call that takes a list of devices/queue masks along with the memory types and buffer usage of the request and returns the selected device/queue mask. Size is not included such that the selection can be memoized across an entire program at initialization time. A pass is added that runs at the same time as device query memoization to hoist the selection into globals. Future changes can enhance the pass to fold the selection when sufficient information has been provided. Future work on #20855 will result in affinity assignment on resource allocations lowering into this new `#hal.device.optimal<...>` attr and deallocations switching to either use the recently added `origin` attr or the same optimal attr (given that it should be consistent and provides more information). Future work on #20856 will ensure that allocations that may be used on multiple devices (regardless of whether runtime-selected) get the appropriate bits set but that may require #20854. Progress on #20851. Progress on #20855 (analysis will lower into this attr). Progress on #20856 (will be used to set bits). Fixes #20857.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Near the end of the
iree-stream-cmd-transformation-pipeline
we have the total set of allocations and their usage in the program. A newRefineResourceAffinitiesPass
that performed an affinity analysis per allocation to find all usage should reschedule the allocations to be performed on the optimal device (#20857). Since the selection logic is deterministic at runtime all deallocations for such allocations should also be changed. This could happen before ARC (where there may be some deallocations) or after (where we have them all).The affinity attr
joinOR
method can be used to join the affinities and produce the final set which may be the same device but with multiple queues or different devices turned into a#hal.device.optimal<...>
affinity.The text was updated successfully, but these errors were encountered: