8000 nonlocal device driver by dmahurin · Pull Request #20620 · iree-org/iree · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

nonlocal device driver #20620

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 49 commits into
base: main
Choose a base branch
from
Open

nonlocal device driver #20620

wants to merge 49 commits into from

Conversation

dmahurin
Copy link

This pull request creates a new set of drivers: nonlocal-sync and nonlocal-task.

These device drivers execute LLVM-cpu compiled MLIR, but run on a nonlocal processor.
The example communication with the nonlocal processor uses simple marshaling over TCP, but the channel could be something else.

The nonlocal implementation includes buffer and allocator derived from the CUDA driver, changed to instead use a "NL API" for such calls.

This allows running on a system which processes connected through some channel.
This approach changes only the runtime, not the compiler.

I would interested in comments as to whether this approach is reasonable, or if there are alternative approaches to running on non-local processors, and whether other approaches following closer to a GPU model exist.

dmahurin added 30 commits April 4, 2025 09:31
(commit date Wed Oct 16 13:02:19)

Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
removes read_only_host_register and supports_concurrent_managed_access checks.

Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
The heap allocator was not used, and is replaced in device.c

Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
This this allows llvm-cpu compiler backends to work nonlocal-sync.

Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
commit date: Wed Oct 16 13:02:19 2024 +0200

Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
…ffer

Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
commit date: Wed Oct 16 13:02:19 2024 +0200

Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
…ost pointers

Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
…em map not to be used in iree/runtime/src/iree/hal/buffer.c

Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
remove unused register, unregister, prefetch, get_device_pointer functions.

Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
dmahurin added 19 commits April 4, 2025 09:31
Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
host mapping is still allowed, though the implementation is pass-through and not mapped

Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
Add debug, including binding debug and called function debug.

Only have debug prints (DEBUG_PRINTF) when DEBUG is set.

Create NL_DEBUG definition to control debug output.

0 - no extra debug
1 - memory copy and operation calls
2 - binding data

Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
…19 2024 +0200 )

Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
Add nonlocal-task to plugin build

Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
…an 2 11:54:22 2025 -0800

Changes from:
- hal/local => hal/nonlocal
- hal/drivers/cuda => hal/nonlocal
- hal/drivers/local-task => drivers/nonlocal-task
- hal/drivers/local-sync => drivers/nonlocal-sync

Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
@dmahurin dmahurin requested a review from benvanik as a code owner April 23, 2025 21:55
@nirvedhmeshram
Copy link
Contributor

Hi, I am not sure, we will want to add a new backend directly like this (although I am no authority on that). You might want to consider starting a plugin project for this backend. Here is an example, https://github.com/nod-ai/iree-amd-aie

@dmahurin
Copy link
Author

Hi, I am not sure, we will want to add a new backend directly like this (although I am no authority on that). You might want to consider starting a plugin project for this backend. Here is an example, https://github.com/nod-ai/iree-amd-aie

Hi @nirvedhmeshram,

Along with this PR into IREE, I also created the same changes to compile these nonlocal drivers as a plugin (keeping the same source tree structure).

https://github.com/dmahurin/iree-nonlocal-plugins/

I was hoping there was interest in merging into the main tree, but keeping external could be ok as well.

I am mainly looking for feedback on this approach of running on non-local processors, (running the llvm-cpu elf's as is, over a channel). Or is there another way that could also work.

Copy link
Collaborator
@benvanik benvanik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello! This is a large change and is the kind of thing we really encourage communications on before undertaking (see the note at https://iree.dev/developers/general/contributing/). It's really cool that you've been able to get this all working but it's not clear to me that we want this in-tree. The intent behind a remote interface (vs local) is that it'd hide the target HAL implementation behind the transport such that there would be no non-local-sync/non-local-task, just a remote-tcp/shm/etc that on the other side could be local-sync, local-task, CUDA, Vulkan, etc in a hosting process/sandbox.

I encourage you to carry this in your own repo by adding it in as an external HAL driver via IREE_EXTERNAL_HAL_DRIVERS. If there's interest in landing it upstream here we'd want to do some design reviews and those are usually best done after something has been implemented, experimented with, and analyzed. There's a lot of fun design around transport interfaces, high-performance IO, etc that would be important for such an undertaking to ensure it is viable - otherwise, the best way to use IREE is to run it locally and marshal the less chatty/expensive invocation calls.

@benvanik
Copy link
Collaborator

Ah just saw your response :) Yes, your plugin looks like a great place for this for now. As mentioned: big kudos to making this work! Outside of an old proof of concept we had many years ago this is the first remoting example and thank you for sharing it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0