-
Notifications
You must be signed in to change notification settings - Fork 701
nonlocal device driver #20620
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
nonlocal device driver #20620
Conversation
(commit date Wed Oct 16 13:02:19) Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
removes read_only_host_register and supports_concurrent_managed_access checks. Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
The heap allocator was not used, and is replaced in device.c Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
This this allows llvm-cpu compiler backends to work nonlocal-sync. Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
commit date: Wed Oct 16 13:02:19 2024 +0200 Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
…ffer Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
commit date: Wed Oct 16 13:02:19 2024 +0200 Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
…ost pointers Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
…em map not to be used in iree/runtime/src/iree/hal/buffer.c Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
remove unused register, unregister, prefetch, get_device_pointer functions. Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
host mapping is still allowed, though the implementation is pass-through and not mapped Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
Add debug, including binding debug and called function debug. Only have debug prints (DEBUG_PRINTF) when DEBUG is set. Create NL_DEBUG definition to control debug output. 0 - no extra debug 1 - memory copy and operation calls 2 - binding data Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
…19 2024 +0200 ) Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
Add nonlocal-task to plugin build Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
…an 2 11:54:22 2025 -0800 Changes from: - hal/local => hal/nonlocal - hal/drivers/cuda => hal/nonlocal - hal/drivers/local-task => drivers/nonlocal-task - hal/drivers/local-sync => drivers/nonlocal-sync Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>
Hi, I am not sure, we will want to add a new backend directly like this (although I am no authority on that). You might want to consider starting a plugin project for this backend. Here is an example, https://github.com/nod-ai/iree-amd-aie |
Hi @nirvedhmeshram, Along with this PR into IREE, I also created the same changes to compile these nonlocal drivers as a plugin (keeping the same source tree structure). https://github.com/dmahurin/iree-nonlocal-plugins/ I was hoping there was interest in merging into the main tree, but keeping external could be ok as well. I am mainly looking for feedback on this approach of running on non-local processors, (running the llvm-cpu elf's as is, over a channel). Or is there another way that could also work. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello! This is a large change and is the kind of thing we really encourage communications on before undertaking (see the note at https://iree.dev/developers/general/contributing/). It's really cool that you've been able to get this all working but it's not clear to me that we want this in-tree. The intent behind a remote interface (vs local) is that it'd hide the target HAL implementation behind the transport such that there would be no non-local-sync/non-local-task, just a remote-tcp/shm/etc that on the other side could be local-sync, local-task, CUDA, Vulkan, etc in a hosting process/sandbox.
I encourage you to carry this in your own repo by adding it in as an external HAL driver via IREE_EXTERNAL_HAL_DRIVERS
. If there's interest in landing it upstream here we'd want to do some design reviews and those are usually best done after something has been implemented, experimented with, and analyzed. There's a lot of fun design around transport interfaces, high-performance IO, etc that would be important for such an undertaking to ensure it is viable - otherwise, the best way to use IREE is to run it locally and marshal the less chatty/expensive invocation calls.
Ah just saw your response :) Yes, your plugin looks like a great place for this for now. As mentioned: big kudos to making this work! Outside of an old proof of concept we had many years ago this is the first remoting example and thank you for sharing it! |
This pull request creates a new set of drivers: nonlocal-sync and nonlocal-task.
These device drivers execute LLVM-cpu compiled MLIR, but run on a nonlocal processor.
The example communication with the nonlocal processor uses simple marshaling over TCP, but the channel could be something else.
The nonlocal implementation includes buffer and allocator derived from the CUDA driver, changed to instead use a "NL API" for such calls.
This allows running on a system which processes connected through some channel.
This approach changes only the runtime, not the compiler.
I would interested in comments as to whether this approach is reasonable, or if there are alternative approaches to running on non-local processors, and whether other approaches following closer to a GPU model exist.