8000 nonlocal device driver by dmahurin · Pull Request #20620 · iree-org/iree · GitHub

More Web Proxy on the site http://driver.im/

nonlocal device driver #20620

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

dmahurin wants to merge 49 commits into iree-org:main from dmahurin:nonlocal

dmahurin commented

This pull request creates a new set of drivers: nonlocal-sync and nonlocal-task.

These device drivers execute LLVM-cpu compiled MLIR, but run on a nonlocal processor.
The example communication with the nonlocal processor uses simple marshaling over TCP, but the channel could be something else.

The nonlocal implementation includes buffer and allocator derived from the CUDA driver, changed to instead use a "NL API" for such calls.

This allows running on a system which processes connected through some channel.
This approach changes only the runtime, not the compiler.

I would interested in comments as to whether this approach is reasonable, or if there are alternative approaches to running on non-local processors, and whether other approaches following closer to a GPU model exist.

dmahurin added 30 commits

April 4, 2025 09:31


          driver copied from iree local-sync

271af67

(commit date Wed Oct 16 13:02:19)

Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>


          Change local-sync to nonlocal-sync

c97882d

Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>


          copy cuda allocator/buffer from iree ( Wed Oct 16 13:02:19 2024 +0200 )

c3f93a6

Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>


          malloc implementation of basic cuda-like functions

70abb32

Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>


          Change cuda to nl in allocator and buffer

4c0f94a

Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>


          Use custom allocator and buffer. Replace heap device allocator.

b7cd962

Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>


          remove allocator mapping and unused flag conditions

f904ea2

removes read_only_host_register and supports_concurrent_managed_access checks.

Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>


          remove creation of heap allocator in driver_module.c.

c2bc661

The heap allocator was not used, and is replaced in device.c

Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>


          changes from allocator_heap

9c9ef48

Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>


          compile nonlocal-sync as plugin

af0f9fc

Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>


          Allow compatibility for compile backends targeting "local" devices.

16218f6

This this allows llvm-cpu compiler backends to work nonlocal-sync.

Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>


          only use embedded-elf loader

24aff12

Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>


          copy inline command buffer from iree hal local

f88006e

commit date: Wed Oct 16 13:02:19 2024 +0200

Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>


          iree_hal_inline_command_buffer => iree_hal_nonlocal_inline_command_bu…

845668f

…ffer

Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>


          local_executable and embedded_elf_loader from iree runtime hal local

90e9108

commit date: Wed Oct 16 13:02:19 2024 +0200

Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>


          replace iree_hal_local prefix with iree_hal_nonlocal

768617f

Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>


          use nl executable loader.

7b18ecd

Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>


          add elf loading, initialization and execution into nl_api

2e53bab

Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>


          Pass dispatch attributes using nl api.

58dc73e

Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>


          Replace command buffer memory map copy with memcpy using device and h…

7827e49

…ost pointers

Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>


          remove device buffer map during dispatch

55f70a1

Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>


          only support allocations of device visible memmory (not host).

d19a4b2

Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>


          replace iree_hal_buffer_map_write (which maps memory) with memcpy

8c7dba1

Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>


          set IREE_HAL_BUFFER_COMPATIBILITY_LOW_PERFORMANCE, which will cause m…

be15d04

…em map not to be used in iree/runtime/src/iree/hal/buffer.c

Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>


          remove unused host-registered and async type buffers

dcbd4b9

remove unused register, unregister, prefetch, get_device_pointer functions.

Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>


          client/server: add copy and dispatch functions

8f8eb40

Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>


          use nl memory copy functions

d1f7a90

Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>


          elf module loading server

f8cc49a

Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>


          change command_data read to recv. Use larger server read buffer.

c62fece

Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>


          from iree runtime/src/iree/hal/loccl/elf/elf_module_test_main.c

e3c35c4

Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>

dmahurin added 19 commits

April 4, 2025 09:31


          correct binding_count bug, that also exists in example.

0c3d288

Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>


          elf module client as nl_api implementation

Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>


          elf client: update to use nl api

528387b

Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>


          Changes to adapt nonlocal plugin for direct compile.

51ddab9

Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>


          command data test client

43f5c32

Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>


          use device memory for elf client test

32f1089

Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>


          just use NULL environment for executable load.

c53ba96

Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>


          disallow mapping of device memory.

cd62d92

host mapping is still allowed, though the implementation is pass-through and not mapped

Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>


          Add debug and debug wrapper.

d253954

Add debug, including binding debug and called function debug.

Only have debug prints (DEBUG_PRINTF) when DEBUG is set.

Create NL_DEBUG definition to control debug output.

0 - no extra debug
1 - memory copy and operation calls
2 - binding data

Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>


          copy iree/runtime/src/iree/hal/drivers/local_task ( Wed Oct 16 13:02:…

fa2ffdd

…19 2024 +0200 )

Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>


          iree_hal_task => iree_hal_nl_task

9d75b02

Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>


          Adjust for iree=>nl changes

1bd7802

Add nonlocal-task to plugin build

Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>


          move device allocator creation from driver registration to device.c

c9e3682

Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>


          Change nonlocal-task to use nl loader.

92f68cd

Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>


          Changes to adapt nonlocal plugin for direct compile.

a85eb16

Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>


          Update to avoid using memory map for memory reference, copy, and write.

5baa3d4

Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>


          adjust to iree upstream changes (to Mon Dec 9 17:21:45 2024 -0800)

63b4dba

Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>


          Changes due to iree changes. Up to iree commit: iree changes to Thu J…

9a55b77

…an 2 11:54:22 2025 -0800

Changes from:
- hal/local => hal/nonlocal
- hal/drivers/cuda => hal/nonlocal
- hal/drivers/local-task => drivers/nonlocal-task
- hal/drivers/local-sync => drivers/nonlocal-sync

Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>


          README with build and run instructions

1f1300b

Signed-off-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>

dmahurin requested a review from benvanik as a code owner

April 23, 2025 21:55

Contributor

nirvedhmeshram commented

Hi, I am not sure, we will want to add a new backend directly like this (although I am no authority on that). You might want to consider starting a plugin project for this backend. Here is an example, https://github.com/nod-ai/iree-amd-aie

dmahurin force-pushed the nonlocal branch from c2207db to 1f1300b Compare

April 24, 2025 16:32

Author

dmahurin commented

Hi, I am not sure, we will want to add a new backend directly like this (although I am no authority on that). You might want to consider starting a plugin project for this backend. Here is an example, https://github.com/nod-ai/iree-amd-aie

Hi @nirvedhmeshram,

Along with this PR into IREE, I also created the same changes to compile these nonlocal drivers as a plugin (keeping the same source tree structure).

https://github.com/dmahurin/iree-nonlocal-plugins/

I was hoping there was interest in merging into the main tree, but keeping external could be ok as well.

I am mainly looking for feedback on this approach of running on non-local processors, (running the llvm-cpu elf's as is, over a channel). Or is there another way that could also work.

benvanik requested changes

View reviewed changes

Collaborator

benvanik left a comment

Hello! This is a large change and is the kind of thing we really encourage communications on before undertaking (see the note at https://iree.dev/developers/general/contributing/). It's really cool that you've been able to get this all working but it's not clear to me that we want this in-tree. The intent behind a remote interface (vs local) is that it'd hide the target HAL implementation behind the transport such that there would be no non-local-sync/non-local-task, just a remote-tcp/shm/etc that on the other side could be local-sync, local-task, CUDA, Vulkan, etc in a hosting process/sandbox.

I encourage you to carry this in your own repo by adding it in as an external HAL driver via IREE_EXTERNAL_HAL_DRIVERS. If there's interest in landing it upstream here we'd want to do some design reviews and those are usually best done after something has been implemented, experimented with, and analyzed. There's a lot of fun design around transport interfaces, high-performance IO, etc that would be important for such an undertaking to ensure it is viable - otherwise, the best way to use IREE is to run it locally and marshal the less chatty/expensive invocation calls.

Collaborator

benvanik commented

Ah just saw your response :) Yes, your plugin looks like a great place for this for now. As mentioned: big kudos to making this work! Outside of an old proof of concept we had many years ago this is the first remoting example and thank you for sharing it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

0