[TE] feat: support send/recv API for tensor transfer #472

stmatengss · 2025-06-11T09:49:12Z

No description provided.

Copilot

Pull Request Overview

This PR introduces support for tensor transfers through new send/recv APIs and adds integration tests to validate sender/receiver functionality. Key changes include:

A new shell script update to run transfer engine tests.
A comprehensive integration test (test_transfer_engine_exp.py) for sender/receiver communication.
New API implementations in transfer_engine_exp.py to enable tensor transfers via a TransferEngine.

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File	Description
scripts/run_tests.sh	Added pip installation and test invocation for transfer engine tests
mooncake-wheel/tests/test_transfer_engine_exp.py	Added an integration test covering multiple sender/receiver usage patterns
mooncake-wheel/mooncake/transfer_engine_exp.py	Introduced new classes and convenience functions to enable tensor transfers

Comments suppressed due to low confidence (1)

mooncake-wheel/tests/test_transfer_engine_exp.py:166

The test currently allows a pass without data reception, which may mask integration issues; consider adding explicit assertions for data availability when the network and TransferEngine are configured properly.

print("Note: This is expected if TransferEngine is not available or network is not configured")

Copilot · 2025-06-12T02:34:11Z

mooncake-wheel/mooncake/transfer_engine_exp.py

+    def __del__(self):
+        """Cleanup"""


Using del for resource cleanup can be unreliable due to unpredictable garbage collection; consider implementing a context manager to ensure timely and explicit cleanup of resources.

Suggested change

def __del__(self):

"""Cleanup"""

def __enter__(self):

"""Enter the runtime context related to this object."""

return self

def __exit__(self, exc_type, exc_value, traceback):

"""Cleanup resources when exiting the context."""

Copilot · 2025-06-12T02:34:11Z

scripts/run_tests.sh

@@ -10,6 +10,10 @@ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib

 echo "Running transfer_engine tests..."
 cd mooncake-wheel/tests
+
+pip install torch numpy zmq


[nitpick] Installing dependencies during test execution can slow down the test run; consider managing these dependencies via a requirements file or pre-configured environment.

xiaguan · 2025-06-12T06:12:21Z

Code looks good to me! Just need to get the CI passing before we can merge

stmatengss · 2025-06-12T06:25:22Z

TODO:

Reduce serialization overhead
Should replace zmq with a higher-performance notification mechanism.
one-to-many send, like broadcast.

[TE] feat: support send/recv API for tensor transfer

21d5a76

stmatengss requested a review from xiaguan June 11, 2025 09:49

stmatengss added 3 commits June 11, 2025 18:50

install torch for testing

2159fb4

install numpy for testing

6393690

install zmq for testing

c1e8d64

stmatengss requested a review from Copilot June 12, 2025 02:33

Copilot AI reviewed Jun 12, 2025

View reviewed changes

add example and fix bugs

299bb2b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[TE] feat: support send/recv API for tensor transfer #472

[TE] feat: support send/recv API for tensor transfer #472

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

-    def __del__(self):
-        """Cleanup"""
+    def __enter__(self):
+        """Enter the runtime context related to this object."""
+        return self
+    def __exit__(self, exc_type, exc_value, traceback):
+        """Cleanup resources when exiting the context."""

[TE] feat: support send/recv API for tensor transfer #472

Are you sure you want to change the base?

[TE] feat: support send/recv API for tensor transfer #472

Uh oh!

Conversation

Uh oh!

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!