8000 [TE] feat: support send/recv API for tensor transfer by stmatengss · Pull Request #472 · kvcache-ai/Mooncake · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

[TE] feat: support send/recv API for tensor transfer #472

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

stmatengss
Copy link
Collaborator

No description provided.

@stmatengss stmatengss requested a review from xiaguan June 11, 2025 09:49
@stmatengss stmatengss requested a review from Copilot June 12, 2025 02:33
Copy link
Contributor
@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces support for tensor transfers through new send/recv APIs and adds integration tests to validate sender/receiver functionality. Key changes include:

  • A new shell script update to run transfer engine tests.
  • A comprehensive integration test (test_transfer_engine_exp.py) for sender/receiver communication.
  • New API implementations in transfer_engine_exp.py to enable tensor transfers via a TransferEngine.

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
scripts/run_tests.sh Added pip installation and test invocation for transfer engine tests
mooncake-wheel/tests/test_transfer_engine_exp.py Added an integration test covering multiple sender/receiver usage patterns
mooncake-wheel/mooncake/transfer_engine_exp.py Introduced new classes and convenience functions to enable tensor transfers
Comments suppressed due to low confidence (1)

mooncake-wheel/tests/test_transfer_engine_exp.py:166

  • The test currently allows a pass without data reception, which may mask integration issues; consider adding explicit assertions for data availability when the network and TransferEngine are configured properly.
print("Note: This is expected if TransferEngine is not available or network is not configured")

Comment on lines +140 to +141
def __del__(self):
"""Cleanup"""
Copy link
Preview
Copilot AI Jun 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using del for resource cleanup can be unreliable due to unpredictable garbage collection; consider implementing a context manager to ensure timely and explicit cleanup of resources.

Suggested change
def __del__(self):
"""Cleanup"""
def __enter__(self):
"""Enter the runtime context related to this object."""
return self
def __exit__(self, exc_type, exc_value, traceback):
"""Cleanup resources when exiting the context."""

Copilot uses AI. Check for mistakes.

@@ -10,6 +10,10 @@ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib

echo "Running transfer_engine tests..."
cd mooncake-wheel/tests

pip install torch numpy zmq
Copy link
Preview
Copilot AI Jun 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Installing dependencies during test execution can slow down the test run; consider managing these dependencies via a requirements file or pre-configured environment.

Copilot uses AI. Check for mistakes.

@xiaguan
Copy link
Collaborator
xiaguan commented Jun 12, 2025

Code looks good to me! Just need to get the CI passing before we can merge

@stmatengss
Copy link
Collaborator Author

TODO:

  1. Reduce serialization overhead
  2. Should replace zmq with a higher-performance notification mechanism.
  3. one-to-many send, like broadcast.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0