-
Notifications
You must be signed in to change notification settings - Fork 287
[TE] feat: support send/recv API for tensor transfer #472
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces support for tensor transfers through new send/recv APIs and adds integration tests to validate sender/receiver functionality. Key changes include:
- A new shell script update to run transfer engine tests.
- A comprehensive integration test (test_transfer_engine_exp.py) for sender/receiver communication.
- New API implementations in transfer_engine_exp.py to enable tensor transfers via a TransferEngine.
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
File | Description |
---|---|
scripts/run_tests.sh | Added pip installation and test invocation for transfer engine tests |
mooncake-wheel/tests/test_transfer_engine_exp.py | Added an integration test covering multiple sender/receiver usage patterns |
mooncake-wheel/mooncake/transfer_engine_exp.py | Introduced new classes and convenience functions to enable tensor transfers |
Comments suppressed due to low confidence (1)
mooncake-wheel/tests/test_transfer_engine_exp.py:166
- The test currently allows a pass without data reception, which may mask integration issues; consider adding explicit assertions for data availability when the network and TransferEngine are configured properly.
print("Note: This is expected if TransferEngine is not available or network is not configured")
def __del__(self): | ||
"""Cleanup""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using del for resource cleanup can be unreliable due to unpredictable garbage collection; consider implementing a context manager to ensure timely and explicit cleanup of resources.
def __del__(self): | |
"""Cleanup""" | |
def __enter__(self): | |
"""Enter the runtime context related to this object.""" | |
return self | |
def __exit__(self, exc_type, exc_value, traceback): | |
"""Cleanup resources when exiting the context.""" |
Copilot uses AI. Check for mistakes.
@@ -10,6 +10,10 @@ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib | |||
|
|||
echo "Running transfer_engine tests..." | |||
cd mooncake-wheel/tests | |||
|
|||
pip install torch numpy zmq |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Installing dependencies during test execution can slow down the test run; consider managing these dependencies via a requirements file or pre-configured environment.
Copilot uses AI. Check for mistakes.
Code looks good to me! Just need to get the CI passing before we can merge |
TODO:
|
No description provided.