8000 docs: Add multi-node TRTLLM steps to README by rmccorm4 · Pull Request #930 · ai-dynamo/dynamo · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

docs: Add multi-node TRTLLM steps to README #930

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 2, 2025

Conversation

rmccorm4
Copy link
Contributor
@rmccorm4 rmccorm4 commented May 2, 2025

Overview:

Add general steps on multi-node disaggregated serving deployment steps

This skips over a couple workarounds that are required in very specific cluster environments to keep the steps general, such as patching dynamo serve to launch TRTLLM workers with mpirun, and over-writing SLURM_NODELIST to prevent mpirun from trying to expand to multiple nodes via srun on multi-node allocations.

When these workarounds are more fleshed out and generalized, we can update these doc steps.

Copy link
copy-pr-bot bot commented May 2, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@rmccorm4 rmccorm4 enabled auto-merge (squash) May 2, 2025 19:09
@rmccorm4 rmccorm4 merged commit f0ac8e2 into main May 2, 2025
6 checks passed
@rmccorm4 rmccorm4 deleted the rmccormick/docs/multinode_trtllm branch May 2, 2025 23:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0