-
Notifications
You must be signed in to change notification settings - Fork 393
feat(dynamo-run): vllm and sglang subprocess engines #954
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
071c093
to
9fb2f04
Compare
9fb2f04
to
0a30b3c
Compare
0a30b3c
to
9d70c68
Compare
New vllm and sglang engines that runs in a sub-process. Will hopefully replace the existing embedded python engines. Why? - Pure Python, does not require knowing Rust to work on it. Much simpler to maintain. - No embedded Python interpreter which avoids linking libpython and avoids the MacOS virtualenv issues. - Should have better performance as it's "native" vllm / sglang. - Works with any version of vllm (including v1!) and sglang. Less upgrade struggle.
9d70c68
to
45f54df
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sglang snuck in there since last review!
Probably want to see what can be common/shared between the examples instead of updating each the same way moving forward for a 3rd+ worker example (ex: a new argparse arg or something) |
Hehe. The way our review process works is that I get max one PR per day, so needs must :-) |
It's a bit tricky right now because those Python scripts are written to a temp file and executed as |
Just curious - why the intermediate temp file? Why not execute the file directly? Couldn't this break slightly more complicated files that depend on relative imports colocated to it? (There's a WAR for that case via PYTHONPATH, but keeping it simple) |
There is no file. We do And yes, absolutely, the engine has to be a single self-contained file. I'm hoping to get the trt-llm example cleaned up enough to fit within this model. The |
What a cleanup! A breath of fresh air. vllm and sglang are now the sub-process engines from #954 This means unless you build with `--feature python`, dynamo-run does not link `libpython`.
What a cleanup! A breath of fresh air. vllm and sglang are now the sub-process engines from #954 This means unless you build with `--feature python`, dynamo-run does not link `libpython`.
What a cleanup! A breath of fresh air. vllm and sglang are now the sub-process engines from #954 This means unless you build with `--feature python`, dynamo-run does not link `libpython`.
What a cleanup! A breath of fresh air. vllm and sglang are now the sub-process engines from #954 This means unless you build with `--feature python`, dynamo-run does not link `libpython`.
What a cleanup! A breath of fresh air. vllm and sglang are now the sub-process engines from #954 This means unless you build with `--feature python`, dynamo-run does not link `libpython`.
vllm and sglang are now the sub-process engines from #954 Also updated docs on doing vllm and sglang multi-gpu (tensor parallel) and multi-node (pipeline parallel).
New vllm and sglang engines that run in a sub-process. Will hopefully replace the existing embedded python engines.
Why?