feat(dynamo-run): vllm and sglang subprocess engines #954

grahamking · 2025-05-06T14:49:29Z

New vllm and sglang engines that run in a sub-process. Will hopefully replace the existing embedded python engines.

Why?

Pure Python, does not require knowing Rust to work on it. Much simpler to maintain.
No embedded Python interpreter which avoids linking libpython and avoids the MacOS virtualenv issues.
Should have better performance as it's "native" vllm / sglang.
Works with any version of vllm (including v1!) and sglang. Less upgrade struggle.

launch/dynamo-run/src/subprocess.rs

launch/dynamo-run/src/lib.rs

launch/dynamo-run/src/subprocess.rs

launch/dynamo-run/src/subprocess/vllm_inc.py

launch/dynamo-run/src/subprocess.rs

launch/dynamo-run/src/subprocess/vllm_inc.py

New vllm and sglang engines that runs in a sub-process. Will hopefully replace the existing embedded python engines. Why? - Pure Python, does not require knowing Rust to work on it. Much simpler to maintain. - No embedded Python interpreter which avoids linking libpython and avoids the MacOS virtualenv issues. - Should have better performance as it's "native" vllm / sglang. - Works with any version of vllm (including v1!) and sglang. Less upgrade struggle.

rmccorm4

Sglang snuck in there since last review!

rmccorm4 · 2025-05-06T19:35:50Z

Probably want to see what can be common/shared between the examples instead of updating each the same way moving forward for a 3rd+ worker example (ex: a new argparse arg or something)

grahamking · 2025-05-06T19:40:33Z

Sglang snuck in there since last review!

Hehe. The way our review process works is that I get max one PR per day, so needs must :-)

grahamking · 2025-05-06T19:42:03Z

Probably want to see what can be common/shared between the examples instead of updating each the same way moving forward for a 3rd+ worker example (ex: a new argparse arg or something)

It's a bit tricky right now because those Python scripts are written to a temp file and executed as python3 <tempfile>. There's no simple way for them to share code.

rmccorm4 · 2025-05-06T20:24:12Z

Just curious - why the intermediate temp file? Why not execute the file directly? Couldn't this break slightly more complicated files that depend on relative imports colocated to it?

(There's a WAR for that case via PYTHONPATH, but keeping it simple)

grahamking · 2025-05-06T20:29:18Z

Just curious - why the intermediate temp file? Why not execute the file directly? Couldn't this break slightly more complicated files that depend on relative imports colocated to it?

(There's a WAR for that case via PYTHONPATH, but keeping it simple)

There is no file. We do pub const PY: &str = include_str!("vllm_inc.py"); which is a compile time include. The whole Python program is a const string.

And yes, absolutely, the engine has to be a single self-contained file. I'm hoping to get the trt-llm example cleaned up enough to fit within this model.

Theout=vllm and out=sglang are just syntactic sugar for python sglang_inc.py --params .... It means we don't break people's scripts when we retire the old sglang and vllm engines. Serious deployments would copy the python script, change it, and run it directly.

What a cleanup! A breath of fresh air. vllm and sglang are now the sub-process engines from #954 This means unless you build with `--feature python`, dynamo-run does not link `libpython`.

vllm and sglang are now the sub-process engines from #954 Also updated docs on doing vllm and sglang multi-gpu (tensor parallel) and multi-node (pipeline parallel).

grahamking requested review from nnshah1, tanmayv25, piotrm-nvidia, ptarasiewiczNV, ryanolson, paulhendricks, biswapanda, tmonty12, GuanLuo, rmccorm4, tedzhouhk, alec-flowers, kkranen, a team and oandreeva-nv as code owners May 6, 2025 14:49

pull-request-size bot added the size/L label May 6, 2025

copy-pr-bot bot temporarily deployed to GITLAB May 6, 2025 14:49 Inactive

copy-pr-bot bot temporarily deployed to GITLAB May 6, 2025 14:50 Inactive

rmccorm4 reviewed May 6, 2025

View reviewed changes

launch/dynamo-run/src/subprocess.rs Show resolved Hide resolved

launch/dynamo-run/src/lib.rs Outdated Show resolved Hide resolved

launch/dynamo-run/src/subprocess.rs Show resolved Hide resolved

launch/dynamo-run/src/subprocess/vllm_inc.py Outdated Show resolved Hide resolved

grahamking force-pushed the gk-vllm-subprocess branch from 071c093 to 9fb2f04 Compare May 6, 2025 16:04

pull-request-size bot added size/XL and removed size/L labels May 6, 2025

copy-pr-bot bot temporarily deployed to GITLAB May 6, 2025 16:05 Inactive

copy-pr-bot bot temporarily deployed to GITLAB May 6, 2025 16:09 Inactive

oandreeva-nv reviewed May 6, 2025

View reviewed changes

launch/dynamo-run/src/subprocess.rs Outdated Show resolved Hide resolved

oandreeva-nv reviewed May 6, 2025

View reviewed changes

launch/dynamo-run/src/subprocess/vllm_inc.py Show resolved Hide resolved

grahamking force-pushed the gk-vllm-subprocess branch from 9fb2f04 to 0a30b3c Compare May 6, 2025 18:14

copy-pr-bot bot temporarily deployed to GITLAB May 6, 2025 18:14 Inactive

copy-pr-bot bot temporarily deployed to GITLAB May 6, 2025 18:15 Inactive

grahamking changed the title ~~feat(dynamo-run): vllm subprocess engine~~ feat(dynamo-run): vllm and sglang subprocess engines May 6, 2025

grahamking force-pushed the gk-vllm-subprocess branch from 0a30b3c to 9d70c68 Compare May 6, 2025 18:23

copy-pr-bot bot temporarily deployed to GITLAB May 6, 2025 18:23 Inactive

copy-pr-bot bot temporarily deployed to GITLAB May 6, 2025 18:25 Inactive

grahamking force-pushed the gk-vllm-subprocess branch from 9d70c68 to 45f54df Compare May 6, 2025 19:10

copy-pr-bot bot temporarily deployed to GITLAB May 6, 2025 19:10 Inactive

copy-pr-bot bot temporarily deployed to GITLAB May 6, 2025 19:11 Inactive

rmccorm4 approved these changes May 6, 2025

View reviewed changes

grahamking merged commit 28fd481 into main May 6, 2025
11 of 12 checks passed

grahamking mentioned this pull request May 6, 2025

feat(sglang): aggregated support #937

Merged

1 task

grahamking mentioned this pull request May 6, 2025

chore: Remove embedded Python vllm and sglang engines #966

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(dynamo-run): vllm and sglang subprocess engines #954

feat(dynamo-run): vllm and sglang subprocess engines #954

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

feat(dynamo-run): vllm and sglang subprocess engines #954

feat(dynamo-run): vllm and sglang subprocess engines #954

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!