Dynamo is an open source project with Apache 2 license. The primary distribution is done via pip wheels with minimal binary size. The ai-dynamo github org hosts 2 repos: dynamo and NIXL. Dynamo is designed as the ideal next generation inference server, building upon the foundations of the Triton Inference Server. While Triton focuses on single-node inference deployments, we are committed to integrating its robust single-node capabilities into Dynamo within the next several months. We will maintain ongoing support for Triton while ensuring a seamless migration path for existing users to Dynamo once feature parity is achieved. As a vendor-agnostic serving framework, Dynamo supports multiple LLM inference engines including TRT-LLM, vLLM, and SGLang, with varying degrees of maturity and support.
Dynamo v0.2.1 features:
- KV Block Manager! intro
- Improved vLLM Performance by avoiding re-initializing sampling params
- SGLang support! README.md
- Multi-Modal E/P/D Disaggregation! README.md
- Leader Worker Set K8s!
- Qwen3, Gemma3 and Llama4 in Dynamo Run!
Future plans
Known Issues
- Benchmark guides are still being validated on public cloud instances (GCP / AWS)
What's Changed
🚀 Features & Improvements
- feat: Qwen3, Gemma3 and Llama4 support by @grahamking in #1002
- feat: Remove vllm and sglang from cargo build command by @hhzhang16 in #1003
- feat: deploy planner in operator by @julienmancuso in #921
- refactor: use primary lease + self-contained graceful shutdown trigged by SIGINT/SIGTERM by @tedzhouhk in #1001
- feat: Add AWS EFA support by @aranadive in #999
- feat(sglang): aggregated support by @ishandhanani in #937
- feat: decoupling dynamo serve by @biswapanda in #905
- feat: allow adding auth to etcd by @wxsms in #980
🐛 Bug Fixes
- fix: Extract tokenizer from GGUF for Qwen3 and Gemma3 arch by @grahamking in #1011
Other Changes
- docs: add docs for dynamo build by @mohammedabdulwahhab in #714
- docs: fix typo in disagg perf tuning guide by @tedzhouhk in #859
- feat: Adding completions endpoint support to
dynamo run in=http
by @oandreeva-nv in #777 - docs: update editable install to include planner by @nv-anants in #860
- chore: add docs around how runtime reconfiguration works by @ishandhanani in #861
- feat: replace async queue with async iter and double decorator by @biswapanda in #858
- docs: fix typo in planner documentation by @AndyDai-nv in #864
- feat: Add unified x86 / aarch64 (ARM) build for VLLM image by @rmccorm4 in #839
- refactor: move logging config to runtime by @ishandhanani in #863
- feat: support multiple endpoints by @biswapanda in #857
- build: Add Olga as a Rust reviewer by @grahamking in #872
- fix: change the processor number to 5 to reduce the tokenization bottleneck by @richardhuo-nv in #865
- refactor: change trtllm example kv routing use python bindings | deal with trtllm partial blocks | trtllm event change by @ziqif-nv in #866
- fix: change environment variable to support local mount by @nnshah1 in #885
- fix: manylinux tag in ai-dynamo-vllm wheel by @nv-anants in #884
- chore: Split PushRouter from Client by @grahamking in #817
- chore: add fastapi depenedncy in pyproject.toml by @biswapanda in #888
- docs: update pythonpath for starting planner by @tedzhouhk in #890
- fix(http): Make ModelDeploymentCard optional by @grahamking in #891
- feat: Add request template support for default inference parameters by @abrarshivani in #841
- fix: endless map in nixl.py by @wxsms in #852
- feat: remove dynamoComponentRequest CRD by @julienmancuso in #856
- docs: Fixes to dynamo deploy docs by @mohammedabdulwahhab in #902
- feat: label component CR for planner by @julienmancuso in #901
- feat: allow users to add env vars to dynamo deployment by @hhzhang16 in #862
- chore: unified logging, added informative warnings for KV router example by @PeaBrane in #912
- docs: add an example on how to use
--service-name
flag to spin up a standalone service by @ishandhanani in #915 - fix: trtllm example by @biswapanda in #909
- fix: add dedicated llmapi config for trtllm disagg kv routing example by @ziqif-nv in #916
- chore: reduce code repetition in processor by @PeaBrane in #919
- feat: Support hf:// URLs in dynamo run by @abrarshivani in #917
- feat: Add check for version info in container build script by @abrarshivani in #774
- docs: update examples in document by @biswapanda in #897
- chore(dynamo-llm): Move the pre-processor to ingress side by @grahamking in #903
- fix: default docker username and password are empty by @hhzhang16 in #926
- feat: Add multimodal example with aggregated serving by @krishung5 in #709
- docs: Add multi-node TRTLLM steps to README by @rmccorm4 in #930
- feat: Update to support completion endpoint in TRTLLM by @tanmayv25 in #837
- fix: use primary lease for NixlMetadataStore by @tedzhouhk in #928
- chore: merge in support matrix and nixl commit hash by @saturley-hall in #944
- feat: allow to set http port by @julienmancuso in #931
- feat: automatically reserve port for assigning port number to endpoint and pubsub by @richardhuo-nv in #946
- feat: multi-thread (via asyncio.task) in processor by @tedzhouhk in #904
- fix: remove requirement for istio in doc by @julienmancuso in #950
- feat: dynamo-run <-> python interop by @grahamking in #934
- refactor: refactor dynamo deploy subfolder by @hhzhang16 in #927
- ci: lock cuda at 12.8 by @hhzhang16 in #957
- chore: Two-line copyright check by @grahamking in #958
- chore: Add John as Codeowner by @jthomson04 in #962
- feat(dynamo-run): vllm and sglang subprocess engines by @grahamking in #954
- docs: add drt doc by @tedzhouhk in #951
- feat: Migrate NATS Queue to Rust (#669) by @jthomson04 in #961
- fix: create k8s service for main component only by @julienmancuso in #953
- fix: fix missing num_remote_prefill_groups in vLLM patch by @ptarasiewiczNV in #981
- fix: Create default sampling params only once during initialization by @ptarasiewiczNV in #982
- chore: Remove embedded Python vllm and sglang engines by @grahamking in #966
- fix: increase ulimit nofile for container/run.sh by @ajcasagrande in #969
- docs: add fix for Zsh globbing error with
pip install .[all]
by @Chasing1020 in #945 - build: Cleans the TensorRTLLM + Dynamo container build by @tanmayv25 in #968
- feat: add interface for deployment manager by @biswapanda in #987
- fix: Check nvext for ignore_eos and set min_tokens for benchmark consistency by @rmccorm4 in #988
- fix: Fix vllm/sglang engine model name if using HF repo by @grahamking in #986
- feat: Add multimodal example with disaggregated serving by @krishung5 in #811
- feat: cleanup EtcdKvCache and PrefillQueue before and after launch by @tedzhouhk in #925
- feat: add ingress to graph deployments by @hhzhang16 in #960
- ci: add PR labels and config for github release notes by @nv-anants in #955
- fix: should route based on waiting requests, not active by @PeaBrane in #989
- fix: typo in devcontainer ulimit nofile by @ajcasagrande in #994
- docs: Add slurm env var workaround for MPI spawn errors by @rmccorm4 in #992
New Contributors
- @richardhuo-nv made their first contribution in #865
- @wxsms made their first contribution in #852
- @krishung5 made their first contribution in #709
- @jthomson04 made their first contribution in #962
- @ajcasagrande made their first contribution in #969
- @Chasing1020 made their first contribution in #945
- @aranadive made their first contribution in #999
Full Changelog: v0.2.0...v0.2.1