Releases · ai-dynamo/dynamo

@grahamking

Dynamo is an open source project with Apache 2 license. The primary distribution is done via pip wheels with minimal binary size. The ai-dynamo github org hosts 2 repos: dynamo and NIXL. Dynamo is designed as the ideal next generation inference server, building upon the foundations of the Triton Inference Server. While Triton focuses on single-node inference deployments, we are committed to integrating its robust single-node capabilities into Dynamo within the next several months. We will maintain ongoing support for Triton while ensuring a seamless migration path for existing users to Dynamo once feature parity is achieved. As a vendor-agnostic serving framework, Dynamo supports multiple LLM inference engines including TRT-LLM, vLLM, and SGLang, with varying degrees of maturity and support.

Dynamo v0.2.1 features:

KV Block Manager! intro
Improved vLLM Performance by avoiding re-initializing sampling params
SGLang support! README.md
Multi-Modal E/P/D Disaggregation! README.md
Leader Worker Set K8s!
Qwen3, Gemma3 and Llama4 in Dynamo Run!

Future plans

Dynamo Roadmap

Known Issues

Benchmark guides are still being validated on public cloud instances (GCP / AWS)

What's Changed

🚀 Features & Improvements

feat: Qwen3, Gemma3 and Llama4 support by @grahamking in #1002
feat: Remove vllm and sglang from cargo build command by @hhzhang16 in #1003
feat: deploy planner in operator by @julienmancuso in #921
refactor: use primary lease + self-contained graceful shutdown trigged by SIGINT/SIGTERM by @tedzhouhk in #1001
feat: Add AWS EFA support by @aranadive in #999
feat(sglang): aggregated support by @ishandhanani in #937
feat: decoupling dynamo serve by @biswapanda in #905
feat: allow adding auth to etcd by @wxsms in #980

🐛 Bug Fixes

fix: Extract tokenizer from GGUF for Qwen3 and Gemma3 arch by @grahamking in #1011

Other Changes

docs: add docs for dynamo build by @mohammedabdulwahhab in #714
docs: fix typo in disagg perf tuning guide by @tedzhouhk in #859
feat: Adding completions endpoint support to dynamo run in=http by @oandreeva-nv in #777
docs: update editable install to include planner by @nv-anants in #860
chore: add docs around how runtime reconfiguration works by @ishandhanani in #861
feat: replace async queue with async iter and double decorator by @biswapanda in #858
docs: fix typo in planner documentation by @AndyDai-nv in #864
feat: Add unified x86 / aarch64 (ARM) build for VLLM image by @rmccorm4 in #839
refactor: move logging config to runtime by @ishandhanani in #863
feat: support multiple endpoints by @biswapanda in #857
build: Add Olga as a Rust reviewer by @grahamking in #872
fix: change the processor number to 5 to reduce the tokenization bottleneck by @richardhuo-nv in #865
refactor: change trtllm example kv routing use python bindings | deal with trtllm partial blocks | trtllm event change by @ziqif-nv in #866
fix: change environment variable to support local mount by @nnshah1 in #885
fix: manylinux tag in ai-dynamo-vllm wheel by @nv-anants in #884
chore: Split PushRouter from Client by @grahamking in #817
chore: add fastapi depenedncy in pyproject.toml by @biswapanda in #888
docs: update pythonpath for starting planner by @tedzhouhk in #890
fix(http): Make ModelDeploymentCard optional by @grahamking in #891
feat: Add request template support for default inference parameters by @abrarshivani in #841
fix: endless map in nixl.py by @wxsms in #852
feat: remove dynamoComponentRequest CRD by @julienmancuso in #856
docs: Fixes to dynamo deploy docs by @mohammedabdulwahhab in #902
feat: label component CR for planner by @julienmancuso in #901
feat: allow users to add env vars to dynamo deployment by @hhzhang16 in #862
chore: unified logging, added informative warnings for KV router example by @PeaBrane in #912
docs: add an example on how to use --service-name flag to spin up a standalone service by @ishandhanani in #915
fix: trtllm example by @biswapanda in #909
fix: add dedicated llmapi config for trtllm disagg kv routing example by @ziqif-nv in #916
chore: reduce code repetition in processor by @PeaBrane in #919
feat: Support hf:// URLs in dynamo run by @abrarshivani in #917
feat: Add check for version info in container build script by @abrarshivani in #774
docs: update examples in document by @biswapanda in #897
chore(dynamo-llm): Move the pre-processor to ingress side by @grahamking in #903
fix: default docker username and password are empty by @hhzhang16 in #926
feat: Add multimodal example with aggregated serving by @krishung5 in #709
docs: Add multi-node TRTLLM steps to README by @rmccorm4 in #930
feat: Update to support completion endpoint in TRTLLM by @tanmayv25 in #837
fix: use primary lease for NixlMetadataStore by @tedzhouhk in #928
chore: merge in support matrix and nixl commit hash by @saturley-hall in #944
feat: allow to set http port by @julienmancuso in #931
feat: automatically reserve port for assigning port number to endpoint and pubsub by @richardhuo-nv in #946
feat: multi-thread (via asyncio.task) in processor by @tedzhouhk in #904
fix: remove requirement for istio in doc by @julienmancuso in #950
feat: dynamo-run <-> python interop by @grahamking in #934
refactor: refactor dynamo deploy subfolder by @hhzhang16 in #927
ci: lock cuda at 12.8 by @hhzhang16 in #957
chore: Two-line copyright check by @grahamking in #958
chore: Add John as Codeowner by @jthomson04 in #962
feat(dynamo-run): vllm and sglang subprocess engines by @grahamking in #954
docs: add drt doc by @tedzhouhk in #951
feat: Migrate NATS Queue to Rust (#669) by @jthomson04 in #961
fix: create k8s service for main component only by @julienmancuso in #953
fix: fix missing num_remote_prefill_groups in vLLM patch by @ptarasiewiczNV in #981
fix: Create default sampling params only once during initialization by @ptarasiewiczNV in #982
chore: Remove embedded Python vllm and sglang engines by @grahamking in #966
fix: increase ulimit nofile for container/run.sh by @ajcasagrande in #969
docs: add fix for Zsh globbing error with pip install .[all] by @Chasing1020 in #945
build: Cleans the TensorRTLLM + Dynamo container build by @tanmayv25 in #968
feat: add interface for deployment manager by @biswapanda in #987
fix: Check nvext for ignore_eos and set min_tokens for benchmark consistency by @rmccorm4 in #988
fix: Fix vllm/sglang engine model name if using HF repo by @grahamking in #986
feat: Add multimodal example with disaggregated serving by @krishung5 in #811
feat: clea...

@tedzhouhk

Dynamo is an open source project with Apache 2 license. The primary distribution is done via pip wheels with minimal binary size. The ai-dynamo github org hosts 2 repos: dynamo and NIXL. Dynamo is designed as the ideal next generation inference server, building upon the foundations of the Triton Inference Server. While Triton focuses on single-node inference deployments, we are committed to integrating its robust single-node capabilities into Dynamo within the next several months. We will maintain ongoing support for Triton while ensuring a seamless migration path for existing users to Dynamo once feature parity is achieved. As a vendor-agnostic serving framework, Dynamo supports multiple LLM inference engines including TRT-LLM, vLLM, and SGLang, with varying degrees of maturity and support.

Dynamo v0.2.0 features:

GB200 support with ARM builds (Note: currently requires a container build)
Planner - new experimental support for spinning workers up and down based on load
Improved K8s deployment workflow
- Installation wizard to enable easy configuration of Dynamo on your Kubernetes cluster
- CLI to manage your operator-based deployments
- Consolidate Custom Resources for Dynamo Deployments
- Documentation improvements (including Minikube guide to installing Dynamo Platform)

Future plans

Dynamo Roadmap

Known Issues

Benchmark guides are still being validated on public cloud instances (GCP / AWS)
Benchmarks on internal clusters show a 15% degradation from results displayed in summary graphs for multi-node 70B and are being investigated.
TensorRT-LLM examples are not working currently in this release - but are being fixed in main.

What's Changed

fix: fix max_local_prefill_length not being printed out in disagg router log by @tedzhouhk in #628
docs: Add instructions to install git lfs by @tanmayv25 in #627
fix: add DYNAMO_HOME env var to vLLM docker image by @nv-anants in #629
fix: Account for Metrics.decode() changes by @rmccorm4 in #619
fix: Update test_report by @pvijayakrish in #641
fix: serviceArgs in config was not getting set for workers by @mohammedabdulwahhab in #640
fix: adding conversion to string for notif id comparison by @nnshah1 in #638
docs: Add documentation for UCX KV cache transfer in TRTLLM by @tanmayv25 in #639
build: Define UCX env var to use NVLink when available by @tanmayv25 in #631
feat: ETCD prefix watcher + python binding + runtime reconfiguration for router and disagg router by @tedzhouhk in #581
fix: dynamo build should work with link syntax by @mohammedabdulwahhab in #646
fix: change trtllm kv_router default block_size to 32 by @ziqif-nv in #642
fix: signal handlers to clean up zombie vllm processes by @ishandhanani in #545
feat: add .devcontainer based off images in container/ by @alec-flowers in #497
fix: devcontainer mounts and vllm c api by @alec-flowers in #663
fix: deploy command should support passing config by @mohammedabdulwahhab in #626
feat(dynamo-run): improve available engines list in --help by @XueSongTap in #664
feat: add dynamoDeployment CR finalizer by @julienmancuso in #623
fix: set correct parent_hash for each kv block when publish kv events by @ziqif-nv in #671
docs: Use the same term for dynamo base image across code snippets and text by @hutm in #670
docs: move deploy docs to docs/guides by @hhzhang16 in #674
fix: frontend and http server signal handling by @alec-flowers in #677
fix: check for resource in pipeline helm chart by @julienmancuso in #687
fix: ensure VLLM_LOGGING_LEVEL=xyz followsDYN_LOG=xyz by @ishandhanani in #692
feat: replace dynamo server with dynamo cloud by @hhzhang16 in #696
feat: base Dynamo docker image improvements and fixes by @hhzhang16 in #658
fix: fix pipeline helm chart by @julienmancuso in #698
docs: Benchmarking guide updates by @kthui in #678
feat: bump vLLM version to v0.8.4 by @ptarasiewiczNV in #690
chore: Replace TRD->Dynamo in llmctl help output by @rmccorm4 in #710
fix: allow for an empty dynamo config file by @hhzhang16 in #712
fix: cli version by @ishandhanani in #716
docs: Remove outdated python-wheels directory reference by @rmccorm4 in #719
fix: direct clients vs dependancies by @ishandhanani in #704
feat: adding dynamo-tokens crate by @ryanolson in #718
fix: bump GAP to r25.03 by @tedzhouhk in #724
feat: make ingress configurable in operator by @julienmancuso in #717
feat: configure logger with detail info by @tlipoca9 in #654
feat: Add disagg skeleton example by @kylehh in #683
fix: dynamo deploy helm chart cleanup by @mohammedabdulwahhab in #727
docs: add dedicated minikube guide by @mohammedabdulwahhab in #735
feat(dynamo-engine-vllm): vllm 0.8.X support by @grahamking in #728
feat: gracefully shutdown endpoint by revoking etcd lease + python binding by @tedzhouhk in #730
fix: Add missing deps for '--framework none' build by @rmccorm4 in #738
chore: Remove TRT-LLM C++ engine in favor of Python one by @grahamking in #747
docs: Support matrix post release. by @pvijayakrish in #736
docs: add aggregated deployment guide for multi-node sized model by @GuanLuo in #713
feat: make the model name to be the same as the HF repo name for dynamo-run by @AndyDai-nv in #749
feat: add additional packages to log filters by @abrarshivani in #752
chore(dynamo-run): Fix echo_core for EOS tokens by @grahamking in #759
feat: add custom lease to worker components by @ishandhanani in #748
chore: Add roadmap to main README.md by @harryskim in #763
feat: MLA disaggregation support to vLLM patch by @ptarasiewiczNV in #745
fix: Fix cancellation flow in python component graph by @pankajroark in #765
fix: give the user ownership permissions of /opt/dynamo/venv by @hhzhang16 in #767
docs: deployment docs improvements by @hhzhang16 in #753
feat: add option to configure separate docker registry for pipelines docker images by @julienmancuso in #744
chore: Update bug report to use dynamo env for collecting environment information by @nv-tusharma in #558
docs: R1 disaggregation guide by @GuanLuo in #720
feat: allow to CRUD dynamo pipelines by @julienmancuso in #761
docs: Custom Backend/Worker Guide by @rmccorm4 in #608
chore: fix arg name in example by @CormickKneey in #770
build: add rust binaries in manylinux image by @nv-anants in #783
feat: remove bento/yatai references by @julienmancuso in #782
docs: add note to use release branch examples by @nv-anants in #793
feat: Add log verbosity level flag to dynamo-run cli by @abrarshivani in #780
feat: rename operator CRDs by @julienmancuso in #795
feat: Add linux aarch64 support to dynamo-run build by @rmccorm4 in #802
fix: Update TRTLLM version and fix disagg workflow by @tanmayv25 in #804
chore: Increase sleep tim...

@kthui

Dynamo is an open source project with Apache 2 license. The primary distribution is done via pip wheels with minimal binary size. The ai-dynamo github org hosts 2 repos: dynamo and NIXL. Dynamo is designed as the ideal next generation inference server, building upon the foundations of the Triton Inference Server. While Triton focuses on single-node inference deployments, we are committed to integrating its robust single-node capabilities into Dynamo within the next several months. We will maintain ongoing support for Triton while ensuring a seamless migration path for existing users to Dynamo once feature parity is achieved. As a vendor-agnostic serving framework, Dynamo supports multiple LLM inference engines including TRT-LLM, vLLM, and SGLang, with varying degrees of maturity and support.

Dynamo v0.1.1 features:

Benchmarking guides for Single and Multi-Node Disaggregation on H100 (vLLM)
TensorRT-LLM support for KV Aware Routing
TensorRT-LLM support for Disaggregation
ManyLinux and Ubuntu 22.04 Support for wheels and crates
Unified logging for Python and Rust

Future plans

Instructions for reproducing benchmark guides on GCP and AWS
KV Cache Manager as a standalone repository under the ai-dynamo organization. This release will provide functionality for storing and evicting KV cache across multiple memory tiers, including GPU, system memory, local SSD, and object storage.
Searchable user guides and documentation
Multi-node instances for large models
Initial Planner version supporting dynamic scaling of P / D workers. We will include an early version of the Dynamo Planner, another core component. This initial release will feature heuristic-based dynamic allocation of GPU workers between prefill and decode tasks, as well as model and fleet configuration adjustments based on user traffic patterns. Our vision is to evolve the Planner into a reinforcement learning platform, which will allow users to define objectives and then tune and optimize performance policies automatically based on system feedback.
vLLM 1.0 support with NIXL and KV Cache Events

Known Issues

Benchmark guides are still being validated on public cloud instances (GCP / AWS)
Benchmarks on internal clusters show a 15% degradation from results displayed in summary graphs for multi-node 70B and are being investigated.

What's Changed

docs: Benchmarking guide updates (#678) by @kthui in #699
docs: Update support matrix by @pvijayakrish in #691
fix: change trtllm kv_router default block_size to 32 (#642) by @tanmayv25 in #694
fix: set correct parent_hash for each kv block when publish kv events by @tanmayv25 in #693
fix: Remove kv connector from agg config by @ptarasiewiczNV in #655
fix: Account for Metrics.decode() changes (#619) by @rmccorm4 in #619
fix: update to match latest nixl notifications as bytes @nnshah1 in #645
docs: Update support matrix by @pvijayakrish in #633
docs: Add instructions to install git lfs (#627) by @tanmayv25 in #627
fix: add DYNAMO_HOME env var to vLLM docker image (#629) by @nv-anants in #629
feat: TRT-LLM disaggregated serving using UCX (#562) by @tanmayv25 in #562
docs: Update support matrix by @pvijayakrish in #604
docs: Guide for multi-node benchmarking (#561) by @kthui in #561
fix: remove api-store from container by @mohammedabdulwahhab in #617
docs: Guides for single node benchmarking (#509) by @kthui in #509
fix: set worker env before worker process spawn by @ishandhanani in #614
docs: Move trtllm dynamo run doc from example to dynamo run guide (#578) by @tanmayv25 in #578
chore: update ai-dynamo-vllm wheel version (#598) by @nv-anants in #598
fix: bump bento to 1.4.8 (#579) by @mohammedabdulwahhab in #579
fix: update yum install in wheel-builder image (#605) by @nv-anants in #605
docs: update dynamo serve trtllm agg example yaml files (#600) by @ziqif-nv in #600
chore: use latest nixl for docker builds by @nv-anants in #596
chore: update versions to 0.1.1 by @nv-anants in #552
docs: Updated dynamo run instructions by @cdgamarose-nv in #555
feat: Add manylinux support for Dynamo by @pvijayakrish in #536
docs: Clarify the --max-local-prefill-length help description by @kthui in #554
feat: Add dynamo env CLI option to provide information about user environment by @nv-tusharma in #533
docs: add disagg tuning guide by @tedzhouhk in #413
fix: let dynamo run pass --help to dynamo-run by @ziqif-nv in #547
chore: Update TRTLLM version. Fix router. by @tanmayv25 in #527
fix: unify and enable dynamo logging by @ishandhanani in #520
feat(dynamo-run): Basic routing choice by @grahamking in #524
fix: clean unused bento pieces from serve.py and serving.py by @ishandhanani in #532
docs: update close-deployment in dynamo_serve.md by @tlipoca9 in #535
feat: update operator README by @julienmancuso in #544
fix: mypy error by @ishandhanani in #543
feat: cleanup operator code by @julienmancuso in #529
chore: Fixed file headers. Added attributions. by @dmitry-tokarev-nv in #530
fix: Remove api-server code by @mohammedabdulwahhab in #526
docs: hello world and vllm process docs by @ishandhanani in #525
feat: KV recorder for dumping router events into a jsonl by @PeaBrane in #505
chore: cleaner required workers check (don't spam print) by @PeaBrane in #521
docs: dynamo-run clarify engine list by @grahamking in #522
chore: Upgrade Rust to 1.86 by @grahamking in #518
chore: Add devops in more CODEOWNERS by @grahamking in #512
feat: Python decorator dynamo_worker takes optional static parameter without etcd by @grahamking in #494
fix: broken link to dynamo run by @lkm2835 in #517
docs: add 405b disaggregated serving documentation by @ishandhanani in #496
refactor: migrate engines to standalone crates by @ryanolson in #453
feat: Add TensorRT-LLM example for dynamo serve/run by @tanmayv25 in #456
docs: Remove invalid link by @grahamking in #506
docs: add instruction to copy dynamo-run in container setup by @hanweisen in #508
chore: Add libclang-dev to CI for llamacpp by @grahamking in #507
chore: rename duration to timeout by @tlipoca9 in #503
fix: adding missing file by @ryanolson in #501
feat: allow replicas to be set in DynamoDeployment CR by @julienmancuso in #486
chore: Disable blank issue creation for default issues template by @nv-tusharma in #492
chore: Remove <> from title + add labels for default issues template. by @nv-tusharma in #491
feat: Sets the code of conduct for the repository by @saturley-hall in #454
fix: Consolidate dynamo start and dynamo serve commands by @mohammedabdulwahhab in #405
feat: improve serve commands and expose DYNAMO_HOME env var by @jon-chuang in #436
feat: kv aware router executable by @ryanolson in #399
feat: deploy and use buildkit to build dynamo images by @julienmancuso in #450
feat(serve): Enhance multi-node deployment and worker configuration by @ishandhanani in #457
chore: Add default issue template for bug & feature requests by @nv-tusharma in #471
feat: unified logging by @ryanolson in #472
feat: add devcontainer to dynamo for Ubuntu 24.04 use by @h...

Dynamo v0.1.0 version will be released following Jensen Huang’s GTC keynote, and the product will be hosted on github.com/ai-dynamo. It’s an open source project with Apache 2 license, and public continuous integration will be available from the start to enable industry-wide collaboration. The primary distribution will be through pip wheels with minimal binary size. The ai-dynamo github org will host 2 repos: dynamo and NIXL.

Initial Dynamo release features:

Disaggregated serving with X prefill and Y decode nodes
KV aware routing
KV cache manager to offload KV cache to system memory
NIXL support for RDMA (InfiniBand, Ethernet) and TCP
Support for K8s deployment

As a vendor-agnostic serving framework, Dynamo supports multiple LLM inference engines including TRT-LLM, vLLM, and SGLang at launch, with varying degrees of maturity and support. Dynamo supports the vLLM engine with all the capabilities mentioned above, with a plan to achieve feature parity with the rest of inference engines as soon as possible.

Future plans
The next release of Dynamo plans to open-source the KV cache manager as a standalone repository under the ai-dynamo organization. This release will provide functionality for storing and evicting KV cache across multiple memory tiers, including GPU, system memory, local SSD, and object storage.

In that release, we will include an early version of the Dynamo Planner, another core component. This initial release will feature heuristic-based dynamic allocation of GPU workers between prefill and decode tasks, as well as model and fleet configuration adjustments based on user traffic patterns. Our vision is to evolve the Planner into a reinforcement learning platform, which will allow users to define objectives and then tune and optimize performance policies automatically based on system feedback.
Dynamo is designed as the ideal next generation inference server, building upon the foundations of the Triton Inference Server. While Triton focuses on single-node inference deployments, we are committed to integrating its robust single-node capabilities into Dynamo within the next several months. We will maintain ongoing support for Triton while ensuring a seamless migration path for existing users to Dynamo once feature parity is achieved.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

What's Changed

🚀 Features & Improvements

🐛 Bug Fixes

Other Changes

Contributors

Uh oh!

What's Changed

Contributors

Uh oh!

What's Changed

Contributors

Uh oh!

Uh oh!

Releases: ai-dynamo/dynamo

Dynamo Release v0.2.1

What's Changed

🚀 Features & Improvements

🐛 Bug Fixes

Other Changes

Contributors

Uh oh!

Dynamo Release v0.2.0

What's Changed

Contributors

Uh oh!

Dynamo Release v0.1.1

What's Changed

Contributors

Uh oh!

Dynamo Release v0.1.0

Uh oh!