Releases: ai-dynamo/dynamo
Dynamo Release v0.2.1
Dynamo is an open source project with Apache 2 license. The primary distribution is done via pip wheels with minimal binary size. The ai-dynamo github org hosts 2 repos: dynamo and NIXL. Dynamo is designed as the ideal next generation inference server, building upon the foundations of the Triton Inference Server. While Triton focuses on single-node inference deployments, we are committed to integrating its robust single-node capabilities into Dynamo within the next several months. We will maintain ongoing support for Triton while ensuring a seamless migration path for existing users to Dynamo once feature parity is achieved. As a vendor-agnostic serving framework, Dynamo supports multiple LLM inference engines including TRT-LLM, vLLM, and SGLang, with varying degrees of maturity and support.
Dynamo v0.2.1 features:
- KV Block Manager! intro
- Improved vLLM Performance by avoiding re-initializing sampling params
- SGLang support! README.md
- Multi-Modal E/P/D Disaggregation! README.md
- Leader Worker Set K8s!
- Qwen3, Gemma3 and Llama4 in Dynamo Run!
Future plans
Known Issues
- Benchmark guides are still being validated on public cloud instances (GCP / AWS)
What's Changed
🚀 Features & Improvements
- feat: Qwen3, Gemma3 and Llama4 support by @grahamking in #1002
- feat: Remove vllm and sglang from cargo build command by @hhzhang16 in #1003
- feat: deploy planner in operator by @julienmancuso in #921
- refactor: use primary lease + self-contained graceful shutdown trigged by SIGINT/SIGTERM by @tedzhouhk in #1001
- feat: Add AWS EFA support by @aranadive in #999
- feat(sglang): aggregated support by @ishandhanani in #937
- feat: decoupling dynamo serve by @biswapanda in #905
- feat: allow adding auth to etcd by @wxsms in #980
🐛 Bug Fixes
- fix: Extract tokenizer from GGUF for Qwen3 and Gemma3 arch by @grahamking in #1011
Other Changes
- docs: add docs for dynamo build by @mohammedabdulwahhab in #714
- docs: fix typo in disagg perf tuning guide by @tedzhouhk in #859
- feat: Adding completions endpoint support to
dynamo run in=http
by @oandreeva-nv in #777 - docs: update editable install to include planner by @nv-anants in #860
- chore: add docs around how runtime reconfiguration works by @ishandhanani in #861
- feat: replace async queue with async iter and double decorator by @biswapanda in #858
- docs: fix typo in planner documentation by @AndyDai-nv in #864
- feat: Add unified x86 / aarch64 (ARM) build for VLLM image by @rmccorm4 in #839
- refactor: move logging config to runtime by @ishandhanani in #863
- feat: support multiple endpoints by @biswapanda in #857
- build: Add Olga as a Rust reviewer by @grahamking in #872
- fix: change the processor number to 5 to reduce the tokenization bottleneck by @richardhuo-nv in #865
- refactor: change trtllm example kv routing use python bindings | deal with trtllm partial blocks | trtllm event change by @ziqif-nv in #866
- fix: change environment variable to support local mount by @nnshah1 in #885
- fix: manylinux tag in ai-dynamo-vllm wheel by @nv-anants in #884
- chore: Split PushRouter from Client by @grahamking in #817
- chore: add fastapi depenedncy in pyproject.toml by @biswapanda in #888
- docs: update pythonpath for starting planner by @tedzhouhk in #890
- fix(http): Make ModelDeploymentCard optional by @grahamking in #891
- feat: Add request template support for default inference parameters by @abrarshivani in #841
- fix: endless map in nixl.py by @wxsms in #852
- feat: remove dynamoComponentRequest CRD by @julienmancuso in #856
- docs: Fixes to dynamo deploy docs by @mohammedabdulwahhab in #902
- feat: label component CR for planner by @julienmancuso in #901
- feat: allow users to add env vars to dynamo deployment by @hhzhang16 in #862
- chore: unified logging, added informative warnings for KV router example by @PeaBrane in #912
- docs: add an example on how to use
--service-name
flag to spin up a standalone service by @ishandhanani in #915 - fix: trtllm example by @biswapanda in #909
- fix: add dedicated llmapi config for trtllm disagg kv routing example by @ziqif-nv in #916
- chore: reduce code repetition in processor by @PeaBrane in #919
- feat: Support hf:// URLs in dynamo run by @abrarshivani in #917
- feat: Add check for version info in container build script by @abrarshivani in #774
- docs: update examples in document by @biswapanda in #897
- chore(dynamo-llm): Move the pre-processor to ingress side by @grahamking in #903
- fix: default docker username and password are empty by @hhzhang16 in #926
- feat: Add multimodal example with aggregated serving by @krishung5 in #709
- docs: Add multi-node TRTLLM steps to README by @rmccorm4 in #930
- feat: Update to support completion endpoint in TRTLLM by @tanmayv25 in #837
- fix: use primary lease for NixlMetadataStore by @tedzhouhk in #928
- chore: merge in support matrix and nixl commit hash by @saturley-hall in #944
- feat: allow to set http port by @julienmancuso in #931
- feat: automatically reserve port for assigning port number to endpoint and pubsub by @richardhuo-nv in #946
- feat: multi-thread (via asyncio.task) in processor by @tedzhouhk in #904
- fix: remove requirement for istio in doc by @julienmancuso in #950
- feat: dynamo-run <-> python interop by @grahamking in #934
- refactor: refactor dynamo deploy subfolder by @hhzhang16 in #927
- ci: lock cuda at 12.8 by @hhzhang16 in #957
- chore: Two-line copyright check by @grahamking in #958
- chore: Add John as Codeowner by @jthomson04 in #962
- feat(dynamo-run): vllm and sglang subprocess engines by @grahamking in #954
- docs: add drt doc by @tedzhouhk in #951
- feat: Migrate NATS Queue to Rust (#669) by @jthomson04 in #961
- fix: create k8s service for main component only by @julienmancuso in #953
- fix: fix missing num_remote_prefill_groups in vLLM patch by @ptarasiewiczNV in #981
- fix: Create default sampling params only once during initialization by @ptarasiewiczNV in #982
- chore: Remove embedded Python vllm and sglang engines by @grahamking in #966
- fix: increase ulimit nofile for container/run.sh by @ajcasagrande in #969
- docs: add fix for Zsh globbing error with
pip install .[all]
by @Chasing1020 in #945 - build: Cleans the TensorRTLLM + Dynamo container build by @tanmayv25 in #968
- feat: add interface for deployment manager by @biswapanda in #987
- fix: Check nvext for ignore_eos and set min_tokens for benchmark consistency by @rmccorm4 in #988
- fix: Fix vllm/sglang engine model name if using HF repo by @grahamking in #986
- feat: Add multimodal example with disaggregated serving by @krishung5 in #811
- feat: clea...
Dynamo Release v0.2.0
Dynamo is an open source project with Apache 2 license. The primary distribution is done via pip wheels with minimal binary size. The ai-dynamo github org hosts 2 repos: dynamo and NIXL. Dynamo is designed as the ideal next generation inference server, building upon the foundations of the Triton Inference Server. While Triton focuses on single-node inference deployments, we are committed to integrating its robust single-node capabilities into Dynamo within the next several months. We will maintain ongoing support for Triton while ensuring a seamless migration path for existing users to Dynamo once feature parity is achieved. As a vendor-agnostic serving framework, Dynamo supports multiple LLM inference engines including TRT-LLM, vLLM, and SGLang, with varying degrees of maturity and support.
Dynamo v0.2.0 features:
- GB200 support with ARM builds (Note: currently requires a container build)
- Planner - new experimental support for spinning workers up and down based on load
- Improved K8s deployment workflow
- Installation wizard to enable easy configuration of Dynamo on your Kubernetes cluster
- CLI to manage your operator-based deployments
- Consolidate Custom Resources for Dynamo Deployments
- Documentation improvements (including Minikube guide to installing Dynamo Platform)
Future plans
Known Issues
- Benchmark guides are still being validated on public cloud instances (GCP / AWS)
- Benchmarks on internal clusters show a 15% degradation from results displayed in summary graphs for multi-node 70B and are being investigated.
- TensorRT-LLM examples are not working currently in this release - but are being fixed in main.
What's Changed
- fix: fix max_local_prefill_length not being printed out in disagg router log by @tedzhouhk in #628
- docs: Add instructions to install git lfs by @tanmayv25 in #627
- fix: add DYNAMO_HOME env var to vLLM docker image by @nv-anants in #629
- fix: Account for Metrics.decode() changes by @rmccorm4 in #619
- fix: Update test_report by @pvijayakrish in #641
- fix: serviceArgs in config was not getting set for workers by @mohammedabdulwahhab in #640
- fix: adding conversion to string for notif id comparison by @nnshah1 in #638
- docs: Add documentation for UCX KV cache transfer in TRTLLM by @tanmayv25 in #639
- build: Define UCX env var to use NVLink when available by @tanmayv25 in #631
- feat: ETCD prefix watcher + python binding + runtime reconfiguration for router and disagg router by @tedzhouhk in #581
- fix: dynamo build should work with link syntax by @mohammedabdulwahhab in #646
- fix: change trtllm kv_router default block_size to 32 by @ziqif-nv in #642
- fix: signal handlers to clean up zombie vllm processes by @ishandhanani in #545
- feat: add .devcontainer based off images in container/ by @alec-flowers in #497
- fix: devcontainer mounts and vllm c api by @alec-flowers in #663
- fix: deploy command should support passing config by @mohammedabdulwahhab in #626
- feat(dynamo-run): improve available engines list in --help by @XueSongTap in #664
- feat: add dynamoDeployment CR finalizer by @julienmancuso in #623
- fix: set correct parent_hash for each kv block when publish kv events by @ziqif-nv in #671
- docs: Use the same term for dynamo base image across code snippets and text by @hutm in #670
- docs: move deploy docs to docs/guides by @hhzhang16 in #674
- fix: frontend and http server signal handling by @alec-flowers in #677
- fix: check for resource in pipeline helm chart by @julienmancuso in #687
- fix: ensure
VLLM_LOGGING_LEVEL=xyz
followsDYN_LOG=xyz
by @ishandhanani in #692 - feat: replace dynamo server with dynamo cloud by @hhzhang16 in #696
- feat: base Dynamo docker image improvements and fixes by @hhzhang16 in #658
- fix: fix pipeline helm chart by @julienmancuso in #698
- docs: Benchmarking guide updates by @kthui in #678
- feat: bump vLLM version to v0.8.4 by @ptarasiewiczNV in #690
- chore: Replace TRD->Dynamo in llmctl help output by @rmccorm4 in #710
- fix: allow for an empty dynamo config file by @hhzhang16 in #712
- fix: cli version by @ishandhanani in #716
- docs: Remove outdated python-wheels directory reference by @rmccorm4 in #719
- fix: direct clients vs dependancies by @ishandhanani in #704
- feat: adding dynamo-tokens crate by @ryanolson in #718
- fix: bump GAP to r25.03 by @tedzhouhk in #724
- feat: make ingress configurable in operator by @julienmancuso in #717
- feat: configure logger with detail info by @tlipoca9 in #654
- feat: Add disagg skeleton example by @kylehh in #683
- fix: dynamo deploy helm chart cleanup by @mohammedabdulwahhab in #727
- docs: add dedicated minikube guide by @mohammedabdulwahhab in #735
- feat(dynamo-engine-vllm): vllm 0.8.X support by @grahamking in #728
- feat: gracefully shutdown endpoint by revoking etcd lease + python binding by @tedzhouhk in #730
- fix: Add missing deps for '--framework none' build by @rmccorm4 in #738
- chore: Remove TRT-LLM C++ engine in favor of Python one by @grahamking in #747
- docs: Support matrix post release. by @pvijayakrish in #736
- docs: add aggregated deployment guide for multi-node sized model by @GuanLuo in #713
- feat: make the model name to be the same as the HF repo name for dynamo-run by @AndyDai-nv in #749
- feat: add additional packages to log filters by @abrarshivani in #752
- chore(dynamo-run): Fix echo_core for EOS tokens by @grahamking in #759
- feat: add custom lease to worker components by @ishandhanani in #748
- chore: Add roadmap to main README.md by @harryskim in #763
- feat: MLA disaggregation support to vLLM patch by @ptarasiewiczNV in #745
- fix: Fix cancellation flow in python component graph by @pankajroark in #765
- fix: give the user ownership permissions of /opt/dynamo/venv by @hhzhang16 in #767
- docs: deployment docs improvements by @hhzhang16 in #753
- feat: add option to configure separate docker registry for pipelines docker images by @julienmancuso in #744
- chore: Update bug report to use dynamo env for collecting environment information by @nv-tusharma in #558
- docs: R1 disaggregation guide by @GuanLuo in #720
- feat: allow to CRUD dynamo pipelines by @julienmancuso in #761
- docs: Custom Backend/Worker Guide by @rmccorm4 in #608
- chore: fix arg name in example by @CormickKneey in #770
- build: add rust binaries in manylinux image by @nv-anants in #783
- feat: remove bento/yatai references by @julienmancuso in #782
- docs: add note to use release branch examples by @nv-anants in #793
- feat: Add log verbosity level flag to dynamo-run cli by @abrarshivani in #780
- feat: rename operator CRDs by @julienmancuso in #795
- feat: Add linux aarch64 support to dynamo-run build by @rmccorm4 in #802
- fix: Update TRTLLM version and fix disagg workflow by @tanmayv25 in #804
- chore: Increase sleep tim...
Dynamo Release v0.1.1
Dynamo is an open source project with Apache 2 license. The primary distribution is done via pip wheels with minimal binary size. The ai-dynamo github org hosts 2 repos: dynamo and NIXL. Dynamo is designed as the ideal next generation inference server, building upon the foundations of the Triton Inference Server. While Triton focuses on single-node inference deployments, we are committed to integrating its robust single-node capabilities into Dynamo within the next several months. We will maintain ongoing support for Triton while ensuring a seamless migration path for existing users to Dynamo once feature parity is achieved. As a vendor-agnostic serving framework, Dynamo supports multiple LLM inference engines including TRT-LLM, vLLM, and SGLang, with varying degrees of maturity and support.
Dynamo v0.1.1 features:
- Benchmarking guides for Single and Multi-Node Disaggregation on H100 (vLLM)
- TensorRT-LLM support for KV Aware Routing
- TensorRT-LLM support for Disaggregation
- ManyLinux and Ubuntu 22.04 Support for wheels and crates
- Unified logging for Python and Rust
Future plans
- Instructions for reproducing benchmark guides on GCP and AWS
- KV Cache Manager as a standalone repository under the ai-dynamo organization. This release will provide functionality for storing and evicting KV cache across multiple memory tiers, including GPU, system memory, local SSD, and object storage.
- Searchable user guides and documentation
- Multi-node instances for large models
- Initial Planner version supporting dynamic scaling of P / D workers. We will include an early version of the Dynamo Planner, another core component. This initial release will feature heuristic-based dynamic allocation of GPU workers between prefill and decode tasks, as well as model and fleet configuration adjustments based on user traffic patterns. Our vision is to evolve the Planner into a reinforcement learning platform, which will allow users to define objectives and then tune and optimize performance policies automatically based on system feedback.
- vLLM 1.0 support with NIXL and KV Cache Events
Known Issues
- Benchmark guides are still being validated on public cloud instances (GCP / AWS)
- Benchmarks on internal clusters show a 15% degradation from results displayed in summary graphs for multi-node 70B and are being investigated.
What's Changed
- docs: Benchmarking guide updates (#678) by @kthui in #699
- docs: Update support matrix by @pvijayakrish in #691
- fix: change trtllm kv_router default block_size to 32 (#642) by @tanmayv25 in #694
- fix: set correct parent_hash for each kv block when publish kv events by @tanmayv25 in #693
- fix: Remove kv connector from agg config by @ptarasiewiczNV in #655
- fix: Account for Metrics.decode() changes (#619) by @rmccorm4 in #619
- fix: update to match latest nixl notifications as bytes @nnshah1 in #645
- docs: Update support matrix by @pvijayakrish in #633
- docs: Add instructions to install git lfs (#627) by @tanmayv25 in #627
- fix: add DYNAMO_HOME env var to vLLM docker image (#629) by @nv-anants in #629
- feat: TRT-LLM disaggregated serving using UCX (#562) by @tanmayv25 in #562
- docs: Update support matrix by @pvijayakrish in #604
- docs: Guide for multi-node benchmarking (#561) by @kthui in #561
- fix: remove api-store from container by @mohammedabdulwahhab in #617
- docs: Guides for single node benchmarking (#509) by @kthui in #509
- fix: set worker env before worker process spawn by @ishandhanani in #614
- docs: Move trtllm dynamo run doc from example to dynamo run guide (#578) by @tanmayv25 in #578
- chore: update ai-dynamo-vllm wheel version (#598) by @nv-anants in #598
- fix: bump bento to 1.4.8 (#579) by @mohammedabdulwahhab in #579
- fix: update yum install in wheel-builder image (#605) by @nv-anants in #605
- docs: update dynamo serve trtllm agg example yaml files (#600) by @ziqif-nv in #600
- chore: use latest nixl for docker builds by @nv-anants in #596
- chore: update versions to 0.1.1 by @nv-anants in #552
- docs: Updated dynamo run instructions by @cdgamarose-nv in #555
- feat: Add manylinux support for Dynamo by @pvijayakrish in #536
- docs: Clarify the --max-local-prefill-length help description by @kthui in #554
- feat: Add dynamo env CLI option to provide information about user environment by @nv-tusharma in #533
- docs: add disagg tuning guide by @tedzhouhk in #413
- fix: let dynamo run pass --help to dynamo-run by @ziqif-nv in #547
- chore: Update TRTLLM version. Fix router. by @tanmayv25 in #527
- fix: unify and enable dynamo logging by @ishandhanani in #520
- feat(dynamo-run): Basic routing choice by @grahamking in #524
- fix: clean unused bento pieces from serve.py and serving.py by @ishandhanani in #532
- docs: update close-deployment in dynamo_serve.md by @tlipoca9 in #535
- feat: update operator README by @julienmancuso in #544
- fix: mypy error by @ishandhanani in #543
- feat: cleanup operator code by @julienmancuso in #529
- chore: Fixed file headers. Added attributions. by @dmitry-tokarev-nv in #530
- fix: Remove api-server code by @mohammedabdulwahhab in #526
- docs: hello world and vllm process docs by @ishandhanani in #525
- feat: KV recorder for dumping router events into a jsonl by @PeaBrane in #505
- chore: cleaner required workers check (don't spam print) by @PeaBrane in #521
- docs: dynamo-run clarify engine list by @grahamking in #522
- chore: Upgrade Rust to 1.86 by @grahamking in #518
- chore: Add devops in more CODEOWNERS by @grahamking in #512
- feat: Python decorator dynamo_worker takes optional
static
parameter without etcd by @grahamking in #494 - fix: broken link to dynamo run by @lkm2835 in #517
- docs: add 405b disaggregated serving documentation by @ishandhanani in #496
- refactor: migrate engines to standalone crates by @ryanolson in #453
- feat: Add TensorRT-LLM example for dynamo serve/run by @tanmayv25 in #456
- docs: Remove invalid link by @grahamking in #506
- docs: add instruction to copy dynamo-run in container setup by @hanweisen in #508
- chore: Add libclang-dev to CI for llamacpp by @grahamking in #507
- chore: rename duration to timeout by @tlipoca9 in #503
- fix: adding missing file by @ryanolson in #501
- feat: allow replicas to be set in DynamoDeployment CR by @julienmancuso in #486
- chore: Disable blank issue creation for default issues template by @nv-tusharma in #492
- chore: Remove <> from title + add labels for default issues template. by @nv-tusharma in #491
- feat: Sets the code of conduct for the repository by @saturley-hall in #454
- fix: Consolidate dynamo start and dynamo serve commands by @mohammedabdulwahhab in #405
- feat: improve serve commands and expose
DYNAMO_HOME
env var by @jon-chuang in #436 - feat: kv aware router executable by @ryanolson in #399
- feat: deploy and use buildkit to build dynamo images by @julienmancuso in #450
- feat(serve): Enhance multi-node deployment and worker configuration by @ishandhanani in #457
- chore: Add default issue template for bug & feature requests by @nv-tusharma in #471
- feat: unified logging by @ryanolson in #472
- feat: add devcontainer to dynamo for Ubuntu 24.04 use by @h...
Dynamo Release v0.1.0
Dynamo v0.1.0 version will be released following Jensen Huang’s GTC keynote, and the product will be hosted on github.com/ai-dynamo. It’s an open source project with Apache 2 license, and public continuous integration will be available from the start to enable industry-wide collaboration. The primary distribution will be through pip wheels with minimal binary size. The ai-dynamo github org will host 2 repos: dynamo and NIXL.
Initial Dynamo release features:
- Disaggregated serving with X prefill and Y decode nodes
- KV aware routing
- KV cache manager to offload KV cache to system memory
- NIXL support for RDMA (InfiniBand, Ethernet) and TCP
- Support for K8s deployment
As a vendor-agnostic serving framework, Dynamo supports multiple LLM inference engines including TRT-LLM, vLLM, and SGLang at launch, with varying degrees of maturity and support. Dynamo supports the vLLM engine with all the capabilities mentioned above, with a plan to achieve feature parity with the rest of inference engines as soon as possible.
Future plans
The next release of Dynamo plans to open-source the KV cache manager as a standalone repository under the ai-dynamo organization. This release will provide functionality for storing and evicting KV cache across multiple memory tiers, including GPU, system memory, local SSD, and object storage.
In that release, we will include an early version of the Dynamo Planner, another core component. This initial release will feature heuristic-based dynamic allocation of GPU workers between prefill and decode tasks, as well as model and fleet configuration adjustments based on user traffic patterns. Our vision is to evolve the Planner into a reinforcement learning platform, which will allow users to define objectives and then tune and optimize performance policies automatically based on system feedback.
Dynamo is designed as the ideal next generation inference server, building upon the foundations of the Triton Inference Server. While Triton focuses on single-node inference deployments, we are committed to integrating its robust single-node capabilities into Dynamo within the next several months. We will maintain ongoing support for Triton while ensuring a seamless migration path for existing users to Dynamo once feature parity is achieved.