Dynamo is an open source project under the Apache 2.0 license. The primary distribution is done through pip wheels with minimal binary size. The ai-dynamo GitHub organization hosts two repositories: Dynamo and NIXL. Dynamo is designed as the next-generation inference server, building upon the foundation of NVIDIA® Triton Inference Server™. While Triton focuses on single-node inference deployments, we're integrating its robust capabilities into Dynamo over the next several months. We'll maintain support for Triton while providing a clear migration path for existing users once Dynamo achieves feature parity.
As a vendor-neutral serving framework, Dynamo supports multiple large language model (LLM) inference engines to varying degrees:
- NVIDIA TensorRT-LLM
- vLLM
- SGLang
Dynamo v0.3.1 features:
- Functional DeepSeek R1 disaggregated serving with wide EP using SGLang
- Functional EPD disaggregation with video model (Llava video 7B)
- Proof of concept inference gateway support
- Prebuilt Dynamo + vLLM container
- We plan to release these pre-built containers in the coming days
- Amazon Linux support
Future plans
Dynamo Roadmap
Known Issues
- KVBM is supported only with python 3.12
What's Changed
🚀 Features & Improvements
- feat: expose estimated kv cache hit in dynamo-run by @tedzhouhk in #1246
- feat: KVBM async Python bindings and Layer class by @kthui in #1141
- feat: add critical task execution handle by @ryanolson in #1268
- feat: Initial Granite support by @grahamking in #1271
- feat: Restructure kv manager block registration by @jthomson04 in #1093
- feat: Publish events and metrics when using kv routing by @tanmayv25 in #1262
- feat(dynamo-run): Use llama.cpp as the default engine for GGUF by @grahamking in #1276
- feat: populate default image name by @biswapanda in #1255
- feat: flatten out dynamo cloud helm chart by @julienmancuso in #1258
- refactor: Refactor kv event publishers by @jthomson04 in #1287
- refactor: rename KvMetricsPublisher to WorkerMetricsPublisher by @alec-flowers in #1284
- feat: all blocks cleared event by @jain-ria in #1279
- perf: Create default sampling params only once during initialization by @krishung5 in #1294
- feat: expose router configurations to dynamo-run by @tedzhouhk in #1259
- feat: Make llama.cpp Gnu OpenMP dependency optional by @grahamking in #1331
- feat: set env variables in Dynamo deployments from secrets by @hhzhang16 in #1325
- feat: Add DSR1 configurations by @ptarasiewiczNV in #1298
- feat: add more metrics to rust frontend by @tedzhouhk in #1315
- feat: Enable disagg support in trtllm standalone script by @tanmayv25 in #1355
- feat: Integrate KVBM with
CriticalTaskHandle
by @jthomson04 in #1321 - feat: add implementation for embeddings by @t-ob in #1290
- feat: refactor docker registry secret management in operator by @julienmancuso in #1337
- feat: set model specific prompt templates in the multimodal config files, add documentation for multimodal example deployment by @hhzhang16 in #1366
- feat: add result of fluid experiment by @julienmancuso in #1379
- feat: Update container with better EFA/RDMA support by @aranadive in #1333
- feat: Support larger Gemma 3 models by @grahamking in #1359
- refactor: Rename CompletionRequest to NvCreateCompletionRequest by @paulhendricks in #1383
- feat: decouple bento dependency by @biswapanda in #1266
- feat: data synthesizer based on prefix statistics by @PeaBrane in #1087
- feat: introduce abstract classes to dynamo services by @mohammedabdulwahhab in #924
- feat: KVBM dynamo runtime + event manger by @oandreeva-nv in #1195
- feat: Utilities for distributed leader-worker barriers by @jthomson04 in #1429
- feat: Restructure the KVBM WriteTo trait by @jthomson04 in #1363
- feat: KVBM prometheus monitoring by @jthomson04 in #1211
- feat: Improved offload queueing and block eviction ordering by @jthomson04 in #1425
- feat: generate random texts from hashes using lorem ipsum by @PeaBrane in #1458
- refactor: use comment filed in annotated to pass metric-related information by @tedzhouhk in #1385
- feat: generalize VLM embedding extraction by @hhzhang16 in #1388
- refactor: move kv store to runtime by @ryanolson in #1459
- feat: add endpoint to clear all kv blocks in vllm v1 by @jain-ria in #1384
- feat: Video support with Dynamo by @indrajit96 in #1443
- feat: add build --push command by @hhzhang16 in #1485
- feat: FT downed worker instance tracking and skipping by @kthui in #1424
- feat: add dynamo pipeline example using inf-gw by @biswapanda in #1512
- refactor: Log subprocess stderr as WARN (#1563) by @rmccorm4 in #1574
🐛 Bug Fixes
- fix: cherry-pick of attributions from 0.2.1 release branch by @saturley-hall in #1267
- fix: resolve local dev container build issues by @t-ob in #1269
- fix: Renamed event publisher classes and configuration by @alec-flowers in #1273
- fix: Only check model name on etcd-registered endpoints by @jthomson04 in #1263
- fix: Fix mypy errors on trtllm examples by @tanmayv25 in #1277
- fix: remove sglang hash for pyproject by @ishandhanani in #1281
- fix: copy workspace as part of ci-min stage by @nv-anants in #1291
- fix: resources naming by @biswapanda in #1302
- fix: wait until probing on vllm examples to prevent timeouts by @mohammedabdulwahhab in #1293
- fix: Fix vllm v0 None*int error when not using kv aware router by @tedzhouhk in #1304
- fix: Update breaking change to enable_overlap_scheduler field from TRTLLM commit b4e5df0e by @rmccorm4 in #1310
- fix: make imagePullSecrets optional when installing dynamo cloud by @julienmancuso in #1324
- fix: Properly set VLLM_NIXL_SIDE_CHANNEL_HOST in multi-node by @ptarasiewiczNV in #1327
- fix: Allow building only llamacpp or only mistralrs engine. by @grahamking in #1328
- fix: allow custom annotations in api-store service by @julienmancuso in #1329
- fix: Flatten pytorch_backend_config section to address breaking change to trtllm config by @rmccorm4 in #1326
- fix: update profile script by @tedzhouhk in #1336
- fix: Use min of max tokens or context length by @abrarshivani in #1322
- fix: add ingress to llm example by @hhzhang16 in #1349
- fix(dynamo-run): For internal comms use a random endpoint instead of hard coded by @grahamking in #1335
- fix: dockerhub registry issues in dynamo operator by @mohammedabdulwahhab in #1350
- fix: add speculative decoding config to dynamo serve + trtllm by @richardhuo-nv in #1356
- fix: prefillqueue stream name in load-planner by @tedzhouhk in #1377
- fix: take into account number of workers from config by @julienmancuso in #1365
- fix: Fix link path for dynamo_run doc by @krishung5 in #1382
- fix: fix dynamo cloud helm chart by @julienmancuso in #1376
- fix: mismatch GAP and PA version by @tedzhouhk in #1386
- fix: remove unused arg in planner by @tedzhouhk in #1390
- fix: Use Rust Ingress (dynamo-run) for the Frontend by @tanmayv25 in #1391
- fix: enable block manager feature only for py3.12 build by @nv-anants in #1393
- fix: small qol improvements to devcontainer by @alec-flowers in #1427
- fix: remove unused bentoml references by @biswapanda in #1412
- fix: Fix planner dependency import when running dynamo CLI by @rmccorm4 in #1416
- fix: add blocking mode for k8s connector in planner by @julienmancuso in #1446
- fix: Fix flaky test by @jthomson04 in #1466
- fix: Python respects DYN_LOG too by @alec-flowers in #1486
- fix: dynamo-run change python subprocess from debug to info by @alec-flowers in #1484
- fix: remove LLMMetricAnnotation from response stream by @tedzhouhk in #1499
- fix: Fix NATS_SERVER value, add details on customizing MOUNTS by @rmccorm4 in #1520
- fix: Improve dynamo.connect Error Reporting by @whoisj in #1524
- fix: enable GCP deployments by @julienmancuso in #1474
- fix: remove lib.real from LD_LIBRARY_PATH (#1546) by @tanmayv25 in #1547
- fix: update nixl build and keep wheels dir in vllm container (#1544) by @nv-anants in #1551
- fix: cleanup allocator (#1536) by @biswapanda in #1554
- fix: Handle model not found error for multimodal example (#1545) by @krishung5 in #1558
- fix: Fix NIXL 0.3.1 build (#1561) by @jthomson04 in #1571
- fix: Fix message truncation in disagg flow by @tanmayv25 in #1573
- fix: Fix sample disagg config for trtllm standalone by @tanmayv25 in #1576
📚 Documentation
- docs: Update Multimodal Example README by @whoisj in #1275
- docs: Updated planner link by @oandreeva-nv in #1308
- docs: Add README for Connect Library by @whoisj in #1303
- docs: Add documentation for verbosity flag in
dynamo-run
by @paulhendricks in #1353 - docs: fix sphinx errors admonitions adobe config by @kmkelle-nv in #1179
- docs: Add docs.nvidia.com userguides link by @statiraju in #1378
- docs: add aggregated example turning on MTP with DeepSeek R1 by @richardhuo-nv in #1421
- docs: Reference Deepseek R1 configs in TRTLLM README by @rmccorm4 in #1414
- docs: add image to front page readme by @faradawn in #1320
- docs: Add example etcd/nats commands to the container banner by @rmccorm4 in #1423
- docs: add message to guide users to the stable version by @richardhuo-nv in #1457
- docs: MTP + TensorRT LLM + DS R1 disaggregated example by @richardhuo-nv in #1473
- docs: Add note about ignore_eos for MTP by @rmccorm4 in #1475
- docs: Benchmarking guide interpreting results by @kthui in #701
- docs: DIS-133 and DIS-134 plus copyediting by @kmkelle-nv in #1439
- docs: Fix Markdown Render Error by @whoisj in #1502
- docs: add concurrency choice to the perf.sh by @richardhuo-nv in #1497
- docs: fix the README link to the perf.sh file by @richardhuo-nv in #1501
- docs: Update main readme with Deepwiki and new documentation and examples links by @harryskim in #1510
- docs: Add multi-node TRTLLM worker example (Deepseek R1) by @rmccorm4 in #1511
- docs: fix DIS-133 and NvB 5322259 by @kmkelle-nv in #1518
- docs: Cleanup & Standardize Guides by @whoisj in #1357
- docs: add trouble shooting section in benchmarking guide. Add known pitfall by @GuanLuo in #1503
- docs: Add GitHub Pages deployment to dynamo.github.io for release branches by @nvda-mesharma in #1527
- docs: add docs and example for inference gateway deployment (#1533) by @biswapanda in #1555
🛠️ Build, CI and Test
- test: basic end to end by @nnshah1 in #1339
- test: Add dynamo serve TRTLLM example to pytest by @tanmayv25 in #1417
- ci: Add args to run.sh by @pvijayakrish in #1418
- ci: add github workflow to close stale issues (bugs) and PRs by @nv-anants in #1450
- ci: Add time delays for server start up. by @pvijayakrish in #1452
- build: enable vllm runtime container as default container for ci pipelines by @nv-tusharma in #1451
- test: add tests for kv_router::scheduler by @ezhoureal in #1491
New Contributors
- @jain-ria made their first contribution in #1279
- @paulhendricks made their first contribution in #1353
- @kmkelle-nv made their first contribution in #1179
- @indrajit96 made their first contribution in #1443
Full Changelog: v0.3.0...v0.3.1