Dynamo Release v0.3.1

@tedzhouhk

Dynamo is an open source project under the Apache 2.0 license. The primary distribution is done through pip wheels with minimal binary size. The ai-dynamo GitHub organization hosts two repositories: Dynamo and NIXL. Dynamo is designed as the next-generation inference server, building upon the foundation of NVIDIA® Triton Inference Server™. While Triton focuses on single-node inference deployments, we're integrating its robust capabilities into Dynamo over the next several months. We'll maintain support for Triton while providing a clear migration path for existing users once Dynamo achieves feature parity.

As a vendor-neutral serving framework, Dynamo supports multiple large language model (LLM) inference engines to varying degrees:

NVIDIA TensorRT-LLM
vLLM
SGLang

Dynamo v0.3.1 features:

Functional DeepSeek R1 disaggregated serving with wide EP using SGLang
Functional EPD disaggregation with video model (Llava video 7B)
Proof of concept inference gateway support
Prebuilt Dynamo + vLLM container
- We plan to release these pre-built containers in the coming days
Amazon Linux support

Future plans
Dynamo Roadmap

Known Issues

KVBM is supported only with python 3.12

What's Changed

🚀 Features & Improvements

feat: expose estimated kv cache hit in dynamo-run by @tedzhouhk in #1246
feat: KVBM async Python bindings and Layer class by @kthui in #1141
feat: add critical task execution handle by @ryanolson in #1268
feat: Initial Granite support by @grahamking in #1271
feat: Restructure kv manager block registration by @jthomson04 in #1093
feat: Publish events and metrics when using kv routing by @tanmayv25 in #1262
feat(dynamo-run): Use llama.cpp as the default engine for GGUF by @grahamking in #1276
feat: populate default image name by @biswapanda in #1255
feat: flatten out dynamo cloud helm chart by @julienmancuso in #1258
refactor: Refactor kv event publishers by @jthomson04 in #1287
refactor: rename KvMetricsPublisher to WorkerMetricsPublisher by @alec-flowers in #1284
feat: all blocks cleared event by @jain-ria in #1279
perf: Create default sampling params only once during initialization by @krishung5 in #1294
feat: expose router configurations to dynamo-run by @tedzhouhk in #1259
feat: Make llama.cpp Gnu OpenMP dependency optional by @grahamking in #1331
feat: set env variables in Dynamo deployments from secrets by @hhzhang16 in #1325
feat: Add DSR1 configurations by @ptarasiewiczNV in #1298
feat: add more metrics to rust frontend by @tedzhouhk in #1315
feat: Enable disagg support in trtllm standalone script by @tanmayv25 in #1355
feat: Integrate KVBM with CriticalTaskHandle by @jthomson04 in #1321
feat: add implementation for embeddings by @t-ob in #1290
feat: refactor docker registry secret management in operator by @julienmancuso in #1337
feat: set model specific prompt templates in the multimodal config files, add documentation for multimodal example deployment by @hhzhang16 in #1366
feat: add result of fluid experiment by @julienmancuso in #1379
feat: Update container with better EFA/RDMA support by @aranadive in #1333
feat: Support larger Gemma 3 models by @grahamking in #1359
refactor: Rename CompletionRequest to NvCreateCompletionRequest by @paulhendricks in #1383
feat: decouple bento dependency by @biswapanda in #1266
feat: data synthesizer based on prefix statistics by @PeaBrane in #1087
feat: introduce abstract classes to dynamo services by @mohammedabdulwahhab in #924
feat: KVBM dynamo runtime + event manger by @oandreeva-nv in #1195
feat: Utilities for distributed leader-worker barriers by @jthomson04 in #1429
feat: Restructure the KVBM WriteTo trait by @jthomson04 in #1363
feat: KVBM prometheus monitoring by @jthomson04 in #1211
feat: Improved offload queueing and block eviction ordering by @jthomson04 in #1425
feat: generate random texts from hashes using lorem ipsum by @PeaBrane in #1458
refactor: use comment filed in annotated to pass metric-related information by @tedzhouhk in #1385
feat: generalize VLM embedding extraction by @hhzhang16 in #1388
refactor: move kv store to runtime by @ryanolson in #1459
feat: add endpoint to clear all kv blocks in vllm v1 by @jain-ria in #1384
feat: Video support with Dynamo by @indrajit96 in #1443
feat: add build --push command by @hhzhang16 in #1485
feat: FT downed worker instance tracking and skipping by @kthui in #1424
feat: add dynamo pipeline example using inf-gw by @biswapanda in #1512
refactor: Log subprocess stderr as WARN (#1563) by @rmccorm4 in #1574

🐛 Bug Fixes

fix: cherry-pick of attributions from 0.2.1 release branch by @saturley-hall in #1267
fix: resolve local dev container build issues by @t-ob in #1269
fix: Renamed event publisher classes and configuration by @alec-flowers in #1273
fix: Only check model name on etcd-registered endpoints by @jthomson04 in #1263
fix: Fix mypy errors on trtllm examples by @tanmayv25 in #1277
fix: remove sglang hash for pyproject by @ishandhanani in #1281
fix: copy workspace as part of ci-min stage by @nv-anants in #1291
fix: resources naming by @biswapanda in #1302
fix: wait until probing on vllm examples to prevent timeouts by @mohammedabdulwahhab in #1293
fix: Fix vllm v0 None*int error when not using kv aware router by @tedzhouhk in #1304
fix: Update breaking change to enable_overlap_scheduler field from TRTLLM commit b4e5df0e by @rmccorm4 in #1310
fix: make imagePullSecrets optional when installing dynamo cloud by @julienmancuso in #1324
fix: Properly set VLLM_NIXL_SIDE_CHANNEL_HOST in multi-node by @ptarasiewiczNV in #1327
fix: Allow building only llamacpp or only mistralrs engine. by @grahamking in #1328
fix: allow custom annotations in api-store service by @julienmancuso in #1329
fix: Flatten pytorch_backend_config section to address breaking change to trtllm config by @rmccorm4 in #1326
fix: update profile script by @tedzhouhk in #1336
fix: Use min of max tokens or context length by @abrarshivani in #1322
fix: add ingress to llm example by @hhzhang16 in #1349
fix(dynamo-run): For internal comms use a random endpoint instead of hard coded by @grahamking in #1335
fix: dockerhub registry issues in dynamo operator by @mohammedabdulwahhab in #1350
fix: add speculative decoding config to dynamo serve + trtllm by @richardhuo-nv in #1356
fix: prefillqueue stream name in load-planner by @tedzhouhk in #1377
fix: take into account number of workers from config by @julienmancuso in #1365
fix: Fix link path for dynamo_run doc by @krishung5 in #1382
fix: fix dynamo cloud helm chart by @julienmancuso in #1376
fix: mismatch GAP and PA version by @tedzhouhk in #1386
fix: remove unused arg in planner by @tedzhouhk in #1390
fix: Use Rust Ingress (dynamo-run) for the Frontend by @tanmayv25 in #1391
fix: enable block manager feature only for py3.12 build by @nv-anants in #1393
fix: small qol improvements to devcontainer by @alec-flowers in #1427
fix: remove unused bentoml references by @biswapanda in #1412
fix: Fix planner dependency import when running dynamo CLI by @rmccorm4 in #1416
fix: add blocking mode for k8s connector in planner by @julienmancuso in #1446
fix: Fix flaky test by @jthomson04 in #1466
fix: Python respects DYN_LOG too by @alec-flowers in #1486
fix: dynamo-run change python subprocess from debug to info by @alec-flowers in #1484
fix: remove LLMMetricAnnotation from response stream by @tedzhouhk in #1499
fix: Fix NATS_SERVER value, add details on customizing MOUNTS by @rmccorm4 in #1520
fix: Improve dynamo.connect Error Reporting by @whoisj in #1524
fix: enable GCP deployments by @julienmancuso in #1474
fix: remove lib.real from LD_LIBRARY_PATH (#1546) by @tanmayv25 in #1547
fix: update nixl build and keep wheels dir in vllm container (#1544) by @nv-anants in #1551
fix: cleanup allocator (#1536) by @biswapanda in #1554
fix: Handle model not found error for multimodal example (#1545) by @krishung5 in #1558
fix: Fix NIXL 0.3.1 build (#1561) by @jthomson04 in #1571
fix: Fix message truncation in disagg flow by @tanmayv25 in #1573
fix: Fix sample disagg config for trtllm standalone by @tanmayv25 in #1576

📚 Documentation

docs: Update Multimodal Example README by @whoisj in #1275
docs: Updated planner link by @oandreeva-nv in #1308
docs: Add README for Connect Library by @whoisj in #1303
docs: Add documentation for verbosity flag in dynamo-run by @paulhendricks in #1353
docs: fix sphinx errors admonitions adobe config by @kmkelle-nv in #1179
docs: Add docs.nvidia.com userguides link by @statiraju in #1378
docs: add aggregated example turning on MTP with DeepSeek R1 by @richardhuo-nv in #1421
docs: Reference Deepseek R1 configs in TRTLLM README by @rmccorm4 in #1414
docs: add image to front page readme by @faradawn in #1320
docs: Add example etcd/nats commands to the container banner by @rmccorm4 in #1423
docs: add message to guide users to the stable version by @richardhuo-nv in #1457
docs: MTP + TensorRT LLM + DS R1 disaggregated example by @richardhuo-nv in #1473
docs: Add note about ignore_eos for MTP by @rmccorm4 in #1475
docs: Benchmarking guide interpreting results by @kthui in #701
docs: DIS-133 and DIS-134 plus copyediting by @kmkelle-nv in #1439
docs: Fix Markdown Render Error by @whoisj in #1502
docs: add concurrency choice to the perf.sh by @richardhuo-nv in #1497
docs: fix the README link to the perf.sh file by @richardhuo-nv in #1501
docs: Update main readme with Deepwiki and new documentation and examples links by @harryskim in #1510
docs: Add multi-node TRTLLM worker example (Deepseek R1) by @rmccorm4 in #1511
docs: fix DIS-133 and NvB 5322259 by @kmkelle-nv in #1518
docs: Cleanup & Standardize Guides by @whoisj in #1357
docs: add trouble shooting section in benchmarking guide. Add known pitfall by @GuanLuo in #1503
docs: Add GitHub Pages deployment to dynamo.github.io for release branches by @nvda-mesharma in #1527
docs: add docs and example for inference gateway deployment (#1533) by @biswapanda in #1555

🛠️ Build, CI and Test

test: basic end to end by @nnshah1 in #1339
test: Add dynamo serve TRTLLM example to pytest by @tanmayv25 in #1417
ci: Add args to run.sh by @pvijayakrish in #1418
ci: add github workflow to close stale issues (bugs) and PRs by @nv-anants in #1450
ci: Add time delays for server start up. by @pvijayakrish in #1452
build: enable vllm runtime container as default container for ci pipelines by @nv-tusharma in #1451
test: add tests for kv_router::scheduler by @ezhoureal in #1491

New Contributors

@jain-ria made their first contribution in #1279
@paulhendricks made their first contribution in #1353
@kmkelle-nv made their first contribution in #1179
@indrajit96 made their first contribution in #1443

Full Changelog: v0.3.0...v0.3.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!