feat: multi-thread (via asyncio.task) in processor #904

tedzhouhk · 2025-04-29T22:46:01Z

Implement an asyncio tasks + queue architecture in processor.py to improve tokenization perf at high load. This partially solves #873

Theoretical dp4 throughput (raw vllm serve x4): 1429.44
Throughput without multi-thread: 1142.65 +- 71.38
Throughput with this PR: 1374.49 +- 9.63 (20.3% improv)

Note that by eyeballing htop, we're still GIL-bounded. Not sure if the vllm tokenization process release GIL or not.

Another benefit of the queue architecture is that it naturally enables queuing request at processor instead of engine.

copy-pr-bot · 2025-04-29T22:46:04Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

examples/llm/configs/agg.yaml

examples/llm/components/processor.py

rmccorm4

Generally I'm not sold on the approach of this change as described here: https://github.com/ai-dynamo/dynamo/pull/904/files#r2069528912

But I don't want to block you from improving the performance if you deem this strategy necessary over just increasing the number of processor replicas.

…/multithread-proc

tedzhouhk · 2025-05-05T19:10:01Z

Set to auto-merge. Note that multi-process/thread/asycio-task are all temporary solution before we refactor processor in Rust.

multi-thread for processor

6c0cccf

pull-request-size bot added the size/L label Apr 29, 2025

rmccorm4 reviewed Apr 29, 2025

View reviewed changes

examples/llm/configs/agg.yaml Show resolved Hide resolved

rmccorm4 reviewed Apr 29, 2025

View reviewed changes

examples/llm/components/processor.py Show resolved Hide resolved

add annotation for mypy

6f8fdf0

rmccorm4 approved these changes Apr 30, 2025

View reviewed changes

tedzhouhk marked this pull request as draft May 2, 2025 00:48

tedzhouhk changed the title ~~feat: multi-thread in processor~~ feat: multi-thread (via asyncio.task) in processor May 2, 2025

tedzhouhk force-pushed the hzhou/multithread-proc branch from 288915f to 6f8fdf0 Compare May 2, 2025 01:13

tedzhouhk assigned tedzhouhk and unassigned tedzhouhk May 2, 2025

tedzhouhk marked this pull request as ready for review May 2, 2025 01:14

tedzhouhk added 2 commits May 5, 2025 12:06

Merge branch 'main' of https://github.com/ai-dynamo/dynamo into hzhou…

180d9c7

…/multithread-proc

pc

6c48f2b

GuanLuo approved these changes May 5, 2025

View reviewed changes

tedzhouhk merged commit e0cd848 into main May 5, 2025
6 checks passed

tedzhouhk deleted the hzhou/multithread-proc branch May 5, 2025 20:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: multi-thread (via asyncio.task) in processor #904

feat: multi-thread (via asyncio.task) in processor #904

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

feat: multi-thread (via asyncio.task) in processor #904

feat: multi-thread (via asyncio.task) in processor #904

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!