8000 feat: multi-thread (via asyncio.task) in processor by tedzhouhk · Pull Request #904 · ai-dynamo/dynamo · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

feat: multi-thread (via asyncio.task) in processor #904

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your accou 8000 nt

Merged
merged 4 commits into from
May 5, 2025

Conversation

tedzhouhk
Copy link
Contributor
@tedzhouhk tedzhouhk commented Apr 29, 2025

Implement an asyncio tasks + queue architecture in processor.py to improve tokenization perf at high load. This partially solves #873

  • Theoretical dp4 throughput (raw vllm serve x4): 1429.44
  • Throughput without multi-thread: 1142.65 +- 71.38
  • Throughput with this PR: 1374.49 +- 9.63 (20.3% improv)

Note that by eyeballing htop, we're still GIL-bounded. Not sure if the vllm tokenization process release GIL or not.

Another benefit of the queue architecture is that it naturally enables queuing request at processor instead of engine.

Copy link
copy-pr-bot bot commented Apr 29, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Copy link
Contributor
@rmccorm4 rmccorm4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally I'm not sold on the approach of this change as described here: https://github.com/ai-dynamo/dynamo/pull/904/files#r2069528912

But I don't want to block you from improving the performance if you deem this strategy necessary over just increasing the number of processor replicas.

@tedzhouhk tedzhouhk marked this pull request as draft May 2, 2025 00:48
@tedzhouhk tedzhouhk changed the title feat: multi-thread in processor feat: multi-thread (via asyncio.task) in processor May 2, 2025
@tedzhouhk tedzhouhk force-pushed the hzhou/multithread-proc branch from 288915f to 6f8fdf0 Compare May 2, 2025 01:13
@tedzhouhk tedzhouhk assigned tedzhouhk and unassigned tedzhouhk May 2, 2025
@tedzhouhk tedzhouhk marked this pull request as ready for review May 2, 2025 01:14
@tedzhouhk
Copy link
Contributor Author

Set to auto-merge. Note that multi-process/thread/asycio-task are all temporary solution before we refactor processor in Rust.

@tedzhouhk tedzhouhk merged commit e0cd848 into main May 5, 2025
6 checks passed
@tedzhouhk tedzhouhk deleted the hzhou/multithread-proc branch May 5, 2025 20:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0