10000 File Limit Request: vLLM - 800 MiB · Issue #6326 · pypi/support · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

File Limit Request: vLLM - 800 MiB #6326

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
3 tasks done
youkaichao opened this issue May 14, 2025 · 15 comments
Open
3 tasks done

File Limit Request: vLLM - 800 MiB #6326

youkaichao opened this issue May 14, 2025 · 15 comments

Comments

@youkaichao
Copy link

Project URL

https://pypi.org/project/vllm/

Does this project already exist?

  • Yes

New Limit

800 MiB

Update issue title

  • I have updated the title.

Which indexes

PyPI

About the project

Running large language models (LLMs) is both resource-intensive and complex, especially as these models scale to hundreds of billions of parameters. That’s where vLLM comes in. Originally built around the innovative PagedAttention algorithm, vLLM has grown into a comprehensive, state-of-the-art inference engine. A thriving community is also continuously adding new features and optimizations to vLLM, including pipeline parallelism, chunked prefill, speculative decoding, and disaggregated serving.

Since its release, vLLM has garnered significant attention, achieving over 46,500 GitHub stars and over 1000 contributors—a testament to its popularity and thriving community.

Reasons for the request

Last year, we requested the size limit to be 400 MiB, see #3792 for more details. Since then, vLLM keeps growing, in terms of both popularity and the models/algorithms it supports. Now it's approaching the limit again. The release https://pypi.org/project/vllm/0.8.5.post1/#files has 326 MiB now.

Recently, when we add support for NVIDIA's blackwell, the wheel size grows to 450 MiB, which limits our release process. We tried to drop the support for older GPUs, but that doesn't help. The binary for the new GPUs dominates in size.

In addition, blackwell GPUs introduce more data types, including FP4, FP6, which means we need more variants of GPU kernels to support the functionality. Therefore, we forsee that the wheel size will continue to grow in the near future.

We kindly request to grow the size limit to 800 MiB, so that vLLM can better serve the community.

Code of Conduct

  • I agree to follow the PSF Code of Conduct
@jeejeelee
Copy link

+1, it would be great to have this!

@WoosukKwon
Copy link
  • 1!

@houseroad
Copy link

+1, many Nvidia arches to support, especially blackwell is in! It's definitely a reasonable ask. :-)

@WangErXiao
Copy link

+1

@edzq
Copy link
edzq commented May 17, 2025

+1! I love vLLM!

@youkaichao
Copy link
Author

@Thespi-Brain can you please help take a look? thanks! 🙏

@youkaichao
Copy link
Author

@Thespi-Brain kindly ping for any update, thanks!

@tlrmchlsmth
Copy link

+1 - This is currently a blocker for vLLM for support NVIDIA GTX 5000-series consumer GPUs

@mgoin
Copy link
mgoin commented May 29, 2025

+1 this would help vLLM serve more users <3

@alew3
Copy link
alew3 commented May 29, 2025

+1 to include blackwell support!

@ywang96
Copy link
ywang96 commented May 29, 2025

+1 - This would be critical for vLLM to ship support for more hardware architectures out of box to make users' life easier!

@Rezzemy
Copy link
Rezzemy commented May 30, 2025

+1

@RodriMora
Copy link

+1 to this.

@youkaichao
Copy link
Author

a gentle nudge @Thespi-Brain @cmaureir 🙏

@pypi pypi locked and limited conversation to collaborators Jun 3, 2025
@ewdurbin
Copy link
Member
ewdurbin commented Jun 3, 2025

Howdy folks, brigading issues like this is not helpful.

I'm going to lock this issue. It will be unlocked when it reaches the top of the queue.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

0