Change the repository type filter
All
Repositories list
13 repositories
ScaleLLM
PublicA high-performance inference system for large language models, designed for production environments.flux
Publicwhl
Public3FS
Publicflashinfer
PublicFlashMLA
Publicvcpkg
Publicdiscussions
Publicflash-attention
Publictokenizers
Publicxformers
PublicFasterTransformer
PublicByteTransformer
Publicoptimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052