-
UC Berkeley
- San Francisco Bay Area
-
22:16
(UTC -07:00) - https://zhuohan.li
- @zhuohan123
- in/zhuohan-li
Stars
A PyTorch native platform for training generative AI models
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
A program to read, merge, and write programs for the Breville Control °Freak®
Tensors and Dynamic neural networks in Python with strong GPU acceleration
The Startup CTO's Handbook, a book covering leadership, management and technical topics for leaders of software engineering teams
A high-performance distributed file system designed to address the challenges of AI training and inference workloads.
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
A throughput-oriented high-performance serving framework for LLMs
Dynamic Memory Management for Serving LLMs without PagedAttention
A framework for few-shot evaluation of language models.
A fast communication-overlapping library for tensor/expert parallelism on GPUs.
HabanaAI / vllm-fork
Forked from vllm-project/vllmA high-throughput and memory-efficient inference and serving engine for LLMs
Tile primitives for speedy kernels
A visual no-code/code-free web crawler/spider易采集:一个可视化浏览器自动化测试/数据采集/爬虫软件,可以无代码图形化的设计和执行爬虫任务。别名:ServiceWrapper面向Web应用的智能化服务封装系统。
A ChatGPT(GPT-3.5) & GPT-4 Workload Trace to Optimize LLM Serving Systems
Arena-Hard-Auto: An automatic LLM benchmark.
DSPy: The framework for programming—not prompting—language models
A parallel framework for training deep neural networks
[ICML 2024] CLLMs: Consistency Large Language Models