-
Tencent
- Beijing
Stars
PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation
Tutel MoE: Optimized Mixture-of-Experts Library, Support DeepSeek FP8/FP4
MSCCL++: A GPU-driven communication stack for scalable AI applications
A throughput-oriented high-performance serving framework for LLMs
Synchronization and asynchronous computation package for Go
[ICLR2025 Spotlight🔥] Official Implementation of TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters
Borgo is a statically typed language that compiles to Go.
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
GLake: optimizing GPU memory management and IO transmission.
A fast inference library for running LLMs locally on modern consumer-class GPUs
Ring attention implementation with flash attention
Implementation of MagViT2 Tokenizer in Pytorch
Chat凉宫春日, An open sourced Role-Playing chatbot Cheng Li, Ziang Leng, and others.
XVERSE-13B: A multilingual large language model developed by XVERSE Technology Inc.
Development repository for the Triton language and compiler
Hackable and optimized Transformers building blocks, supporting a composable construction.
Implementation of a Transformer, but completely in Triton
Unsupervised text tokenizer focused on computational efficiency
Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackable.
Efficient cache for gigabytes of data written in Go.