-
NetEase
- HangZhou
-
16:22
(UTC -12:00)
Lists (3)
Sort Name ascending (A-Z)
Starred repositories
Implement a ChatGPT-like LLM in PyTorch from scratch, step by step
Curated collection of papers in machine learning systems
Ditto is an open-source framework that enables direct conversion of HuggingFace PreTrainedModels into TensorRT-LLM engines.
C++ functions matching the interface and behavior of python string methods with std::string
Writing AI Conference Papers: A Handbook for Beginners
An interactive NVIDIA-GPU process viewer and beyond, the one-stop solution for GPU process management.
A beautiful stack trace pretty printer for C++
A bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training.
Andres6936 / Flossy
Forked from ongbe/flossyString Formatting Library for C++17
A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations
Finetune Qwen3, Llama 4, TTS, DeepSeek-R1 & Gemma 3 LLMs 2x faster with 70% less memory! 🦥
Implementation of std::experimental::any, including small object optimization, for C++11 compilers
Serialization library written in C++17 - Pack C++ structs into a compact byte-array without any macros or boilerplate code
Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
cudnn_frontend provides a c++ wrapper for the cudnn backend API and samples on how to use it
Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs
SGLang is a fast serving framework for large language models and vision language models.
A throughput-oriented high-performance serving framework for LLMs
Recipes for shrinking, optimizing, customizing cutting edge vision models. 💜
Universal cross-platform tokenizers binding to HF and sentencepiece
Higher performance OpenAI LLM service than vLLM serve: A pure C++ high-performance OpenAI LLM service implemented with GPRS+TensorRT-LLM+Tokenizers.cpp, supporting chat and function call, AI agents…
Code and information for face image quality assessment with SER-FIQ
A high-performance inference system for large language models, designed for production environments.