- Beijing, China
-
16:54
(UTC -12:00)
Stars
SGLang is a fast serving framework for large language models and vision language models.
A self-learning tutorail for CUDA High Performance Programing.
Easy-to-use and powerful LLM and SLM library with awesome model zoo.
FastAPI framework, high performance, easy to learn, fast to code, ready for production
Ongoing research training transformer models at scale
A Datacenter Scale Distributed Inference Serving Framework
AIInfra(AI 基础设施)指AI系统从底层芯片等硬件,到上层软件栈支持AI大模型训练和推理。
A high-throughput and memory-efficient inference and serving engine for LLMs
DeepEP: an efficient expert-parallel communication library
Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation
FlashMLA: Efficient MLA decoding kernels
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
gongshaotian / cutlass
Forked from NVIDIA/cutlassCUDA Templates for Linear Algebra Subroutines
All-in-One Development Tool based on PaddlePaddle
Open deep learning compiler stack for cpu, gpu and specialized accelerators
Tensors and Dynamic neural networks in Python with strong GPU acceleration
gongshaotian / BladeDISC
Forked from alibaba/BladeDISCBladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.
BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.
PaddlePaddle Developer Community
This GitHub Action creates a GitHub contribution calendar on a 3D profile image.
nvidia-modelopt is a unified library of state-of-the-art model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for do…
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
gongshaotian / Paddle
Forked from PaddlePaddle/PaddlePArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
Instant neural graphics primitives: lightning fast NeRF and more
This is a pytorch implementation of method based on Lightweight Multi-View 3D Pose Estimation through Camera-Disentangled Representation applying on human pose estimation tasks using stereo images.