- Beijing
-
22:03
(UTC +08:00) - https://scholar.google.com/citations?hl=zh-CN&user=MBR97ZIAAAAJ
-
sglang Public
Forked from sgl-project/sglangSGLang is a fast serving framework for large language models and vision language models.
-
QQQ Public
QQQ is an innovative and hardware-optimized W4A8 quantization solution for LLMs.
-
DeepGEMM Public
Forked from deepseek-ai/DeepGEMMDeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
Cuda MIT License UpdatedFeb 28, 2025 -
compressed-tensors Public
Forked from neuralmagic/compressed-tensorsA safetensors extension to efficiently store sparse quantized tensors on disk
10000 Python Apache License 2.0 UpdatedFeb 20, 2025 -
llm-compressor Public
Forked from vllm-project/llm-compressorTransformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
Python Apache License 2.0 UpdatedFeb 19, 2025 -
-
ao Public
Forked from pytorch/aoPyTorch native quantization and sparsity for training and inference
Python BSD 3-Clause "New" or "Revised" License UpdatedNov 14, 2024 -
transformers Public
Forked from huggingface/transformers🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Python Apache License 2.0 UpdatedSep 5, 2024 -
lmdeploy Public
Forked from InternLM/lmdeployLMDeploy is a toolkit for compressing, deploying, and serving LLMs.
Python Apache License 2.0 UpdatedAug 29, 2024 -
vllm Public
Forked from vllm-project/vllmA high-throughput and memory-efficient inference and serving engine for LLMs
-
marlin Public
Forked from IST-DASLab/marlinFP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
-
smoothquant Public
Forked from mit-han-lab/smoothquant[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
Python MIT License UpdatedNov 29, 2023 -
Megatron-DeepSpeed Public
Forked from deepspeedai/Megatron-DeepSpeedOngoing research training transformer language models at scale, including: BERT & GPT-2
Python Other UpdatedAug 28, 2023 -
-
-
-
-
-
lightseq Public
Forked from bytedance/lightseqLightSeq: A High Performance Library for Sequence Processing and Generation
C++ Other UpdatedDec 30, 2022 -
-
academicpages.github.io Public
Forked from academicpages/academicpages.github.ioGithub Pages template for academic personal websites, forked from mmistakes/minimal-mistakes
JavaScript MIT License UpdatedJan 5, 2022 -
-
-
NN-CUDA-Example Public template
Forked from godweiyang/NN-CUDA-ExampleSeveral simple examples for popular neural network toolkits calling custom CUDA operators.
Python Apache License 2.0 UpdatedApr 29, 2021 -
NLP-Tutorials Public
Forked from MorvanZhou/NLP-TutorialsSimple implementations of NLP models. Tutorials are written in Chinese on my website https://mofanpy.com
Python MIT License UpdatedMar 7, 2021 -
-
soln-ml Public
Forked from thomas-young-2013/mindwareA research framework for fast prototyping of automl algorithms.
Python MIT License UpdatedJan 1, 2021 -
-
-