-
USTC
- HeFei, China
- https://guopeng-gpli.github.io/
Highlights
- Pro
Stars
Framework for AI on mobile devices and wearables, hardware-aware C/C++ backend, with wrappers for Kotlin, Java, Swift, React, Flutter.
A high-performance distributed file system designed to address the challenges of AI training and inference workloads.
A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations
FlashMLA: Efficient MLA decoding kernels
TensorZero creates a feedback loop for optimizing LLM applications — turning production data into smarter, faster, and cheaper models.
Medusa: Accelerating Serverless LLM Inference with Materialization [ASPLOS'25]
An GPU/CUDA implementation of the Hungarian algorithm
Beginner-friendly serverless LLM deployment with Replicate & fly.io
Caribou is a framework for geo-distributed deployment of serverless workflows to save carbon emissions.
ustc thesis proposal 中国科学技术大学 开题报告 latex 模板
Code for reproducing results for SOSP paper Bagpipe
Efficient and easy multi-instance LLM serving
📚A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, FlashAttention, PagedAttention, Parallelism, MLA, etc.
Gorilla: Training and Evaluating LLMs for Function Calls (Tool Calls)
基于pytorch的中文意图识别和槽位填充
Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting yo…
BERT-based intent and slots detector for chatbots.
A ChatGPT(GPT-3.5) & GPT-4 Workload Trace to Optimize LLM Serving Systems
a curated list of high-quality papers on resource-efficient LLMs 🌱
Serverless LLM Serving for Everyone.
Large Language Model (LLM) Systems Paper List
A curated list for Efficient Large Language Models
Semantic Kernel (SK) is a lightweight SDK enabling integration of AI Large Language Models (LLMs) with conventional programming languages.
🚀 Docker 镜像代理,通过 GitHub Actions 将 docker.io、gcr.io、registry.k8s.io、k8s.gcr.io、quay.io、ghcr.io 等国外镜像转换为国内镜像加速下载
Secure Transformer Inference is a protocol for serving Transformer-based models securely.
Integrate cutting-edge LLM technology quickly and easily into your apps