-
283 production
- Shanghai, China
-
08:12
(UTC +08:00) - ocss.lin@gmail.com
- https://junronglin.com
Stars
DeepEP: an efficient expert-parallel communication library
Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.
Allow torch tensor memory to be released and resumed later
A Datacenter Scale Distributed Inference Serving Framework
A list of awesome compiler projects and papers for tensor computation and deep learning.
Scalable RL solution for advanced reasoning of language models
My learning notes/codes for ML SYS.
Python tool for converting files and office documents to Markdown.
NVR with realtime local object detection for IP cameras
Composable building blocks to build Llama Apps
Fair-code workflow automation platform with native AI capabilities. Combine visual building with custom code, self-host or cloud, 400+ integrations.
SGLang is a fast serving framework for large language models and vision language models.
📚A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, FlashAttention, PagedAttention, Parallelism, MLA, etc.
Langflow is a powerful tool for building and deploying AI-powered agents and workflows.
The easiest way to run WireGuard VPN + Web-based Admin UI.
A guidance language for controlling large language models.
VPS 融合怪服务器测评项目 更推荐使用无环境依赖的Go版本 VPS Fusion Monster Server Test Script – More recommended to use the Go version with no environment dependencies: https://github.com/oneclickvirt/ecs
⛅️ 精选的 Cloudflare 工具、开源项目、指南、博客和其他资源列表。/ ⛅️ A curated list of Cloudflare tools, open source projects, guides, blogs and other resources.
Calculate token/s & GPU memory requirement for any LLM. Supports llama.cpp/ggml/bnb/QLoRA quantization
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorR…
FastAPI Best Practices and Conventions we used at our startup
Ongoing research training transformer models at scale
💯 Curated coding interview preparation materials for busy software engineers
A high-throughput and memory-efficient inference and serving engine for LLMs
🐫 CAMEL: The first and the best multi-agent framework. Finding the Scaling Law of Agents. https://www.camel-ai.org 32D3