- Shatin, NT, Hong Kong
- www.zhihan-jiang.com
- @zhjiang22
Highlights
- Pro
Starred repositories
[ICLR'25] OpenRCA: Can Large Language Models Locate the Root Cause of Software Failures?
[ACL2024] Planning, Creation, Usage: Benchmarking LLMs for Comprehensive Tool Utilization in Real-World Complex Scenarios
CAShift: Benchmarking Log-Based Cloud Attack Detection under Normality Shift (FSE 2025)
The official Python SDK for Model Context Protocol servers and clients
Model Context Protocol Servers
Simulator for the datacenter, including power, cooling, server and other components
The official Python SDK for Codellm-Devkit
This repository manifests set which is made to build a prototype system of TraceZip, made by 4 pieces.
A lightweight, powerful framework for multi-agent workflows
📄 🇨🇳 📃 论文阅读笔记(分布式系统、虚拟化、机器学习)Papers Notebook (Distributed System, Virtualization, Machine Learning)
Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation
Code for the embedding and reranker models, as well for evaluation from the paper "Stack Trace Deduplication: Faster, More Accurately, and in More Realistic Scenarios".
Your 24/7 On-Call AI Agent - Solve Alerts Faster with Automatic Correlations, Investigations, and More
An LLM Based Diagnosis System (https://arxiv.org/pdf/2312.01454.pdf)
Artifact for "Apparate: Rethinking Early Exits to Tame Latency-Throughput Tensions in ML Serving" [SOSP '24]
This repository contains all the code used for the experimental analysis of the paper: The Importance of Workload Choice in Evaluating LLM Inference Systems.
PyTorch implementation of paper "Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline".
[OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable
From Task-based to Instruction-based Automated Log Analysis
Source Code for ISSRE-24 paper "Demystifying and Extracting Fault-indicating Information from Logs for Failure Diagnosis".
SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 16+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.
FastAPI framework, high performance, easy to learn, fast to code, ready for production
Disaggregated serving system for Large Language Models (LLMs).
Predict the performance of LLM inference services