-
Tsinghua University & Shanghai AI Lab
- Shanghai
-
22:07
(UTC +08:00)
Highlights
- Pro
Lists (1)
Sort Name ascending (A-Z)
Stars
A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)
Ephibbs / big-tau
Forked from sierra-research/tau-benchCode and Data for an expanded Tau-Bench with training and test sets in a variety of domains
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
Muon is an optimizer for hidden layers in neural networks
MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone
🍒 Cherry Studio is a desktop client that supports for multiple LLM providers.
Implementations for a family of attention mechanisms, suitable for all kinds of natural language processing tasks and compatible with TensorFlow 2.0 and Keras.
⚓️ Repository for the "Thought Anchors: Which LLM Reasoning Steps Matter?" paper.
The Triton TensorRT-LLM Backend
[ICLR'24 spotlight] An open platform for training, serving, and evaluating large language model for tool learning.
(ACL 2025) Divide-Then-Aggregate: An Efficient Tool Learning Method via Parallel Tool Invocation
Search-R1: An Efficient, Scalable RL Training Framework for Reasoning & Search Engine Calling interleaved LLM based on veRL
A library for generating difficulty-scalable, multi-tool, and verifiable agentic tasks with execution trajectories.
Implementation for OAgents: An Empirical Study of Building Effective Agents
The official Python SDK for Model Context Protocol servers and clients
Expose your FastAPI endpoints as Model Context Protocol (MCP) tools, with Auth!
An Open-source RL System from ByteDance Seed and Tsinghua AIR
A MemAgent framework that can be extrapolated to 3.5M, along with a training framework for RL training of any agent workflow.
open-source coding LLM for software engineering tasks
[NeurIPS 2024 D&B Track] GTA: A Benchmark for General Tool Agents
The data and code for paper: "SOPBench: Evaluating Language Agents at Following Standard Operating Procedures and Constraints"
EvaLearn is a pioneering benchmark designed to evaluate large language models (LLMs) on their learning capability and efficiency in challenging tasks.
Evaluating Conversational Agents in a Dual-Control Environment
Holistic Evaluation of Language Models (HELM) is an open source Python framework created by the Center for Research on Foundation Models (CRFM) at Stanford for holistic, reproducible and transparen…
Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster by executing routine tasks, explaining complex code, and handling git workflo…
DeerFlow is a community-driven Deep Research framework, combining language models with tools like web search, crawling, and Python execution, while contributing back to the open-source community.