Web-Bench

中文 • Install • Paper • Datasets • LeaderBoard • Citation

📖 Overview

Web-Bench is a benchmark designed to evaluate the performance of LLMs in actual Web development. Web-Bench contains 50 projects, each consisting of 20 tasks with sequential dependencies. The tasks implement project features in sequence, simulating real-world human development workflows. When designing Web-Bench, we aim to cover the foundational elements of Web development: Web Standards and Web Frameworks. Given the scale and complexity of these projects, which were designed by engineers with 5-10 years of experience, each presents a significant challenge. On average, a single project takes 4–8 hours for a senior engineer to complete. On our given benchmark agent (Web-Agent), SOTA (Claude 3.7 Sonnet) achieves only 25.1% Pass@1.

The distribution of the experimental data aligns well with the current code generation capabilities of mainstream LLMs.

HumanEval and MBPP have approached saturation. APPS and EvalPlus are approaching saturation. The SOTA for Web-Bench is 25.1%, which is lower (better) than that of the SWE-bench Full and Verified sets.

🚀 Installation

Install Node.js 22+
Init

git clone https://github.com/bytedance/Web-Bench.git
cd Web-Bench
npm i -g pnpm@9.12.0 @microsoft/rush@5.140.0 playwright@1.49.1
cd projects/angular &&  npx playwright install
rush update
rush build

If you wish to use Docker, refer to Docker Guide.

📘 Usage

Complete Configuration and run:

rush eval

🛠️ Contribution

Project Contribution

📚 Citation

@article{xu2025webbench,
  title={Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks},
  author={Xu, Kai and Mao, YiWei and Guan, XinYi and Feng, ZiLong},
  journal={arXiv preprint arXiv:2505.07473},
  year={2025}
}

📄 License

Apache 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
apps/eval		apps/eval
common		common
docs/assets		docs/assets
libraries		libraries
projects		projects
scripts		scripts
tools		tools
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
.prettierrc.json		.prettierrc.json
LICENSE.md		LICENSE.md
README.md		README.md
README.zh_CN.md		README.zh_CN.md
pnpm-lock.yaml		pnpm-lock.yaml
rush.json		rush.json
start.dockerfile		start.dockerfile

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Web-Bench

📖 Overview

🚀 Installation

📘 Usage

🛠️ Contribution

📚 Citation

📄 License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors 3

Languages

License

bytedance/web-bench

Folders and files

Latest commit

History

Repository files navigation

Web-Bench

📖 Overview

🚀 Installation

📘 Usage

🛠️ Contribution

📚 Citation

📄 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors 3

Languages

Packages