Multi-SWE-bench

👋 Hi, everyone!
We are ByteDance Seed team.

You can get to know us better through the following channels👇

Multi-SWE-bench

This organization contains the source code for Multi-SWE-bench, a multilingual benchmark for evaluating LLMs in real-world code issue resolution. Unlike existing Python-centric benchmarks (e.g., SWE-bench), the benchmark spans 7 languages (i.e., Java, TypeScript, JavaScript, Go, Rust, C, and C++) with 1,632 high-quality instances, curated from 2,456 candidates by 68 expert annotators for reliability.

Use the repositories in this organization to...

Construct Multi-SWE-bench datasets and run local evaluation (multi-swe-bench/multi-swe-bench)
Submit your predictions and evaluation results to be featured on the public leaderboard (multi-swe-bench/experiments)

Multi-SWE-RL Community

Multi-SWE-RL Dataset Overview

The Multi-SWE-RL Community is an open-source initiative focused on collaborative dataset creation for software engineering and reinforcement learning research. To foster active participation and recognize contributors, we introduce this Contribution Incentive Plan. By contributing high-quality data, you directly support advancements in AI research and earn recognition within the community.

Incentive Tiers:

Be a Contributor: Get listed in the Contribution Progress Sheet
Report Authorship: Become an author in future technical reports

Full details: Contribution Incentive Plan

Get Started in 2 Steps:

Learn: Quick-Start Guide
Try: Follow our Contribution Demo

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-SWE-bench

Multi-SWE-bench

Multi-SWE-RL Community

Popular repositories Loading

Repositories

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

People

Top languages

Uh oh!

Most used topics

Uh oh!