The official repo for the paper Can ChatGPT replace StackOverflow? A Study on Robustness and Reliability of Large Language Model Code Generation (AAAI'24).

Java 16 3 Updated Feb 27, 2024

evalplus / evalplus

Rigourous evaluation of LLM-synthesized code - NeurIPS 2023 & COLM 2024

Python 1,483 158 Updated May 24, 2025

hendrycks / math

The MATH Dataset (NeurIPS 2021)

Python 1,136 101 Updated Aug 5, 2024

lucidrains / reformer-pytorch

Reformer, the efficient Transformer, in Pytorch

Python 2,170 257 Updated Jun 21, 2023

Jamie-Stirling / RetNet

An implementation of "Retentive Network: A Successor to Transformer for Large Language Models"

Python 1,190 102 Updated Oct 22, 2023

syncdoth / RetNet

Huggingface compatible implementation of RetNet (Retentive Networks, https://arxiv.org/pdf/2307.08621.pdf) including parallel, recurrent, and chunkwise forward.

Jupyter Notebook 226 27 Updated Mar 12, 2024

FoundationAgents / MetaGPT

🌟 The Multi-Agent Framework: First AI Software Company, Towards Natural Language Programming

Python 56,428 6,762 Updated Jun 13, 2025

SWE-agent / SWE-agent

SWE-agent takes a GitHub issue and tries to automatically fix it, using your LM of choice. It can also be employed for offensive cybersecurity or competitive coding challenges. [NeurIPS 2024]

Python 16,311 1,675 Updated Jun 15, 2025

AutoCodeRoverSG / auto-code-rover

A project structure aware autonomous software engineer aiming for autonomous program improvement. Resolved 37.3% tasks (pass@1) in SWE-bench lite and 46.2% tasks (pass@1) in SWE-bench verified with…

Python 2,951 323 Updated Apr 24, 2025

THUDM / LongBench

LongBench v2 and LongBench (ACL 25'&24')

Python 900 88 Updated Jan 15, 2025

NTDXYG / green_paper_list

1 Updated Jul 2, 2024

C-dessert / 2017-8-17-Try1

试一试

PHP 1 Updated Aug 17, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

C-dessert

Block or report C-dessert

Stars

hiyouga / LLaMA-Factory

EleutherAI / lm-evaluation-harness

FareedKhan-dev / train-deepseek-r1

unslothai / unsloth

OctopusMind / DPO

GAIR-NLP / O1-Journey

AIDC-AI / Marco-o1

Scientific-Computing-Lab / MonoCoder

parallelcodefoundry / ParEval

brenocfg / AnghaBench

ExpertiseModel / MuTAP

scope-lab-vu / PAMCTS

plasma-umass / coverup

PyCQA / bandit

FloridSleeves / RobustAPI