8000
We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
There was an error while loading. Please reload this page.
Merge pull request #234 from RobotSail/add-leaderboard Implement leaderboard as a benchmark
Merge pull request #212 from alimaredia/bump-ragas-version
Merge pull request #208 from RobotSail/update-changelog chore: update changelog for 0.5.0
Merge pull request #197 from RobotSail/fix-mmlu Allows MMLU to have the system_prompt provided to it
Merge pull request #179 from danmcp/handlenoresult Handle no valid eval results for mt_bench
Merge pull request #174 from danmcp/modeladapterunits Add model adapter unit tests
Merge pull request #143 from danmcp/aggfix Remove task logic with lm_eval 0.4.4 for agg_score
Merge pull request #138 from alimaredia/mtbench-branch-judgement-retu… …rn-overall-score return overall_score from MTBenchBranch.generate_judgement()
Merge pull request #98 from danmcp/removefastchatdep Remove fastchat dependency
Merge pull request #110 from danmcp/singleanswerfile Use single answer file and model list