Evals

Evals is a framework for evaluating LLMs (large language models) or systems built using LLMs as components. It also includes an open-source registry of challenging evals.

We now support evaluating the behavior of any system including prompt chains or tool-using agents, via the Completion Function Protocol.

With Evals, we aim to make it as simple as possible to build an eval while writing as little code as possible. An "eval" is a task used to evaluate the quality of a system's behavior. To get started, we recommend that you follow these steps:

To get set up with evals, follow the setup instructions below.

Running evals

Learn how to run existing evals: run-evals.md.
Familiarize yourself with the existing eval templates: eval-templates.md.

Writing evals

Important: Please note that we are currently not accepting Evals with custom code! While we ask you to not submit such evals at the moment, you can still submit modelgraded evals with custom modelgraded YAML files.

Walk through the process for building an eval: build-eval.md
See an example of implementing custom eval logic: custom-eval.md.

Writing CompletionFns

Write your own completion functions: completion-fns.md

If you think you have an interesting eval, please open a PR with your contribution. OpenAI staff actively review these evals when considering improvements to upcoming models.

🚨 For a limited time, we will be granting GPT-4 access to those who contribute high quality evals. Please follow the instructions mentioned above and note that spam or low quality submissions will be ignored❗️

Access will be granted to the email address associated with an accepted Eval. Due to high volume, we are unable to grant access to any email other than the one used for the pull request.

Setup

To run evals, you will need to set up and specify your OpenAI API key. You can generate one at https://platform.openai.com/account/api-keys. After you obtain an API key, specify it using the OPENAI_API_KEY environment variable. Please be aware of the costs associated with using the API when running evals.

Minimal Required Version: Python 3.9

Downloading evals

Our Evals registry is stored using Git-LFS. Once you have downloaded and installed LFS, you can fetch the evals with:

git lfs fetch --all
git lfs pull

You may just want to fetch data for a select eval. You can achieve this via:

< 6F83 div class="highlight highlight-source-shell notranslate position-relative overflow-auto" dir="auto" data-snippet-clipboard-copy-content="git lfs fetch --include=evals/registry/data/${your eval} git lfs pull">

git lfs fetch --include=evals/registry/data/${your eval}
git lfs pull

Name		Name	Last commit message	Last commit date
Latest commit History 201 Commits
.github		.github
docs		docs
evals		evals
examples		examples
scripts		scripts
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md
mypy.ini		mypy.ini
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Evals

Running evals

Writing evals

Writing CompletionFns

Setup

Downloading evals

Making evals

Running evals

FAQ

Disclaimer

About

Uh oh!

Releases

Packages

Languages

License

jivishov/evals

Folders and files

Latest commit

History

Repository files navigation

Evals

Running evals

Writing evals

Writing CompletionFns

Setup

Downloading evals

Making evals

Running evals

FAQ

Disclaimer

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages