Multi-Programming Language Evaluation of Large Language Models of Code (MultiPL-E)

Please visit the website or read our paper for more information.

Evaluation on Discovery

These instructions will run inference and evaluation on the Northeastern Discovery cluster. It should be possible to easily adapt the scripts for other Slurm clusters.

Prerequisites

On a compute node, run

singularity pull docker://ghcr.io/nuprl/multipl-e-evaluation

This wll create the file multipl-e-evaluation_latest.sif, which is the container. The file cluster/discovery_evaluation.sh assumes that the file is saved as /work/arjunguha-research-group/arjun/containers/multipl-e-evaluation_latest.sif.

You also need an environment that has the MultiPL-E dependencies. On Discovery, you can use source ~a.guha/bin/gpuenv, which activates an appropriate Conda environment.

Running the Evaluation

You can do this on the login node or a compute node with limited resources.

Activate an appropriate environment:
```
source ~a.guha/bin/gpuenv
```

Enter the root of the MultiPL-E repository:

cd /work/arjunguha-research-group/arjun/repos/MultiPL-E

Create a directory for experiment results:
```
mkdir experiments
```
You can re-use this directory to incrementally add new experiments.
Create a file called experiments/inference.sh. Each line of the file should run inference. For example:
```
python -m inference --model-name inference.bigcode_mha --root-dataset humaneval --lang py --temperature 0.2 --batch-size 50
```
We will not run this shell script directly. Instead, we will run each line on a separate GPU node. Therefore, ensure that no command spans multiple lines (i.e., do not use trailing \) and do not include the #! on the first line.
Run ./cluster/pipeline.sh experiments

You will receive an email at your @northeastern.edu address when complete.

The script puts all logs files in experiments/logs.

Name		Name	Last commit message	Last commit date
Latest commit History 746 Commits
analysis		analysis
cluster		cluster
dataset_builder		dataset_builder
datasets		datasets
discovery_environment_check		discovery_environment_check
docs		docs
evaluation		evaluation
fill_in_the_middle		fill_in_the_middle
inference		inference
prompts		prompts
results		results
rs		rs
src		src
.gitignore		.gitignore
LICENSE		LICENSE
MultiPL-E-all-toolchains.yaml		MultiPL-E-all-toolchains.yaml
README.md		README.md
util.py		util.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Multi-Programming Language Evaluation of Large Language Models of Code (MultiPL-E)

Evaluation on Discovery

Prerequisites

Running the Evaluation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

vljap/MultiPL-E

Folders and files

Latest commit

History

Repository files navigation

Multi-Programming Language Evaluation of Large Language Models of Code (MultiPL-E)

Evaluation on Discovery

Prerequisites

Running the Evaluation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages