Please visit the website or read our paper for more information.
These instructions will run inference and evaluation on the Northeastern Discovery cluster. It should be possible to easily adapt the scripts for other Slurm clusters.
On a compute node, run
singularity pull docker://ghcr.io/nuprl/multipl-e-evaluation
This wll create the file multipl-e-evaluation_latest.sif
, which is the
container. The file cluster/discovery_evaluation.sh assumes that the file is
saved as
/work/arjunguha-research-group/arjun/containers/multipl-e-evaluation_latest.sif
.
You also need an environment that has the MultiPL-E dependencies. On Discovery,
you can use source ~a.guha/bin/gpuenv
, which activates an appropriate
Conda environment.
You can do this on the login node or a compute node with limited resources.
-
Activate an appropriate environment:
source ~a.guha/bin/gpuenv
-
Enter the root of the MultiPL-E repository:
cd /work/arjunguha-research-group/arjun/repos/MultiPL-E
-
Create a directory for experiment results:
mkdir experiments
You can re-use this directory to incrementally add new experiments.
-
Create a file called
experiments/inference.sh
. Each line of the file should run inference. For example:python -m inference --model-name inference.bigcode_mha --root-dataset humaneval --lang py --temperature 0.2 --batch-size 50
We will not run this shell script directly. Instead, we will run each line on a separate GPU node. Therefore, ensure that no command spans multiple lines (i.e., do not use trailing
\
) and do not include the#!
on the first line. -
Run
./cluster/pipeline.sh experiments
You will receive an email at your
@northeastern.edu
address when complete.The script puts all logs files in
experiments/logs
.