LLMvsLeetCode

Setting Up The Environment

After cloning the repo...

Set Up a Virtual Environment (optional but recommended)

python -m venv venv
source venv/bin/activate   # For macOS/Linux
.\venv\Scripts\activate    # For Windows

Install Dependencies

pip install -r requirements.txt

Setting Environment Variables

Make env files key.env and .env. In .env, store your HuggingFace access token for access to Meta's Llama models in HF_TOKEN In key.env store your Gemini Flash API key in GOOGLE_API_KEY.

The Dataset

Dataset Structure

The dataset, stored in lc_hard.json, is comprised of Hard difficulty problems from LeetCode.

As it stands, there are 50 entries; the first 20 are from questions that are asked much more frequently in real life technical interviews, and the remaining 30 are selected from the set of other Hard problems on LeetCode.

Each entry in the dataset follows the schema below:

{
    "desc": "",
    "skeleton": "",
    "examples": [""],
    "ref": [""],
    "test": {"input": [[]], "output": []},
    "func": ""
}

Features:

desc stores a string representation of the problem statement.
skeleton stores a string representation of the code starting point.
examples stores a list of problems with solutions for in-context learning. [This is currently not utilized, but room for it exists in the dataset]
ref stores a list of human produced reference solutions to the problem.
test stores a dictionary where input maps to a list of inputs for test cases and output maps to a list of outputs. A test case's input is represented as a list of inputs, even if there is only one. For example, if the input for a singular test case is 4, it would be represented as [4]. If the inputs for a singular test were 4 and "foo", it would be represented as [4, "foo"]
func stores a string representation of the function name the model is completing. This is used for the dynamic code execution, not model prompting.

Editing/Adding Entires

Because the dataset's json representation needs desc, skeleton, ref, and func to be single line strings, it can be hard to type everything out and preserve structure for code. To combat this difficulty, multiple scripts were made to take multiline strings that are easy to paste in and read, convert them into a json acceptable format, and write them to the json.

To edit the feature of an entry/entries, simply open lc_hard_{feature}.py and set lcX where X is the problem number of interest your desired string. When you've made the changes you want to make, save the file and run the script to update the dataset.

! Note that the current version of these scripts is hard coded for 50 entries. If entries are added, the value in range for when gathering the global variables should be changed accordingly.

How To Run Evaluation

The evaluation in this project is three-pronged:

Structural Similarity
Accuracy
Readability

Structural similarity and accuaracy are calculated in eval.ipnyb. ! This must be run before running readability.ipnyb.

Structural similarity is calculated using Corpus BLEU Score. ! It is important to note that the non-deterministic results of the model's generation (due to not setting a seed and using default top-p and temperature parameters) that the BLEU score can vary significantly due to the sensitivity of the metric.

Accuracy is calculated by dynamically executing the code generated by the model and then running the corresponding input output pairs from the dataset against the function produced.

It is noted as Average Pass Rate across all entries, Pass Rate for Popular Problems (first 20), and Pass Rate for Unpopular Problems (last 30).

! Note that Pass Rate for Unpopular Problems is calculated based on all problems other than the first 20, so keep this factor in mind if you add entries.

Running the cells in eval.ipnyb will produce the model's outputs in the generated_outputs folder as individual txt files. After this is done, running the cells in readability.ipnyb will store readability scores for each code snippet in readability_scores.json, produce a histogram of readabilty scores as judged by Gemini Flash, and an average readability score.

For ease of access, we pasted our results as printed in eval.ipnyb in stats.txt.

Generating Graphs

As mentioned above, running readability.ipnyb will yield a histogram of readabilty scores as judged by Gemini Flash. Currently, our other graphs are generated by the simple make_graphs.ipnyb. Paste the dict representing Pass Rate by Problem into acc_by_snippet. Running this notebook will produce graphs for Readability by Accuracy.

Miscellanious

The check_cuda.py exists to check if your GPU is available.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
generated_outputs		generated_outputs
.gitignore		.gitignore
README.md		README.md
check_cuda.py		check_cuda.py
eval.ipynb		eval.ipynb
lc_hard.json		lc_hard.json
lc_hard_desc.py		lc_hard_desc.py
lc_hard_func_name.py		lc_hard_func_name.py
lc_hard_refs.py		lc_hard_refs.py
lc_hard_skel.py		lc_hard_skel.py
make_graphs.ipynb		make_graphs.ipynb
readability.ipynb		readability.ipynb
readability_scores.json		readability_scores.json
requirements.txt		requirements.txt
stats.txt		stats.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LLMvsLeetCode

Table of Contents

Setting Up The Environment

Set Up a Virtual Environment (optional but recommended)

Install Dependencies

Setting Environment Variables

The Dataset

Dataset Structure

Editing/Adding Entires

How To Run Evaluation

Generating Graphs

Miscellanious

About

Uh oh!

Releases

Packages

Languages

wathmew/LLMvsLeetCode

Folders and files

Latest commit

History

Repository files navigation

LLMvsLeetCode

Table of Contents

Setting Up The Environment

Set Up a Virtual Environment (optional but recommended)

Install Dependencies

Setting Environment Variables

The Dataset

Dataset Structure

Editing/Adding Entires

How To Run Evaluation

Generating Graphs

Miscellanious

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages