Clover: Closed-Loop Verifiable Code Generation [paper]
The use of large language models for code generation is a rapidly growing trend in software development. However, without effective methods for ensuring the correctness of generated code, this trend could lead to any number of undesirable outcomes. In this paper, we lay out a vision for addressing this challenge: the Clover paradigm, short for Closed-Loop Verifiable Code Generation, which reduces correctness checking to the more accessible problem of consistency checking. At the core of Clover lies a checker that performs consistency checks among code, docstrings, and formal annotations. The checker is implemented using a novel integration of formal verification tools and large language models. We provide a theoretical analysis to support our thesis that Clover should be effective at consistency checking. We also empirically investigate its feasibility on a hand-designed dataset (CloverBench) featuring annotated Dafny programs at a textbook level of difficulty. Experimental results show that for this dataset, (i) LLMs are reasonably successful at automatically generating formal specifications; and (ii) our consistency checker achieves a promising acceptance rate (up to 87%) for correct instances while maintaining zero tolerance for incorrect ones (no false positives).
# Create and activate a virtual environment
python3 -m venv clover_env
source clover_env/bin/activate
# Install requirements
pip install "sglang[openai]" tqdm
We've created a ZSH configuration file to make working with Clover easier:
source .clover_zshrc
Dafny is required to run the Clover experiments. You can install it with:
brew install dafny
For Linux installation (Ubuntu, Debian, Arch), see the detailed instructions in clover_env_setup.md
.
Basic Ubuntu/Debian installation:
# Install dependencies
sudo apt install -y mono-complete dotnet-sdk-7.0
# Download and install Dafny
VERSION=4.3.0
wget https://github.com/dafny-lang/dafny/releases/download/v${VERSION}/dafny-${VERSION}-x64-ubuntu-20.04.zip
unzip dafny-${VERSION}-x64-ubuntu-20.04.zip -d dafny
sudo mv dafny /usr/local/
echo 'export PATH="$PATH:/usr/local/dafny"' >> ~/.bashrc
source ~/.bashrc
Download from the Dafny GitHub releases and add it to your PATH.
Set your OpenAI API key as an environment variable:
export OPENAI_API_KEY=your_api_key_here
For security, create a private key file that won't be committed to git:
# Create the private key file
cat > private_key.sh << 'EOF'
#!/bin/bash
export OPENAI_API_KEY="your-actual-api-key-here"
EOF
# Make it executable
chmod +x private_key.sh
# Use it
source private_key.sh
This approach is automatically supported by the included .clover_zshrc
and set_openai_key.sh
scripts.
cd clover
python exps.py --num-trial 1 --dafny-path $(which dafny) --model gpt-4o
cd clover
python exps.py --num-trial 50 --dafny-path $(which dafny) --model gpt-4o
cd clover
python debug_exp.py --num-trial 1 --verbose 2 --model gpt-4o --dafny-path $(which dafny)
By default, the code now uses the gpt-4o
model. You can specify a different model with the --model
parameter:
python exps.py --model gpt-4-1106-preview --dafny-path $(which dafny)
Will receive 87% acceptance rate on CloverBench (Ground-Truth) dataset with gpt-4-1106-preview in March 2024.
@misc{sun2023clover,
title={Clover: Closed-Loop Verifiable Code Generation},
author={Chuyue Sun and Ying Sheng and Oded Padon and Clark Barrett},
year={2023},
eprint={2310.17807},
archivePrefix={arXiv},
primaryClass={cs.SE}
}
To reduce costs and speed up experiments, Clover now includes a caching system for LLM calls. This system:
- Saves responses from OpenAI API calls to avoid redundant requests
- Significantly speeds up repeated benchmark runs
- Provides detailed statistics on cache performance
Use the provided script to run Clover with caching enabled:
./run_clover_cached.sh --test abs
You can enable or disable the cache with environment variables:
# Enable caching (default)
export ENABLE_LLM_CACHE=1
# Disable caching
export ENABLE_LLM_CACHE=0
For more details, see README_LLM_CACHE.md.