FALCON: Feedback-driven Adaptive Long/short-term memory reinforced Coding Optimization

FALCON Overview

we explore the FALCON framework, which integrates comprehensive unit testing with reinforcement learning, supported by both long-term and short-term memory buffers. During the code generation process, the system stores task descriptions, generated code, and various feedback (e.g., compilation results, code style, and complexity) in the long-term memo 7E0B ry buffer. By retrieving this information, the model references high-quality code, avoids past mistakes, and ensures adherence to required standards. After generating the code, a judge model evaluates it and calculates rewards based on the feedback, which are then used to update the model's parameters through reinforcement learning. All generated code and feedback are stored for future reference and optimization. The combination of long-term and short-term memory feedback in the FALCON framework allows the model to not only learn from a wide range of historical data but also adapt quickly to new tasks based on recent performance.

Installation

The code requires some dependencies as specified in requirements.txt. Please follow the relevant libraries to install or run:

pip install -r requirements.txt

Datasets

APPS: Please follow the downloading and preprocessing instructions provided [here](hendrycks/apps: APPS: Automated Programming Progress Standard (NeurIPS 2021))

Download and unzip all files into the data folder.

Processes

Generating Programs

We created scripts/generate.sh to generate programs on the APPS benchmark.You can run it directly. The relevant parameters are configured in configs/generate_configs.py.

Receive Feedback From the compiler

sh script/run_unit_tests.sh

The relevant parameters are configured in configs/unit_test_configs.py.

Receive AI Feedback

python /AI_Feedback/ai_feedback_generate.py ,Please enter your API key.

Generating Programs with Long Term Memory

sh /scripts/long_memory_generate.sh

Please update the source code path and unit test result path accordingly. Other relevant parameters are located in configs/FAISS_config.py.

RL Finetune

sh /scripts/train_actor_rl_deepspeed.sh

Please update the model paths accordingly. Note that the outputs directory contains various training datasets, including the following:

AI_Feedback: AI-generated feedback related to the code.

deep_codes: Generated code data based on specific tasks.

deepseek_test_result: Unit test feedback, which can be directly used for training purposes.

Please adjust your training paths according to the corresponding parameters to ensure correct configurations. This step is crucial for aligning your data structure and paths with the training process.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
AI_Feedback		AI_Feedback
Data		Data
configs		configs
datasets		datasets
scripts		scripts
trainers		trainers
utils		utils
CODEOWNERS		CODEOWNERS
LICENSE.txt		LICENSE.txt
README.md		README.md
ai_feedback_generate.py		ai_feedback_generate.py
deep_generate.py		deep_generate.py
generate.py		generate.py
generate_with_long_memory.py		generate_with_long_memory.py
get_ai_feedback.py		get_ai_feedback.py
requirements.txt		requirements.txt
test_one_solution.py		test_one_solution.py
train.py		train.py
transformers.zip		transformers.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

FALCON: Feedback-driven Adaptive Long/short-term memory reinforced Coding Optimization

FALCON Overview

Installation

Datasets

Processes

Generating Programs

Receive Feedback From the compiler

Receive AI Feedback

Generating Programs with Long Term Memory

RL Finetune

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

codepassionor/FALCON

Folders and files

Latest commit

History

Repository files navigation

FALCON: Feedback-driven Adaptive Long/short-term memory reinforced Coding Optimization

FALCON Overview

Installation

Datasets

Processes

Generating Programs

Receive Feedback From the compiler

Receive AI Feedback

Generating Programs with Long Term Memory

RL Finetune

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages