ReCUT:Balancing Reasoning Length and Accuracy in LLMs via Stepwise Trails and Preference Optimization

Overview

ReCUT employs a stepwise exploration mechanism and a long-short switched sampling strategy, enabling LLMs to incrementally generate diverse reasoning paths. These paths are evaluated and used to construct preference pairs to train two specialized models (Gemini LLMs)—one optimized for reasoning accuracy, the other for shorter reasoning. A final integrated model is obtained by interpolating the parameters of these two models.

Set up

Use git clone to download this project

git clone https://github.com/NEUIR/ReCUT.git
cd ReCUT

To prevent conflicts between packages, we mainly use three virtual environment management packages, one for constructive data and evaluate、 one for model training and for model merge.

for constructive data and evaluate, please:
conda env create -n data_eva -f data_environment.yml

for model training, please:
conda env create -n long-short -f training_environment.yml

for model merge, please:
conda env create -n mergekit -f merge_environment.yml

Data

Our corresponding generated training data is placed under the data folder

Download the files from here Use the downloaded data to synthesize the data using the following scripts

conda activate data_eva
python src/long2short_reward_generate.py
--model_path # The path to model \
--tensor_parallel_size # Tensor Parallel Size \
--max_model_len 8096 \
--gpu_memory_utilization 0.95 \
--dataset_path # Input DeepScaleR dataset paths \
--output_dir # Output directory (default: . /result) \
--sample_size 8000 \
--batch_size 4 \
--num_iterations 8 \
--temperature 0.7 \
--top_p 0.95 \
--max_tokens 2048 \
--alpha 1 \
--beta 1 \
--random_seed 42 \
--analyze_only \
--result_path # Path to the result file to be analyzed (only used in analyze_only mode)

The script generates two thing pool json files, a selection result json file and an optimal solution json file

Construct dpo data

First:Use the following script to slice the correct optimal solution

python src/long2short_split_optimal.py
--input # Path to the optimal json file
--output_correct #
--intput_correct #

Second:Construct two DPO training data using the correct optimal solution and two think tanks to train two DPO models, one length-optimized model and one accuracy-optimized model

python src/long2short_dpo.py
--optimal # Correctly optimize the path to the json file
--pooling # Thing Pool Path
--output # Output File Path

DPO

Our DPO training uses LLaMA Factory, and the specific training parameters can be found in our dpo.yaml file.

conda activate long-short

Merge Model

Our model merging reference is Mergekit, Please refer to the specific Mergekit documentation for usage. which uses the following script for model merging. Merging models requires modifying the dare_ties.yml file, which is located in the folder main.

conda env create -n mergekit -f merge_environment.yml
cd mergekit
mergekit-yaml Path to the dare_ties.yml file Output path of the merged model

Evaluate Pass@1

Our evaluation uses search-o1's evaluation methodology and we provide nothing but good test datasets. If you have any problems, please refer to search-o1, just replace the files prompt.py and run_direct_gen.py.

conda env create -n data_eva
cd evaluate
python scripts/run_direct_gen.py \
--dataset_name \ # The name of the dataset, e.g. aime25
--split test \
--output_dir  \
--model_path \

Evaluate Length

conda env create -n data_eva
cd evaluate
python evaluate_length.py \
--input \ # Evaluate the path to the production results file
--output \

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ReCUT:Balancing Reasoning Length and Accuracy in LLMs via Stepwise Trails and Preference Optimization

Overview

Set up

Data

Construct dpo data

DPO

Merge Model

Evaluate Pass@1

Evaluate Length

About

Uh oh!

Watchers

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
evaluate		evaluate
figs		figs
src		src
LICENSE		LICENSE
README.md		README.md
dare_ties.yml		dare_ties.yml
data_environment.yml		data_environment.yml
dpo.yaml		dpo.yaml
merge_environment.yml		merge_environment.yml
training_environment.yml		training_environment.yml

License

NEUIR/ReCUT

Folders and files

Latest commit

History

Repository files navigation

ReCUT:Balancing Reasoning Length and Accuracy in LLMs via Stepwise Trails and Preference Optimization

Overview

Set up

Data

Construct dpo data

DPO

Merge Model

Evaluate Pass@1

Evaluate Length

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages