SpeedupLLM

This repository contains the codebase for our paper "Can Past Experience Accelerate LLM Reasoning?". The project explores how prior experience and memory mechanisms can accelerate LLM reasoning while maintaining or improving answer quality. It introduces a framework for adaptive compute budget allocation and memory-based reasoning enhancements across diverse task similarities.

Memory Methods

Memory methods control how prior examples or intermediate results are incorporated during reasoning:

Memory Method	Description
`no_memory`	No memory used. Each query is answered independently.
`SFT`	Fine-tuned model with few-shot context from training examples.
`in_context`	Uses retrieved past examples dynamically as in-context demonstrations.
`reflect`	Uses self-reflection on failed trials to guide next iteration.
`multi_case_reflect`	Reflects using multiple past answers for better generalization.
`reflect_update`	Reflection with iterative memory refinement after each round.

In code (e.g., main.py or SLURM script), memory method is selected with:

--memory_method=SFT

Scaling Methods

Scaling methods define how reasoning is expanded or iterated during test time:

Scaling Method	Description
`best_of_n`	Samples multiple answers and picks the best one based on a scoring model.
`self_refine`	Iteratively refines previous answers by self-editing.
`dfs`	Depth-first search in reasoning steps, exploring sequentially.
`long_cot`	Long Chain-of-Thought reasoning using larger context windows.

In code (e.g., main.py), the method is passed as:

--scaling_method=long_cot

These methods can be combined with different memory settings to study cost-quality tradeoffs.

Similarity Levels

We group questions into clusters to simulate varying degrees of similarity with previously solved tasks. This helps analyze whether LLMs can leverage prior exposure to speed up reasoning.

Available Subgroups

Subgroup Key	Description
`1_same_question`	Exact same question repeated
`2_diff_wording`	Same semantics with different wording
`3_diff_number`	Same problem type, but numbers are varied
`4_diff_question`	Entirely new question type or topic

Setting Similarity in Code

In main.py, similarity is set using:

--subgroup=2_diff_wording

🧪 Running Experiments

Run a Single Job Locally

python main.py \
  --cuda=0 \
  --backend=meta-llama/Meta-Llama-3.1-8B-Instruct \
  --value_backend=gpt-4o-mini \
  --task=MATH500 \
  --scaling_method=best_of_n \
  --memory_method=SFT \
  --subgroup=2_diff_wording \
  --cluster_id=1 \
  --experiment_id=4_0

Optional flags:

--use_lora
--prm_use_lora
--max_tokens, --num_iteration, --num_questions, etc.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
MATH/MATH		MATH/MATH
models		models
utils		utils
MATH.py		MATH.py
MATH_subset.py		MATH_subset.py
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
test_time_scaling.py		test_time_scaling.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SpeedupLLM

Memory Methods

Scaling Methods

Similarity Levels

Available Subgroups

Setting Similarity in Code

🧪 Running Experiments

Run a Single Job Locally

About

Uh oh!

Releases

Packages

Uh oh!

Languages

pb0316/SpeedupLLM

Folders and files

Latest commit

History

Repository files navigation

SpeedupLLM

Memory Methods

Scaling Methods

Similarity Levels

Available Subgroups

Setting Similarity in Code

🧪 Running Experiments

Run a Single Job Locally

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages