Official resources of "KBQA-o1: Agentic Knowledge Base Question Answering with Monte Carlo Tree Search". Haoran Luo, Haihong E, Yikai Guo, Qika Lin, Xiaobao Wu, Xinyu Mu, Wenhao Liu, Meina Song, Yifan Zhu, Luu Anh Tuan. ICML 2025 [paper].
conda create -n kbqao1 python=3.11
conda activate kbqao1
pip install torch==2.3.0
pip install -r requirements.txt
sudo apt install unixodbc
export PYTHONPATH=$PWD
Below steps are according to Freebase Virtuoso Setup.
(1) Clone from dki-lab/Freebase-Setup
:
cd Freebase-Setup
(2) Processed Freebase Virtuoso DB file can be downloaded from here or via wget (WARNING: 53G+ disk space is needed):
tar -zxvf virtuoso_db.zip
(3) Managing the Virtuoso service: To start service:
chmod +x virtuoso-opensource/bin/virtuoso-t
python3 virtuoso.py start 3001 -d virtuoso_db
and to stop a currently running service at the same port:
chmod +x virtuoso-opensource/bin/isql
python3 virtuoso.py stop 3001
A server with at least 100 GB RAM is recommended.
- Download
fb_roles
,fb_types
,reverse_properties
from here todataset/Freebase/
.
KBQA-o1/
└── dataset/
├── Freebase/
├── fb_roles
├── fb_types
└── reverse_properties
Experiments are conducted on 3 classical KBQA benchmarks: WebQSP, GrailQA and GraphQ.
- WebQSP: Download the WebQSP dataset from here and put them under
dataset/WebQSP/origin
. The dataset files should be named asWebQSP.test[train].json
. - GrailQA: Download the GrailQA dataset here and put them under both
dataset/GrailQA/origin
. The dataset files should be named asgrailqa_v1.0_test_public[train,dev].json
. - GraphQ: Download the GraphQ dataset here and put them under both
dataset/GraphQ/origin
. The dataset files should be named asgraphquestions_v1_fb15_test[training]_091420.json
.
KBQA-o1/
└── dataset/
├── WebQSP/
├── origin/
├── WebQSP.train.json
└── WebQSP.test.json
├── GrailQA/
├── origin/
├── grailqa_v1.0_train.json
├── grailqa_v1.0_dev.json
└── grailqa_v1.0_test_public.json
├── GraphQ/
├── origin/
├── graphquestions_v1_fb15_training_091420.json
└── graphquestions_v1_fb15_test_091420.json
Parse SPARQL queries to S-expressions and Function-lists.
- WebQSP: Run
python data_process.py --dataset WebQSP
and the merged data file will be saved asdataset/WebQSP/processed/WebQSP_train[test].json
. - GrailQA: Run
python data_process.py --dataset GrailQA
and the merged data file will be saved asdataset/GrailQA/processed/GrailQA_train[test,test_public].json
. - GraphQ: Run
python data_process.py --dataset GraphQ
and the merged data file will be saved asdataset/GraphQ/processed/GraphQ_train[test].json
.
KBQA-o1/
└── dataset/
├── WebQSP/
├── processed/
├── WebQSP_train.json
└── WebQSP_test.json
├── GrailQA/
├── processed/
├── GrailQA_train.json
└── GrailQA_test.json
├── GraphQ/
├── processed/
├── GraphQ_train.json
└── GraphQ_test.json
python prepare_sft_data.py --dataset WebQSP
CUDA_VISIBLE_DEVICES=0,1,2,3 nohup deepspeed --num_gpus 4 --master_port=9902 src/train.py --deepspeed ds_config.json --model_name_or_path meta-llama/Llama-3.1-8B-Instruct --stage sft --do_train --finetuning_type lora --lora_target q_proj,v_proj --dataset sft_KBQA_WebQSP_simulate --template llama3 --cutoff_len 1024 --overwrite_cache --preprocessing_num_workers 16 --output_dir expr/KBQA/Llama-3.1-8B-Instruct/WebQSP/sft_KBQA_WebQSP_simulate/checkpoint --logging_steps 10 --save_steps 10000 --plot_loss --overwrite_output_dir --per_device_train_batch_size 1 --gradient_accumulation_steps 1 --learning_rate 5e-5 --num_train_epochs 50.0 --lr_scheduler_type cosine --bf16 >> result_Llama-3.1-8B-Instruct_sft_KBQA_WebQSP_simulate.log 2>&1 &
CUDA_VISIBLE_DEVICES=0,1,2,3 nohup deepspeed --num_gpus 4 --master_port=9902 src/train.py --deepspeed ds_config.json --model_name_or_path meta-llama/Llama-3.1-8B-Instruct --stage sft --do_train --finetuning_type lora --lora_target q_proj,v_proj --use_dora --dataset sft_KBQA_WebQSP_reward --template llama3 --cutoff_len 1024 --overwrite_cache --preprocessing_num_workers 16 --output_dir expr/KBQA/Llama-3.1-8B-Instruct/WebQSP/sft_KBQA_WebQSP_reward/checkpoint --logging_steps 10 --save_steps 10000 --plot_loss --overwrite_output_dir --per_device_train_batch_size 1 --gradient_accumulation_steps 1 --learning_rate 5e-5 --num_train_epochs 100.0 --lr_scheduler_type cosine --bf16 >> result_Llama-3.1-8B-Instruct_sft_KBQA_WebQSP_reward.log 2>&1 &
CUDA_VISIBLE_DEVICES=0,1,2,3 python src/export_model.py --model_name_or_path meta-llama/Llama-3.1-8B-Instruct --adapter_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/WebQSP/sft_KBQA_WebQSP_simulate/checkpoint --template llama3 --finetuning_type lora --use_dora --export_dir expr/KBQA/Llama-3.1-8B-Instruct/WebQSP/sft_KBQA_WebQSP_simulate/export_model/simulate --export_size 2 --export_legacy_format False
CUDA_VISIBLE_DEVICES=0,1,2,3 python src/export_model.py --model_name_or_path meta-llama/Llama-3.1-8B-Instruct --adapter_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/WebQSP/sft_KBQA_WebQSP_reward/checkpoint --template llama3 --finetuning_type lora --use_dora --export_dir expr/KBQA/Llama-3.1-8B-Instruct/WebQSP/sft_KBQA_WebQSP_reward/export_model/reward --export_size 2 --export_legacy_format False
CUDA_VISIBLE_DEVICES=0 API_PORT=8101 MODEL_NAME=simulate nohup python src/llm_api.py --model_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/WebQSP/sft_KBQA_WebQSP_simulate/export_model/simulate --template llama3 --temperature 0.0 >> result_Llama-3.1-8B-Instruct_llm_api_sft_KBQA_WebQSP_simulate.log 2>&1 &
CUDA_VISIBLE_DEVICES=0 API_PORT=8102 MODEL_NAME=reward nohup python src/llm_api.py --model_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/WebQSP/sft_KBQA_WebQSP_reward/export_model/reward --template llama3 --temperature 0.0 >> result_Llama-3.1-8B-Instruct_llm_api_sft_KBQA_WebQSP_reward.log 2>&1 &
CUDA_VISIBLE_DEVICES=0 nohup python run_explore.py --llm_simulate_name 8101/simulate --llm_reward_name 8102/reward --base Llama-3.1-8B-Instruct --task explore --dataset WebQSP >> result_Llama-3.1-8B-Instruct_explore_KBQA_WebQSP_sft.log 2>&1 &
python prepare_sft2_data.py --llm_reward_name 8102/reward --base Llama-3.1-8B-Instruct --dataset WebQSP --limit "30"
bash utils/kill_llm_api_WebQSP.sh
CUDA_VISIBLE_DEVICES=0,1,2,3 nohup deepspeed --num_gpus 4 --master_port=9902 src/train.py --deepspeed ds_config.json --model_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/WebQSP/sft_KBQA_WebQSP_simulate/export_model/simulate --stage sft --do_train --finetuning_type lora --lora_target q_proj,v_proj --dataset sft2_KBQA_WebQSP_simulate --template llama3 --cutoff_len 1024 --overwrite_cache --preprocessing_num_workers 16 --output_dir expr/KBQA/Llama-3.1-8B-Instruct/WebQSP/sft2_KBQA_WebQSP_simulate/checkpoint --logging_steps 10 --save_steps 10000 --plot_loss --overwrite_output_dir --per_device_train_batch_size 1 --gradient_accumulation_steps 1 --learning_rate 5e-5 --num_train_epochs 10.0 --lr_scheduler_type cosine --bf16 >> result_Llama-3.1-8B-Instruct_sft2_KBQA_WebQSP_simulate.log 2>&1 &
CUDA_VISIBLE_DEVICES=0,1,2,3 nohup deepspeed --num_gpus 4 --master_port=9902 src/train.py --deepspeed ds_config.json --model_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/WebQSP/sft_KBQA_WebQSP_reward/export_model/reward --stage sft --do_train --finetuning_type lora --lora_target q_proj,v_proj --use_dora --dataset sft2_KBQA_WebQSP_reward --template llama3 --cutoff_len 1024 --overwrite_cache --preprocessing_num_workers 16 --output_dir expr/KBQA/Llama-3.1-8B-Instruct/WebQSP/sft2_KBQA_WebQSP_reward/checkpoint --logging_steps 10 --save_steps 10000 --plot_loss --overwrite_output_dir --per_device_train_batch_size 1 --gradient_accumulation_steps 1 --learning_rate 5e-5 --num_train_epochs 20.0 --lr_scheduler_type cosine --bf16 >> result_Llama-3.1-8B-Instruct_sft2_KBQA_WebQSP_reward.log 2>&1 &
CUDA_VISIBLE_DEVICES=0,1,2,3 python src/export_model.py --model_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/WebQSP/sft_KBQA_WebQSP_simulate/export_model/simulate --adapter_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/WebQSP/sft2_KBQA_WebQSP_simulate/checkpoint --template llama3 --finetuning_type lora --use_dora --export_dir expr/KBQA/Llama-3.1-8B-Instruct/WebQSP/sft2_KBQA_WebQSP_simulate/export_model/simulate --export_size 2 --export_legacy_format False
CUDA_VISIBLE_DEVICES=0,1,2,3 python src/export_model.py --model_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/WebQSP/sft_KBQA_WebQSP_reward/export_model/reward --adapter_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/WebQSP/sft2_KBQA_WebQSP_reward/checkpoint --template llama3 --finetuning_type lora --use_dora --export_dir expr/KBQA/Llama-3.1-8B-Instruct/WebQSP/sft2_KBQA_WebQSP_reward/export_model/reward --export_size 2 --export_legacy_format False
CUDA_VISIBLE_DEVICES=0 API_PORT=8101 MODEL_NAME=simulate nohup python src/llm_api.py --model_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/WebQSP/sft2_KBQA_WebQSP_simulate/export_model/simulate --template llama3 --temperature 0.0 >> result_Llama-3.1-8B-Instruct_llm_api_sft2_KBQA_WebQSP_simulate.log 2>&1 &
CUDA_VISIBLE_DEVICES=0 API_PORT=8102 MODEL_NAME=reward nohup python src/llm_api.py --model_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/WebQSP/sft2_KBQA_WebQSP_reward/export_model/reward --template llama3 --temperature 0.0 >> result_Llama-3.1-8B-Instruct_llm_api_sft2_KBQA_WebQSP_reward.log 2>&1 &
CUDA_VISIBLE_DEVICES=0 nohup python run_explore.py --llm_simulate_name 8101/simulate --llm_reward_name 8102/reward --base Llama-3.1-8B-Instruct --task test --dataset WebQSP >> result_Llama-3.1-8B-Instruct_test_KBQA_WebQSP_sft2.log 2>&1 &
bash utils/kill_llm_api_WebQSP.sh
python prepare_sft_data.py --dataset GrailQA
CUDA_VISIBLE_DEVICES=0,1,2,3 nohup deepspeed --num_gpus 4 --master_port=9902 src/train.py --deepspeed ds_config.json --model_name_or_path meta-llama/Llama-3.1-8B-Instruct --stage sft --do_train --finetuning_type lora --lora_target q_proj,v_proj --dataset sft_KBQA_GrailQA_simulate --template llama3 --cutoff_len 1024 --overwrite_cache --preprocessing_num_workers 16 --output_dir expr/KBQA/Llama-3.1-8B-Instruct/GrailQA/sft_KBQA_GrailQA_simulate/checkpoint --logging_steps 10 --save_steps 10000 --plot_loss --overwrite_output_dir --per_device_train_batch_size 1 --gradient_accumulation_steps 1 --learning_rate 5e-5 --num_train_epochs 100.0 --lr_scheduler_type cosine --bf16 >> result_Llama-3.1-8B-Instruct_sft_KBQA_GrailQA_simulate.log 2>&1 &
CUDA_VISIBLE_DEVICES=0,1,2,3 nohup deepspeed --num_gpus 4 --master_port=9902 src/train.py --deepspeed ds_config.json --model_name_or_path meta-llama/Llama-3.1-8B-Instruct --stage sft --do_train --finetuning_type lora --lora_target q_proj,v_proj --use_dora --dataset sft_KBQA_GrailQA_reward --template llama3 --cutoff_len 1024 --overwrite_cache --preprocessing_num_workers 16 --output_dir expr/KBQA/Llama-3.1-8B-Instruct/GrailQA/sft_KBQA_GrailQA_reward/checkpoint --logging_steps 10 --save_steps 10000 --plot_loss --overwrite_output_dir --per_device_train_batch_size 1 --gradient_accumulation_steps 1 --learning_rate 5e-5 --num_train_epochs 300.0 --lr_scheduler_type cosine --bf16 >> result_Llama-3.1-8B-Instruct_sft_KBQA_GrailQA_reward.log 2>&1 &
CUDA_VISIBLE_DEVICES=0,1,2,3 python src/export_model.py --model_name_or_path meta-llama/Llama-3.1-8B-Instruct --adapter_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/GrailQA/sft_KBQA_GrailQA_simulate/checkpoint --template llama3 --finetuning_type lora --use_dora --export_dir expr/KBQA/Llama-3.1-8B-Instruct/GrailQA/sft_KBQA_GrailQA_simulate/export_model/simulate --export_size 2 --export_legacy_format False
CUDA_VISIBLE_DEVICES=0,1,2,3 python src/export_model.py --model_name_or_path meta-llama/Llama-3.1-8B-Instruct --adapter_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/GrailQA/sft_KBQA_GrailQA_reward/checkpoint --template llama3 --finetuning_type lora --use_dora --export_dir expr/KBQA/Llama-3.1-8B-Instruct/GrailQA/sft_KBQA_GrailQA_reward/export_model/reward --export_size 2 --export_legacy_format False
CUDA_VISIBLE_DEVICES=1 API_PORT=8103 MODEL_NAME=simulate nohup python src/llm_api.py --model_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/GrailQA/sft_KBQA_GrailQA_simulate/export_model/simulate --template llama3 --temperature 0.0 >> result_Llama-3.1-8B-Instruct_llm_api_sft_KBQA_GrailQA_simulate.log 2>&1 &
CUDA_VISIBLE_DEVICES=1 API_PORT=8104 MODEL_NAME=reward nohup python src/llm_api.py --model_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/GrailQA/sft_KBQA_GrailQA_reward/export_model/reward --template llama3 --temperature 0.0 >> result_Llama-3.1-8B-Instruct_llm_api_sft_KBQA_GrailQA_reward.log 2>&1 &
CUDA_VISIBLE_DEVICES=1 nohup python run_explore.py --llm_simulate_name 8103/simulate --llm_reward_name 8104/reward --base Llama-3.1-8B-Instruct --task explore --dataset GrailQA >> result_Llama-3.1-8B-Instruct_explore_KBQA_GrailQA_sft.log 2>&1 &
python prepare_sft2_data.py --llm_reward_name 8104/reward --base Llama-3.1-8B-Instruct --dataset GrailQA --limit "-100"
bash utils/kill_llm_api_GrailQA.sh
CUDA_VISIBLE_DEVICES=0,1,2,3 nohup deepspeed --num_gpus 4 --master_port=9902 src/train.py --deepspeed ds_config.json --model_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/GrailQA/sft_KBQA_GrailQA_simulate/export_model/simulate --stage sft --do_train --finetuning_type lora --lora_target q_proj,v_proj --dataset sft2_KBQA_GrailQA_simulate --template llama3 --cutoff_len 1024 --overwrite_cache --preprocessing_num_workers 16 --output_dir expr/KBQA/Llama-3.1-8B-Instruct/GrailQA/sft2_KBQA_GrailQA_simulate/checkpoint --logging_steps 10 --save_steps 10000 --plot_loss --overwrite_output_dir --per_device_train_batch_size 1 --gradient_accumulation_steps 1 --learning_rate 5e-5 --num_train_epochs 10.0 --lr_scheduler_type cosine --bf16 >> result_Llama-3.1-8B-Instruct_sft2_KBQA_GrailQA_simulate.log 2>&1 &
CUDA_VISIBLE_DEVICES=0,1,2,3 nohup deepspeed --num_gpus 4 --master_port=9902 src/train.py --deepspeed ds_config.json --model_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/GrailQA/sft_KBQA_GrailQA_reward/export_model/reward --stage sft --do_train --finetuning_type lora --lora_target q_proj,v_proj --use_dora --dataset sft2_KBQA_GrailQA_reward --template llama3 --cutoff_len 1024 --overwrite_cache --preprocessing_num_workers 16 --output_dir expr/KBQA/Llama-3.1-8B-Instruct/GrailQA/sft2_KBQA_GrailQA_reward/checkpoint --logging_steps 10 --save_steps 10000 --plot_loss --overwrite_output_dir --per_device_train_batch_size 1 --gradient_accumulation_steps 1 --learning_rate 5e-5 --num_train_epochs 20.0 --lr_scheduler_type cosine --bf16 >> result_Llama-3.1-8B-Instruct_sft2_KBQA_GrailQA_reward.log 2>&1 &
CUDA_VISIBLE_DEVICES=0,1,2,3 python src/export_model.py --model_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/GrailQA/sft_KBQA_GrailQA_simulate/export_model/simulate --adapter_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/GrailQA/sft2_KBQA_GrailQA_simulate/checkpoint --template llama3 --finetuning_type lora --use_dora --export_dir expr/KBQA/Llama-3.1-8B-Instruct/GrailQA/sft2_KBQA_GrailQA_simulate/export_model/simulate --export_size 2 --export_legacy_format False
CUDA_VISIBLE_DEVICES=0,1,2,3 python src/export_model.py --model_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/GrailQA/sft_KBQA_GrailQA_reward/export_model/reward --adapter_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/GrailQA/sft2_KBQA_GrailQA_reward/checkpoint --template llama3 --finetuning_type lora --use_dora --export_dir expr/KBQA/Llama-3.1-8B-Instruct/GrailQA/sft2_KBQA_GrailQA_reward/export_model/reward --export_size 2 --export_legacy_format False
CUDA_VISIBLE_DEVICES=1 API_PORT=8103 MODEL_NAME=simulate nohup python src/llm_api.py --model_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/GrailQA/sft2_KBQA_GrailQA_simulate/export_model/simulate --template llama3 --temperature 0.0 >> result_Llama-3.1-8B-Instruct_llm_api_sft2_KBQA_GrailQA_simulate.log 2>&1 &
CUDA_VISIBLE_DEVICES=1 API_PORT=8104 MODEL_NAME=reward nohup python src/llm_api.py --model_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/GrailQA/sft2_KBQA_GrailQA_reward/export_model/reward --template llama3 --temperature 0.0 >> result_Llama-3.1-8B-Instruct_llm_api_sft2_KBQA_GrailQA_reward.log 2>&1 &
CUDA_VISIBLE_DEVICES=1 nohup python run_explore.py --llm_simulate_name 8103/simulate --llm_reward_name 8104/reward --base Llama-3.1-8B-Instruct --task test --dataset GrailQA >> result_Llama-3.1-8B-Instruct_test_KBQA_GrailQA_sft2.log 2>&1 &
bash utils/kill_llm_api_GrailQA.sh
python prepare_sft_data.py --dataset GraphQ
CUDA_VISIBLE_DEVICES=0,1,2,3 nohup deepspeed --num_gpus 4 --master_port=9902 src/train.py --deepspeed ds_config.json --model_name_or_path meta-llama/Llama-3.1-8B-Instruct --stage sft --do_train --finetuning_type lora --lora_target q_proj,v_proj --dataset sft_KBQA_GraphQ_simulate --template llama3 --cutoff_len 1024 --overwrite_cache --preprocessing_num_workers 16 --output_dir expr/KBQA/Llama-3.1-8B-Instruct/GraphQ/sft_KBQA_GraphQ_simulate/checkpoint --logging_steps 10 --save_steps 10000 --plot_loss --overwrite_output_dir --per_device_train_batch_size 1 --gradient_accumulation_steps 1 --learning_rate 5e-5 --num_train_epochs 50.0 --lr_scheduler_type cosine --bf16 >> result_Llama-3.1-8B-Instruct_sft_KBQA_GraphQ_simulate.log 2>&1 &
CUDA_VISIBLE_DEVICES=0,1,2,3 nohup deepspeed --num_gpus 4 --master_port=9902 src/train.py --deepspeed ds_config.json --model_name_or_path meta-llama/Llama-3.1-8B-Instruct --stage sft --do_train --finetuning_type lora --lora_target q_proj,v_proj --use_dora --dataset sft_KBQA_GraphQ_reward --template llama3 --cutoff_len 1024 --overwrite_cache --preprocessing_num_workers 16 --output_dir expr/KBQA/Llama-3.1-8B-Instruct/GraphQ/sft_KBQA_GraphQ_reward/checkpoint --logging_steps 10 --save_steps 10000 --plot_loss --overwrite_output_dir --per_device_train_batch_size 1 --gradient_accumulation_steps 1 --learning_rate 5e-5 --num_train_epochs 100.0 --lr_scheduler_type cosine --bf16 >> result_Llama-3.1-8B-Instruct_sft_KBQA_GraphQ_reward.log 2>&1 &
CUDA_VISIBLE_DEVICES=0,1,2,3 python src/export_model.py --model_name_or_path meta-llama/Llama-3.1-8B-Instruct --adapter_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/GraphQ/sft_KBQA_GraphQ_simulate/checkpoint --template llama3 --finetuning_type lora --use_dora --export_dir expr/KBQA/Llama-3.1-8B-Instruct/GraphQ/sft_KBQA_GraphQ_simulate/export_model/simulate --export_size 2 --export_legacy_format False
CUDA_VISIBLE_DEVICES=0,1,2,3 python src/export_model.py --model_name_or_path meta-llama/Llama-3.1-8B-Instruct --adapter_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/GraphQ/sft_KBQA_GraphQ_reward/checkpoint --template llama3 --finetuning_type lora --use_dora --export_dir expr/KBQA/Llama-3.1-8B-Instruct/GraphQ/sft_KBQA_GraphQ_reward/export_model/reward --export_size 2 --export_legacy_format False
CUDA_VISIBLE_DEVICES=2 API_PORT=8105 MODEL_NAME=simulate nohup python src/llm_api.py --model_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/GraphQ/sft_KBQA_GraphQ_simulate/export_model/simulate --template llama3 --temperature 0.0 >> result_Llama-3.1-8B-Instruct_llm_api_sft_KBQA_GraphQ_simulate.log 2>&1 &
CUDA_VISIBLE_DEVICES=2 API_PORT=8106 MODEL_NAME=reward nohup python src/llm_api.py --model_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/GraphQ/sft_KBQA_GraphQ_reward/export_model/reward --template llama3 --temperature 0.0 >> result_Llama-3.1-8B-Instruct_llm_api_sft_KBQA_GraphQ_reward.log 2>&1 &
CUDA_VISIBLE_DEVICES=2 nohup python run_explore.py --llm_simulate_name 8105/simulate --llm_reward_name 8106/reward --base Llama-3.1-8B-Instruct --task explore --dataset GraphQ >> result_Llama-3.1-8B-Instruct_explore_KBQA_GraphQ_sft.log 2>&1 &
python prepare_sft2_data.py --llm_reward_name 8106/reward --base Llama-3.1-8B-Instruct --dataset GraphQ --limit "-50"
bash utils/kill_llm_api_GraphQ.sh
CUDA_VISIBLE_DEVICES=0,1,2,3 nohup deepspeed --num_gpus 4 --master_port=9902 src/train.py --deepspeed ds_config.json --model_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/GraphQ/sft_KBQA_GraphQ_simulate/export_model/simulate --stage sft --do_train --finetuning_type lora --lora_target q_proj,v_proj --dataset sft2_KBQA_GraphQ_simulate --template llama3 --cutoff_len 1024 --overwrite_cache --preprocessing_num_workers 16 --output_dir expr/KBQA/Llama-3.1-8B-Instruct/GraphQ/sft2_KBQA_GraphQ_simulate/checkpoint --logging_steps 10 --save_steps 10000 --plot_loss --overwrite_output_dir --per_device_train_batch_size 1 --gradient_accumulation_steps 1 --learning_rate 5e-5 --num_train_epochs 10.0 --lr_scheduler_type cosine --bf16 >> result_Llama-3.1-8B-Instruct_sft2_KBQA_GraphQ_simulate.log 2>&1 &
CUDA_VISIBLE_DEVICES=0,1,2,3 nohup deepspeed --num_gpus 4 --master_port=9902 src/train.py --deepspeed ds_config.json --model_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/GraphQ/sft_KBQA_GraphQ_reward/export_model/reward --stage sft --do_train --finetuning_type lora --lora_target q_proj,v_proj --use_dora --dataset sft2_KBQA_GraphQ_reward --template llama3 --cutoff_len 1024 --overwrite_cache --preprocessing_num_workers 16 --output_dir expr/KBQA/Llama-3.1-8B-Instruct/GraphQ/sft2_KBQA_GraphQ_reward/checkpoint --logging_steps 10 --save_steps 10000 --plot_loss --overwrite_output_dir --per_device_train_batch_size 1 --gradient_accumulation_steps 1 --learning_rate 5e-5 --num_train_epochs 20.0 --lr_scheduler_type cosine --bf16 >> result_Llama-3.1-8B-Instruct_sft2_KBQA_GraphQ_reward.log 2>&1 &
CUDA_VISIBLE_DEVICES=0,1,2,3 python src/export_model.py --model_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/GraphQ/sft_KBQA_GraphQ_simulate/export_model/simulate --adapter_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/GraphQ/sft2_KBQA_GraphQ_simulate/checkpoint --template llama3 --finetuning_type lora --use_dora --export_dir expr/KBQA/Llama-3.1-8B-Instruct/GraphQ/sft2_KBQA_GraphQ_simulate/export_model/simulate --export_size 2 --export_legacy_format False
CUDA_VISIBLE_DEVICES=0,1,2,3 python src/export_model.py --model_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/GraphQ/sft_KBQA_GraphQ_reward/export_model/reward --adapter_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/GraphQ/sft2_KBQA_GraphQ_reward/checkpoint --template llama3 --finetuning_type lora --use_dora --export_dir expr/KBQA/Llama-3.1-8B-Instruct/GraphQ/sft2_KBQA_GraphQ_reward/export_model/reward --export_size 2 --export_legacy_format False
CUDA_VISIBLE_DEVICES=2 API_PORT=8105 MODEL_NAME=simulate nohup python src/llm_api.py --model_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/GraphQ/sft2_KBQA_GraphQ_simulate/export_model/simulate --template llama3 --temperature 0.0 >> result_Llama-3.1-8B-Instruct_llm_api_sft2_KBQA_GraphQ_simulate.log 2>&1 &
CUDA_VISIBLE_DEVICES=2 API_PORT=8106 MODEL_NAME=reward nohup python src/llm_api.py --model_name_or_path expr/KBQA/Llama-3.1-8B-Instruct/GraphQ/sft2_KBQA_GraphQ_reward/export_model/reward --template llama3 --temperature 0.0 >> result_Llama-3.1-8B-Instruct_llm_api_sft2_KBQA_GraphQ_reward.log 2>&1 &
CUDA_VISIBLE_DEVICES=2 nohup python run_explore.py --llm_simulate_name 8105/simulate --llm_reward_name 8106/reward --base Llama-3.1-8B-Instruct --task test --dataset GraphQ >> result_Llama-3.1-8B-Instruct_test_KBQA_GraphQ_sft2.log 2>&1 &
bash utils/kill_llm_api_GraphQ.sh
If you find this work is helpful for your research, please cite:
@misc{luo2025kbqao1,
title={KBQA-o1: Agentic Knowledge Base Question Answering with Monte Carlo Tree Search},
author={Haoran Luo and Haihong E and Yikai Guo and Qika Lin and Xiaobao Wu and Xinyu Mu and Wenhao Liu and Meina Song and Yifan Zhu and Luu Anh Tuan},
year={2025},
eprint={2501.18922},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2501.18922},
}
For further questions, please contact: luohaoran@bupt.edu.cn.
This repo benefits from KB-Coder, LLM-Reasoners and LLaMA-Factory. Thanks for their wonderful works.