This is the official implementation of the RAS paper.
- Environment Setup
- Train Theme Classifier and Distribution Shifter
- Train Text-to-Triples Model
- Training Data (HotpotQA-SUBQ) Processing
- Train GraphLLM by Multi-task Learning
- Knowledge Indexing
- Run Baselines
- Run RAS
- Evaluation
Please follow the commands below (in exact order) to setup the environment.
# Create new env
conda create -n ras python=3.10 -y
# Activate it
conda activate ras
# Install PyTorch first, separately
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118
pip install torch-geometric
pip install torch-scatter torch-sparse torch-cluster torch-spline-conv -f https://data.pyg.org/whl/torch-2.5.1+cu118.html #(depends on your cuda version)
pip install transformers wandb tqdm peft accelerate bitsandbytes sentencepiece
First, download DBPedia-298 dataset from here.
cd classifier_shifter
sh doc_train.sh
sh shifter_train.sh # please process HotpotQA-SUBQ data (see below) before this
cd classifier_shifter
python theme_predictor.py
First, download WikiOFGraph dataset from here.
cd text_to_triples
sh train.sh
You can download the processed data from here.
Alternatively, you can process the data by yourself as follows:
First, download hotpotqa training set from here.
Then, run the following commands to process the training data.
cd llm_training_data_process
# Process the hotpotqa data
python process_hotpot.py
# Generate subqueries for hotpotqa
python generate_subqueries.py
# identify questions that don't need both subqueries and retrieval
python training_data_gen_wo_ret_wo_subq.py
cd ../text_to_triples
# generate triples for hotpotqa docs
python generate.py
# process the data with graphs
cd ../llm_training_data_process
python a_planner_data_process.py
python a_1_hotpotqa_only.py
python b_answerer_data_process.py
cd framework
sh train.sh # or train_8b.sh for 8B model
sh test_planner.sh
sh test_answerer.sh
# Download corpora
cd knowledge_indexing
sh download_corpora.sh
# Theme Indexing
cd theme
sh class_labeling.sh
sh convert.sh
# Dense Indexing
cd ../dense
sh dense_index.sh
sh combine.sh
cd baselines
sh run.sh # (please see the arguments in the run.sh file to change the dataset, model, etc.)
cd framework
sh run_ras.sh
run the eval.ipynb
file in the framework/
folder.
Uncomment the metrics you want to evaluate on.
To run closed-source Sonnet-3.5 in either baselines' setting or RAS, please fill in the key information in the claude_api_example.py
file, and rename it to claude_api.py
, and put it under both baselines/
and framework/
.