Code for implementing paper "Tokens for Learning, Tokens for Unlearning: Mitigating Membership Inference Attacks in Large Language Models via Dual-Purpose Training"
Step 1: Please adjust "cache_dir" as your desired local folder. The HF models will be downloaded and stored in cache_dir.
Conventional FT:
python trainer.py --llm="gpt2" --dataset="ccnews" --type="target"
Goldfish:
python trainer-goldfish.py --llm="gpt2" --dataset="ccnews" --k=4
DPSGD:
python trainer-dp.py --llm="gpt2" --dataset="ccnews" --epsilon=8
DuoLearn:
- train a ref model
python trainer.py --llm="gpt2" --dataset="ccnews" --type="ref"
- adjust "ref_model_path" in trainer-duolearn.py then train the target model
python trainer-duolearn.py --llm="gpt2" --dataset="ccnews"
MIA:
- train a ref-attack model
python trainer.py --llm="gpt2" --dataset="ccnews" --type="attack"
- adjust the ref-attack model path and perform attack by:
python mia.py --ft_model_path="saved_models/ccnews/trainer/.../best-model" --save_obj_path="MIA-logs/gpt2/ccnews/filename.pkl" --llm="gpt2" --dataset="ccnews"
BackDoor Attack:
- The code also supports backdoor, first we need to train a backdoor model
python trainer.py --llm="gpt2" --dataset="ccnews" --type="backdoor"
-
Adjust "backdoor_model_path" appropriately and train target models same as above but with --backdoor
$ python trainer.py --llm="gpt2" --dataset="ccnews" --type="target" --backdoor $ python trainer-goldfish.py --llm="gpt2" --dataset="ccnews" --k=4 --backdoor ...