GitHub - pranavh4/llm_unlearn: Implement unlearning methods such as Gradient Ascent for State Space Models and compare unlearning efficacy with Transformer based LLMs

Exploring Unlearning in State Space Models

Project developed as part of the CSE 575 Statistical Machine Learning Course taken by Dr. Kookjin Lee at Arizona State University.

Abstract

This study explores the application of machine unlearning techniques to State Space Models (SSMs), an area that has received limited attention compared to Transformer models. The research aims to adapt existing unlearning methods for SSMs and compare their performance with Transformer models in terms of effectiveness, efficiency, and privacy-preserving capabilities. The experiment utilizes OPT-1.3B, Pythia-1.4B (Transformer models), and Mamba-1.4B (State Space Model). The PKU-SafeRLHF dataset, containing unsafe prompt-response pairs, is used as the forget dataset. Two unlearning methods are implemented: Gradient Ascent and Gradient Ascent with Mismatch. Results indicate that Transformer models respond quickly to fine-tuning methods, achieving good unlearning performance after only 2000 examples. In contrast, the Mamba model (SSM) shows more rigidity and moves very slowly towards the unlearning target. This difference in behavior might be attributed to the distinct internal knowledge representation in SSM and Transformer architectures. The study suggests that State Space Models may be more resilient to phenomena such as Catastrophic Forgetting. However, further research is needed to explore why SSMs remain rigid during fine-tuning and gradient ascent. The findings open avenues for developing custom unlearning algorithms explicitly tailored for SSMs.

The entire report with all info can be found here.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
assets		assets
.gitignore		.gitignore
CSE_575_Final_Report_Group_13.pdf		CSE_575_Final_Report_Group_13.pdf
Evaluation_Code.ipynb		Evaluation_Code.ipynb
LICENSE		LICENSE
README.md		README.md
evaluate_forget.py		evaluate_forget.py
finetune_sol.sh		finetune_sol.sh
generate_truthful_qa_responses.py		generate_truthful_qa_responses.py
requirements.txt		requirements.txt
sol_evaluate_forget.sh		sol_evaluate_forget.sh
sol_generate_truthful_qa_responses.sh		sol_generate_truthful_qa_responses.sh
unlearn_harm.py		unlearn_harm.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Exploring Unlearning in State Space Models

Abstract

About

Uh oh!

Releases

Packages

Languages

License

pranavh4/llm_unlearn

Folders and files

Latest commit

History

Repository files navigation

Exploring Unlearning in State Space Models

Abstract

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages