This repository contains implementations for machine unlearning methods on LLM360 Models. Machine unlearning is a pre-deployment safety measure designed to remove hazardous knowledge from language models. Unlearned models are inherently safe, as they lack the knowledge to be misused.
Here's a list of unlearning methods we have implemented so far.
Method | Model |
---|---|
max_entropy | CrystalChat |
unlearn.py
is the main entrypoint for running unlearning methods. It uses python modules in methods/
and utils/
folders.
The methods/
folder contains the implementations for unlearning methods:
training.py
: All training loop implementationsutils.py
: Loss functions and other method-related utils
The utils/
folder contains helper functions for model/dataset IO:
dataloaders.py
: Dataloader for text datasetsmodel_utils.py
: Model IO utils
By default, unlearned models are saved to models/
folder. Please store all training datasets to the data/
folder.
Note
This project uses the bio-forget-corpus from the WMDP Benchmark for unlearning training. Access to this dataset requires a separate request. Please follow the instructions provided here to obtain the necessary permissions. By default, the dataloader is configured to load the dataset from data/bio_forget.jsonl
.
- Clone and enter the repo:
git clone https://github.com/xyzhu123/Unlearn360.git cd Unlearn360
- Install dependencies:
pip install -r requirements.txt
- To install
lm-eval
, run the following commands visit the official repo:git clone --depth 1 https://github.com/EleutherAI/lm-evaluation-harness cd lm-evaluation-harness pip install -e .
An example usage is provided in the max_entropy_exp.ipynb
, which can be executed with a single A100 80G
GPU.