Unlearn360

This repository contains implementations for machine unlearning methods on LLM360 Models. Machine unlearning is a pre-deployment safety measure designed to remove hazardous knowledge from language models. Unlearned models are inherently safe, as they lack the knowledge to be misused.

Overview

Here's a list of unlearning methods we have implemented so far.

Method	Model
max_entropy	CrystalChat

Directory Structure

unlearn.py is the main entrypoint for running unlearning methods. It uses python modules in methods/ and utils/ folders.

The methods/ folder contains the implementations for unlearning methods:

training.py: All training loop implementations
utils.py: Loss functions and other method-related utils

The utils/ folder contains helper functions for model/dataset IO:

dataloaders.py: Dataloader for text datasets
model_utils.py: Model IO utils

By default, unlearned models are saved to models/ folder. Please store all training datasets to the data/ folder.

Note

This project uses the bio-forget-corpus from the WMDP Benchmark for unlearning training. Access to this dataset requires a separate request. Please follow the instructions provided here to obtain the necessary permissions. By default, the dataloader is configured to load the dataset from data/bio_forget.jsonl.

Quick Start

Setup

Clone and enter the repo:

git clone https://github.com/xyzhu123/Unlearn360.git
cd Unlearn360

Install dependencies:
```
pip install -r requirements.txt
```

To install lm-eval, run the following commands visit the official repo:

git clone --depth 1 https://github.com/EleutherAI/lm-evaluation-harness
cd lm-evaluation-harness
pip install -e .

Training and Evaluation

An example usage is provided in the max_entropy_exp.ipynb, which can be executed with a single A100 80G GPU.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Unlearn360

Table of Contents

Overview

Directory Structure

Quick Start

Setup

Training and Evaluation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
methods		methods
utils		utils
LICENSE		LICENSE
README.md		README.md
max_entropy_exp.ipynb		max_entropy_exp.ipynb
requirements.txt		requirements.txt
unlearn.py		unlearn.py

License

xyzhu123/Unlearn360

Folders and files

Latest commit

History

Repository files navigation

Unlearn360

Table of Contents

Overview

Directory Structure

Quick Start

Setup

Training and Evaluation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages