We are excited to present 'REEF: Representation Encoding Fingerprints for Large Language Models,' an efficient and robust approach designed to protect the intellectual property of open-source LLMs.
In this paper, we propose a training-free REEF to identify the relationship between the suspect and victim models from the perspective of LLMs' feature representations. Specifically, REEF computes and compares the centered kernel alignment similarity between the representations of a suspect model and a victim model on the same samples. This training-free REEF does not impair the model's general capabilities and is robust to sequential fine-tuning, pruning, model merging, and permutations.
In summary, REEF provides a simple and effective way for third parties and model owners to protect LLMs' intellectual property together.
git clone https://github.com/tmylla/REEF.git
cd REEF
pip install -r requirements.txt
Main Experiments
cd src/
# generation activations
sh ./scripts/save_activation.sh
# compute the cka-similarity
python compute_cka.py --base_model llama-2-7b --base_layers -1 --test_model vicuna-7b-v1.5 --test_layers -1
# plot the cka-heatmap
plot.ipynb
Preliminary Experiments
# train linear/MLP/CNN classifier
python train_cls.py --model llama-2-7b --layers 18 --datasets truthfulqa
# train GCN classifier
python train_cls_gcn.py --model llama-2-7b --layers 18 --datasets truthfulqa
# apply the classifier to suspect models
python transfer_cls.py --pretrain_dir classifier_path --suspect_model vicuna-7b-v1.5 --layers 18
Replication of Comparative Experiments
- Human-Readable Fingerprint for Large Language Models
python pcs.py
python ics.py
- A Fingerprint for Large Language Models
# generation logits activations
sh ./scripts/save_logits.sh
python logit.py
We are actively maintaining a repository focused on Fingerprinting Large Language Models (LLMs). If you find our project helpful or interesting, we would greatly appreciate your support by giving it a star ⭐.
Distributed under the Apache-2.0 License. See LICENSE for more information.
@misc{zhang2024reefrepresentationencodingfingerprints,
title={REEF: Representation Encoding Fingerprints for Large Language Models},
author={Jie Zhang and Dongrui Liu and Chen Qian and Linfeng Zhang and Yong Liu and Yu Qiao and Jing Shao},
year={2024},
eprint={2410.14273},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2410.14273},
}