TraceGuard: Fine-Tuning Pre-Trained Model by Using Stego Images to Trace Its User
<p>Diagram of tracing a pre-trained model. The model owner spends significant money and resources to obtain and deploy pre-trained models as an MLaaS (machine learning as a service) to attain legal profit. The model owner can also authorize users to legally use their model. However, some malicious users conduct a secondary deployment of the model to earn an illegal profit. To conceal evidence of piracy, malicious users may launch various fingerprint removal attacks on pirate models, such as fine-tuning (FT), pruning (PR), and model stealing (MS), for which it is challenging for the owner to trace the suspect model.</p> "> Figure 2
<p>Overview of TraceGuard. (<b>a</b>) The steganography network composed of encoder and decoder is learned by using a base pre-trained model, <span class="html-italic">D<sub>s</sub></span>, and a set of secret keys. (<b>b</b>) The base pre-trained model is fine-tuned to be individually fingerprinted by using the steganography network, <span class="html-italic">D<sub>f</sub></span>, and secret key designated to each authorized user. (<b>c</b>) The suspect pre-trained model is traced by querying stego-images, which are generated by using <span class="html-italic">D<sub>t</sub></span> and steganography network. The authorized user allocated for the secret key with the lowest error rate between the embedded and extracted and below the pre-set threshold <span class="html-italic">T</span> is responsible for the suspect pre-trained model. For a concise description, we only show a single extracted key. In effect, we use the query set to obtain a set of extracted keys so as to obtain a stable error rate. See details in <a href="#sec3dot5-mathematics-12-03333" class="html-sec">Section 3.5</a>.</p> "> Figure 3
<p>Flowchart of training steganography network: encoder (E), decoder (D), pre-trained model (P), and loss function.</p> "> Figure 4
<p>Flowchart of fingerprinting pre-trained model.</p> "> Figure 5
<p>The impact of the length <span class="html-italic">L</span> and minimum code distance <span class="html-italic">D</span> on extraction error rate. The orange line represents the extraction error rate (FP) of <math display="inline"><semantics> <msubsup> <mi>K</mi> <mrow> <mi>em</mi> </mrow> <mi>i</mi> </msubsup> </semantics></math>, where <math display="inline"><semantics> <msubsup> <mi>K</mi> <mrow> <mi>em</mi> </mrow> <mi>i</mi> </msubsup> </semantics></math> corresponds to the <span class="html-italic">i</span>-th fingerprinted model. The blue line and the green line represent the maximum extraction error rate (NFP-MAX) and the minimum extraction error rate (NFP-MIN) for <math display="inline"><semantics> <msubsup> <mi>K</mi> <mrow> <mi>em</mi> </mrow> <mi>j</mi> </msubsup> </semantics></math> such that <math display="inline"><semantics> <mrow> <mi>C</mi> <mo>(</mo> <msubsup> <mi>K</mi> <mrow> <mi>em</mi> </mrow> <mi>i</mi> </msubsup> <mo>,</mo> <msubsup> <mi>K</mi> <mrow> <mi>em</mi> </mrow> <mi>j</mi> </msubsup> <mo>)</mo> <mo>=</mo> <mi>D</mi> </mrow> </semantics></math>. The purple area represents the extraction error rate distribution for extracting <math display="inline"><semantics> <msubsup> <mi>K</mi> <mrow> <mi>em</mi> </mrow> <mi>j</mi> </msubsup> </semantics></math>.</p> "> Figure 6
<p>(<b>a</b>) ROC curve for evaluating binary classification of the independent model and fingerprinted model. (<b>b</b>) Robustness against pruning.</p> "> Figure 7
<p>The robustness against fine-tuning: (<b>a</b>) as a result of FTAL, (<b>b</b>) as a result of RTAL.</p> "> Figure 8
<p>(<b>a</b>) Impact of the number of examples used in fingerprinting step. (<b>b</b>) Impact of the number of examples used in tracing step.</p> "> Figure 9
<p>(<b>a</b>) The impact of self-supervised fine-tuning on uniqueness. (<b>b</b>) The impact of self-supervised fine-tuning on utility. (<b>c</b>) The impact of model stealing simulation on robustness.</p> ">
Abstract
:1. Introduction
- TraceGuard is proposed. It is the first framework for tracing a pre-trained model to its user, ascertaining which authorized user illegally released the pirate model or verifying that the suspect model is independent.
- A novel fingerprinting strategy is introduced. By using stego images, fine-tuning for unique secret key extraction, model stealing simulations, and fine-tuning for self-supervised learning are alternately executed to balance the effectiveness and robustness of the fingerprint and the utility of the fingerprinted pre-trained model. Experimental results confirm that TraceGuard has generated a traceable pre-trained model with performance well preserved.
- Extensive evaluations on benchmark datasets demonstrate that TraceGuard is robust against fingerprint removal attacks such as model fine-tuning, pruning, and model stealing attack.
2. Related Works
2.1. Copyright Verification and Piracy Attribution for Suspect Model
2.2. Steganography
2.3. Model Stealing Attack
3. Proposed Approach
3.1. Threat Model
3.1.1. Adversary’s Capability and Knowledge
3.1.2. Defender’s Capability and Knowledge
3.2. Design Overview
3.3. Training Steganography Network
Algorithm 1 Training Steganography Network |
Input: base pre-trained model P, Ds, secret key set Sk, training epochs Nt, learning rate Output: encoder (E), decoder (D)
|
3.4. Fingerprinting Pre-Trained Model
3.4.1. Model Stealing Simulation
3.4.2. Fine-Tuning for Uniqueness
3.4.3. Fine-Tuning for Utility
Algorithm 2 Fingerprinting Pre-trained Model |
Input: base pre-trained model P, steganography network including encoder (E) and decoder (D), , , , fingerprinting epochs , learning rate Output: fingerprinted model
|
3.5. Tracing Suspect Model
Algorithm 3 Tracing Suspect Model |
Input: suspect model , steganography network including encoder E and decoder D, , designated key set , threshold T Output: tracing result
|
4. Experiments
4.1. Experiment Setup
4.1.1. Datasets
4.1.2. Learning Algorithm and Structure for Pre-Trained Model
4.1.3. Hyperparameters
4.2. Capacity of Tracing
4.3. Effectiveness of TraceGuard
4.3.1. Uniqueness
4.3.2. Utility
4.3.3. Independent Model Decision
4.4. Robustness of TraceGuard Against Fingerprint Removal Attack
4.4.1. Pruning Attack
4.4.2. Fine-Tuning Attack
4.4.3. Model Stealing Attack
4.4.4. The Overall Evaluation for Robustness
4.5. Ablation Study
4.5.1. Impact of the Number of Examples Used in Fingerprinting Phase
4.5.2. Impact of the Number of Examples Used in Tracing Phase
4.5.3. Impact of Self-Supervised Fine-Tuning
4.5.4. Impact of the Model Stealing Simulation
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Al Bdairi, A.J.A.; Xiao, Z.; Alkhayyat, A. Face recognition based on Deep Learning and FPGA for ethnicity identification. Appl. Sci. 2022, 12, 2605. [Google Scholar] [CrossRef]
- Zou, Z.; Chen, K.; Shi, Z.; Guo, Y. Object detection in 20 years: A survey. Proc. IEEE 2023, 111, 257–276. [Google Scholar] [CrossRef]
- Sun, J.; Wang, Z.; Zhang, S. Onepose: One-shot object pose estimation without cad models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 6825–6834. [Google Scholar]
- Chen, T.; Kornblith, S.; Norouzi, M. A simple framework for contrastive learning of visual representations. In Proceedings of the International Conference on Machine Learning, Baltimore, MD, USA, 17–23 July 2022; pp. 1597–1607. [Google Scholar]
- Chen, X.; Fan, H.; Girshick, R. Improved baselines with momentum contrastive learning. arXiv 2020, arXiv:2003.04297. [Google Scholar]
- He, K.; Chen, X.; Xie, S. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 16000–16009. [Google Scholar]
- Ribeiro, M.; Grolinger, K.; Capretz, M.A. Mlaas: Machine learning as a service. In Proceedings of the 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA), Miami, FL, USA, 9–11 December 2015; pp. 896–902. [Google Scholar]
- Zhao, X.; Yao, Y.; Wu, H. Structural watermarking to deep neural networks via network channel pruning. In Proceedings of the 2021 IEEE International Workshop on Information Forensics and Security (WIFS), Montpellier, France, 7–10 December 2021; pp. 1–6. [Google Scholar]
- Yadollahi, M.M.; Shoeleh, F.; Dadkhah, S. Robust black-box watermarking for deep neural network using inverse document frequency. In Proceedings of the 2021 IEEE International Conference on Dependable, Autonomic and Secure Computing, International Conference on Pervasive Intelligence and Computing, International Conference on Cloud and Big Data Computing, International Conference on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech), Online, 25–28 October 2021; pp. 574–581. [Google Scholar]
- Zhang, T.; Wu, H.; Lu, X.; Han, G.; Sun, G. AWEncoder: Adversarial Watermarking Pre-Trained Encoders in Contrastive Learning. Appl. Sci. 2023, 13, 3531. [Google Scholar] [CrossRef]
- Zhang, J.; Chen, D.; Liao, J.; Ma, Z.; Fang, H.; Zhang, W.; Feng, H.; Hua, G.; Yu, N. Robust Model Watermarking for Image Processing Networks via Structure Consistency. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 6985–6992. [Google Scholar] [CrossRef] [PubMed]
- Zhang, J.; Chen, D.; Liao, J. Passport-aware normalization for deep model protection. Adv. Neural Inf. Process. Syst. 2020, 33, 22619–22628. [Google Scholar]
- Zhu, H.; Liang, S.; Hu, W.; Li, F.; Jia, J.; Wang, S. Reliable Model Watermarking: Defending Against Theft without Compromising on Evasion. arXiv 2024, arXiv:2404.13518. [Google Scholar]
- Kuribayashi, M.; Tanaka, T.; Suzuki, S. White-box watermarking scheme for fully-connected layers in fine-tuning model. In Proceedings of the 2021 ACM Workshop on Information Hiding and Multimedia Security, Online, 22–25 June 2021; pp. 165–170. [Google Scholar]
- Wang, Z.; Wu, Y.; Huang, H. Defense against Model Extraction Attack by Bayesian Active Watermarking. In Proceedings of the Forty-First International Conference on Machine Learning, Vienna, Austria, 21–27 June 2024. [Google Scholar]
- Maung, A.P.; Kiya, H. Piracy-resistant DNN watermarking by block-wise image transformation with secret key. In Proceedings of the 2021 ACM Workshop on Information Hiding and Multimedia Security, Online, 22–25 July 2021; pp. 159–164. [Google Scholar]
- Szyller, S.; Atli, B.G.; Marchal, S. Dawn: Dynamic adversarial watermarking of neural networks. In Proceedings of the 29th ACM International Conference on Multimedia, Online, 20–24 October 2021; pp. 4417–4425. [Google Scholar]
- Rouhani, B.D.; Chen, H.; Koushanfar, F. Deepsigns: An end-to-end watermarking framework for protecting the ownership of deep neural networks. In Proceedings of the ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Providence, RI, USA, 13–17 April 2019; pp. 1–12. [Google Scholar]
- Wu, H.; Liu, G.; Yao, Y.; Zhang, X. Watermarking neural networks with watermarked images. IEEE Trans. Circuits Syst. Video Technol. 2021, 31, 2591–2601. [Google Scholar] [CrossRef]
- Chen, H.; Rouhani, B.D.; Koushanfar, F. DeepMarks: A Digital Fingerprinting Framework for Deep Neural Networks. arXiv 2018, arXiv:1804.03648. [Google Scholar]
- Sun, S.; Xue, M.; Wang, J. Protecting the intellectual properties of deep neural networks with an additional class and steganography images. arXiv 2021, arXiv:2104.09203. [Google Scholar]
- Yu, N.; Skripniuk, V.; Chen, D.; Davis, L.; Fritz, M. Responsible, Disclosure, of Generative, Models Using Scalable Fingerprinting. In Proceedings of the International Conference on Learning Representations (ICLR), Online, 25 April 2022. [Google Scholar]
- Li, M.; Wu, H.; Zhang, X. Generating traceable adversarial text examples by watermarking in the semantic space. J. Electron. Imaging 2022, 31, 063034. [Google Scholar] [CrossRef]
- Liu, H.; Zhang, W.; Li, B.; Ghanem, B.; Schmidhuber, J. Lazy Layers to Make Fine-Tuned Diffusion Models More Traceable. arXiv 2024, arXiv:2405.00466. [Google Scholar]
- Li, J.; Wang, H.; Li, S.; Qian, Z.; Zhang, X.; Vasilakos, A.V. Are handcrafted filters helpful for attributing AI-generated images? In Proceedings of the 32nd ACM International Conference on Multimedia, Melbourne, VIC, Australia, 28 October–1 November 2024.
- Yao, Y.; Wang, J.; Chang, Q.; Ren, Y.; Meng, W. High invisibility image steganography with wavelet transform and generative adversarial network. Expert Syst. Appl. 2024, 249, 123540. [Google Scholar] [CrossRef]
- Yu, J.; Zhang, X.; Xu, Y.; Zhang, J. Cross: Diffusion model makes controllable, robust and secure image steganography. In Proceedings of the 37th International Conference on Neural Information Processing Systems (NIPS ’23), New Orleans, LA, USA, 10–16 December 2023; pp. 80730–80743. [Google Scholar]
- Bui, T.; Agarwal, S.; Yu, N.; Collomosse, J. Rosteals: Robust steganography using autoencoder latent space. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 933–942. [Google Scholar]
- Yansong, G.; Qiu, H.; Zhang, Z.; Wang, B.; Ma, H.; Abuadbba, A.; Xue, M.; Fu, A.; Nepal, S. Deeptheft: Stealing dnn model architectures through power side channel. In Proceedings of the 2024 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 19–23 May 2024; pp. 3311–3326. [Google Scholar]
- Chuan, Z.; Liang, H.; Li, Z.; Wu, T.; Wang, L.; Zhu, L. PtbStolen: Pre-trained Encoder Stealing Through Perturbed Samples. In Proceedings of the International Symposium on Emerging Information Security and Applications, Hangzhou, China, 29–30 October 2023; Springer Nature: Singapore, 2023; pp. 1–19. [Google Scholar]
- Pratik, K.; Basu, D. Marich: A Query-efficient Distributionally Equivalent Model Extraction Attack Using Public Data. In Proceedings of the 37th International Conference on Neural Information Processing Systems (NIPS ’23), New Orleans, LA, USA, 10–16 December 2023; pp. 72412–72445. [Google Scholar]
- Kariyappa, S.; Prakash, A.; Qureshi, M.K. Maze: Data-free model stealing attack using zeroth-order gradient estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 13814–13823. [Google Scholar]
- Truong, J.B.; Maini, P.; Walls, R.J. Data-free model extraction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 4771–4780. [Google Scholar]
- Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Fei-Fei, L. Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 2015, 115, 211–252. [Google Scholar] [CrossRef]
- Wang, D.; Tan, X. Unsupervised feature learning with C-SVDDNet. Pattern Recognit. 2016, 60, 473–485. [Google Scholar] [CrossRef]
Authors and Publication Year | Title | Objective |
---|---|---|
Chen et al. (2018) [20] | Deepmarks: A digital fingerprinting framework for deep neural networks | Tracing user of classification model |
Sun et al. (2021) [21] | Protecting intellectual property of deep neural networks with an additional class and stego images | Tracing user of classification model |
Yu et al. (2022) [22] | Responsible disclosure of generative models using scalable fingerprinting | Tracing the source of generated images |
Li et al. (2022) [23] | Generating traceable adversarial text examples by watermarking in semantic space | Tracing the source of generated text |
Liu et al. (2024) [24] | Lazy Layers to Make Fine-Tuned Diffusion Models More Traceable | Tracing ownership of diffusion models |
Li et al. (2024) [25] | Are handcrafted filters helpful for attributing AI-generated images? | Tracing the source of generated images |
Notation | Definition |
---|---|
E, D | encoder and decoder of the steganography networks |
P, , , | pre-trained model/shadow pre-trained model/ fingerprinted pre-trained model/suspect model |
, | embedded key/extracted key |
, | cover images/stego images |
, | dataset for training steganography network/ dataset for fingerprinting/tracing pre-trained model |
SSL | Model | ImageNet | STL-10 | ||||||
---|---|---|---|---|---|---|---|---|---|
FP | NFP-MIN | FP-ACC (%) | Base-ACC (%) | FP | NFP-MIN | FP-ACC (%) | Base-ACC (%) | ||
SimCLR | RN-18 | 0.1293 | 0.2031 | 85.36 | 86.81 | 0.0334 | 0.1666 | 67.81 | 67.76 |
RN-34 | 0.1378 | 0.2639 | 86.33 | 90.61 | 0.0686 | 0.1351 | 70.40 | 71.66 | |
DN-121 | 0.2518 | 0.3187 | 91.22 | 91.86 | 0.2704 | 0.3367 | 72.86 | 72.36 | |
MoCo v2 | RN-18 | 0.0315 | 0.1542 | 84.52 | 81.69 | 0.0337 | 0.1000 | 61.85 | 59.93 |
RN-34 | 0.0143 | 0.1377 | 83.43 | 80.86 | 0.1119 | 0.2447 | 60.24 | 58.89 | |
DN-121 | 0.0786 | 0.1612 | 84.72 | 86.58 | 0.0695 | 0.2009 | 62.75 | 63.05 | |
MAE | ViT-B | 0.0012 | 0.1333 | 88.64 | 86.78 | 0.2000 | 0.3042 | 72.38 | 70.63 |
ViT-L | 0.1001 | 0.1667 | 79.05 | 84.56 | 0.1332 | 0.2618 | 60.08 | 67.46 |
SSL | Model | Original Model | Pirate Model | ||
---|---|---|---|---|---|
FP | NFP-MIN | FP | NFP-MIN | ||
SimCLR | RN-18 | 0.1293 | 0.2301 | 0.3967 | 0.4183 |
RN-34 | 0.1378 | 0.2639 | 0.4554 | 0.4591 | |
MoCo v2 | RN-18 | 0.0315 | 0.1542 | 0.4312 | 0.4725 |
RN-34 | 0.0143 | 0.1377 | 0.3914 | 0.4112 | |
MAE | ViT-B | 0.0012 | 0.1333 | 0.3722 | 0.4013 |
SSL | Original | Fine-Tuning | Pruning | Model Stealing | ||||
---|---|---|---|---|---|---|---|---|
M = 5 | M = 10 | M = 5 | M = 10 | M = 5 | M = 10 | M = 5 | M = 10 | |
SimCLR | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
MoCo v2 | 1 | 1 | 1 | 1 | 1 | 1 | 0.95 | 0.90 |
MAE | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0.93 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhou, L.; Ren, X.; Qian, C.; Sun, G. TraceGuard: Fine-Tuning Pre-Trained Model by Using Stego Images to Trace Its User. Mathematics 2024, 12, 3333. https://doi.org/10.3390/math12213333
Zhou L, Ren X, Qian C, Sun G. TraceGuard: Fine-Tuning Pre-Trained Model by Using Stego Images to Trace Its User. Mathematics. 2024; 12(21):3333. https://doi.org/10.3390/math12213333
Chicago/Turabian StyleZhou, Limengnan, Xingdong Ren, Cheng Qian, and Guangling Sun. 2024. "TraceGuard: Fine-Tuning Pre-Trained Model by Using Stego Images to Trace Its User" Mathematics 12, no. 21: 3333. https://doi.org/10.3390/math12213333
APA StyleZhou, L., Ren, X., Qian, C., & Sun, G. (2024). TraceGuard: Fine-Tuning Pre-Trained Model by Using Stego Images to Trace Its User. Mathematics, 12(21), 3333. https://doi.org/10.3390/math12213333