A Comparative Analysis of Compression and Transfer Learning Techniques in DeepFake Detection Models
<p>Pruning of convolutional neural networks.</p> "> Figure 2
<p>Knowledge distillation in the teacher–student framework.</p> "> Figure 3
<p>Quantization of deep neural network parameters.</p> "> Figure 4
<p>Low-rank factorization.</p> "> Figure 5
<p>Transfer learning across different tasks.</p> "> Figure 6
<p>CNN with adapter module for transfer learning.</p> "> Figure 7
<p>Knowledge distillation for transfer learning.</p> "> Figure 8
<p>“Dogs vs. cats” dataset example.</p> "> Figure 9
<p>Sample ROC curves for Synthbuster dataset.</p> ">
Abstract
:1. Introduction
- It demonstrates that compressed models can achieve performance levels comparable to uncompressed models, even when compressed to 40%, 30%, 20%, or 10% of the original model size.
- It proposes various approaches for applying transfer learning to DeepFake detection models, instead of training them from scratch, tackling the challenge of efficiently training models with limited resources and data.
- It conducted extensive evaluations across multiple benchmark datasets to establish the generalizability and robustness of compressed models for real-world DeepFake detection applications.
- It presents experimental results and provides an empirical analysis of how different types of synthetic image generators and image types impact the effectiveness of transfer learning.
2. A Critical Literature Review of Deep Fake Detection, Compression and Transfer Learning Techniques
2.1. DeepFake Detection
2.2. Compression of CNNs
2.3. Transfer Learning in CNNs
3. Compression of DeepFake Models
3.1. Pruning
3.2. Knowledge Distillation
3.3. Quantization
3.4. Low-Rank Factorization
4. Transfer Learning in DeepFake Models
4.1. Pruning with Fine-Tuning
4.2. Pruning with Fine-Tuning and Adapters
4.3. Knowledge Distillation Using Data from a New Task
4.4. Knowledge Distillation with Adapter Using Data from a New Task
4.5. Quantization with Fine-Tuning
4.6. Low-Rank Factorization with Fine-Tuning
5. Experimental Evaluation
5.1. Datasets
5.2. Experimental Setup
5.3. Evaluation Metrics
5.4. Outcomes and Discussion
5.4.1. Experimental Evaluation of Compression Using a DeepFake Detection Model Trained on Multiple Synthetic Image Types and Multiple DeepFake Generators
5.4.2. Experimental Evaluation of Compression and Transfer Learning Between Models Trained and Fine-Tuned on Different DeepFake Datasets
5.4.3. Experimental Evaluation of Compression Using a Dataset Generated by Multiple Types of DeepFake Models
5.4.4. Experimental Evaluation on Transfer Learning Between Models Trained on Different Types of Datasets
5.4.5. Experimental Evaluation on Low-Rank Factorization
5.4.6. Discussion
6. Research Implications
7. Future Work
8. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
CNNs | Convolutional Neural Networks |
DNNs | Deep Neural Networks |
FC | Fully Connected |
I2G | Inconsistency Image Generator |
KL | Kullback–Leibler |
PCL | Pair-wise self-Consistency Learning |
S-T | Student-Teacher |
SVD | Singular Value Decomposition |
ViT | Visual Transformers |
References
- Bimpas, A.; Violos, J.; Leivadeas, A.; Varlamis, I. Leveraging pervasive computing for ambient intelligence: A survey on recent advancements, applications and open challenges. Comput. Netw. 2024, 239, 110156. [Google Scholar] [CrossRef]
- Mitra, A.; Mohanty, S.P.; Corcoran, P.; Kougianos, E. iFace: A Deepfake Resilient Digital Identification Framework for Smart Cities. In Proceedings of the 2021 IEEE International Symposium on Smart Electronic Systems (iSES), Jaipur, India, 18–22 December 2021; pp. 361–366. [Google Scholar] [CrossRef]
- Nagothu, D.; Schwell, J.; Chen, Y.; Blasch, E.; Zhu, S. A study on smart online frame forging attacks against Video Surveillance System. In Proceedings of the Sensors and Systems for Space Applications XII, Baltimore, MD, USA, 15–16 April 2019; SPIE: Paris, France, 2019; Volume 11017, pp. 176–188. [Google Scholar] [CrossRef]
- Bilika, D.; Michopoulou, N.; Alepis, E.; Patsakis, C. Hello Me, Meet the Real Me: Audio Deepfake Attacks on Voice Assistants. arXiv 2023, arXiv:2302.10328. [Google Scholar] [CrossRef]
- Sridevi, K.; Kumar, K.S.; Sameera, D.; Garapati, Y.; Krishnamadhuri, D.; Bethu, S. IoT based application designing of Deep Fake Test for Face animation. In Proceedings of the 2022 6th International Conference on Cloud and Big Data Computing, New York, NY, USA, 18–20 August 2022; ICCBDC ’22. pp. 24–30. [Google Scholar] [CrossRef]
- Bethu, S.; Trupthi, M.; Mandala, S.K.; Karimunnisa, S.; Banu, A. AI-IoT Enabled Surveillance Security: DeepFake Detection and Person Re-Identification Strategies. Int. J. Adv. Comput. Sci. Appl. 2024, 15, 1013. [Google Scholar] [CrossRef]
- Chen, J.; Ran, X. Deep Learning With Edge Computing: A Review. Proc. IEEE 2019, 107, 1655–1674. [Google Scholar] [CrossRef]
- Caire, P.; Moawad, A.; Efthymiou, V.; Bikakis, A.; Le Traon, Y. Privacy challenges in ambient intelligence systems. J. Ambient. Intell. Smart Environ. 2016, 8, 619–644. [Google Scholar] [CrossRef]
- Hadidi, R.; Cao, J.; Xie, Y.; Asgari, B.; Krishna, T.; Kim, H. Characterizing the deployment of deep neural networks on commercial edge devices. In Proceedings of the 2019 IEEE International Symposium on Workload Characterization (IISWC), Orlando, FL, USA, 3–9 November 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 35–48. [Google Scholar]
- Liang, T.; Glossner, J.; Wang, L.; Shi, S.; Zhang, X. Pruning and quantization for deep neural network acceleration: A survey. Neurocomputing 2021, 461, 370–403. [Google Scholar] [CrossRef]
- Choukroun, Y.; Kravchik, E.; Yang, F.; Kisilev, P. Low-bit Quantization of Neural Networks for Efficient Inference. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), Seoul, Korea, 27–28 October 2019; pp. 3009–3018. [Google Scholar] [CrossRef]
- Wang, L.; Yoon, K.J. Knowledge distillation and student-teacher learning for visual intelligence: A review and new outlooks. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 3048–3068. [Google Scholar] [CrossRef]
- Sainath, T.N.; Kingsbury, B.; Sindhwani, V.; Arisoy, E.; Ramabhadran, B. Low-rank matrix factorization for deep neural network training with high-dimensional output targets. In Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, BC, Canada, 26–31 May 2013; IEEE: Piscataway, NJ, USA, 2013; pp. 6655–6659. [Google Scholar]
- Vrbančič, G.; Podgorelec, V. Transfer Learning With Adaptive Fine-Tuning. IEEE Access 2020, 8, 196197–196211. [Google Scholar] [CrossRef]
- Moosavi, N.S.; Delfosse, Q.; Kersting, K.; Gurevych, I. Adaptable adapters. arXiv 2022, arXiv:2205.01549. [Google Scholar]
- Bica, I.; Chifor, B.C.; Arseni, S.C.; Matei, I. Multi-Layer IoT Security Framework for Ambient Intelligence Environments. Sensors 2019, 19, 4038. [Google Scholar] [CrossRef]
- Coccomini, D.A.; Caldelli, R.; Falchi, F.; Gennaro, C.; Amato, G. Cross-forgery analysis of vision transformers and cnns for deepfake image detection. In Proceedings of the 1st International Workshop on Multimedia AI Against Disinformation, Newark, NJ, USA, 27–30 June 2022; pp. 52–58. [Google Scholar]
- Bai, Y.; Mei, J.; Yuille, A.L.; Xie, C. Are transformers more robust than cnns? Adv. Neural Inf. Process. Syst. 2021, 34, 26831–26843. [Google Scholar]
- Naseer, M.M.; Ranasinghe, K.; Khan, S.H.; Hayat, M.; Shahbaz Khan, F.; Yang, M.H. Intriguing properties of vision transformers. Adv. Neural Inf. Process. Syst. 2021, 34, 23296–23308. [Google Scholar]
- Wang, Z.; Bai, Y.; Zhou, Y.; Xie, C. Can cnns be more robust than transformers? arXiv 2022, arXiv:2206.03452. [Google Scholar]
- Nguyen, H.H.; Yamagishi, J.; Echizen, I. Capsule-forensics: Using capsule networks to detect forged images and videos. In Proceedings of the ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 12–17 May 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 2307–2311. [Google Scholar]
- Hinton, G.E.; Sabour, S.; Frosst, N. Matrix capsules with EM routing. In Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
- Sabour, S.; Frosst, N.; Hinton, G.E. Dynamic routing between capsules. Adv. Neural Inf. Process. Syst. 2017, 30, 1–11. [Google Scholar]
- Maurício, J.; Domingues, I.; Bernardino, J. Comparing vision transformers and convolutional neural networks for image classification: A literature review. Appl. Sci. 2023, 13, 5521. [Google Scholar] [CrossRef]
- Patrick, M.K.; Adekoya, A.F.; Mighty, A.A.; Edward, B.Y. Capsule networks–a survey. J. King Saud-Univ.-Comput. Inf. Sci. 2022, 34, 1295–1310. [Google Scholar]
- Fung, S.; Lu, X.; Zhang, C.; Li, C.T. Deepfakeucl: Deepfake detection via unsupervised contrastive learning. In Proceedings of the 2021 International Joint Conference on Neural Networks (IJCNN), Virtual, 18–22 July 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 1–8. [Google Scholar]
- Li, J.; Zhou, P.; Xiong, C.; Hoi, S.C. Prototypical contrastive learning of unsupervised representations. arXiv 2020, arXiv:2005.04966. [Google Scholar]
- Chen, T.; Kornblith, S.; Norouzi, M.; Hinton, G. A simple framework for contrastive learning of visual representations. In Proceedings of the International Conference on Machine Learning, Virtual, 13–18 July 2020; pp. 1597–1607. [Google Scholar]
- He, K.; Fan, H.; Wu, Y.; Xie, S.; Girshick, R. Momentum Contrast for Unsupervised Visual Representation Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 9729–9738. [Google Scholar]
- Misra, I.; Maaten, L.v.d. Self-supervised learning of pretext-invariant representations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 6707–6717. [Google Scholar]
- Khosla, P.; Teterwak, P.; Wang, C.; Sarna, A.; Tian, Y.; Isola, P.; Maschinot, A.; Liu, C.; Krishnan, D. Supervised contrastive learning. Adv. Neural Inf. Process. Syst. 2020, 33, 18661–18673. [Google Scholar]
- Chollet, F. Xception: Deep Learning with Depthwise Separable Convolutions. arXiv 2017, arXiv:1610.02357. [Google Scholar] [CrossRef]
- Wang, X.; Qi, G.J. Contrastive learning with stronger augmentations. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 5549–5560. [Google Scholar] [CrossRef]
- Wu, A.; Zheng, W.S.; Lai, J.H. Unsupervised person re-identification by camera-aware similarity consistency learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 27 October–2 November 2019; pp. 6922–6931. [Google Scholar]
- Zhou, D.; Bousquet, O.; Lal, T.; Weston, J.; Schölkopf, B. Learning with local and global consistency. Adv. Neural Inf. Process. Syst. 2003, 16, 1–7. [Google Scholar]
- Mayer, O.; Stamm, M.C. Forensic similarity for digital images. IEEE Trans. Inf. Forensics Secur. 2019, 15, 1331–1346. [Google Scholar] [CrossRef]
- Huh, M.; Liu, A.; Owens, A.; Efros, A.A. Fighting fake news: Image splice detection via learned self-consistency. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 101–117. [Google Scholar]
- Zhao, T.; Xu, X.; Xu, M.; Ding, H.; Xiong, Y.; Xia, W. Learning self-consistency for deepfake detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 15023–15033. [Google Scholar]
- Zhang, T. Deepfake generation and detection, a survey. Multimed. Tools Appl. 2022, 81, 6259–6276. [Google Scholar] [CrossRef]
- Dagar, D.; Vishwakarma, D.K. A literature review and perspectives in deepfakes: Generation, detection, and applications. Int. J. Multimed. Inf. Retr. 2022, 11, 219–289. [Google Scholar] [CrossRef]
- Edwards, P.; Nebel, J.C.; Greenhill, D.; Liang, X. A Review of Deepfake Techniques: Architecture, Detection and Datasets. IEEE Access 2024, 12, 154718–154742. [Google Scholar] [CrossRef]
- Zhang, Y.; Colman, B.; Guo, X.; Shahriyari, A.; Bharaj, G. Common sense reasoning for deepfake detection. In Proceedings of the European Conference on Computer Vision, Milan, Italy, 29 September–4 October 2024; Springer: Cham, Switzerland, 2024; pp. 399–415. [Google Scholar]
- Guarnera, L.; Giudice, O.; Battiato, S. Mastering deepfake detection: A cutting-edge approach to distinguish GAN and diffusion-model images. ACM Trans. Multimed. Comput. Commun. Appl. 2024, 20, 1–24. [Google Scholar] [CrossRef]
- Rokh, B.; Azarpeyvand, A.; Khanteymoori, A. A Comprehensive Survey on Model Quantization for Deep Neural Networks in Image Classification. ACM Trans. Intell. Syst. Technol. 2023, 14, 97:1–97:50. [Google Scholar] [CrossRef]
- Han, S.; Mao, H.; Dally, W.J. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv 2015, arXiv:1510.00149. [Google Scholar]
- Tsanakas, S.; Hameed, A.; Violos, J.; Leivadeas, A. A light-weight edge-enabled knowledge distillation technique for next location prediction of multitude transportation means. Future Gener. Comput. Syst. 2024, 154, 45–58. [Google Scholar] [CrossRef]
- Jaderberg, M.; Vedaldi, A.; Zisserman, A. Speeding up convolutional neural networks with low rank expansions. arXiv 2014, arXiv:1405.3866. [Google Scholar]
- Pham, N.S.; Shin, S.; Xu, L.; Shi, W.; Suh, T. Starspa: Stride-aware sparsity compression for efficient cnn acceleration. IEEE Access 2024, 12, 10893–10909. [Google Scholar] [CrossRef]
- Lian, Y.; Peng, P.; Jiang, K.; Xu, W. Multi-objective compression for CNNs via evolutionary algorithm. Inf. Sci. 2024, 661, 120155. [Google Scholar] [CrossRef]
- Lopes, A.; dos Santos, F.P.; de Oliveira, D.; Schiezaro, M.; Pedrini, H. Computer vision model compression techniques for embedded systems: A survey. Comput. Graph. 2024, 123, 104015. [Google Scholar] [CrossRef]
- Tajbakhsh, N.; Shin, J.Y.; Gurudu, S.R.; Hurst, R.T.; Kendall, C.B.; Gotway, M.B.; Liang, J. Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE Trans. Med. Imaging 2016, 35, 1299–1312. [Google Scholar] [CrossRef] [PubMed]
- Guo, Y.; Shi, H.; Kumar, A.; Grauman, K.; Rosing, T.; Feris, R. Spottune: Transfer learning through adaptive fine-tuning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 4805–4814. [Google Scholar]
- Csurka, G. Domain adaptation for visual applications: A comprehensive survey. arXiv 2017, arXiv:1702.05374. [Google Scholar]
- Ben-David, S.; Blitzer, J.; Crammer, K.; Pereira, F. Analysis of representations for domain adaptation. Adv. Neural Inf. Process. Syst. 2006, 19, 1–9. [Google Scholar]
- Blitzer, J.; McDonald, R.; Pereira, F. Domain adaptation with structural correspondence learning. In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October–4 November 2006; pp. 120–128. [Google Scholar]
- Gholizade, M.; Soltanizadeh, H.; Rahmanimanesh, M.; Sana, S.S. A review of recent advances and strategies in transfer learning. Int. J. Syst. Assur. Eng. Manag. 2025, 16, 1–40. [Google Scholar] [CrossRef]
- Al-Dulaimi, O.A.H.H.; Kurnaz, S. A hybrid CNN-LSTM approach for precision deepfake image detection based on transfer learning. Electronics 2024, 13, 1662. [Google Scholar] [CrossRef]
- Alazwari, S.; Alsamri, M.O.J.; Alamgeer, M.; Alabdan, R.; Alzahrani, I.; Rizwanullah, M.; Osman, A.E. Artificial rabbits optimization with transfer learning based deepfake detection model for biometric applications. Ain Shams Eng. J. 2024, 15, 103057. [Google Scholar] [CrossRef]
- Sanh, V.; Debut, L.; Chaumond, J.; Wolf, T. DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter. arXiv 2019, arXiv:1910.01108. [Google Scholar]
- Sun, Z.; Yu, H.; Song, X.; Liu, R.; Yang, Y.; Zhou, D. Mobilebert: A compact task-agnostic bert for resource-limited devices. arXiv 2020, arXiv:2004.02984. [Google Scholar]
- Howard, A. Mobilenets: Efficient convolu-tional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
- Iandola, F.N.; Han, S.; Moskewicz, M.W.; Ashraf, K.; Dally, W.J.; Keutzer, K. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5 MB model size. arXiv 2016, arXiv:1602.07360. [Google Scholar]
- Tan, M.; Le, Q. Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the International Conference on Machine Learning, PMLR, Long Beach, CA, USA, 9–15 June 2019; pp. 6105–6114. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16×16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-end object detection with transformers. In Proceedings of the European Conference on Computer Vision, Milan, Italy, 29 September–4 October 2020; Springer: Cham, Switzerland, 2020; pp. 213–229. [Google Scholar]
- Violos, J.; Papadopoulos, S.; Kompatsiaris, I. Towards Optimal Trade-Offs in Knowledge Distillation for CNNs and Vision Transformers at the Edge. In Proceedings of the 2024 32nd European Signal Processing Conference (EUSIPCO), Lyon, France, 26–30 August 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1896–1900. [Google Scholar]
- Kinnas, M.; Violos, J.; Karapiperis, N.I.; Kompatsiaris, I. Selecting Images With Entropy for Frugal Knowledge Distillation. IEEE Access 2025, 13, 28189–28203. [Google Scholar] [CrossRef]
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2015, arXiv:1409.1556. [Google Scholar] [CrossRef]
- Hinton, G.; Vinyals, O.; Dean, J. Distilling the knowledge in a neural network. arXiv 2015, arXiv:1503.02531. [Google Scholar]
- Masana, M.; Van De Weijer, J.; Herranz, L.; Bagdanov, A.D.; Alvarez, J.M. Domain-adaptive deep network compression. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 4289–4297. [Google Scholar]
- Chen, Z.; Duan, Y.; Wang, W.; He, J.; Lu, T.; Dai, J.; Qiao, Y. Vision transformer adapter for dense predictions. arXiv 2022, arXiv:2205.08534. [Google Scholar]
- Wang, R.; Tang, D.; Duan, N.; Wei, Z.; Huang, X.; Ji, J.; Cao, G.; Jiang, D.; Zhou, M. K-adapter: Infusing knowledge into pre-trained models with adapters. arXiv 2020, arXiv:2002.01808. [Google Scholar]
- Bammey, Q. Synthbuster: Towards detection of diffusion model generated images. IEEE Open J. Signal Process. 2023, 5, 1–9. [Google Scholar] [CrossRef]
- Wang, S.Y.; Wang, O.; Zhang, R.; Owens, A.; Efros, A.A. CNN-generated images are surprisingly easy to spot... for now. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 8695–8704. [Google Scholar]
- Karathanasis, A. Andreaskarathanasis/Compression-Transfer-of-DeepFake-Models. 2024. Original-Date: 2025-03-6T05:44:03Z. Available online: https://github.com/andreaskarathanasis/Compression-Transfer-of-DeepFake-Models (accessed on 2 March 2025).
- Hong, Y.W.; Leu, J.S.; Faisal, M.; Prakosa, S.W. Analysis of model compression using knowledge distillation. IEEE Access 2022, 10, 85095–85105. [Google Scholar] [CrossRef]
- Meng, L.; Qiao, G.; Zhang, X.; Bai, J.; Zuo, Y.; Zhou, P.; Liu, Y.; Hu, S. An efficient pruning and fine-tuning method for deep spiking neural network. Appl. Intell. 2023, 53, 28910–28923. [Google Scholar] [CrossRef]
- Alballa, N.; Canini, M. Practical Insights into Knowledge Distillation for Pre-Trained Models. arXiv 2024, arXiv:2402.14922. [Google Scholar]
- Gale, T.; Elsen, E.; Hooker, S. The state of sparsity in deep neural networks. arXiv 2019, arXiv:1902.09574. [Google Scholar]
- Frankle, J.; Carbin, M. The lottery ticket hypothesis: Finding sparse, trainable neural networks. arXiv 2018, arXiv:1803.03635. [Google Scholar]
- Gou, J.; Yu, B.; Maybank, S.J.; Tao, D. Knowledge distillation: A survey. Int. J. Comput. Vis. 2021, 129, 1789–1819. [Google Scholar] [CrossRef]
- Krishnamoorthi, R. Quantizing deep convolutional networks for efficient inference: A whitepaper. arXiv 2018, arXiv:1806.08342. [Google Scholar]
- Denton, E.L.; Zaremba, W.; Bruna, J.; LeCun, Y.; Fergus, R. Exploiting linear structure within convolutional networks for efficient evaluation. Adv. Neural Inf. Process. Syst. 2014, 27, 1–9. [Google Scholar]
- Idelbayev, Y.; Carreira-Perpinán, M.A. Low-rank compression of neural nets: Learning the rank of each layer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 8049–8059. [Google Scholar]
- Houlsby, N.; Giurgiu, A.; Jastrzebski, S.; Morrone, B.; De Laroussilhe, Q.; Gesmundo, A.; Attariyan, M.; Gelly, S. Parameter-efficient transfer learning for NLP. In Proceedings of the International Conference on Machine Learning, PMLR, Long Beach, CA, USA, 9–15 June 2019; pp. 2790–2799. [Google Scholar]
- Elsken, T.; Metzen, J.H.; Hutter, F. Neural architecture search: A survey. J. Mach. Learn. Res. 2019, 20, 1–21. [Google Scholar]
- Marculescu, D.; Stamoulis, D.; Cai, E. Hardware-aware machine learning: Modeling and optimization. In Proceedings of the 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), San Diego, CA, USA, 5–8 November 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 1–8. [Google Scholar]
Model | Architecture | Layers | Parameter Count |
---|---|---|---|
Baseline/Teacher Model | VGG-based | 8 Conv layers 3 FC layers | ∼4.5 Million |
Student Model 40% | VGG-based | 7 Conv layers 3 FC layers | ∼1.8 Million |
Student Model 30% | VGG-based | 6 Conv layers 2 FC layers | ∼1.34 Million |
Student Model 20% | VGG-based | 7 Conv layers 2 FC layers | ∼0.9 Million |
Student Model 10% | VGG-based | 5 Depthwise Separable Conv layers 2 FC layers | ∼0.44 Million |
Method | Full Model | 40% Parameters | 30% Parameters | 20% Parameters | 10% Parameters |
---|---|---|---|---|---|
Pruning | Precision: 0.9420 Recall: 0.9169 F-1 score: 0.9293 Accuracy: 0.9265 Train Time: 859 s | Precision: 0.9070 Recall: 0.9402 F-1 score: 0.9233 Accuracy: 0.9177 Train Time: 491 s | Precision: 0.8964 Recall: 0.9358 F-1 score: 0.9157 Accuracy: 0.9092 Train Time: 491 s | Precision: 0.9147 Recall: 0.9325 F-1 score: 0.9235 Accuracy: 0.9186 Train Time: 490 s | Precision: 0.9272 Recall: 0.9138 F-1 score: 0.9205 Accuracy: 0.9168 Train Time: 488 s |
Knowledge Distillation | Precision: 0.9420 Recall: 0.9169 F-1 score: 0.9293 Accuracy: 0.9265 Train Time: 859 s | Precision: 0.9298 Recall: 0.9311 F-1 score: 0.9304 Accuracy: 0.9266 Train Time: 599 s | Precision: 0.9366 Recall: 0.9286 F-1 score: 0.9326 Accuracy: 0.9293 Train Time: 586 s | Precision: 0.9275 Recall: 0.9347 F-1 score: 0.9311 Accuracy: 0.9271 Train Time: 586 s | Precision: 0.9216 Recall: 0.9241 F-1 score: 0.9228 Accuracy: 0.9186 Train Time: 582 s |
Original Model | Compressed Model | ||||
Quantization | Original size: 18.41 Mb Precision: 0.9420 Recall: 0.9169 F-1 score: 0.9293 Accuracy: 0.9265 Train Time: 859 s CPU inference time on 6833 images: 1979 s | Quantized size: 13.39 Mb Precision: 0.9420 Recall: 0.9166 F-1 score: 0.9291 Accuracy: 0.9263 CPU inference time for 6833 images: 1656 s |
Method | Full Model | 40% Parameters | 30% Parameters | 20% Parameters | 10% Parameters |
---|---|---|---|---|---|
Pruning | Precision: 0.9878 Recall: 0.9914 F-1 score: 0.9896 Accuracy: 0.9896 Train Time: 4382 s | Precision: 0.9671 Recall: 0.9507 F-1 score: 0.9588 Accuracy: 0.9592 | Precision: 0.9296 Recall: 0.9141 F-1 score: 0.9217 Accuracy: 0.9224 | Precision: 0.8577 Recall: 0.8371 F-1 score: 0.8473 Accuracy: 0.8491 | Precision: 0.9418 Recall: 0.1134 F-1 score: 0.2024 Accuracy: 0.5532 |
Knowledge Distillation | Precision: 0.9878 Recall: 0.9914 F-1 score: 0.9896 Accuracy: 0.9896 Train Time: 4382 s | Precision: 0.9899 Recall: 0.9721 F-1 score: 0.9809 Accuracy: 0.9811 Train Time: 1862 s | Precision: 0.9845 Recall: 0.981 F-1 score: 0.9827 Accuracy = 0.9828 Train Time: 1825 s | Precision: 0.9928 Recall: 0.974 F-1 score: 0.9833 Accuracy: 0.9835 Train Time: 1913 s | Precision: 0.9856 Recall: 0.969 F-1 score: 0.9772 Accuracy: 0.9774 Train Time: 3195 s |
Original Model | Compressed Model | ||||
Quantization | Size: 17.6 Mb Precision: 0.9878 Recall: 0.9914 F-1 score: 0.9896 Accuracy: 0.9896 CPU inference on 20,000 images: 5378 s | Size: 12.8 Mb Precision: 0.9875 Recall: 0.9914 F-1 score: 0.9894 Accuracy: 0.9894 CPU inference on 20,000 images: 3918 s |
Method | 40% Parameters | 30% Parameters | 20% Parameters | 10% Parameters |
---|---|---|---|---|
Pruning + transfer | Precision: 0.8535 Recall: 0.8860 F-1 score: 0.8694 Accuracy: 0.8660 Train Time: 1881 s | Precision: 0.8477 Recall: 0.8800 F-1 score: 0.8635 Accuracy: 0.8599 Train Time: 1789 s | Precision: 0.8303 Recall: 0.8687 F-1 score: 0.8490 Accuracy: 0.8444 Train Time: 1790 s | Precision: 0.8116 Recall: 0.8443 F-1 score: 0.8276 Accuracy: 0.8229 Train Time: 1795 s |
Knowledge Distillation | Precision: 0.9467 Recall: 0.9198 F-1 score: 0.9331 Accuracy: 0.9336 Train Time: 2715 s | Precision: 0.9537 Recall: 0.8867 F-1 score: 0.9190 Accuracy: 0.9213 Train Time: 2675 s | Precision: 0.9392 Recall: 0.9146 F-1 score: 0.9267 Accuracy: 0.9271 Train Time: 2583 s | Precision: 0.9534 Recall: 0.7605 F-1 score: 0.8461 Accuracy: 0.8607 Train Time: 2559 s |
KD + adapter after last Conv2d layer | Precision: 0.9179 Recall: 0.9431 F-1 score: 0.9303 Accuracy: 0.9289 Train time: 1332 s | Precision: 0.9329 Recall: 0.9304 F-1 score: 0.9317 Accuracy: 0.9313 Train time: 1187 s | Precision: 0.9105 Recall: 0.9439 F-1 recall: 0.9269 Accuracy: 0.9250 Train time: 1183 s | Precision: 0.8845 Recall: 0.8872 F-1 score: 0.8859 Accuracy: 0.8849 Train time: 1188 s |
Pruning + adapter after last Conv2d layer | Precision: 0.8633 Recall: 0.8801 F-1 score: 0.8716 Accuracy: 0.8695 Train time: 1865 s | Precision: 0.8582 Recall: 0.8831 F-1 score: 0.8705 Accuracy: 0.8676 Train time: 1918 s | Precision: 0.8540 Recall: 0.8461 F-1 score: 0.8500 Accuracy: 0.8497 Train time: 1912 s | Precision: 0.8424 Recall: 0.8102 F-1 score: 0.8260 Accuracy: 0.8281 Train time: 1890 s |
Pruning + adapter after fourth Conv2d layer | Precision: 0.8748 Recall: 0.8845 F-1 score: 0.8796 Accuracy: 0.8781 Train time: 2415 s | Precision: 0.8720 Recall: 0.9011 F-1 score: 0.8863 Accuracy: 0.8836 Train time: 2462 s | Precision: 0.8861 Recall: 0.8617 F-1 score: 0.8738 Accuracy: 0.8746 Train time: 2458 s | Precision: 8745 Recall: 0.8343 F-1 score: 0.8539 Accuracy: 0.8563 Train time: 2415 s |
Transfered Model | Compressed Model | |||
Quantization | Size: 17.6 Mb Precision: 0.8526 Recall: 0.8996 F-1 score: 0.8755 Accuracy: 0.8711 Transfer time: 1959 s CPU inference on 10905 images: 2949 s | Size: 12.8 Mb Precision: 0.8523 Recall: 0.8998 F-1 score: 0.8754 Accuracy: 0.8710 CPU inference on 10905 images: 3036 s |
Method | GAN Type | Full Model | 40% Parameters | 30% Parameters | 20% Parameters | 10% Parameters |
---|---|---|---|---|---|---|
Pruning | DeepFake | Precision: 0.4909 Recall: 0.3690 F-1 score: 0.4213 Accuracy: 0.4923 Train Time: 2236 s | Precision: 0.5038 Recall: 0.4536 F-1 score: 0.4774 Accuracy: 0.5026 Train Time: 1326 s | Precision: 0.5116 Recall: 0.6080 F-1 score: 0.5557 Accuracy: 0.5130 Train Time: 1318 s | Precision: 0.5144 Recall: 0.8574 F-1 score: 0.6430 Accuracy: 0.5232 Train Time: 1319 s | Precision: 0.4974 Recall: 0.6605 F-1 score: 0.5675 Accuracy: 0.4958 Train Time: 1308 s |
progan | Precision: 0.9746 Recall: 0.96 F-1 score: 0.9672 Accuracy: 0.9675 Train Time: 2236 s | Precision: 0.9802 Recall: 0.995 F-1 score: 0.9875 Accuracy: 0.9875 Train Time: 1326 s | Precision: 0.9945 Recall: 0.915 F-1 score: 0.9531 Accuracy: 0.955 Train Time: 1318 s | Precision: 0.9747 Recall: 0.965 F-1 score: 0.9698 Accuracy: 0.97 Train Time: 1319 s | Precision: 0.9705 Recall: 0.99 F-1 score: 0.9801 Accuracy: 0.98 Train Time: 1308 s | |
stargan | Precision: 0.7585 Recall: 0.8909 F-1 score: 0.8194 Accuracy: 0.8036 Train Time: 2236 s | Precision: 0.7656 Recall: 0.9674 F-1 score: 0.8548 Accuracy: 0.8356 Train Time: 1326 s | Precision: 0.9461 Recall: 0.8179 F-1 score: 0.8773 Accuracy: 0.8856 Train Time: 1318 s | Precision: 0.7951 Recall: 0.9804 F-1 score: 0.8781 Accuracy: 0.8639 Train Time: 1319 s | Precision: 0.7804 Recall: 0.9764 F-1 score: 0.8675 Accuracy: 0.8509 Train Time: 1308 s | |
whichfaceisreal | Precision: 0.5860 Recall: 0.504 F-1 score: 0.5419 Accuracy: 0.574 Train Time: 2236 s | Precision: 0.6442 Recall: 0.393 F-1 score: 0.4881 Accuracy: 0.588 Train Time: 1326 s | Precision: 0.6998 Recall: 0.359 F-1 score: 0.4745 Accuracy: 0.6025 Train Time: 1318 s | Precision: 0.6289 Recall: 0.473 F-1 score: 0.5399 Accuracy: 0.597 Train Time: 1319 s | Precision: 0.6691 Recall: 0.441 F-1 score: 0.5316 Accuracy: 0.6115 Train Time: 1308 s | |
Knowledge Distillation | DeepFake | Precision: 0.4909 Recall: 0.3690 F-1 score: 0.4213 Accuracy: 0.4923 Train Time: 2236 s | Precision: 0.4909 Recall: 0.3018 F-1 score: 0.3738 Accuracy: 0.4936 Train Time: 1642 s | Precision: 0.5242 Recall: 0.2670 F-1 score: 0.3538 Accuracy: 0.5115 Train Time: 1609 s | Precision: 0.4852 Recall: 0.2803 F-1 score: 0.3554 Accuracy: 0.4906 Train Time: 1609 s | Precision: 0.5475 Recall: 0.7480 F-1 score: 0.6323 Accuracy: 0.5642 Train Time: 1599 s |
progan | Precision: 0.9746 Recall: 0.96 F-1 score: 0.9672 Accuracy: 0.9675 Train Time: 2236 s | Precision: 0.9846 Recall: 0.96 F-1 score: 0.9721 Accuracy: 0.9725 Train Time: 1642 s | Precision: 0.985 Recall: 0.985 F-1 score: 0.985 Accuracy: 0.985 Train Time: 1609 s | Precision: 0.9801 Recall: 0.99 F-1 score: 0.9850 Accuracy: 0.985 Train Time: 1609 s | Precision: 0.9756 Recall: 1.0 F-1 score: 0.9876 Accuracy: 0.9875 Train Time: 1599 s | |
stargan | Precision: 0.7585 Recall: 0.8909 F-1 score: 0.8194 Accuracy: 0.8036 Train Time: 2236 s | Precision: 0.7259 Recall: 0.8639 F-1 score: 0.7889 Accuracy: 0.7688 Train Time: 1642 s | Precision: 0.7996 Recall: 0.8024 F-1 score: 0.8009 Accuracy: 0.8006 Train Time: 1609 s | Precision: 0.7564 Recall: 0.8344 F-1 score: 0.7935 Accuracy: 0.7828 Train Time: 1609 s | Precision: 0.8919 Recall: 0.9784 F-1 score: 0.9332 Accuracy: 0.9299 Train Time: 1599 s | |
whichfaceisreal | Precision: 0.5860 Recall: 0.504 F-1 score: 0.5419 Accuracy: 0.574 Train Time: 2236 s | Precision: 0.5771 Recall: 0.479 F-1 score: 0.5234 Accuracy: 0.564 Train Time: 1642 s | Precision: 0.5948 Recall: 0.458 F-1 score: 0.5175 Accuracy: 0.573 Train Time: 1609 s | Precision: 0.6058 Recall: 0.521 F-1 score: 0.5602 Accuracy: 0.591 Train Time: 1609 s | Precision: 0.6403 Recall: 0.593 F-1 score: 0.6157 Accuracy: 0.63 Train Time: 1599 s | |
Original Model | Compressed Model | |||||
Quantization | DeepFake | Original size: 18.41 Mb Precision: 0.4909 Recall: 0.3690 F-1 score: 0.4213 Accuracy: 0.4923 Train Time: 2236 s CPU inference time on 5405 images: 1568 s | Compressed size: 13.39 Mb Precision: 0.4906 Recall: 0.3690 F-1 score: 0.4212 Accuracy: 0.4921 CPU inference time on 5405 images: 1470 s | |||
progan | Original size: 18.41 Mb Precision: 0.9746 Recall: 0.96 F-1 score: 0.9672 Accuracy: 0.9675 Train Time: 2236 s CPU inference time on 400 images: 113 s | Compressed size: 13.39 Mb Precision: 0.9746 Recall: 0.96 F-1 score: 0.9672 Accuracy: 0.9675 CPU inference time on 400 images: 113 s | ||||
stargan | Original size: 18.41 Mb Precision: 0.7585 Recall: 0.8909 F-1 score: 0.8194 Accuracy: 0.8036 Train Time: 2236 s CPU inference time on 3998 images: 1154 s | Compressed size: 13.39 Mb Precision: 0.7577 Recall: 0.8904 F-1 score: 0.8187 Accuracy: 0.8029 CPU inference time on 3998 images: 1086 s | ||||
whichfaceisreal | Original size: 18.41 Mb Precision: 0.5860 Recall: 0.504 F-1 score: 0.5419 Accuracy: 0.574 Train Time: 2236 s CPU inference time on 2000 images: 606 s | Compressed size: 13.39 Mb Precision: 0.5865 Recall: 0.505 F-1 score: 0.5427 Accuracy: 0.5745 CPU inference time on 2000 images: 600 s |
Method | 40% Parameters | 30% Parameters | 20% Parameters | 10% Parameters |
---|---|---|---|---|
Pruning + transfer | Precision: 0.8531 Recall: 0.8486 F-1 score: 0.8509 Accuracy: 0.8502 Train Time: 1994 s | Precision: 0.8536 Recall: 0.8623 F-1 score: 0.8579 Accuracy: 0.8562 Train Time: 1850 s | Precision: 0.8477 Recall: 0.8355 F-1 score: 0.8416 Accuracy: 0.8416 Train Time: 1845 s | Precision: 0.8555 Recall: 0.8282 F-1 score: 0.8417 Accuracy: 0.8430 Train Time: 1899 s |
Knowledge Distillation | Precision: 0.8928 Recall: 0.9073 F-1 score: 0.9000 Accuracy: 0.8984 Train Time: 2694 s | Precision: 0.8599 Recall: 0.9417 F-1 score: 0.8990 Accuracy: 0.8934 Train Time: 2525 s | Precision: 0.8878 Recall: 0.9136 F-1 score: 0.9005 Accuracy: 0.8983 Train Time: 2498 s | Precision: 0.8209 Recall: 0.8841 F-1 score: 0.8514 Accuracy = 0.8445 Train Time: 2653 s |
KD + adapter after last Conv2d layer | Precision: 0.9045 Recall: 0.9200 F-1 score: 0.9122 Accuracy: 0.9108 Train time: 1467 s | Precision: 0.8934 Recall: 0.9280 F-1 score: 0.9104 Accuracy = 0.9080 Train time: 1348 s | Precision: 0.8819 Recall: 0.9344 F-1 score: 0.9074 Accuracy: 0.9039 Train time: 1373 s | Precision: 0.8536 Recall: 0.8834 F-1 score: 0.8682 Accuracy: 0.8650 Train time: 1261 s |
Pruning + adapter after fourth Conv2d layer | Precision: 0.8496 Recall: 0.9078 F-1 score: 0.8778 Accuracy: 0.8727 Train time: 2409 s | Precision: 0.8739 Recall: 0.8719 F-1 score: 0.8729 Accuracy: 0.8721 Train time: 2394 s | Precision: 0.8600 Recall: 0.8827 F-1 score: 0.8712 Accuracy: 0.8685 Train time: 2396 s | Precision: 0.8749 Recall: 0.8789 F-1 score: 0.8769 Accuracy: 0.8757 Train time: 2375 s |
Transfered Model | Compressed Model | |||
Quantization | Size: 17.6 Mb Precision: 0.8486 Recall: 0.8507 F-1 score: 0.8449 Accuracy: 0.8565 Transfer time: 1958 s CPU inference on 10,905 images: 3262 s | Size: 12.8 Mb Precision: 0.8486 Recall: 0.8508 F-1 score: 0.8447 Accuracy: 0.8570 CPU inference on 10,905 images: 3148 s |
Dataset | Original Model | Compressed Model |
---|---|---|
Synthbuster + Raise Dataset | Precision: 0.9420 Recall: 0.9169 F-1 score: 0.9293 Accuracy: 0.9265 Train time: 859 s | Precision: 0.8821 Recall: 0.8802 F-1 score: 0.9474 Acc: 0.8219 Rank: 6 |
“140k Real and Fake Faces” | Precision: 0.9878 Recall: 0.9914 F-1 score: 0.9896 Accuracy: 0.9896 Train Time: 4382 s | Precision: 1.0 Recall: 0.0725 F-1 score: 0.1351 Accuracy: 0.5362 Rank: 6 |
“DeepFake and Real Images” | Precision: 0.8526 Recall: 0.8996 F-1 score: 0.8755 Accuracy: 0.8711 Transfer time: 1959 s | Precision: 1.0 Recall: 0.0103 F-1 score: 0.0205 Accuracy: 0.5016 Rank: 6 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Karathanasis, A.; Violos, J.; Kompatsiaris, I. A Comparative Analysis of Compression and Transfer Learning Techniques in DeepFake Detection Models. Mathematics 2025, 13, 887. https://doi.org/10.3390/math13050887
Karathanasis A, Violos J, Kompatsiaris I. A Comparative Analysis of Compression and Transfer Learning Techniques in DeepFake Detection Models. Mathematics. 2025; 13(5):887. https://doi.org/10.3390/math13050887
Chicago/Turabian StyleKarathanasis, Andreas, John Violos, and Ioannis Kompatsiaris. 2025. "A Comparative Analysis of Compression and Transfer Learning Techniques in DeepFake Detection Models" Mathematics 13, no. 5: 887. https://doi.org/10.3390/math13050887
APA StyleKarathanasis, A., Violos, J., & Kompatsiaris, I. (2025). A Comparative Analysis of Compression and Transfer Learning Techniques in DeepFake Detection Models. Mathematics, 13(5), 887. https://doi.org/10.3390/math13050887