DeepFake detection algorithm based on improved vision transformer

3573 Accesses
44 Citations
1 Altmetric
Explore all metrics

Abstract

A DeepFake is a manipulated video made with generative deep learning technologies, such as generative adversarial networks or auto encoders that anyone can utilize. With the increase in DeepFakes, classifiers consisting of convolutional neural networks (CNN) that can distinguish them have been actively created. However, CNNs have a problem with overfitting and cannot consider the relation between local regions as global feature of image, resulting in misclassification. In this paper, we propose an efficient vision transformer model for DeepFake detection to extract both local and global features. We combine vector-concatenated CNN feature and patch-based positioning to interact with all positions to specify the artifact region. For the distillation token, the logit is trained using binary cross entropy through the sigmoid function. By adding this distillation, the proposed model is generalized to improve performance. From experiments, the proposed model outperforms the SOTA model by 0.006 AUC and 0.013 f1 score on the DFDC test dataset. For 2,500 fake videos, the proposed model correctly predicts 2,313 as fake, whereas the SOTA model predicts 2,276 in the best performance. With the ensemble method, the proposed model outperformed the SOTA model by 0.01 AUC. For Celeb-DF (v2) dataset, the proposed model achieves a high performance of 0.993 AUC and 0.978 f1 score, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Fig. 5

Seeing Through the Lies: A Vision Transformer-Based Solution

Combining EfficientNet and Vision Transformers for Video Deepfake Detection

An ensemble of CNNs with self-attention mechanism for DeepFake video detection

Article Open access 23 November 2023

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

https://www.kaggle.com/c/deepfake-detection-challenge

References

Choi Y-J, Lee YW, Kim B-G (2021) Group-based bi-directional recurrent wavelet neural networks for video super-resolution, arXiv:2106.07190
Jeong D, Kim BG, Dong S-Y (2020) Deep joint spatiotemporal network (djstn) for efficient facial expression recognition. Sensors 20(7):1936
Article Google Scholar
Yeo W-H, Heo Y-J, Choi Y-J, Kim B-G (2020) Place classification algorithm based on semantic segmented objects. Appl Sci 10(24):9069
Article Google Scholar
Heo Y-J, Choi Y-J, Lee Y-W, Kim B-G (2021) Deepfake detection scheme based on vision transformer and distillation, arXiv:2104.01353
Karras T, Laine S, Aila T (2019) A style-based generator architecture for generative adversarial networks. In: proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pp 4401–4410
Choi Y, Choi M, Kim M, Ha J-W, Kim S, Choo J (2018) Stargan: Unified generative adversarial networks for multi-domain image-to-image translation. In: proceedings of the IEEE conference on computer vision and pattern recognition, pp 8789–8797
Shen Y, Yang C, Tang X, Zhou B (2020) Interfacegan: Interpreting the disentangled face representation learned by gans, IEEE Transactions on Pattern Analysis and Machine Intelligence
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. Advances in neural information processing systems, 27
Kingma DP, Welling M (2014) Stochastic gradient vb and the variational auto-encoder. In: Second international conference on learning representations, ICLR, vol 19, p 121
Dolhansky B, Bitton J, Pflaum B, Lu J, Howes R, Wang M, Ferrer CC (2020) The deepfake detection challenge dataset, arXiv preprint arXiv arXiv:2006.07397
Seferbekov S (2020) https://github.com/selimsef/dfdc_deepfake_challenge. Accessed 24 Jan 2022
Nguyen HH, Yamagishi Y, Echizen I (2019) Use of a capsule network to detect fake images and videos, arXiv:1910.12467
Li Y, Lyu S (2019) Exposing deepfake videos by detecting face warping artifacts. In: CVPR Workshops
Lui S, Deng W (2015) Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian conference on pattern recognition (ACPR), p 730–734 IEEE
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Yang X, Li Y, Lyu S (2019) Exposing deep fakes using inconsistent head poses. In: ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp 8261–8265
Guarnera L, Giudice O, Battiato S (2020) Deepfake detection by analyzing convolutional traces. In: proceedings of the IEEE/CVF Conference on computer vision and pattern recognition workshops, pp 666–667
Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International conference on machine learning. PMLR
Li L, Bao J, Zhang T, Yang H, Chen D, Wen F, Guo B (2020) Face x-ray for more general face forgery detection. In: proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pp 5001–5010
Mittal T, Bhattacharya U, Chandra R, Bera A, Manocha D (2020) Emotions don’t lie: an audio-visual deepfake detection method using affective cues. In: proceedings of the 28th ACM international conference on multimedia, pp 2823–2832
Montserrat DM, Hao H, Yarlagadda SK, Baireddy S, Shao R, Horváth J, Bartusiak E, Yang J, Guera D, Zhu F et al (2020) Deepfakes detection with automatic face weighting. In: proceedings of the IEEE/CVF Conference on computer vision and pattern recognition workshops, pp 668–669
Güera D, Delp EJ (2018) Deepfake video detection using recurrent neural networks. In: 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). IEEE, pp 1–6
de Lima O, Franklin S, Basu S, Karwoski B, George A (2020) Deepfake detection using spatiotemporal convolutional networks, arXiv:2006.14749
Carreira J, Zisserman A (2017) Quo vadis, action recognition? a new model and the kinetics dataset. In: proceedings of the IEEE Conference on computer vision and pattern recognition, pp 6299–6308
Hara K, Kataoka H, Satoh Y (2017) Learning spatio-temporal features with 3d residual networks for action recognition. In: proceedings of the IEEE International conference on computer vision workshops, pp 3154–3160
Tran D, Wang H, Torresani L, Ray J, LeCun Y, Paluri M (2018) A closer look at spatiotemporal convolutions for action recognition. In: proceedings of the IEEE conference on computer vision and pattern recognition, pp 6450–6459
Amerini I, Galteri L, Caldelli R, Del Bimbo A (2019) Deepfake video detection through optical flow based cnn. In: proceedings of the IEEE/CVF International conference on computer vision workshops, pp 0–0
Thies J, Zollhofer M, Stamminger M, Theobalt C, Nießner M (2016) Face2face: Real-time face capture and reenactment of rgb videos. In: proceedings of the IEEE conference on computer vision and pattern recognition, pp 2387–2395
Touvron H, Cord M, Douze M, Massa F, Sablayrolles A, Jégou A (2021) Training data-efficient image transformers & distillation through attention. PMLR
Raghu M, Unterthiner T, Kornblith S, Zhang C, Dosovitskiy A (2021) Do vision transformers see like convolutional neural networks?. Advances in Neural Information Processing Systems, vol 34
Zhang K, Zhang Z, Li Z, Qiao Y (2016) Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Processing Letters 23(10):1499–1503
Article Google Scholar
Buslaev A, Iglovikov VI, Khvedchenya E, Parinov A, Druzhinin M, Kalinin AA (2020) Albumentations: fast and flexible image augmentations. Information 11(2):125
Article Google Scholar
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S et al (2020) An image is worth 16x16 words: Transformers for image recognition at scale, arXiv:2010.11929
Girdhar R, Carreira J, Doersch C, Zisserman A (2019) Video action transformer network. In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 244–253
Neimark D, Bar O, Zohar M, Asselmann D (2021) Video transformer network, arXiv:2102.00719
Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: Hierarchical vision transformer using shifted windows, International Conference on Computer Vision (ICCV)
Lin M, Chen Q, Yan S (2013) Network in network, arXiv:1312.4400
Dolhansky B, Howes R, Pflaum B, Baram N, Ferrer CC (2019) The deepfake detection challenge (dfdc) preview dataset, arXiv:1910.08854
Korshunov P, Marcel S (2018) Deepfakes:, a new threat to face recognition? assessment and detection, arXiv:1812.08685
Rossler A, Cozzolino D, Verdoliva L, Riess C, Thies J, Nießner M (2019) Faceforensics++: Learning to detect manipulated facial images. In: proceedings of the IEEE/CVF International Conference on Computer Vision, pp 1–11
Li Y, Yang X, Sun P, Qi H, Lyu S (2020) Celeb-df: a large-scale challenging dataset for deepfake forensics. In: proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pp 3207–3216
Zhao H, Cui H, Zhou W (2020) https://github.com/cuihaoleo/kaggle-dfdc. Accessed 24 Jan 2022
Davletshin A (2020) https://github.com/NTech-Lab/deepfake-detection-challenge https://github.com/NTech-Lab/deepfake-detection-challenge. Accessed 24 Jan 2022
Shao J, Shi H, Yin Z, Fang Z, Yin G, Chen S, Ning N, Liu Y (2020) https://github.com/Siyu-C/RobustForensics. Accessed 24 Jan 2022
Howard J, Pan I (2020) https://github.com/jphdotam/DFDC/. Accessed 24 Jan 2022

Download references

Author information

Authors and Affiliations

Sookmyung Women’s University, Seoul, Republic of Korea
Young-Jin Heo, Woon-Ha Yeo & Byung-Gyu Kim

Authors

Young-Jin Heo
View author publications
You can also search for this author in PubMed Google Scholar
Woon-Ha Yeo
View author publications
You can also search for this author in PubMed Google Scholar
Byung-Gyu Kim
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Byung-Gyu Kim.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Heo, YJ., Yeo, WH. & Kim, BG. DeepFake detection algorithm based on improved vision transformer. Appl Intell 53, 7512–7527 (2023). https://doi.org/10.1007/s10489-022-03867-9

Download citation

Accepted: 07 June 2022
Published: 26 July 2022
Issue Date: April 2023
DOI: https://doi.org/10.1007/s10489-022-03867-9

DeepFake detection algorithm based on improved vision transformer

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Seeing Through the Lies: A Vision Transformer-Based Solution

Combining EfficientNet and Vision Transformers for Video Deepfake Detection

An ensemble of CNNs with self-attention mechanism for DeepFake video detection

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

DeepFake detection algorithm based on improved vision transformer

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Seeing Through the Lies: A Vision Transformer-Based Solution

Combining EfficientNet and Vision Transformers for Video Deepfake Detection

An ensemble of CNNs with self-attention mechanism for DeepFake video detection

Explore related subjects

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation