Abstract
Digital videos have become essential to broadcast news that targets many audiences around the world, and it is therefore important to ensure the reliability of these broadcasted videos. Unfortunately, digital videos can be manipulated by replacing a person’s face or expressions with another person’s face or expressions without leaving visible traces. This facial manipulation is a challenging problem due to the lack of digital forensic techniques that can be used to verify the originality of video content. In this paper, we propose a novel approach, dubbed FaceMD, based on fusing three streams of convolutional neural networks to detect facial manipulation. The proposed FaceMD incorporates spatiotemporal information by fusing video frames, motion residuals, and 3D gradients to improve facial manipulation detection accuracy. We combine these three streams using different fusion methods and places to best use this spatiotemporal information, hence increasing detection performance. The experimental results show that the proposed FaceMD achieves state-of-the-art accuracy using two different facial manipulation data sets.
Similar content being viewed by others
References
Rössler, A., Cozzolino, D., Verdoliva,,L., Riess, C., Thies, J., Nießner, M.: “Faceforensics: A large-scale video dataset for forgery detection in human faces,” arXiv preprint arXiv:1803.09179 (2018)
Aloraini, M., Sharifzadeh, M., Schonfeld, D.: Sequential and patch analyses for object removal video forgery detection and localization, IEEE Transactions on Circuits and Systems for Video Technology (Early Access), pp. 1 – 1 (2020)
Faceswap. https://github.com/MarekKowalski/FaceSwap/, Accessed: 2020-05-20
Deepfakes githup. https://github.com/deepfakes/faceswap, Accessed: 2020-05-20
Thies, J., Zollhofer, M., Stamminger,M., Theobalt, C., Nießner, M.: Face2face: Real-time face capture and reenactment of rgb videos, In: Proceedings of the IEEE conference on computer vision and pattern recognition pp. 2387–2395 (2016)
Thies, J., Zollhöfer, M., Nießner, M.: Deferred neural rendering: image synthesis using neural textures. ACM Trans. Graphics (TOG) 38(4), 1–12 (2019)
Matern, F., Riess, C., Stamminger, M.: Exploiting visual artifacts to expose deepfakes and face manipulations, In: IEEE Winter Applications of Computer Vision Workshops (WACVW). IEEE 2019, pp. 83–92 (2019)
Afchar, D., Nozick, V., Yamagishi, J., Echizen, I.: Mesonet: a compact facial video forgery detection network, In: 2018 IEEE International Workshop on Information Forensics and Security (WIFS).IEEE, pp. 1–7 (2018)
Tolosana, R., Vera-Rodriguez, R., Fierrez, J., Morales, A., Ortega-Garcia, J.: Deepfakes and beyond: A survey of face manipulation and fake detection, arXiv preprint arXiv:2001.00179 (2020)
Rossler, A., Cozzolino, D., Verdoliva, L., Riess, C., Thies, J., Nießner, M.: Faceforensics++: Learning to detect manipulated facial images, In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1–11 (2019)
Google-jigsaw. https://ai.googleblog.com/2019/09/contributing-data-to-deepfake-detection.html, Accessed: 2020-05-20
Wu, M., Trappe, W., Wang, Z.J., Liu, K.R.: Collusion-resistant fingerprinting for multimedia. IEEE Sig Process Magaz 21(2), 15–27 (2004)
Chen, S., Tan, S., Li, B., Huang, J.: Automatic detection of object-based forgery in advanced video. IEEE Trans. Circuits and Syst. Video Technol. 26(11), 2138–2151 (2016)
Danielsson, P.-E., Seger, O.: Generalized and separable sobel operators, In: Machine vision for three-dimensional scenes.Elsevier, pp. 347–379 (1990)
Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3d convolutional networks, In: Proceedings of the IEEE international conference on computer vision, pp. 4489–4497 (2015)
Chollet, F.: Xception: Deep learning with depthwise separable convolutions, In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1251–1258 (2017)
Feichtenhofer, C., Pinz, A., Zisserman, A.: Convolutional two-stream network fusion for video action recognition, In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1933–1941 (2016)
Zhou, P., Han, X., Morariu, V.I., Davis, L.S.: Two-stream neural networks for tampered face detection, In: 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). IEEE, , pp. 1831–1839 (2017)
Acknowledgements
The researcher would like to thank the Deanship of Scientific Research, Qassim University, for funding the publication of this project.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Aloraini, M. FaceMD: convolutional neural network-based spatiotemporal fusion facial manipulation detection. SIViP 17, 247–255 (2023). https://doi.org/10.1007/s11760-022-02227-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11760-022-02227-x