Abstract
In an era where digital authenticity is frequently compromised by sophisticated synthetic audio technologies, ensuring the integrity of digital media is crucial. This paper addresses the critical challenges of catastrophic forgetting and incremental learning within the domain of audio deepfake detection. We introduce a novel methodology that synergistically combines the discriminative feature extraction capabilities of SincNet with the computational efficiency of LightCNN. Our approach is further augmented by integrating Feature Distillation and Dynamic Class Rebalancing, enhancing the model’s adaptability across evolving deepfake threats while maintaining high accuracy on previously encountered data. The models were tested using the ASVspoof 2015, ASVspoof 2019, and FoR datasets, demonstrating significant improvements in detecting audio deepfakes with reduced computational overhead. Our results illustrate that the proposed model not only effectively counters the issue of catastrophic forgetting but also exhibits superior adaptability through dynamic class rebalancing and feature distillation techniques.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Wu, Z., Yamagishi, J., Kinnunen, T., Hanilçi, C., Sahidullah, M., Sizov, A., Evans, N., Todisco, M., Delgado, H.: ASVspoof: the automatic speaker verification spoofing and countermeasures challenge. IEEE Journal of Selected Topics in Signal Processing 11(4), 588–604 (2017)
J. Yi, R. Fu, J. Tao, S. Nie, H. Ma, C. Wang, T. Wang, Z. Tian, Y. Bai, C. Fan, et al., "Add 2022: the first audio deep synthesis detection challenge," in ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 9216–9220, 2022
Dixit, A., Kaur, N., Kingra, S.: Review of audio deepfake detection techniques: Issues and prospects. Expert. Syst. 40(8), e13322 (2023)
T. M. Wani and I. Amerini, "Deepfakes audio detection leveraging audio spectrogram and convolutional neural networks," in International Conference on Image Analysis and Processing, pp. 156–167, 2023
Zhang, B., Tondi, B., Barni, M.: Adversarial examples for replay attacks against CNN-based face recognition with anti-spoofing capability. Comput. Vis. Image Underst. 197, 102988 (2020)
H. Ma, J. Yi, J. Tao, Y. Bai, Z. Tian, and C. Wang, "Continual learning for fake audio detection," arXiv preprint arXiv:2104.07286, 2021
H. Shin, J. K. Lee, J. Kim, and J. Kim, "Continual learning with deep generative replay," Advances in Neural Information Processing Systems, vol. 30, 2017
Tadros, T., Krishnan, G.P., Ramyaa, R., Bazhenov, M.: Sleep-like unsupervised replay reduces catastrophic forgetting in artificial neural networks. Nat. Commun. 13(1), 7742 (2022)
Y. Patel, S. Tanwar, R. Gupta, P. Bhattacharya, I. E. Davidson, R. Nyameko, S. Aluvala, and V. Vimal, "Deepfake Generation and Detection: Case Study and Challenges," IEEE Access, 2023
L. Wang, X. Zhang, H. Su, and J. Zhu, "A comprehensive survey of continual learning: Theory, method and application," IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024
M. Ravanelli and Y. Bengio, "Speaker recognition from raw waveform with SincNet," in 2018 IEEE Spoken Language Technology Workshop (SLT), pp. 1021–1028, 2018
C. Liu, J. Li, J. Duan, H. Shen, and H. Huang, "LightCvT: Audio forgery detection via fusion of light CNN and transformer," in Proceedings of the 2021 10th International Conference on Computing and Pattern Recognition, pp. 99–105, 2021
Z. Wu, J. Yamagishi, T. Kinnunen, C. Hanilçi, M. Sahidullah, A. Sizov, N. Evans, M. Todisco, and H. Delgado,"ASVspoof: the automatic speaker verification spoofing and countermeasures challenge,"IEEE Journal of Selected Topics in Signal Processing, vol. 11, no. 4, pp. 588–604, 2017.
M. Todisco, X. Wang, V. Vestman, Md. Sahidullah, H. Delgado, A. Nautsch, J. Yamagishi, N. Evans, T. Kinnunen, and K. A. Lee, "ASVspoof 2019: Future horizons in spoofed and fake audio detection," arXiv preprint arXiv:1904.05441, 2019
R. Reimao and V. Tzerpos, "FOR: A dataset for synthetic speech detection," in 2019 International Conference on Speech Technology and Human-Computer Dialogue (SpeD), pp. 1–10, 2019
H. Ma, J. Yi, J. Tao, Y. Bai, Z. Tian, and C. Wang, "Continual learning for fake audio detection," arXiv preprint arXiv:2104.07286, 2021
X. Zhang, J. Yi, C. Wang, C. Zhang, S. Zeng, and J. Tao, "What to remember: Self-adaptive continual learning for audio deepfake detection," arXiv preprint arXiv:2312.09651, 2023
N. M. Müller, P. Czempin, F. Dieckmann, A. Froghyar, and K. Böttinger, "Does audio deepfake detection generalize?", arXiv preprint arXiv:2203.16263, 2022
X. Zhang, J. Yi, J. Tao, C. Wang, and C. Yuan Zhang, "Do you remember? Overcoming catastrophic forgetting for fake audio detection," in International Conference on Machine Learning, pp. 41819–41831, 2023
P. Kawa, M. Plata, and P. Syga, "Defense against adversarial attacks on audio deepfake detection," arXiv preprint arXiv:2212.14597, 2022
J. Khochare, C. Joshi, B. Yenarkar, S. Suratkar, and F. Kazi, “A deep learning framework for audio deepfake detection,” Arabian Journal for Science and Engineering, pp. 1–12, 2021
Yamagishi, J., Wang, X., Todisco, M., Sahidullah, M., Patino, J., Nautsch, A., Liu, X., Lee, K. A., Kinnunen, T., Evans, N., et al. “ASVspoof 2021: accelerating progress in spoofed and deepfake speech detection,” arXiv preprint arXiv:2109.00537, 2021
Acknowledgements
This study has been partially supported by SERICS (PE00000014) under the MUR National Recovery and Resilience Plan funded by the European Union – NextGenerationEU and Sapienza University of Rome project 2022–2024 “EV2” (003 009 22).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Wani, T.M., Amerini, I. (2025). Audio Deepfake Detection: A Continual Approach with Feature Distillation and Dynamic Class Rebalancing. In: Antonacopoulos, A., Chaudhuri, S., Chellappa, R., Liu, CL., Bhattacharya, S., Pal, U. (eds) Pattern Recognition. ICPR 2024. Lecture Notes in Computer Science, vol 15321. Springer, Cham. https://doi.org/10.1007/978-3-031-78305-0_14
Download citation
DOI: https://doi.org/10.1007/978-3-031-78305-0_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-78304-3
Online ISBN: 978-3-031-78305-0
eBook Packages: Computer ScienceComputer Science (R0)