Deep normalization for light SpineNet speaker anti-spoofing systems

149 Accesses
Explore all metrics

Abstract

Despite their impressive performance in controlled conditions, current speaker recognition systems still face challenges related to the diversity of real-world situations, including unpredictable noisy conditions and spoofing attacks. This paper presents a novel approach that optimizes the deployment of automatic speaker verification spoofing countermeasures. An innovative normalization process is proposed to adapt Light SpineNet-based countermeasure vectors for this optimization in conjunction with the probabilistic linear discriminant analysis (PLDA) scoring method. Three normalization techniques –maximum Gaussianality discriminative normalization flow (MG-DNF), maximum likelihood discriminative normalization flow (ML-DNF), and variational autoencoder regularization (VAE)– are assessed by using the logical access evaluation dataset of the ASVspoof 2021 challenge edition. This dataset includes diverse transmission artifacts and realistic conditions, enabling the evaluation of the ability of the normalized Light SpineNet-based countermeasures embedding to prevent spoofing attacks. The results showed the effectiveness of the introduced normalization approach within the LSpineNet-based anti-spoofing system. The LSpineNet49-GM-DNF countermeasure embedding achieved the best performance compared to DNF-, VAE-based, and current state-of-the-art systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Anti-spoofing Methods for Automatic Speaker Verification System

Defending Adversarial Attacks Against ASV Systems Using Spectral Masking

Article 22 April 2024

Automatic speaker verification systems and spoof detection techniques: review and analysis

Article 16 August 2021

Data Availability

The datasets and code generated and/or analyzed during the current study are available on reasonable request.

Notes

https://www.asvspoof.org/
The terms “countermeasure embeddings” and “countermeasure vectors” are used interchangeably in this paper, referring to the feature representation generally extracted from the penultimate fully connected layer of DNN backbones.
https://github.com/hyperion-ml/hyperion
https://github.com/Caiyq2019/MG

References

Yamagishi J, Wang X, Todisco M, Sahidullah M, Patino J, Nautsch A, Liu X, Lee KA, Kinnunen T, Evans N, Delgado H (2021) ASVspoof 2021: accelerating progress in spoofed and deepfake speech detection. In: Proc. 2021 Edition of the automatic speaker verification and spoofing countermeasures challenge. 47–54
Liu X, Wang X, Sahidullah M, Patino J, Delgado H, Kinnunen T, Todisco M, Yamagishi J, Evans N, Nautsch A, Lee KA (2023) Asvspoof 2021: Towards spoofed and deepfake speech detection in the wild. IEEE/ACM Trans Audio, Speech, and Language Process 31:2507–2522
Article Google Scholar
Khan A, Malik KM, Ryan J, Saravanan M (2023) Battling voice spoofing: a review, comparative analysis, and generalizability evaluation of state-of-the-art voice spoofing counter measures. Artif Intell Rev 56:513–566
Article Google Scholar
Li M, Ahmadiadli Y, Zhang X-P (2024) Audio anti-spoofing detection: A survey. arXiv:2404.13914
Wang X, Yamagishi J (2021) A Comparative Study on Recent Neural Spoofing Countermeasures for Synthetic Speech Detection. In: Proc. Interspeech 2021. pp 4259–4263
Tak H, Patino J, Todisco M, Nautsch A, Evans N, Larcher A (2021) End-to-end anti-spoofing with rawnet2. In: ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). pp 6369–6373
Wang X, Yamagishi J, Todisco M, Delgado H, Nautsch A, Evans N, Sahidullah M, Vestman V, Kinnunen T, Lee KA et al (2020) Asvspoof 2019: A large-scale public database of synthesized, converted and replayed speech. Computer Speech & Language 64:101114
Article Google Scholar
Khan A, Malik KM, Nawaz S (2024) Frame-to-utterance convergence: A spectra-temporal approach for unified spoofing detection. In: ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). pp 10761–10765
Zhang Y, Li Z, Lu J, Wang W, Zhang P (2024) Synthetic speech detection based on the temporal consistency of speaker features. IEEE Signal Process Lett 31:944–948
Article Google Scholar
Lei Z, Yan H, Liu C, Zhou Y, Ma M (2024) GMM-ResNet2: Ensemble of group ResNet networks for synthetic speech detection. In: ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). pp 12101–12105
Wen P, Hu K, Yue W, Zhang S, Zhou W, Wang Z (2023) Robust Audio Anti-Spoofing with Fusion-Reconstruction Learning on Multi-Order Spectrograms. In: Proc. INTERSPEECH 2023. pp 271–275
Li J, Long Y, Li Y, Xu D (2023) Advanced RawNet2 with Attention-based Channel Masking for Synthetic Speech Detection. In: Proc. INTERSPEECH 2023. pp 2788–2792
Tak H, Jung J-w, Patino J, Kamble M, Todisco M, Evans N (2021) End-to-end spectro-temporal graph attention networks for speaker verification anti-spoofing and speech deepfake detection. In: Proc. 2021 Edition of the automatic speaker verification and spoofing countermeasures challenge. pp 1–8
Ge W, Patino J, Todisco M, Evans N (2021) Raw Differentiable Architecture Search for Speech Deepfake and Spoofing Detection. In: Proc. 2021 Edition of the automatic speaker verification and spoofing countermeasures challenge. pp 22–28
Benhafid Z, Selouani SA, Yakoub MS, Amrouche A (2021) LARIHS ASSERT Reassessment for Logical Access ASVspoof 2021 Challenge. In: Proc. 2021 Edition of the automatic speaker verification and spoofing countermeasures challenge. pp 94–99
Guo Y, Huang H, Chen X, Zhao H, Wang Y (2024) Audio deepfake detection with self-supervised WavLM and multi-fusion attentive classifier. In: ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). pp 12702–12706
Wang X, Yamagishi J (2022) Investigating Self-Supervised Front Ends for Speech Spoofing Countermeasures. In: Proc. The speaker and language recognition workshop (Odyssey 2022). pp 100–106
Tak, H., Todisco, M., Wang, X., Jung, J.-w., Yamagishi, J., Evans, N.: Automatic Speaker Verification Spoofing and Deepfake Detection Using Wav2vec 2.0 and Data Augmentation. In: Proc. The Speaker and Language Recognition Workshop (Odyssey 2022), pp 112–119 (2022)
Du X, Lin T-Y, Jin P, Ghiasi G, Tan M, Cui Y, Le QV, Song X (2020) Spinenet: Learning scale-permuted backbone for recognition and localization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 11592–11601
Rybicka M, Villalba J, Żelasko P, Dehak N, Kowalczyk K (2021) Spine2Net: SpineNet with Res2Net and Time-Squeeze-and-Excitation Blocks for Speaker Recognition. In: Proc. Interspeech 2021. pp 496–500
Kingma DP, Welling M (2014) Auto-encoding variational bayes. In: International conference on learning representations
Zhang Y, Li L, Wang D (2019) VAE-Based Regularization for Deep Speaker Embedding. In: Proc. Interspeech 2019. pp 4020–4024
Benhafid Z, Selouani SA, Amrouche A (2023) Light-spinenet variational autoencoder for logical access spoof utterances detection in speaker verification systems. In: 2023 5th International Conference on Bio-engineering for Smart Technologies (BioSMART). pp 1–4
Cai Y, Li L, Abel A, Zhu X, Wang D (2021) Deep normalization for speaker vectors. IEEE/ACM Trans Audio, Speech, Language Process 29:733–744
Article Google Scholar
Cai Y, Li L, Abel A, Zhu X, Wang D (2024) Maximum gaussianality training for deep speaker vector normalization. Pattern Recogn 145:109977
Article Google Scholar
He K, Zhang X, Ren S, Sun J (2015) Deep Residual Learning for Image Recognition. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, vol. 2016-Decem. pp 770–778
Gao S-H, Cheng M-M, Zhao K, Zhang X-Y, Yang M-H, Torr P (2021) Res2Net: A New Multi-scale Backbone Architecture. IEEE Trans Pattern Anal Mach Intell 43(2):652–662
Article Google Scholar
Hu J, Shen L, Sun G (2018) Squeeze-and-Excitation Networks. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition. pp 7132–7141
Kataria S, Nidadavolu PS, Villalba J, Chen N, García-Perera P, Dehak N (2020) Feature enhancement with deep feature losses for speaker verification. In: ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). pp 7584–7588
Kobyzev I, Prince SJD, Brubaker MA (2021) Normalizing flows: An introduction and review of current methods. IEEE Trans Pattern Anal Mach Intell 43(11):3964–3979
Article Google Scholar
Jakubec M, Jarina R, Lieskovska E, Kasak P (2024) Deep speaker embeddings for speaker verification: Review and experimental comparison. Eng Appl Artif Intell 127:107232
Article Google Scholar
Kenny P, Stafylakis T, Ouellet P, Alam MJ, Dumouchel P (2013) Plda for speaker verification with utterances of arbitrary duration. In: 2013 IEEE international conference on acoustics, speech and signal processing. IEEE, pp 7649–7653
Villalba J, Chen N, Snyder D, Garcia-Romero D, McCree A, Sell G, Borgstrom J, García-Perera LP, Richardson F, Dehak R, Torres-Carrasquillo PA, Dehak N (2020) State-of-the-art speaker recognition with neural network embeddings in nist sre18 and speakers in the wild evaluations. Comput Speech Lang 60:101026
Article Google Scholar
Snyder D, Chen G, Povey D (2015) MUSAN: A Music, Speech, and Noise Corpus. arXiv:1510.08484
Papamakarios G, Pavlakou T, Murray I (2017) Masked autoregressive flow for density estimation. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R (eds) Advances in neural information processing systems, vol. 30
Kinnunen T, Delgado H, Evans N, Lee KA, Vestman V, Nautsch A, Todisco M, Wang X, Sahidullah M, Yamagishi J, Reynolds DA (2020) Tandem assessment of spoofing countermeasures and automatic speaker verification: Fundamentals. IEEE/ACM TransAudio, Speech, Lang Process 28:2195–2210
Article Google Scholar
Sinha S, Dey S, Saha G (2024) Improving self-supervised learning model for audio spoofing detection with layer-conditioned embedding fusion. Comput Speech Lang 86:101599
Article Google Scholar
Martín-Doñas JM, Álvarez A (2022) The vicomtech audio deepfake detection system based on wav2vec2 for the 2022 add challenge. In: ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). pp 9241–9245
Dişken G (2024) Complementary regional energy features for spoofed speech detection. Comput Speech Lang 85:101602
Article Google Scholar

Download references

Acknowledgements

Authors would like to thank the Digital Research Alliance of Canada for supplying the computational resources used to achieve the experiments.

Funding

This work has received funding from the Natural Sciences and Engineering Research Council of Canada under the reference number RGPIN-2018-05221.

Author information

Authors and Affiliations

LAboratoire de Recherche en Interaction Humain-Système (LARIHS), 218 Boul.J.D. Gauthier, Shippagan, E8S 1P6, NB, Canada
Zhor Benhafid & Sid Ahmed Selouani
Laboratoire de Communication Parlée et Traitement du Signal (LCPTS), Bab Ezzouar, 16111, Algiers, Algeria
Zhor Benhafid & Abderrahmane Amrouche

Authors

Zhor Benhafid
View author publications
You can also search for this author in PubMed Google Scholar
Sid Ahmed Selouani
View author publications
You can also search for this author in PubMed Google Scholar
Abderrahmane Amrouche
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sid Ahmed Selouani.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Benhafid, Z., Selouani, S.A. & Amrouche, A. Deep normalization for light SpineNet speaker anti-spoofing systems. Multimed Tools Appl 83, 80261–80275 (2024). https://doi.org/10.1007/s11042-024-19892-4

Download citation

Received: 21 January 2024
Revised: 22 June 2024
Accepted: 13 July 2024
Published: 23 July 2024
Issue Date: October 2024
DOI: https://doi.org/10.1007/s11042-024-19892-4

Deep normalization for light SpineNet speaker anti-spoofing systems

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Anti-spoofing Methods for Automatic Speaker Verification System

Defending Adversarial Attacks Against ASV Systems Using Spectral Masking

Automatic speaker verification systems and spoof detection techniques: review and analysis

Data Availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Deep normalization for light SpineNet speaker anti-spoofing systems

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Anti-spoofing Methods for Automatic Speaker Verification System

Defending Adversarial Attacks Against ASV Systems Using Spectral Masking

Automatic speaker verification systems and spoof detection techniques: review and analysis

Data Availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now