Abstract
In this study, we thoroughly examined every component of speaker authentication, analyzing the input, process, and output phases to identify flaws and new threats. Our investigation is organized around specific research topics that aim to effectively address and minimize the identified dangers. By methodically exploring each component of the speaker authentication process, we not only identify possible issues but also recommend proactive methods to protect these systems from unauthorized access. Our research questions act as significant probes, allowing for a deeper knowledge of the underlying difficulties and leading to the creation of tailored authentication solutions. This study goes beyond theoretical analysis and provides practical insights and strategic recommendations for improving the security and reliability of speaker authentication systems in a variety of sectors, including cybersecurity and forensic analysis. We highlight the interrelated nature of the input, process, and output stages, emphasizing the importance of remaining vigilant in the face of emerging security risks. Our goal is to provide the necessary knowledge and tools to effectively handle the complexities of speaker authentication in the changing digital world. This work establishes a solid foundation for the development of safe and durable speaker authentication methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Anwer, H., Anjum, S., Saqib, N.A.: Robust speaker recognition for E-commerce system. In: International Conference on Radar, Antenna, Microwave, Electronics and Telecommunications, pp. 92–97 (2015). https://doi.org/10.1109/ICRAMET.2015.7380781, nULL
Araujo, T., Helberger, N., Kruikemeier, S., de Vreese, C.H.: In AI we trust? perceptions about automated decision-making by artificial intelligence. AI and Society 35, 611–623 (2020). https://doi.org/10.1007/s00146-019-00931-w
Arp, D., et al.: Dos and don’ts of machine learning in computer security. In: Proceedings of USENIX Security Symposium (2022)
Bhugra, D., McKenzie, K.: Expressed emotion across cultures. Adv. Psychiatric Treatment 9, 342–348 (2003). https://doi.org/10.1192/apt.9.5.342
Bibi, K., Naz, S., Rehman, A.: Biometric signature authentication using machine learning techniques: Current trends, challenges and opportunities. Multimedia Tools Appli. 79, 289–340 (2020). https://doi.org/10.1007/s11042-019-08022-0
Bodepudi, A., Reddy, M.: Spoofing attacks and mitigation strategies in biometrics-as-a-service systems. Eigenpub Rev. Sci. Technol. 4, 1–14 (2020)
Ciarrochi, J., Scott, G., Deane, F.P., Heaven, P.C.: Relations between social and emotional competence and mental health: a construct validation study. Personality Individual Differences 35, 1947–1963 (12003). https://doi.org/10.1016/S0191-8869(03)00043-6
Duan, S., Zhang, J., Roe, P., Towsey, M.: A survey of tagging techniques for music, speech and environmental sound. Artifi. Intell. Rev. 42, 637–661 (2014) https://doi.org/10.1007/s10462-012-9362-y
Fatin, I., Zulkifli, D.B., Fuad, N., Marwan, M.E.: An application of biometric security identification for automated teller machine. ACCESS Online J. IJACSSE-Inter. J. Adv. Comput. Syst. Softw. Eng. 1, 6–11 (2020)
Galvez, D., et al.: The people’s speech: A large-scale diverse english speech recognition dataset for commercial usage. arXiv preprint arXiv:2111.09344 (2021)
Hancock, J.T., Landrigan, C., Silver, C.: Expressing emotion in text-based communication, pp. 929–932 (2007). https://doi.org/10.1145/1240624.1240764
Hizlisoy, S., Arslan, R.S.: Text independent speaker recognition based on MFCC and machine learning. Selcuk Univ. J. Eng. Sci. 20, 73–078 (2021). http://sujes.selcuk.edu.tr
Jiang, P., et al.: Securing liveness detection for voice authentication via Pop Noises. IEEE Trans. Dependable Sec. Comput. 20, 1702–1718 (2023). https://doi.org/10.1109/TDSC.2022.3163024
Jürgens, R., Grass, A., Drolet, M., Fischer, J.: Effect of acting experience on emotion expression and recognition in voice: non-actors provide better stimuli than expected. J. Nonverbal Behav. 39, 195–214 (2015). https://doi.org/10.1007/s10919-015-0209-5
Kaplan, R.M., Chambers, D.A., Glasgow, R.E.: Big data and large sample size: a cautionary note on the potential for bias (2014). https://doi.org/10.1111/cts.12178
Khokher, R., Singh, R.C.: Footprint Identification: Review of an Emerging Biometric Trait. Macromolecular Symposia 397 (2021). https://doi.org/10.1002/masy.202000246
Lindebaum, D., Jordan, P.J.: Positive emotions, negative emotions, or utility of discrete emotions? J. Organizat. Behav. 33, 1027–1030 (2012). https://doi.org/10.1002/job.1819
Lumini, A., Nanni, L.: Overview of the combination of biometric matchers. Inform. Fus. 33, 71–85 (2017). https://doi.org/10.1016/j.inffus.2016.05.003
Mau, T.L., et al.: Professional actors demonstrate variability, not stereotypical expressions, when portraying emotional states in photographs. Nat. Commun. 12 (2021).https://doi.org/10.1038/s41467-021-25352-6
Moreno, J.D., Martínez-Huertas, J., Olmos, R., Jorge-Botana, G., Botella, J.: Can personality traits be measured analyzing written language? a meta-analytic study on computational methods. Personal. Individual Differences 177 (2021). https://doi.org/10.1016/j.paid.2021.110818
Nam, H., Kim, S.H., Park, Y.H.: FilterAugment: an acoustic environmental data augmentation method, vol. 2022, pp. 4308–4312. Institute of Electrical and Electronics Engineers Inc. (May 2022) https://doi.org/10.1109/ICASSP43922.2022.9747680
Naveen, R., Reddy, C.J., Tanguturu, R., Kumar, M.A.: Speaker identification and verification using deep learning. Instit. Electr. Electr. En. Inc. (2022). https://doi.org/10.1109/ICoNSIP49665.2022.10007520
Pal, K., Patel, B.V.: Data Classification with k-fold Cross Validation and Holdout Accuracy Estimation Methods with 5 Different Machine Learning Techniques, pp. 83–87. Institute of Electrical and Electronics Engineers Inc. (Mar 2020). https://doi.org/10.1109/ICCMC48092.2020.ICCMC-00016
Pandey, L., Arif, A.S.: Silent Speech and Emotion Recognition from Vocal Tract Shape Dynamics in Real-Time MRI, pp. 1–8 (8 2021). http://arxiv.org/abs/2106.08706
Patel, Y., et al.: Deepfake generation and detection: case study and challenges. IEEE Access 11, 143296–143323 (2023). https://doi.org/10.1109/ACCESS.2023.3342107
Pranto, S.I., et al.: AIMS TALK: Intelligent Call Center Support in Bangla Language with Speaker Authentication. Institute of Electrical and Electronics Engineers Inc. (2021). https://doi.org/10.1109/ETCCE54784.2021.9689831
Ramezan, C.A., Warner, T.A., Maxwell, A.E., Price, B.S.: Effects of training set size on supervised machine-learning land-cover classification of large-area high-resolution remotely sensed data. Remote Sensing 13, 1–27 (2021). https://doi.org/10.3390/rs13030368
Salehi, A.W., et al.: A Study of CNN and Transfer Learning in Medical Imaging: Advantages, Challenges, Future Scope ( Apr 2023).https://doi.org/10.3390/su15075930
Schirmer, A., Adolphs, R.: Emotion perception from face, voice, and touch: comparisons and convergence. Trends Cognitive Sci. 21, 216–228 (2017). https://doi.org/10.1016/j.tics.2017.01.001, https://doi.org/10.1016/j.tics.2017.01.001
Shrestha, Y.R., Ben-Menahem, S.M., von Krogh, G.: Organizational decision-making structures in the age of artificial intelligence. California Mana. Rev. 66–83 (2019). https://doi.org/10.1177/0008125619862257
Singh, M., Pati, D.: Countermeasures to Replay Attacks: A Review. IETE Technical Review (Institution of Electronics and Telecommunication Engineers, India) 37, 599–614 (2020). https://doi.org/10.1080/02564602.2019.1684851
Story, B.H., et al.: An age-dependent vocal tract model for males and females based on anatomic measurements 3079 (2018). https://doi.org/10.1121/1.5038264
Szijjarto, L., Bereczkei, T.: The machiavellians’ “cool syndrome”: they experience intensive feelings but have difficulties in expressing their emotions. Current Psychol. 34, 363–375 (2015).https://doi.org/10.1007/s12144-014-9262-1
Taylor, S., Dromey, C., Nissen, S.L., Tanner, K., Eggett, D., Corbin-Lewis, K.: Age-related changes in speech and voice: spectral and cepstral measures. J. Speech, Lang. Hearing Res. 63, 647–660 (2020). https://doi.org/10.1044/2019_JSLHR-19-00028
Turgeman, A., Zelazny, F.: Invisible challenges: the next step in behavioural biometrics?. Biometric Technol. Today 2017, 5–7 (2017). https://doi.org/10.1016/S0969-4765(17)30114-5
Wang, Q., Wang, M., Yang, Y., Zhang, X.: Multi-modal emotion recognition using EEG and speech signals. Comput. Biol. Med. 149 (2022). https://doi.org/10.1016/j.compbiomed.2022.105907
Wells, A., Usman, A.B.: Privacy and biometrics for smart healthcare systems: attacks, and techniques. Information Security J. (2023). https://doi.org/10.1080/19393555.2023.2260818
Wu, L., Yang, J., Zhou, M., Chen, Y., Wang, Q.: LVID: a multimodal biometrics authentication system on smartphones. IEEE Trans. Inf. Forensics Secur. 15, 1572–1585 (2020). https://doi.org/10.1109/TIFS.2019.2944058
Yang, W., Wang, S., Hu, J., Zheng, G., Valli, C.: Security and accuracy of fingerprint-based biometrics: a review. Symmetry 11 (2019).https://doi.org/10.3390/sym11020141
Zloteanu, M., Krumhuber, E.G.: Expression Authenticity: The Role of Genuine and Deliberate Displays in Emotion Perception. Front. Psychol. 11 (1 2021) https://doi.org/10.3389/fpsyg.2020.611248
Özaydın, S.: Examination of energy based voice activity detection algorithms for noisy speech signals. Euro. J. Sci. Technol., pp. 157–163 (2019). https://doi.org/10.31590/ejosat.637741
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 IFIP International Federation for Information Processing
About this paper
Cite this paper
van Rensburg, E.J., Botha, R.A., Haskins, B. (2025). Research Agenda for Speaker Authentication. In: Clarke, N., Furnell, S. (eds) Human Aspects of Information Security and Assurance. HAISA 2024. IFIP Advances in Information and Communication Technology, vol 721. Springer, Cham. https://doi.org/10.1007/978-3-031-72559-3_19
Download citation
DOI: https://doi.org/10.1007/978-3-031-72559-3_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72558-6
Online ISBN: 978-3-031-72559-3
eBook Packages: Computer ScienceComputer Science (R0)