Abstract
Text-Independent Speaker Identification consists in finding out the identity of an individual using his/her voice independently of the content of the speech signal, that is, regardless the words uttered by the speaker. This problem is harder than Text-dependent speaker recognition where the speaker has to utter some specific word or phrase so he/she can be recognized. However, Text-Independent Speaker Identification is what we have to solve when the speaker has to be recognized without his/her collaboration as is frequently the case in many practical situations. Our proposal consists in searching within the speech signal for voiced speech content, which is the kind of speech produced when the vocal cords are vibrating. Once these segments of speech are identified, the formants are determined, formants are the resonance frequencies of the vocal tract. We use these formants to produce images which we believe should be different from one speaker to another, the way such images are built is original. Each image represent a specific speaker and so the problem of identifying speakers is turned into a problem of image recognition and we know how useful convolutional neural networks are for that purpose. For our experiments we used a collection of recordings from 21 individuals and achieved an accuracy of 92% outperforming the best results for text-independent identification published in recent works that used the same collection for testing.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Almaadeed, N., Aggoun, A., Amira, A.: Text-independent speaker identification using vowel formants. J. Sig. Process. Syst. 82(3), 345–356 (2016). https://doi.org/10.1007/s11265-015-1005-5
Atal, B.: Automatic recognition of speakers from their voices. Proc. IEEE 64(4), 460–475 (1976). https://doi.org/10.1109/PROC.1976.10155
Besacier, L., Bonastre, J.F.: Subband architecture for automatic speaker recognition. Sig. Process. 80(7), 1245–1259 (2000)
Bunrit, S., Inkian, T., Kerdprasop, N., Kerdprasop, K.: Text-independent speaker identification using deep learning model of convolution neural network. Int. J. Mach. Learn. Comput. 9, 143–148 (2019). https://doi.org/10.18178/ijmlc.2019.9.2.778
Camarena-Ibarrola, A., Castro-Coria, M., Figueroa, K.: Cloud point matching for text-independent speaker identification. In: 2018 IEEE International Autumn Meeting on Power, Electronics and Computing (ROPEC), pp. 1–6 (2018). https://doi.org/10.1109/ROPEC.2018.8661454
Camarena-Ibarrola, A., Figueroa, K., García, J.: Speaker identification using entropygrams and convolutional neural networks. In: Martínez-Villaseñor, L., Herrera-Alcántara, O., Ponce, H., Castro-Espinoza, F.A. (eds.) MICAI 2020. LNCS (LNAI), vol. 12468, pp. 23–34. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-60884-2_2
Camarena-Ibarrola, A., Luque, F., Chavez, E.: Speaker identification through spectral entropy analysis. In: 2017 IEEE International Autumn Meeting on Power, Electronics and Computing (ROPEC), pp. 1–6 (2017). https://doi.org/10.1109/ROPEC.2017.8261607
Hermansky, H.: Perceptual linear predictive (PLP) analysis of speech. J. Acoust. Soc. Am. 87(4), 1738–1752 (1990)
Lieberman, P., Blumstein, S.E.: Speech Physiology, Speech Perception, and Acoustic Phonetics. Cambridge University Press, Cambridge (1988)
Luque-Suárez, F., Camarena-Ibarrola, A., Chávez, E.: Efficient speaker identification using spectral entropy. Multimed. Tools Appl. 78(12), 16803–16815 (2019). https://doi.org/10.1007/s11042-018-7035-9
Plumpe, M.D., Quatieri, T.F., Reynolds, D.A.: Modeling of the glottal flow derivative waveform with application to speaker identification. IEEE Trans. Speech Audio Process. 7(5), 569–586 (1999)
Rosenberg, A.: Automatic speaker verification: a review. Proc. IEEE 64(4), 475–487 (1976). https://doi.org/10.1109/PROC.1976.10156
Snell, R., Milinazzo, F.: Formant location from LPC analysis data. IEEE Trans. Speech Audio Process. 1(2), 129–134 (1993). https://doi.org/10.1109/89.222882
Taseer, S.K.: Speaker identification for speakers with deliberately disguised voices using glottal pulse information. In: 2005 Pakistan Section Multitopic Conference, pp. 1–5 (2005). https://doi.org/10.1109/INMIC.2005.334384
Thévenaz, P., Hügli, H.: Usefulness of the LPC-residue in text-independent speaker verification. Speech Commun. 17(1–2), 145–157 (1995)
Yu, J.C., Zhang, R.L.: Speaker recognition method using MFCC and LPCC features. Comput. Eng. Des. 30(5), 1189–1191 (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Camarena-Ibarrola, A., Reynoso, M., Figueroa, K. (2021). Text-Independent Speaker Identification Using Formants and Convolutional Neural Networks. In: Batyrshin, I., Gelbukh, A., Sidorov, G. (eds) Advances in Soft Computing. MICAI 2021. Lecture Notes in Computer Science(), vol 13068. Springer, Cham. https://doi.org/10.1007/978-3-030-89820-5_9
Download citation
DOI: https://doi.org/10.1007/978-3-030-89820-5_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-89819-9
Online ISBN: 978-3-030-89820-5
eBook Packages: Computer ScienceComputer Science (R0)