Text-Independent Speaker Identification Using Formants and Convolutional Neural Networks

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13068))

Included in the following conference series:

Mexican International Conference on Artificial Intelligence

806 Accesses

Abstract

Text-Independent Speaker Identification consists in finding out the identity of an individual using his/her voice independently of the content of the speech signal, that is, regardless the words uttered by the speaker. This problem is harder than Text-dependent speaker recognition where the speaker has to utter some specific word or phrase so he/she can be recognized. However, Text-Independent Speaker Identification is what we have to solve when the speaker has to be recognized without his/her collaboration as is frequently the case in many practical situations. Our proposal consists in searching within the speech signal for voiced speech content, which is the kind of speech produced when the vocal cords are vibrating. Once these segments of speech are identified, the formants are determined, formants are the resonance frequencies of the vocal tract. We use these formants to produce images which we believe should be different from one speaker to another, the way such images are built is original. Each image represent a specific speaker and so the problem of identifying speakers is turned into a problem of image recognition and we know how useful convolutional neural networks are for that purpose. For our experiments we used a collection of recordings from 21 individuals and achieved an accuracy of 92% outperforming the best results for text-independent identification published in recent works that used the same collection for testing.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 79.50; Price includes VAT (United Kingdom)

Softcover Book: GBP 99.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Text-Independent Speaker Recognition Using Deep Learning

Text-Independent Speaker Identification with Glottal Flow and 1D Convolutional Neural Networks

Speaker Identification Using Entropygrams and Convolutional Neural Networks

References

Almaadeed, N., Aggoun, A., Amira, A.: Text-independent speaker identification using vowel formants. J. Sig. Process. Syst. 82(3), 345–356 (2016). https://doi.org/10.1007/s11265-015-1005-5
Article Google Scholar
Atal, B.: Automatic recognition of speakers from their voices. Proc. IEEE 64(4), 460–475 (1976). https://doi.org/10.1109/PROC.1976.10155
Article Google Scholar
Besacier, L., Bonastre, J.F.: Subband architecture for automatic speaker recognition. Sig. Process. 80(7), 1245–1259 (2000)
Article Google Scholar
Bunrit, S., Inkian, T., Kerdprasop, N., Kerdprasop, K.: Text-independent speaker identification using deep learning model of convolution neural network. Int. J. Mach. Learn. Comput. 9, 143–148 (2019). https://doi.org/10.18178/ijmlc.2019.9.2.778
Article Google Scholar
Camarena-Ibarrola, A., Castro-Coria, M., Figueroa, K.: Cloud point matching for text-independent speaker identification. In: 2018 IEEE International Autumn Meeting on Power, Electronics and Computing (ROPEC), pp. 1–6 (2018). https://doi.org/10.1109/ROPEC.2018.8661454
Camarena-Ibarrola, A., Figueroa, K., García, J.: Speaker identification using entropygrams and convolutional neural networks. In: Martínez-Villaseñor, L., Herrera-Alcántara, O., Ponce, H., Castro-Espinoza, F.A. (eds.) MICAI 2020. LNCS (LNAI), vol. 12468, pp. 23–34. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-60884-2_2
Chapter Google Scholar
Camarena-Ibarrola, A., Luque, F., Chavez, E.: Speaker identification through spectral entropy analysis. In: 2017 IEEE International Autumn Meeting on Power, Electronics and Computing (ROPEC), pp. 1–6 (2017). https://doi.org/10.1109/ROPEC.2017.8261607
Hermansky, H.: Perceptual linear predictive (PLP) analysis of speech. J. Acoust. Soc. Am. 87(4), 1738–1752 (1990)
Article Google Scholar
Lieberman, P., Blumstein, S.E.: Speech Physiology, Speech Perception, and Acoustic Phonetics. Cambridge University Press, Cambridge (1988)
Book Google Scholar
Luque-Suárez, F., Camarena-Ibarrola, A., Chávez, E.: Efficient speaker identification using spectral entropy. Multimed. Tools Appl. 78(12), 16803–16815 (2019). https://doi.org/10.1007/s11042-018-7035-9
Article Google Scholar
Plumpe, M.D., Quatieri, T.F., Reynolds, D.A.: Modeling of the glottal flow derivative waveform with application to speaker identification. IEEE Trans. Speech Audio Process. 7(5), 569–586 (1999)
Article Google Scholar
Rosenberg, A.: Automatic speaker verification: a review. Proc. IEEE 64(4), 475–487 (1976). https://doi.org/10.1109/PROC.1976.10156
Article Google Scholar
Snell, R., Milinazzo, F.: Formant location from LPC analysis data. IEEE Trans. Speech Audio Process. 1(2), 129–134 (1993). https://doi.org/10.1109/89.222882
Article Google Scholar
Taseer, S.K.: Speaker identification for speakers with deliberately disguised voices using glottal pulse information. In: 2005 Pakistan Section Multitopic Conference, pp. 1–5 (2005). https://doi.org/10.1109/INMIC.2005.334384
Thévenaz, P., Hügli, H.: Usefulness of the LPC-residue in text-independent speaker verification. Speech Commun. 17(1–2), 145–157 (1995)
Article Google Scholar
Yu, J.C., Zhang, R.L.: Speaker recognition method using MFCC and LPCC features. Comput. Eng. Des. 30(5), 1189–1191 (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Facultad de Ingeniera Eléctrica, División de Estudios de Postgrado, Universidad Michoacana de San Nicolás de Hidalgo, 58000, Morelia, Mich, Mexico
Antonio Camarena-Ibarrola & Miguel Reynoso
Facultad de Ciencias Fisico-Matemáticas, Universidad Michoacana de San Nicolás de Hidalgo, 58000, Morelia, Mich, Mexico
Karina Figueroa

Authors

Antonio Camarena-Ibarrola
View author publications
You can also search for this author in PubMed Google Scholar
Miguel Reynoso
View author publications
You can also search for this author in PubMed Google Scholar
Karina Figueroa
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Antonio Camarena-Ibarrola .

Editor information

Editors and Affiliations

Instituto Politécnico Nacional, Centro de Investigación en Computación, Mexico City, Mexico
Ildar Batyrshin
Instituto Politécnico Nacional, Centro de Investigación en Computación, Mexico City, Mexico
Alexander Gelbukh
Instituto Politécnico Nacional, Centro de Investigación en Computación, Mexico City, Mexico
Grigori Sidorov

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Camarena-Ibarrola, A., Reynoso, M., Figueroa, K. (2021). Text-Independent Speaker Identification Using Formants and Convolutional Neural Networks. In: Batyrshin, I., Gelbukh, A., Sidorov, G. (eds) Advances in Soft Computing. MICAI 2021. Lecture Notes in Computer Science(), vol 13068. Springer, Cham. https://doi.org/10.1007/978-3-030-89820-5_9

Download citation

DOI: https://doi.org/10.1007/978-3-030-89820-5_9
Published: 21 October 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-89819-9
Online ISBN: 978-3-030-89820-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics