Abstract
Building a system to identify individuals through their speech recording can find its application in diverse areas, such as telephone shopping, voice mail and security control. However, building such systems is a tricky task because of the vast range of differences in the human voice. Thus, selecting strong features becomes very crucial for the recognition system. Therefore, a speaker recognition system based on new spin-image descriptors (SISR) is proposed in this paper. In the proposed system, circular windows (spins) are extracted from the frequency domain of the spectrogram image of the sound, and then a run length matrix is built for each spin, to work as a base for feature extraction tasks. Five different descriptors are generated from the run length matrix within each spin and the final feature vector is then used to populate a deep belief network for classification purpose. The proposed SISR system is evaluated using the English language Speech Database for Speaker Recognition (ELSDSR) database. The experimental results were achieved with 96.46 accuracy; showing that the proposed SISR system outperforms those reported in the related current research work in terms of recognition accuracy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Kekre, H.B., Kulkarni, V., Gaikar, P., Gupta, N.: Speaker identification using spectrograms of varying frame sizes. Int. J. Comput. Appl. 50(20), 27–33 (2012)
Dhakal, P., Damacharla, P., Javaid, A.Y.: A near real-time automatic speaker recognition architecture for voice-based user interface. Mach. Learn. Knowl. Extr. 1, 504–520 (2019)
Chauhan, T., Soni, H., Zafar, S.: A review of automatic speaker recognition system. Int. J. Soft Comput. Eng. (IJSCE) 3(4), 132–135 (2013)
Fandrianto, A., Jin, A., Neelappa, A.: Speaker recognition using deep belief networks. [CS 229] Fall, 14 December 2012. http://cs229.stanford.edu/proj2012/JinFandriantoNeelappa-SpeakerRecognitionUsingDeepBeliefNetworks.pdf. Accessed 20 Apr 2019
Dennis, J., Dat, T.H., Li, H.: Spectrogram image feature for sound event classification in mismatched conditions. IEEE Signal Process. Lett. 18(2), 130–133 (2011)
Neammalai, P, Phimoltares, S., Lursinsap, C.: Speech and music classification using hybrid form of spectrogram and fourier transformation. In: APSIPA (2014)
Nguyen, Q.T., Bui, T.D.: Speech classification using SIFT features on spectrogram images. Vietnam J. Comput. Sci. 3, 247–257 (2016)
Radionov, A., Aliev, V., Shvets, A.A.: Deep learning approaches for understanding simple speech commands. arXiv:1810.02364v1 [cs.SD] (2018)
Saady, M.R., El-Borey, H., El-Dahshan, E.S.A., Yahia, S.: Stand-alone intelligent voice recognition system. J. Signal Inf. Process. 5(04), 70–75 (2014)
Bora, A., Vajpai, J., Sanjay, G.: Speaker identification for biometric access control using hybrid features. Int. J. Comput. Sci. Eng. (IJCSE) 9(11), 666–673 (2017)
Soleymanpour, H.M.M.: Text-independent speaker identification based on selection of the most similar feature vectors. Int. J. Speech Technol. 20, 99–108 (2017)
Padmaja, J.N., Rao, R.R.: A comparative study of silence and non silence regions of speech signal using prosody features for emotion recognition. Indian J. Comput. Sci. Eng. (IJCSE) 7(4), 153–161 (2016)
Umbaugh, S.E.: Digital Image Processing and Analysis. CRC Press, London (2010)
Makandar, A., Halalli, B.: Image enhancement techniques using highpass and lowpass filters. Int. J. Comput. Appl. 109(14), 12–15 (2015)
Farina, A.: Methods. Springer, Dordrecht (2014)
Baraa, A.K., Abdullah, N.A.Z., Abood, Q.K.: Hand written signature verification based on geometric and grid features. Iraqi J. Sci. 56(2C), 1799–1809 (2015)
Patel, V., Mistree, K.: A review on different image interpolation techniques for image enhancement. Int. J. Emerg. Technol. Adv. Eng. 3(12), 129–133 (2013)
Goshtasby, A.A.: Image Registration Principles Tools and Methods. Springer, London (2012)
Bondarenko, A., Borisov, A.: Research on the classification ability of deep belief networks on small and medium datasets. Inf. Technol. Manag. Sci. 16, 60–65 (2013)
Pezeshki, M., Gholami, S.: Distinction between features extracted using deep belief networks, pp. 1–4. arXiv:1312.6157v2 [cs.LG] (2014)
English language speech database for speaker recognition (ELSDSR). http://www.imm.dtu.dk/~lfen/elsdsr/index.php?page=index. Accessed 20 Mar 2019
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Mohammed, S.N., Jabir, A.J., Abbas, Z.A. (2020). Spin-Image Descriptors for Text-Independent Speaker Recognition. In: Saeed, F., Mohammed, F., Gazem, N. (eds) Emerging Trends in Intelligent Computing and Informatics. IRICT 2019. Advances in Intelligent Systems and Computing, vol 1073. Springer, Cham. https://doi.org/10.1007/978-3-030-33582-3_21
Download citation
DOI: https://doi.org/10.1007/978-3-030-33582-3_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-33581-6
Online ISBN: 978-3-030-33582-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)