Abstract
In this paper, a new model for voice morphing is proposed. The spectral characteristics of a source speaker’s speech have been transferred to speech as it was spoken by another designated target speaker. The proposed model performs a phoneme segmentation of the voice signal and then transforms the spectral characteristics of each segment using a Linear Prediction model. The spectral features extracted using the Linear Prediction Coding (LPC) technique are aligned using the Dynamic Time Wrapping (DTW). The Generative Topographic Mapping (GTM) method was used for modeling the LPC features. Then, the transformation is achieved using the Gaussian Mixture Model (GMM). The transformed code-books are finally converted to prediction coefficients, and the excitation signal is filtered in order to synthesis the speech. A correlation test is performed between the source, and target signals showed a high correlation. The results reveal that the proposed model is promising in terms of recognizing full sentences in addition to individual words.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Hutchinson, M.: Methods for voice conversion (2012)
Saundade, M., Kurle, P.: Speech recognition using digital signal processing. Int. J. Electron. Commun. Soft Comput. Sci. Eng. 2, 31 (2013)
Orphanidou, C., et al.: Voice morphing using the generative topographic mapping (2003)
Kain, A., Macon, M.W.: Spectral voice conversion for text-to-speech synthesis (1998)
Mccree, A.: Low-Bit-Rate Speech Coding. Information Systems Technology Group, MIT Lincoln Laboratory (2008)
Abe, M., Nakamura, S., Shikano, K., Kuwabara, H.: Voice conversion through vector quantization. In: Proceedings of IEEE ICASSP (1988)
Rabiner, L.R., Schafer, R.W.: Digital Processing of Speech Signals. Prentice-Hall Signal Processing Series (1978)
Drioli, C.: Radial basis function networks for conversion of sound spectra. EURASIP J. Appl. Signal Process. 2001, 36–44 (2001)
Orphanidou, C., Moroz, I.M., Roberts, S.J.: Wavelet-based voice morphing (2004)
Garofolo, J.S.: TIMIT Acoustic-Phonetic Continuous Speech Corpus LDC93S1. Web Download. Linguistic Data Consortium, Philadelphia (1993)
Songar, A., Harita, M.B.: MATLAB based voice conversion model using PSOLA algorithm. Int. J. Digit. Appl. Contemp. Res. 1, 2319–4863 (2013)
Makhoul, J.: Linear prediction: a tutorial review. Proc. IEEE 64, 561–580 (1975)
Hosom, J.-P.: Automatic time alignment of phonemes using acoustic-phonetic information, May 2000
Markus, J.F.: GTM: the generative topographic mapping, April 1998
Netlab Toolbox. http://www1.aston.ac.uk/eas/research/groups/ncrg/resources/netlab/
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Rassam, M.A. et al. (2020). A Voice Morphing Model Based on the Gaussian Mixture Model and Generative Topographic Mapping. In: Saeed, F., Mohammed, F., Gazem, N. (eds) Emerging Trends in Intelligent Computing and Informatics. IRICT 2019. Advances in Intelligent Systems and Computing, vol 1073. Springer, Cham. https://doi.org/10.1007/978-3-030-33582-3_38
Download citation
DOI: https://doi.org/10.1007/978-3-030-33582-3_38
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-33581-6
Online ISBN: 978-3-030-33582-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)