Abstract
In this paper, we present a singing voice synthesis system, which can convert lyrics to singing voice. As the synthetic song’s timbre is too monotonous, a new singing voice morphing algorithm based on GMM (Gaussian Mixture Model) was presented accordingly. The MOS test shows that the average MOS score of synthesized song is above 3.3 before timbre conversion. The professional singer’s timbre can be added proportionally by changing the scale factor k in the system. The ABX test demonstrates that the accuracy can be up to 100% in the case of k=0 or k=1, and it can be higher than 64.5% in the case of 0<k<1. The experiments also show the mean of GMM has greater impact on a singer’s timbre than weight ratio and covariance.
This work is partially supported by the National Science Foundation of China (NSFC) under grant NO.60875015 and the Key Project of Chinese Ministry of Education under grant MO.208146.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Tokuda, K., Zen, H., Black, A.W.: An HMM-based speech synthesis system applied to English. In: Proc. 2002 IEEE Workshop on Speech Synthesis, Santa Monica, CA, pp. 41–46 (2002)
Zhou, S.-s., Chen, Q.-q., Wang, D.-d., et al.: Acorpus-based concatenative mandarin singing voice synthesis system. In: Proc. Seventh International Conference on Machine Learning and Cybernetics, Kunming, China, pp. 2695–2699 (July 2008)
Gu, H.Y., Liau, H.L.: Mandarin Singing Voice Synthesis Using an HNM Based Scheme. In: Proc. International Congress on Image and Signal Processing, Sanya, China, pp. 347–351 (2008)
Macon, M.W., Jensen-Link, L., Oliverio, J., et al.: A Singing Voice Synthesis System Based on Sinusoidal Modeling. In: Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, Munich, Germany, pp. 435–438 (1997)
Saitou, T., Goto, M., Unoki, M., et al.: Speech-to-sing synthesis: converting speaking voices to sing voices by controlling acoustic features unique to sing voices. In: Proc. 10th National Conference on Man-Machine Speech Communication, Lanzhou, China, pp. 477–482 (August 2009)
Kawanami, H., Iwami, Y., Toda, T., et al.: GMM-based Voice Conversion Applied to Emotional Speech Synthesis. In: Proc. European Conference on Speech Communication and Technology, Geneva, Switzerland, pp. 2401–2404 (2003)
Kawahara, H., Estill, J., Fujimura, O.: Aperiodicity extraction and control using mixed mode excitation and group delay manipulation for a high quality speech analysis, modification and synthesis system straight. In: Proc. International Workshop on Models and Analysis of Vocal Emissions for Biomedical Application, Firentze Italy, pp. 13–15 (September 2001)
Saitou, T., Goto, M., Unoki, M., et al.: Vocal Conversion from speaking voice to singing voice using STRAIGHT. In: Proc. Interspeech, Antwerp, Belgium, pp. 4005–4006 (2007)
Saitou, T., Unokiand, M.M., Akagi: Development of an F0 control model based on F0 dynamic characteristics for singing-voice synthesis. Speech Communication 46, 405–417 (2005)
Lai, W.-h.: F0 Control Model for Mandarin Singing Voice Synthesis. In: Proc. 2th International Conference on Digital Telecommunications, San Jose, California, pp. 12–15 (2007)
Cai, L.-h., Hou, j., Liu, r., et al.: HMM Parametric Singing Synthesis with Pitch Instruction. In: Proc. 8th National Conference of Multimedia Technology, Xi’an, China, pp. 219–225 (2009)
Chen, Y., Chu, M., Chang, E., et al.: Voice conversion with smoothed GMM and map adaptation. In: Proc. Eurospeech, Geneva, Switzerland, pp. 2413–2416 (2003)
Cano, P., Loscos, A., Bonada, J., et al.: Voice Morphing System for Impersonating in Karaoke Applications. In: Proc. International Computer Music Conference, Rio de Janeiro, Brazil, pp. 109–112 (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Li, J., Yang, H., Zhang, W., Cai, L. (2011). A Lyrics to Singing Voice Synthesis System with Variable Timbre. In: Zeng, D. (eds) Applied Informatics and Communication. ICAIC 2011. Communications in Computer and Information Science, vol 225. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23220-6_23
Download citation
DOI: https://doi.org/10.1007/978-3-642-23220-6_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23219-0
Online ISBN: 978-3-642-23220-6
eBook Packages: Computer ScienceComputer Science (R0)