Abstract
This study focuses on the question of voice disguise and its detection. Voice disguise is considered as a deliberate action of the speaker who wants to falsify or to conceal his identity; the problem of voice alteration caused by channel distortion is not presented in this work. A large range of options are open to a speaker to change his voice and to trick a human ear or an automatic system. A voice can be transformed by electronic scrambling or more simply by exploiting intra-speaker variability: modification of pitch, modification of the position of the articulators as lips or tongue which affect the formant frequencies. The proposed work is divided in three parts: the first one is a classification of the different options available for changing one’s voice, the second one presents a review of the different techniques in the literature and the third one describes the main indicators proposed in the literature to distinguish a disguised voice from the original voice, and proposes some perspectives based on disordered and emotional speech.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Abe, M., Nakamura, S., Shikano, K., Kuwabara, H.: Voice conversion through vector quantization. In: Proc. ICASSP 88, New-York (1988)
Amir, N.: Classifying emotions in speech: a comparison of methods. In: Proceedings EUROSPEECH 2001, Scandinavia (2001)
Baudoin, G., Capman, F., Černocký, J., El Chami, F., Charbit, M., Chollet, G., Petrovska-Delacrétaz, D.: Advances in Very Low Bit Rate Speech Coding Using Recognition and Synthesis Techniques. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2002. LNCS (LNAI), vol. 2448, pp. 269–276. Springer, Heidelberg (2002)
Beaugendre, F.: “Modèle de l’intonation pour la synthèse”.1995 de la parole”. In: Fondements et perspectives en traitement automatique de la parole, Aupelf-Uref, edn. (1995)
Bimbot, F., Chollet, G., Deleglise, P., Montacié, C.: Temporal Decomposition and Acoustic-phonetic Decoding of Speech. In: Proc. ICASSP 88, New-York, pp. 445–448 (1988)
Blomberg, M., Elenius, D., Zetterholm, E.: Speaker verification scores and acoustics analysis of a professional impersonator. In: Proc. FONETIK (2004)
Blouet, R., Mokbel, C., Chollet, G.: BECARS: a free software for speaker recognition. In: ODYSSEY 2004, Toledo (2004)
Boersma, P., Weenink, D.: PRAAT: doing phonetics by computer, http://www.praat.org
Cappe, O., Stylianou, Y., Moulines, E.: Statistical methods for voice quality transformation. In: Proc. of EUROSPEECH 95, Madrid (1995)
Chollet, G., Cernocky, J., Constantinescu, A., Deligne, S., Bimbot, F.: Toward ALISP: a proposal for Automatic Language Independent Speech Processing. In: Computational Models of Speech Processing. NATO ASI Series (1997)
Delvaux, V., Metens, T., Soquet, A.: French nasal vowels: articulary and acoustic properties. In: Proc. Of the 7th ICSLP, vol. 1, Denver, pp. 53–56 (2002)
Dutoit, T.: High quality text to speech synthesis: a comparison of four candidates algorithms. In: Proc. ICASSP 1994, vol. 1, Adelaïde, Australie, pp. 565–568 (1994)
de Figueiredo, R.M., de Souza Britto, H.: A report on the acoustic effects of one type of disguise. Forensic Linguistics 3(1), 168–175 (1996)
Genoud, D., Chollet, G.: Voice transformations: some tools for the imposture of speaker verification systems. In: Braun, A. (ed.) Advances in Phonetics, Franz Steiner Verlag, Stuttgart (1999)
Gibbon, D., Gut, U.: Measuring speech rhythm. In: Proc. Eurospeech 2001, Scandinavia (2001)
Endres, W., Balbach, W., Flösser, G.: Voice spectrograms as a function of age, voice disguise and voice imitation. Journal of the Acosutical Society of America 49, 1842–1848 (1971)
Gu, L., Harris, J.G., Shrivastav, R., Sapienza, C.: Disordered speech evaluation using objective quality measures. In: Proc. ICASSP 2005, Philadelphia (2005)
Hall, M.: Spectrographic analysis of interspeaker and intraspeaker variability of professional mimicry. MA dissertation, Michigan State University (1975)
Künzel, H.J.: Effects of voice disguise on fundamental frequency. Forensic linguistics 7, 149–179 (2000)
Künzel, H., Gonzalez-Rodriguez, J., Ortega-Garcia, J.: Effect of voice disguise on the performance of a forensic automatic speaker recognition system. In: Proc. Odyssey (2004)
Hirson, A., Duckworth, M.: Glottal fry and voice disguise: a case study in forensic phonetics. Journal of Biomedical Enginering 15, 193–200 (1993)
Jiang, D., Zhang, W., Shen, L., Cai, L.: Prosody analysis and modelling for emotional speech synthesis. In: Proc. ICASSP 2005, Philadelphia (2005)
Kain, A., Macon, M.W.: Spectral voice conversion for text to speech synthesis. In: Proc. ICASSP 98, New York (1998)
Kain, A., Macon, M.W.: Design and evaluation of a voice conversion algorithm based on spectral envelope mapping and residual prediction. In: Proc. ICASSP 01, Salt Lake City (2001)
Lummis, R.C., Rosenberg, A.E.: Test of an automatic speaker verification method with intensively trained professional mimics. Journal of Acoustical Society of America 9(1) (1972)
Masthoff, H.: A report on voice disguise experiment. Forensic Linguistics 3(1), 160–167 (1996)
Martin, A., Doddington, G., Kamm, T., Ordowski, M., Przybocki, M.: The DET curve in assessment of detection task performance. In: Proc. EUROSPEECH 97, Rhodes, Greece, pp. 1895–1898 (1997)
Melvaldova, J.: Caractéristiques temporelle de la parole imitée. In: Proceedings JEP, Journées d’Etudes sur la Parole (2004)
Moosmüller, S.: The influence of creaky voice on formant frequency changes. The International Journal of Speech, Language and the Law 8(1) (2001)
Moulines, E., Charpentier, F.: Pitch synchronous waveform processing techniques for text to speech synthesis using diphone. Speech comm. 9, 453–497
Ochard, T., Yarmey, A.: The effects of whispers, voice sample duration and voice distinctiveness on criminal Speaker Identification. Appl. Cogn. Psychol. 9(3), 249–260 (1995)
Perrot, P., Aversano, G., Blouet, R., Charbit, M., Chollet, G.: Voice forgery using ALISP. In: Proc. ICASSP 2005, Philadelphie (2005)
Rodman, R.: Speaker Recognition of disguised voices: a program for research. In: Consortium on Speech Technology Conference on Speaker by man and machine: direction for forensic applications, COST 250, Ankara, Turkey (1998)
Valbret, H., Moulines, E., Tubach, J.P.: Voice trans-formation using PSOLA technique. In: Proc. ICASSP 92, San Francisco (1992)
Shafran, I., Mohri, M.: A comparison of classifiers for detecting emotion from speech. In: Proc. ICASSP 2005, Philadelphia (2005)
Stylianou, Y., Cappe, O.: A system for voice conversion based on probabilistic classification and a harmonic plus noise model. In: Proc ICASSP 98, Seattle, WA, pp. 281–284 (1998)
Stylianou, Y., Cappe, O., Moulines, E.: Continuous probalistic transform for voice conversion. IEEE Trans. Speech and Audio Processing 6(2), 131–142 (1998)
Zetterholm, E.: Voice Imitation. A phonetic study of perceptual illusions and acoustic success. Dissertation, Department of Linguistics and Phonetics, Lund University (2003)
Rostolland, D.: Acoustic features of shouted voice. Acustica 50, 118–125 (1982a)
Rostolland, D.: Phonetic structure of shouted voice. Acustica 51, 80–89 (1982b)
Rostolland, D.: Intelligibility of shouted voice. Acoustica 57, 103–121 (1985)
Abboud, B., Bredin, H., Aversano, G., Chollet, G.: Audio visual forgery in identity verification. In: Workshop on Nonlinear Speech Processing, Heraklion, Crete, 20-23 Sep (2005)
Atal, B.S.: Automatic speaker recognition based on pitch contours. Journal of Acoustical Society of America 52, 1687–1697 (1972)
Zalewski, J., Maljewski, W., Hollien, H.: Cross correlation between Long-term speech Spectra as a criterion for speaker identification. Acoustica 34, 20–24 (1975)
http://www.zdnet.fr/telecharger/windows/fiche/0,39021313,11009007s,00.htm
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer Berlin Heidelberg
About this chapter
Cite this chapter
Perrot, P., Aversano, G., Chollet, G. (2007). Voice Disguise and Automatic Detection: Review and Perspectives. In: Stylianou, Y., Faundez-Zanuy, M., Esposito, A. (eds) Progress in Nonlinear Speech Processing. Lecture Notes in Computer Science, vol 4391. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71505-4_7
Download citation
DOI: https://doi.org/10.1007/978-3-540-71505-4_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-71503-0
Online ISBN: 978-3-540-71505-4
eBook Packages: Computer ScienceComputer Science (R0)