Voice Disguise and Automatic Detection: Review and Perspectives

Patrick Perrot^1,2,
Guido Aversano¹ &
Gérard Chollet¹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 4391))

1420 Accesses
32 Citations

Abstract

This study focuses on the question of voice disguise and its detection. Voice disguise is considered as a deliberate action of the speaker who wants to falsify or to conceal his identity; the problem of voice alteration caused by channel distortion is not presented in this work. A large range of options are open to a speaker to change his voice and to trick a human ear or an automatic system. A voice can be transformed by electronic scrambling or more simply by exploiting intra-speaker variability: modification of pitch, modification of the position of the articulators as lips or tongue which affect the formant frequencies. The proposed work is divided in three parts: the first one is a classification of the different options available for changing one’s voice, the second one presents a review of the different techniques in the literature and the third one describes the main indicators proposed in the literature to distinguish a disguised voice from the original voice, and proposes some perspectives based on disordered and emotional speech.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Identification of Electronic Disguised Voices in the Noisy Environment

Blind Detection of Electronic Voice Transformation with Natural Disguise

Multimedia analysis for disguised voice and classification efficiency

Article 01 October 2018

References

Abe, M., Nakamura, S., Shikano, K., Kuwabara, H.: Voice conversion through vector quantization. In: Proc. ICASSP 88, New-York (1988)
Google Scholar
Amir, N.: Classifying emotions in speech: a comparison of methods. In: Proceedings EUROSPEECH 2001, Scandinavia (2001)
Google Scholar
Baudoin, G., Capman, F., Černocký, J., El Chami, F., Charbit, M., Chollet, G., Petrovska-Delacrétaz, D.: Advances in Very Low Bit Rate Speech Coding Using Recognition and Synthesis Techniques. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2002. LNCS (LNAI), vol. 2448, pp. 269–276. Springer, Heidelberg (2002)
Chapter Google Scholar
Beaugendre, F.: “Modèle de l’intonation pour la synthèse”.1995 de la parole”. In: Fondements et perspectives en traitement automatique de la parole, Aupelf-Uref, edn. (1995)
Google Scholar
Bimbot, F., Chollet, G., Deleglise, P., Montacié, C.: Temporal Decomposition and Acoustic-phonetic Decoding of Speech. In: Proc. ICASSP 88, New-York, pp. 445–448 (1988)
Google Scholar
Blomberg, M., Elenius, D., Zetterholm, E.: Speaker verification scores and acoustics analysis of a professional impersonator. In: Proc. FONETIK (2004)
Google Scholar
Blouet, R., Mokbel, C., Chollet, G.: BECARS: a free software for speaker recognition. In: ODYSSEY 2004, Toledo (2004)
Google Scholar
Boersma, P., Weenink, D.: PRAAT: doing phonetics by computer, http://www.praat.org
Cappe, O., Stylianou, Y., Moulines, E.: Statistical methods for voice quality transformation. In: Proc. of EUROSPEECH 95, Madrid (1995)
Google Scholar
Chollet, G., Cernocky, J., Constantinescu, A., Deligne, S., Bimbot, F.: Toward ALISP: a proposal for Automatic Language Independent Speech Processing. In: Computational Models of Speech Processing. NATO ASI Series (1997)
Google Scholar
Delvaux, V., Metens, T., Soquet, A.: French nasal vowels: articulary and acoustic properties. In: Proc. Of the 7th ICSLP, vol. 1, Denver, pp. 53–56 (2002)
Google Scholar
Dutoit, T.: High quality text to speech synthesis: a comparison of four candidates algorithms. In: Proc. ICASSP 1994, vol. 1, Adelaïde, Australie, pp. 565–568 (1994)
Google Scholar
de Figueiredo, R.M., de Souza Britto, H.: A report on the acoustic effects of one type of disguise. Forensic Linguistics 3(1), 168–175 (1996)
Google Scholar
Genoud, D., Chollet, G.: Voice transformations: some tools for the imposture of speaker verification systems. In: Braun, A. (ed.) Advances in Phonetics, Franz Steiner Verlag, Stuttgart (1999)
Google Scholar
Gibbon, D., Gut, U.: Measuring speech rhythm. In: Proc. Eurospeech 2001, Scandinavia (2001)
Google Scholar
Endres, W., Balbach, W., Flösser, G.: Voice spectrograms as a function of age, voice disguise and voice imitation. Journal of the Acosutical Society of America 49, 1842–1848 (1971)
Article Google Scholar
Gu, L., Harris, J.G., Shrivastav, R., Sapienza, C.: Disordered speech evaluation using objective quality measures. In: Proc. ICASSP 2005, Philadelphia (2005)
Google Scholar
Hall, M.: Spectrographic analysis of interspeaker and intraspeaker variability of professional mimicry. MA dissertation, Michigan State University (1975)
Google Scholar
Künzel, H.J.: Effects of voice disguise on fundamental frequency. Forensic linguistics 7, 149–179 (2000)
Article Google Scholar
Künzel, H., Gonzalez-Rodriguez, J., Ortega-Garcia, J.: Effect of voice disguise on the performance of a forensic automatic speaker recognition system. In: Proc. Odyssey (2004)
Google Scholar
Hirson, A., Duckworth, M.: Glottal fry and voice disguise: a case study in forensic phonetics. Journal of Biomedical Enginering 15, 193–200 (1993)
Article Google Scholar
Jiang, D., Zhang, W., Shen, L., Cai, L.: Prosody analysis and modelling for emotional speech synthesis. In: Proc. ICASSP 2005, Philadelphia (2005)
Google Scholar
Kain, A., Macon, M.W.: Spectral voice conversion for text to speech synthesis. In: Proc. ICASSP 98, New York (1998)
Google Scholar
Kain, A., Macon, M.W.: Design and evaluation of a voice conversion algorithm based on spectral envelope mapping and residual prediction. In: Proc. ICASSP 01, Salt Lake City (2001)
Google Scholar
Lummis, R.C., Rosenberg, A.E.: Test of an automatic speaker verification method with intensively trained professional mimics. Journal of Acoustical Society of America 9(1) (1972)
Google Scholar
Masthoff, H.: A report on voice disguise experiment. Forensic Linguistics 3(1), 160–167 (1996)
Google Scholar
Martin, A., Doddington, G., Kamm, T., Ordowski, M., Przybocki, M.: The DET curve in assessment of detection task performance. In: Proc. EUROSPEECH 97, Rhodes, Greece, pp. 1895–1898 (1997)
Google Scholar
Melvaldova, J.: Caractéristiques temporelle de la parole imitée. In: Proceedings JEP, Journées d’Etudes sur la Parole (2004)
Google Scholar
Moosmüller, S.: The influence of creaky voice on formant frequency changes. The International Journal of Speech, Language and the Law 8(1) (2001)
Google Scholar
Moulines, E., Charpentier, F.: Pitch synchronous waveform processing techniques for text to speech synthesis using diphone. Speech comm. 9, 453–497
Google Scholar
Ochard, T., Yarmey, A.: The effects of whispers, voice sample duration and voice distinctiveness on criminal Speaker Identification. Appl. Cogn. Psychol. 9(3), 249–260 (1995)
Article Google Scholar
Perrot, P., Aversano, G., Blouet, R., Charbit, M., Chollet, G.: Voice forgery using ALISP. In: Proc. ICASSP 2005, Philadelphie (2005)
Google Scholar
Rodman, R.: Speaker Recognition of disguised voices: a program for research. In: Consortium on Speech Technology Conference on Speaker by man and machine: direction for forensic applications, COST 250, Ankara, Turkey (1998)
Google Scholar
Valbret, H., Moulines, E., Tubach, J.P.: Voice trans-formation using PSOLA technique. In: Proc. ICASSP 92, San Francisco (1992)
Google Scholar
Shafran, I., Mohri, M.: A comparison of classifiers for detecting emotion from speech. In: Proc. ICASSP 2005, Philadelphia (2005)
Google Scholar
Stylianou, Y., Cappe, O.: A system for voice conversion based on probabilistic classification and a harmonic plus noise model. In: Proc ICASSP 98, Seattle, WA, pp. 281–284 (1998)
Google Scholar
Stylianou, Y., Cappe, O., Moulines, E.: Continuous probalistic transform for voice conversion. IEEE Trans. Speech and Audio Processing 6(2), 131–142 (1998)
Article Google Scholar
Zetterholm, E.: Voice Imitation. A phonetic study of perceptual illusions and acoustic success. Dissertation, Department of Linguistics and Phonetics, Lund University (2003)
Google Scholar
Rostolland, D.: Acoustic features of shouted voice. Acustica 50, 118–125 (1982a)
Google Scholar
Rostolland, D.: Phonetic structure of shouted voice. Acustica 51, 80–89 (1982b)
Google Scholar
Rostolland, D.: Intelligibility of shouted voice. Acoustica 57, 103–121 (1985)
Google Scholar
Abboud, B., Bredin, H., Aversano, G., Chollet, G.: Audio visual forgery in identity verification. In: Workshop on Nonlinear Speech Processing, Heraklion, Crete, 20-23 Sep (2005)
Google Scholar
Atal, B.S.: Automatic speaker recognition based on pitch contours. Journal of Acoustical Society of America 52, 1687–1697 (1972)
Article Google Scholar
Zalewski, J., Maljewski, W., Hollien, H.: Cross correlation between Long-term speech Spectra as a criterion for speaker identification. Acoustica 34, 20–24 (1975)
Google Scholar
http://www.zdnet.fr/telecharger/windows/fiche/0,39021313,11009007s,00.htm

Download references

Author information

Authors and Affiliations

CNRS-LTCI-Ecole Nationale Supérieure des Télécommunications (ENST), 75013 Paris, France
Patrick Perrot, Guido Aversano & Gérard Chollet
Institut de Recherche Criminelle de la Gendarmerie Nationale (IRCGN), 93110, Rosny sous Bois, France
Patrick Perrot

Authors

Patrick Perrot
View author publications
You can also search for this author in PubMed Google Scholar
Guido Aversano
View author publications
You can also search for this author in PubMed Google Scholar
Gérard Chollet
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Yannis Stylianou Marcos Faundez-Zanuy Anna Esposito

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Perrot, P., Aversano, G., Chollet, G. (2007). Voice Disguise and Automatic Detection: Review and Perspectives. In: Stylianou, Y., Faundez-Zanuy, M., Esposito, A. (eds) Progress in Nonlinear Speech Processing. Lecture Notes in Computer Science, vol 4391. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71505-4_7

Download citation

DOI: https://doi.org/10.1007/978-3-540-71505-4_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-71503-0
Online ISBN: 978-3-540-71505-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Voice Disguise and Automatic Detection: Review and Perspectives

Abstract

Access this chapter

Preview

Similar content being viewed by others

Identification of Electronic Disguised Voices in the Noisy Environment

Blind Detection of Electronic Voice Transformation with Natural Disguise

Multimedia analysis for disguised voice and classification efficiency

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

Voice Disguise and Automatic Detection: Review and Perspectives

Abstract

Access this chapter

Preview

Similar content being viewed by others

Identification of Electronic Disguised Voices in the Noisy Environment

Blind Detection of Electronic Voice Transformation with Natural Disguise

Multimedia analysis for disguised voice and classification efficiency

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation