[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3123514.3123528acmotherconferencesArticle/Chapter ViewAbstractPublication PagesamConference Proceedingsconference-collections
research-article

Perception of Paralinguistic Traits in Synthesized Voices

Published: 23 August 2017 Publication History

Abstract

Along with the rise of artificial intelligence and the internet-of-things, synthesized voices are now common in daily--life, providing us with guidance, assistance, and even companionship. From formant to concatenative synthesis, the synthesized voice continues to be defined by the same traits we prescribe to ourselves. When the recorded voice is synthesized, does our perception of its new machine embodiment change, and can we consider an alternative, more inclusive form? To begin evaluating the impact of aesthetic design, this study presents a first--step perception test to explore the paralinguistic traits of the synthesized voice. Using a corpus of 13 synthesized voices, constructed from acoustic concatenative speech synthesis, we assessed the response of 23 listeners from differing cultural backgrounds. To evaluate if perception shifts from the defined traits, we asked listeners to assigned traits of age, gender, accent origin, and human--likeness. Results present a difference in perception for age and human--likeness across voices, and a general agreement across listeners for both gender and accent origin. Connections found between age, gender and human--likeness call for further exploration into a more participatory and inclusive synthesized vocal identity.

References

[1]
Amazon. 2017. The Alexa Fund. (2017). http://amzn.to/2fD1COc/
[2]
A. Baird, F. Tollund Juutilainen, S. Hasse Jorgensen, and M. Steensig Pelt. 2017. {multi'vocal}, Exploring Representation, Identity and Aesthetics of Synthesized Voices. (2017). http://www.multivocal.org/
[3]
M. Beaulieu. 2002. Wireless Internet Applications and Architecture: Building Professional Wireless Applications Worldwide. Pearson Education, Boston, MA, USA.
[4]
C. Yen C. Nass. 2010. The Man Who Lied to His Laptop: What We Can Learn About Ourselves from Our Machines. Penguin Group, New York, NY, USA.
[5]
IBM® Watson Developer Cloud. 2017. Text to speech. (2017). https://ibm.co/2vLOhNE
[6]
IBM® Watson Developer Cloud. 2017. The Science Behind the Service. (2017). https://ibm.co/2vtyDnu
[7]
Y. Fan, Y. Quan, F. Xie, and F. Soong. 2014. HMM-based synthesis of creaky voice. In Proc. Interspeech (2014), 964--1968.
[8]
G. Fant. 1981. The Source Filter Concept in Voice Production. STL-QPSR 22, 1 (1981), 21--37.
[9]
L. Ferlazzo. 2015. The Most Translated Words Using Google Translate Are. (2015). http://bit.ly/2wArIZI
[10]
J. Ferrell. 1999. System and Method for Multimodal Interactive Speech and Language Training. (23. 03. 1999).
[11]
L. Gong and J. Lai. 2003. To Mix or Not to Mix Synthetic Speech and Human Speech? Contrasting Impact on Judge-Rated Task Performance versus Self-Rated Performance and Attitudinal Responses. International Journal of Speech Technology 6 (2003), 123--131.
[12]
Yamaha Group. 2014. Designing the New Sound. Annual report 2014. (2014). http://bit.ly/2vsTIOR
[13]
S. Hantke, F. Eyben, T. Appel, and B. Schuller. 2015. iHEARu-PLAY: Introducing a Game for Crowdsourced Data Collection for Affective Computing. In Proc. 1st International WASA 2015, ACII 2015 (2015), 891--897.
[14]
R. A. Harris. 2005. Voice Interaction Design: Crafting the New Conversational Speech Systems. Morgan Kaufmann Publishers /Elsevier, San Francisco, CA, USA.
[15]
S. Hasse. 2016. Stemmernes Politik I Samtidskunsten. TerrÃęn: Dansk Samtidskunst, Aarhus Universitetsforlag (2016), no pagination.
[16]
J. Hirschberge. 2006. Speech Synthesis: Prosody. In Encyclopedia of Language & Linguistics 7 (2006), 49--55.
[17]
S. Watkins Homer Dudley, R. Riesz. 1939. A Synthetic Speaker. Journal of The Franklin Institute 227, 6 (June 1939), 739--764.
[18]
U. Jekosch. 2005. Voice and Speech Quality Perception: Assessment and Evaluation. Springer-Verlag Berlin Heidelberg, Heidelberg, Germany.
[19]
A. Kharpal. 2017. Amazon Voice Assistant Alexa could be a Billion Dollar Mega-Hit by 2020. (2017). http://cnb.cx/2vWx8QX
[20]
E. Ju Lee, C. Nass, and S. Brave. 2000. Can Computer-generated Speech Have Gender?: An Experimental Test of Gender Stereotype. In CHI '00 Extended Abstracts on Human Factors in Computing Systems (CHI EA '00). ACM, New York, NY, USA, 289--290.
[21]
E. Marchi, F. Eyben, G. Hagerer, and B. W. Schuller. 2016. Real-time Tracking of Speakers' Emotions, States, and Traits on Mobile Platforms. In Proc. Interspeech 2016. ISCA, ISCA, San Francisco, CA, 1182--1183.
[22]
M. Mori. 1970. Bukimi No Tani {The Uncanny Valley}. ENERGY 7, 4 (1970), 33--35.
[23]
T. Phan. 2017. The Materiality of the Digital and the Gendered Voice of Siri. Transformations 29 (2017), 23--33.
[24]
J. F. Pitrelli, R. Bakis, E. M. Eide, R. Fernandez, W. Hamza, and M. A. Picheny. 2006. The IBM Expressive Text-to-Speech Synthesis System for American English. IEEE Transactions on Audio, Speech, and Language Processing 14, 4 (2006), 1099--1108.
[25]
T. Raitio, J. Kane, T. Drugman, and Gobl C. 2013. HMM-based Synthesis of Creaky Voice. In Proc. Interspeech (2013), 2316--2320.
[26]
B. B. Read. 2011. IVR: Nuance Acquires PerSay to Bring Voice Biometrics to Market. (2011). http://bit.ly/2uv4YNr
[27]
J. Robin. 2008. 'Robo-Diva R&B' Aesthetics, Politics, and Black Female Robots in Contemporary Popular Music. Journal of Popular Music Studies 20, 4 (2008), 402--423.
[28]
M. R.Schroeder. 2004. Computer Speech: Recognition, Compression, Synthesis. Springer-Verlag, Heidelberg, Germany.
[29]
J. Sánchez and C. Oyarzún. 2011. Mobile audio assistance in bus transportation for the blind. Official journal of the the National Institute of Child Health and Human Development in Israel 10, 4 (2011), 365--371.
[30]
R. Scha. 1992. Virtual Voices. Mediamatic Magazine 7, 1 (1992), 27--42.
[31]
K. Scherer, R. Banse, and H. Wallbott. 2001. Emotion Inferences from Vocal Expression Correlate Across Languages and Cultures. Journal of Cross Cultural Psychology 32, 1 (2001), 76--92.
[32]
M. Schröder. 2001. Emotional Speech Synthesis: A Review. In Proc. Interspeech (2001), 964--1968.
[33]
M. Schröder. 2009. Approaches to Emotional Expressivity in Synthetic Speech. In Emotions in the Human Voice, Krzysztof Izdebski (Ed.). Culture and Perception, Vol. 3. Plural Publishing, United Kingdom, Chapter 19, 307--323.
[34]
B. Schuller and A. Batliner. 2013. Computational Paralinguistics: Emotion, Affect and Personality in Speech and Language Processing. Wiley, Hoboken, NJ, USA.
[35]
A. Stent, A. Syrdal, and T. Mishra. 2011. On the Intelligibility of Fast Synthesized Speech for Individuals with Early-onset Blindness. In Proc. ACM SIGACCESS (ASSETS 2011). ACM, New York, NY, USA, 211--218.
[36]
T. Streeter. 2003. The Romantic Self and the Politics of Internet Commercialization. Cultural Studies 17, 5 (2003), 648--668.
[37]
K. Scherer T. Bänziger, H. Pirker. 2006. GEMEP-GEneva Multimodal Emotion Portrayals: A corpus for the study of multimodal emotional expressions. In In Proc. Language Resources and Evaluation. 15--19.
[38]
A. Weidman. 2014. Anthropology and Voice. Annual Review of Anthropology 43 (October 2014), 37--51.
[39]
J. Yamagishi. 2006. An Introduction to HMM-Based Speech Synthesis. Technical report, Technical report. Tokyo Institute of Technology (2006).
[40]
Y. Zhang and B. Schuller. 2016. Towards Human-Like Holisitc Machine Perception of Speaker States and Traits. In Proc. of the Human-Like Computing Machine Intelligence Workshop, MI20-HLC. Springer, Windsor, U. K. 'no pagination'.

Cited By

View all
  • (2024)The Impact of Perceived Tone, Age, and Gender on Voice Assistant Persuasiveness in the Context of Product RecommendationsProceedings of the 6th ACM Conference on Conversational User Interfaces10.1145/3640794.3665545(1-15)Online publication date: 8-Jul-2024
  • (2023)Siri, you've changed! Acoustic properties and racialized judgments of voice assistantsFrontiers in Communication10.3389/fcomm.2023.11169558Online publication date: 26-Apr-2023
  • (2023)Can Voice Assistants Sound Cute? Towards a Model of Kawaii VocalicsExtended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems10.1145/3544549.3585656(1-7)Online publication date: 19-Apr-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
AM '17: Proceedings of the 12th International Audio Mostly Conference on Augmented and Participatory Sound and Music Experiences
August 2017
337 pages
ISBN:9781450353731
DOI:10.1145/3123514
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

  • Queen Mary, University of London

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 August 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Human--Machine Interaction
  2. Humanisation of Synthesis
  3. Paralinguistic Traits
  4. Personification Debate
  5. Synthesized Voice

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

AM '17
AM '17: Audio Mostly 2017
August 23 - 26, 2017
London, United Kingdom

Acceptance Rates

AM '17 Paper Acceptance Rate 54 of 77 submissions, 70%;
Overall Acceptance Rate 177 of 275 submissions, 64%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)40
  • Downloads (Last 6 weeks)7
Reflects downloads up to 12 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)The Impact of Perceived Tone, Age, and Gender on Voice Assistant Persuasiveness in the Context of Product RecommendationsProceedings of the 6th ACM Conference on Conversational User Interfaces10.1145/3640794.3665545(1-15)Online publication date: 8-Jul-2024
  • (2023)Siri, you've changed! Acoustic properties and racialized judgments of voice assistantsFrontiers in Communication10.3389/fcomm.2023.11169558Online publication date: 26-Apr-2023
  • (2023)Can Voice Assistants Sound Cute? Towards a Model of Kawaii VocalicsExtended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems10.1145/3544549.3585656(1-7)Online publication date: 19-Apr-2023
  • (2022)The Effects of an Embodied Pedagogical Agent’s Synthetic Speech Accent on Learning OutcomesProceedings of the 2022 International Conference on Multimodal Interaction10.1145/3536221.3556587(198-206)Online publication date: 7-Nov-2022
  • (2021)Voice in Human–Agent InteractionACM Computing Surveys10.1145/338686754:4(1-43)Online publication date: 3-May-2021
  • (2020)Considerations for a More Ethical Approach to Data in AI: On Data Representation and InfrastructureFrontiers in Big Data10.3389/fdata.2020.000253Online publication date: 2-Sep-2020
  • (2020)Beyond What is SaidProceedings of the 2nd Conference on Conversational User Interfaces10.1145/3405755.3406145(1-3)Online publication date: 22-Jul-2020
  • (2018)An approach to analyze the social acceptance of virtual assistants by elderly peopleProceedings of the 8th International Conference on the Internet of Things10.1145/3277593.3277616(1-6)Online publication date: 15-Oct-2018
  • (2018)Investigating Concurrent Speech-based Designs for Information CommunicationProceedings of the Audio Mostly 2018 on Sound in Immersion and Emotion10.1145/3243274.3243284(1-8)Online publication date: 12-Sep-2018
  • (2018)Intimate FuturesProceedings of the 2018 Designing Interactive Systems Conference10.1145/3196709.3196766(869-880)Online publication date: 8-Jun-2018

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media