More Web Proxy on the site http://driver.im/

research-article

Capture-Time Feedback for Recording Scripted Narration

Authors:

Floraine Berthouzoz,

Gautham J. Mysore,

Maneesh AgrawalaAuthors Info & Claims

UIST '15: Proceedings of the 28th Annual ACM Symposium on User Interface Software & Technology

Pages 191 - 199

https://doi.org/10.1145/2807442.2807464

Published: 05 November 2015 Publication History

Abstract

Well-performed audio narrations are a hallmark of captivating podcasts, explainer videos, radio stories, and movie trailers. To record these narrations, professional voiceover actors follow guidelines that describe how to use low-level vocal components---volume, pitch, timbre, and tempo---to deliver performances that emphasize important words while maintaining variety, flow and diction. Yet, these techniques are not well-known outside the professional voiceover community, especially among hobbyist producers looking to create their own narrations. We present Narration Coach, an interface that assists novice users in recording scripted narrations. As a user records her narration, our system synchronizes the takes to her script, provides text feedback about how well she is meeting the expert voiceover guidelines, and resynthesizes her recordings to help her hear how she can speak better.

Supplementary Material

ZIP File (uist2198-file5.zip)

Download
12.47 MB

MP4 File (p191.mp4)

Download
57.61 MB

References

[1]

1. Black, A. W., and Taylor, P. A. The Festival Speech Synthesis System: System documentation. Tech. Rep. HCRC/TR-83, Human Communciation Research Centre, University of Edinburgh, Scotland, UK, 1997. Avaliable at http://www.cstr.ed.ac.uk/projects/festival/.

[2]

2. Blu, S., Mullin, M. A., and Songé, C. Word of Mouth: A Guide to Commercial and Animation Voice-over Excellence. Silman-James Press, 2006.

[3]

3. Boersma, P., and Weenink, D. Praat, a system for doing phonetics by computer.

[4]

4. Carter, S., Adcock, J., Doherty, J., and Branham, S. Nudgecam: Toward targeted, higher quality media capture. In Proceedings of the International Conference on Multimedia, ACM (New York, NY, USA, 2010), 615--618.

Digital Library

[5]

5. Chang, A. R., and Davis, M. Designing systems that direct human action. In Proceedings of the SIGCHI Extended Abstracts on Human Factors in Computing Systems, ACM (2005), 1260--1263.

Digital Library

[6]

6. Davis, M. Active capture: integrating human-computer interaction and computer vision/audition to automate media capture. In Proceedings of the International Conference Multimedia and Expo, vol. 2, IEEE (2003), II-185--II-188.

Digital Library

[7]

7. Design, S. S. Voice lesson 2: Marking a script. https://www.youtube.com/watch?v=LwS7WD7WQ3Y. Accessed: 2015-03--31.

[8]

8. Dolson, M. The phase vocoder: A tutorial. Computer Music Journal 10, 4 (1986), 14--27.

[9]

9. Drew, P. Take your copy to the woodshed. http://www.peterdrewvo.com/html/analyze_the_ copy_first.html. Accessed: 2015-03--31.

[10]

10. Dusterhoff, K., and Black, A. W. Generating f0 contours for speech synthesis using the tilt intonation theory. In Intonation: Theory, Models and Applications (1997), 107--110.

[11]

11. Goldberg, D. The Voice Over Technique Guidebook with Industry Overview. Edge Studio, 2010.

[12]

12. Heer, J., Good, N. S., Ramirez, A., Davis, M., and Mankoff, J. Presiding over accidents: system direction of human action. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, ACM (2004), 463--470.

Digital Library

[13]

13. Hindenburg ABC. http://hindenburg.com/products/hindenburg-abc/, Apr. 2015.

[14]

14. Kawahara, H., Morise, M., Takahashi, T., Nisimura, R., Irino, T., and Banno, H. Tandem-straight: A temporally stable power spectral representation for periodic signals and applications to interference-free spectrum, f0, and aperiodicity estimation. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, IEEE (2008), 3933--3936.

[15]

15. Kurihara, K., Goto, M., Ogata, J., Matsusaka, Y., and Igarashi, T. Presentation sensei: A presentation training system using speech and image processing. In Proceedings of the 9th International Conference on Multimodal Interfaces, ACM (New York, NY, USA, 2007), 358--365.

Digital Library

[16]

16. Mauch, M., and Dixon, S. pYIN: a fundamental frequency estimator using probabilistic threshold distributions. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (2014), 659--663.

[17]

17. Needleman, S. B., and Wunsch, C. D. A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of molecular biology 48, 3 (1970), 443--453.

[18]

18. Robertson, H. 10 ways to build your voice-over skills. http://www.videomaker.com/article/ 15804--10-ways-to-build-your-voice-over-skills. Accessed: 2015-03--31.

[19]

19. Rodgers, J. The Complete Voice & Speech Workout: 75 Exercises for Classroom and Studio Use. Hal Leonard Corporation, 2002.

[20]

20. Rosenberg, A. Autobi-a tool for automatic tobi annotation. In Proceedings of INTERSPEECH (2010), 146--149.

[21]

21. Rubin, S., and Agrawala, M. Generating emotionally relevant musical scores for audio stories. In Proceedings of the 27th annual ACM symposium on User interface software and technology, ACM (2014), 439--448.

Digital Library

[22]

22. Rubin, S., Berthouzoz, F., Mysore, G., Li, W., and Agrawala, M. Underscore: musical underlays for audio stories. In Proceedings of the 25th annual ACM symposium on User interface software and technology, ACM (2012), 359--366.

Digital Library

[23]

23. Rubin, S., Berthouzoz, F., Mysore, G. J., Li, W., and Agrawala, M. Content-based tools for editing audio stories. In Proceedings of the 26th annual ACM symposium on User interface software and technology, ACM (2013), 113--122.

Digital Library

[24]

24. Saragih, J. M., Lucey, S., and Cohn, J. F. Deformable model fitting by regularized landmark mean-shift. International Journal of Computer Vision 91, 2 (2011), 200--215.

Digital Library

[25]

25. Silverman, K. E., Beckman, M. E., Pitrelli, J. F., Ostendorf, M., Wightman, C. W., Price, P., Pierrehumbert, J. B., and Hirschberg, J. Tobi: a standard for labeling english prosody. In Proceedings of The Second International Conference on Spoken Language Processing (1992), 867--870.

[26]

26. Stylianou, Y. Voice transformation: a survey. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing., IEEE (2009), 3585--3588.

Digital Library

[27]

27. Taylor, P. Text-to-speech synthesis. Cambridge university press, 2009.

[28]

28. Valbret, H., Moulines, E., and Tubach, J. Voice transformation using psola technique. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 1 (Mar 1992), 145--148 vol.1.

Digital Library

[29]

29. Wilcox, J. Voiceovers: Techniques and Tactics for Success. Allworth Press, 2007.

[30]

30. Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X., Moore, G., Odell, J., Ollason, D., Povey, D., et al. The HTK book, vol. 2. Entropic Cambridge Research Laboratory Cambridge, 1997.

[31]

31. Yuan, J., and Liberman, M. Speaker identification on the SCOTUS corpus. Journal of the Acoustical Society of America 123, 5 (2008), 3878.

Cited By

Chiba MYamada WOchiai K(2024)Shadowed Speech: An Approach for Slowing Speech Rate Using Adaptive Delayed Auditory FeedbackJournal of Information Processing10.2197/ipsjjip.32.93832(938-947)Online publication date: 2024
https://doi.org/10.2197/ipsjjip.32.938
Barnes AMcCoy AWarnick Q(2023)Developing Supplemental Instructional Videos for Construction Management EducationBuildings10.3390/buildings1310246613:10(2466)Online publication date: 28-Sep-2023
https://doi.org/10.3390/buildings13102466
Orii LOgawa NHatada YNarumi T(2022)Designing for Speech Practice Systems: How Do User-Controlled Voice Manipulation and Model Speakers Impact Self-Perceptions of Voice?Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems10.1145/3491102.3502093(1-14)Online publication date: 29-Apr-2022
https://dl.acm.org/doi/10.1145/3491102.3502093
Show More Cited By

Index Terms

Capture-Time Feedback for Recording Scripted Narration
1. Human-centered computing
  1. Human computer interaction (HCI)

Recommendations

Aesop: Authoring Engaging Digital Storytelling Experiences
UIST '19 Adjunct: Adjunct Proceedings of the 32nd Annual ACM Symposium on User Interface Software and Technology

The traditional storytelling experiences are often one dimensional, wherein they only contain a single channel of communication with the audience through narration. With the advancements in technology, storytelling experiences have been augmented with ...
Creating 3D worlds through storytelling and narration
OzCHI '20: Proceedings of the 32nd Australian Conference on Human-Computer Interaction

While contemporary game engine tools have allowed a lower barrier of entry into creating 3D environments, many rely on engineering, architectural or cartographic approaches. While useful, this usually facilitates very particular methods of 3D space ...
TutorialLens: Authoring Interactive Augmented Reality Tutorials Through Narration and Demonstration
SUI '21: Proceedings of the 2021 ACM Symposium on Spatial User Interaction

Exploring unfamiliar devices and interfaces through trial and error can be challenging and frustrating. Existing video tutorials require frequent context switching between the device showing the tutorial and the device being used. While augmented ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

UIST '15: Proceedings of the 28th Annual ACM Symposium on User Interface Software & Technology

November 2015

686 pages

ISBN:9781450337793

DOI:10.1145/2807442

General Chair:
Celine Latulipe
UNC Charlotte, USA
,
Program Chairs:
Bjoern Hartmann
UC Berkeley, USA
,
Tovi Grossman
Autodesk Research, Canada

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 November 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

UIST '15

Sponsor:

UIST '15: The 28th Annual ACM Symposium on User Interface Software and Technology

November 11 - 15, 2015

NC, Charlotte, USA

Acceptance Rates

UIST '15 Paper Acceptance Rate 70 of 297 submissions, 24%;

Overall Acceptance Rate 561 of 2,567 submissions, 22%

Upcoming Conference

UIST '25

Sponsor:
sigchi
sigchi

The 38th Annual ACM Symposium on User Interface Software and Technology

September 28 - October 1, 2025

Busan , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

13
Total Citations
View Citations
335
Total Downloads

Downloads (Last 12 months)11
Downloads (Last 6 weeks)1

Reflects downloads up to 16 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Chiba MYamada WOchiai K(2024)Shadowed Speech: An Approach for Slowing Speech Rate Using Adaptive Delayed Auditory FeedbackJournal of Information Processing10.2197/ipsjjip.32.93832(938-947)Online publication date: 2024
https://doi.org/10.2197/ipsjjip.32.938
Barnes AMcCoy AWarnick Q(2023)Developing Supplemental Instructional Videos for Construction Management EducationBuildings10.3390/buildings1310246613:10(2466)Online publication date: 28-Sep-2023
https://doi.org/10.3390/buildings13102466
Orii LOgawa NHatada YNarumi T(2022)Designing for Speech Practice Systems: How Do User-Controlled Voice Manipulation and Model Speakers Impact Self-Perceptions of Voice?Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems10.1145/3491102.3502093(1-14)Online publication date: 29-Apr-2022
https://dl.acm.org/doi/10.1145/3491102.3502093
Wang XMing YWu TZeng HWang YQu H(2022)DeHumor: Visual Analytics for Decomposing HumorIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2021.309770928:12(4609-4623)Online publication date: 1-Dec-2022
https://doi.org/10.1109/TVCG.2021.3097709
Pavel AReyes GBigham JIqbal SMacLean KChevalier FMueller S(2020)RescribeProceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology10.1145/3379337.3415864(747-759)Online publication date: 20-Oct-2020
https://dl.acm.org/doi/10.1145/3379337.3415864
Wang XZeng HWang YWu ASun ZMa XQu HBernhaupt RMueller FVerweij DAndres JMcGrenere JCockburn AAvellino IGoguey ABjørn PZhao SSamson BKocielnik R(2020)VoiceCoach: Interactive Evidence-based Training for Voice Modulation Skills in Public SpeakingProceedings of the 2020 CHI Conference on Human Factors in Computing Systems10.1145/3313831.3376726(1-12)Online publication date: 21-Apr-2020
https://dl.acm.org/doi/10.1145/3313831.3376726
Seetharaman PMysore GPardo BSmaragdis PGomes CBrewster SFitzpatrick GCox AKostakos V(2019)VoiceAssistProceedings of the 2019 CHI Conference on Human Factors in Computing Systems10.1145/3290605.3300539(1-6)Online publication date: 2-May-2019
https://dl.acm.org/doi/10.1145/3290605.3300539
Yuan LChen YFu SWu AQu H(2019)SpeechLens: A Visual Analytics Approach for Exploring Speech Strategies with Textural and Acoustic Features2019 IEEE International Conference on Big Data and Smart Computing (BigComp)10.1109/BIGCOMP.2019.8679261(1-8)Online publication date: Feb-2019
https://doi.org/10.1109/BIGCOMP.2019.8679261
Subramonyam HLi WAdar EDontcheva MBaudisch PSchmidt AWilson A(2018)TakeToonsProceedings of the 31st Annual ACM Symposium on User Interface Software and Technology10.1145/3242587.3242618(663-674)Online publication date: 11-Oct-2018
https://dl.acm.org/doi/10.1145/3242587.3242618
Venkataramani SSmaragdis PMysore GGajos KMankoff JHarrison C(2017)AutoDubProceedings of the 30th Annual ACM Symposium on User Interface Software and Technology10.1145/3126594.3126661(533-538)Online publication date: 20-Oct-2017
https://dl.acm.org/doi/10.1145/3126594.3126661
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents