[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2807442.2807464acmconferencesArticle/Chapter ViewAbstractPublication PagesuistConference Proceedingsconference-collections
research-article

Capture-Time Feedback for Recording Scripted Narration

Published: 05 November 2015 Publication History

Abstract

Well-performed audio narrations are a hallmark of captivating podcasts, explainer videos, radio stories, and movie trailers. To record these narrations, professional voiceover actors follow guidelines that describe how to use low-level vocal components---volume, pitch, timbre, and tempo---to deliver performances that emphasize important words while maintaining variety, flow and diction. Yet, these techniques are not well-known outside the professional voiceover community, especially among hobbyist producers looking to create their own narrations. We present Narration Coach, an interface that assists novice users in recording scripted narrations. As a user records her narration, our system synchronizes the takes to her script, provides text feedback about how well she is meeting the expert voiceover guidelines, and resynthesizes her recordings to help her hear how she can speak better.

Supplementary Material

ZIP File (uist2198-file5.zip)
MP4 File (p191.mp4)

References

[1]
1. Black, A. W., and Taylor, P. A. The Festival Speech Synthesis System: System documentation. Tech. Rep. HCRC/TR-83, Human Communciation Research Centre, University of Edinburgh, Scotland, UK, 1997. Avaliable at http://www.cstr.ed.ac.uk/projects/festival/.
[2]
2. Blu, S., Mullin, M. A., and Songé, C. Word of Mouth: A Guide to Commercial and Animation Voice-over Excellence. Silman-James Press, 2006.
[3]
3. Boersma, P., and Weenink, D. Praat, a system for doing phonetics by computer.
[4]
4. Carter, S., Adcock, J., Doherty, J., and Branham, S. Nudgecam: Toward targeted, higher quality media capture. In Proceedings of the International Conference on Multimedia, ACM (New York, NY, USA, 2010), 615--618.
[5]
5. Chang, A. R., and Davis, M. Designing systems that direct human action. In Proceedings of the SIGCHI Extended Abstracts on Human Factors in Computing Systems, ACM (2005), 1260--1263.
[6]
6. Davis, M. Active capture: integrating human-computer interaction and computer vision/audition to automate media capture. In Proceedings of the International Conference Multimedia and Expo, vol. 2, IEEE (2003), II-185--II-188.
[7]
7. Design, S. S. Voice lesson 2: Marking a script. https://www.youtube.com/watch?v=LwS7WD7WQ3Y. Accessed: 2015-03--31.
[8]
8. Dolson, M. The phase vocoder: A tutorial. Computer Music Journal 10, 4 (1986), 14--27.
[9]
9. Drew, P. Take your copy to the woodshed. http://www.peterdrewvo.com/html/analyze_the_ copy_first.html. Accessed: 2015-03--31.
[10]
10. Dusterhoff, K., and Black, A. W. Generating f0 contours for speech synthesis using the tilt intonation theory. In Intonation: Theory, Models and Applications (1997), 107--110.
[11]
11. Goldberg, D. The Voice Over Technique Guidebook with Industry Overview. Edge Studio, 2010.
[12]
12. Heer, J., Good, N. S., Ramirez, A., Davis, M., and Mankoff, J. Presiding over accidents: system direction of human action. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, ACM (2004), 463--470.
[13]
13. Hindenburg ABC. http://hindenburg.com/products/hindenburg-abc/, Apr. 2015.
[14]
14. Kawahara, H., Morise, M., Takahashi, T., Nisimura, R., Irino, T., and Banno, H. Tandem-straight: A temporally stable power spectral representation for periodic signals and applications to interference-free spectrum, f0, and aperiodicity estimation. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, IEEE (2008), 3933--3936.
[15]
15. Kurihara, K., Goto, M., Ogata, J., Matsusaka, Y., and Igarashi, T. Presentation sensei: A presentation training system using speech and image processing. In Proceedings of the 9th International Conference on Multimodal Interfaces, ACM (New York, NY, USA, 2007), 358--365.
[16]
16. Mauch, M., and Dixon, S. pYIN: a fundamental frequency estimator using probabilistic threshold distributions. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (2014), 659--663.
[17]
17. Needleman, S. B., and Wunsch, C. D. A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of molecular biology 48, 3 (1970), 443--453.
[18]
18. Robertson, H. 10 ways to build your voice-over skills. http://www.videomaker.com/article/ 15804--10-ways-to-build-your-voice-over-skills. Accessed: 2015-03--31.
[19]
19. Rodgers, J. The Complete Voice & Speech Workout: 75 Exercises for Classroom and Studio Use. Hal Leonard Corporation, 2002.
[20]
20. Rosenberg, A. Autobi-a tool for automatic tobi annotation. In Proceedings of INTERSPEECH (2010), 146--149.
[21]
21. Rubin, S., and Agrawala, M. Generating emotionally relevant musical scores for audio stories. In Proceedings of the 27th annual ACM symposium on User interface software and technology, ACM (2014), 439--448.
[22]
22. Rubin, S., Berthouzoz, F., Mysore, G., Li, W., and Agrawala, M. Underscore: musical underlays for audio stories. In Proceedings of the 25th annual ACM symposium on User interface software and technology, ACM (2012), 359--366.
[23]
23. Rubin, S., Berthouzoz, F., Mysore, G. J., Li, W., and Agrawala, M. Content-based tools for editing audio stories. In Proceedings of the 26th annual ACM symposium on User interface software and technology, ACM (2013), 113--122.
[24]
24. Saragih, J. M., Lucey, S., and Cohn, J. F. Deformable model fitting by regularized landmark mean-shift. International Journal of Computer Vision 91, 2 (2011), 200--215.
[25]
25. Silverman, K. E., Beckman, M. E., Pitrelli, J. F., Ostendorf, M., Wightman, C. W., Price, P., Pierrehumbert, J. B., and Hirschberg, J. Tobi: a standard for labeling english prosody. In Proceedings of The Second International Conference on Spoken Language Processing (1992), 867--870.
[26]
26. Stylianou, Y. Voice transformation: a survey. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing., IEEE (2009), 3585--3588.
[27]
27. Taylor, P. Text-to-speech synthesis. Cambridge university press, 2009.
[28]
28. Valbret, H., Moulines, E., and Tubach, J. Voice transformation using psola technique. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 1 (Mar 1992), 145--148 vol.1.
[29]
29. Wilcox, J. Voiceovers: Techniques and Tactics for Success. Allworth Press, 2007.
[30]
30. Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X., Moore, G., Odell, J., Ollason, D., Povey, D., et al. The HTK book, vol. 2. Entropic Cambridge Research Laboratory Cambridge, 1997.
[31]
31. Yuan, J., and Liberman, M. Speaker identification on the SCOTUS corpus. Journal of the Acoustical Society of America 123, 5 (2008), 3878.

Cited By

View all
  • (2024)Shadowed Speech: An Approach for Slowing Speech Rate Using Adaptive Delayed Auditory FeedbackJournal of Information Processing10.2197/ipsjjip.32.93832(938-947)Online publication date: 2024
  • (2023)Developing Supplemental Instructional Videos for Construction Management EducationBuildings10.3390/buildings1310246613:10(2466)Online publication date: 28-Sep-2023
  • (2022)Designing for Speech Practice Systems: How Do User-Controlled Voice Manipulation and Model Speakers Impact Self-Perceptions of Voice?Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems10.1145/3491102.3502093(1-14)Online publication date: 29-Apr-2022
  • Show More Cited By

Index Terms

  1. Capture-Time Feedback for Recording Scripted Narration

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    UIST '15: Proceedings of the 28th Annual ACM Symposium on User Interface Software & Technology
    November 2015
    686 pages
    ISBN:9781450337793
    DOI:10.1145/2807442
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 05 November 2015

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. audio
    2. narration
    3. speech emphasis
    4. voiceover

    Qualifiers

    • Research-article

    Conference

    UIST '15

    Acceptance Rates

    UIST '15 Paper Acceptance Rate 70 of 297 submissions, 24%;
    Overall Acceptance Rate 561 of 2,567 submissions, 22%

    Upcoming Conference

    UIST '25
    The 38th Annual ACM Symposium on User Interface Software and Technology
    September 28 - October 1, 2025
    Busan , Republic of Korea

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)11
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 16 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Shadowed Speech: An Approach for Slowing Speech Rate Using Adaptive Delayed Auditory FeedbackJournal of Information Processing10.2197/ipsjjip.32.93832(938-947)Online publication date: 2024
    • (2023)Developing Supplemental Instructional Videos for Construction Management EducationBuildings10.3390/buildings1310246613:10(2466)Online publication date: 28-Sep-2023
    • (2022)Designing for Speech Practice Systems: How Do User-Controlled Voice Manipulation and Model Speakers Impact Self-Perceptions of Voice?Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems10.1145/3491102.3502093(1-14)Online publication date: 29-Apr-2022
    • (2022)DeHumor: Visual Analytics for Decomposing HumorIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2021.309770928:12(4609-4623)Online publication date: 1-Dec-2022
    • (2020)RescribeProceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology10.1145/3379337.3415864(747-759)Online publication date: 20-Oct-2020
    • (2020)VoiceCoach: Interactive Evidence-based Training for Voice Modulation Skills in Public SpeakingProceedings of the 2020 CHI Conference on Human Factors in Computing Systems10.1145/3313831.3376726(1-12)Online publication date: 21-Apr-2020
    • (2019)VoiceAssistProceedings of the 2019 CHI Conference on Human Factors in Computing Systems10.1145/3290605.3300539(1-6)Online publication date: 2-May-2019
    • (2019)SpeechLens: A Visual Analytics Approach for Exploring Speech Strategies with Textural and Acoustic Features2019 IEEE International Conference on Big Data and Smart Computing (BigComp)10.1109/BIGCOMP.2019.8679261(1-8)Online publication date: Feb-2019
    • (2018)TakeToonsProceedings of the 31st Annual ACM Symposium on User Interface Software and Technology10.1145/3242587.3242618(663-674)Online publication date: 11-Oct-2018
    • (2017)AutoDubProceedings of the 30th Annual ACM Symposium on User Interface Software and Technology10.1145/3126594.3126661(533-538)Online publication date: 20-Oct-2017
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media