More Web Proxy on the site http://driver.im/

research-article

Open access

Synthesis-Assisted Video Prototyping From a Document

Authors:

Christian Frueh,

Irfan EssaAuthors Info & Claims

UIST '22: Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology

Article No.: 16, Pages 1 - 10

https://doi.org/10.1145/3526113.3545676

Published: 28 October 2022 Publication History

All formats PDF

Abstract

Video productions commonly start with a script, especially for talking head videos that feature a speaker narrating to the camera. When the source materials come from a written document – such as a web tutorial, it takes iterations to refine content from a text article to a spoken dialogue, while considering visual compositions in each scene. We propose Doc2Video, a video prototyping approach that converts a document to interactive scripting with a preview of synthetic talking head videos. Our pipeline decomposes a source document into a series of scenes, each automatically creating a synthesized video of a virtual instructor. Designed for a specific domain – programming cookbooks, we apply visual elements from the source document, such as a keyword, a code snippet or a screenshot, in suitable layouts. Users edit narration sentences, break or combine sections, and modify visuals to prototype a video in our Editing UI. We evaluated our pipeline with public programming cookbooks. Feedback from professional creators shows that our method provided a reasonable starting point to engage them in interactive scripting for a narrated instructional video.

References

[1]

Faisal Ahmed, Yevgen Borodin, Andrii Soviak, Muhammad Islam, I.V. Ramakrishnan, and Terri Hedgpeth. 2012. Accessible Skimming: Faster Screen Reading of Web Pages. In Proceedings of the 25th Annual ACM Symposium on User Interface Software and Technology (Cambridge, Massachusetts, USA) (UIST ’12). Association for Computing Machinery, New York, NY, USA, 367–378. https://doi.org/10.1145/2380116.2380164

Digital Library

[2]

Daniel Arijon. 1991. Grammar of the film language. Silman-James Press.

[3]

Floraine Berthouzoz, Wilmot Li, and Maneesh Agrawala. 2012. Tools for Placing Cuts and Transitions in Interview Video. ACM Trans. Graph. 31, 4, Article 67 (July 2012), 8 pages. https://doi.org/10.1145/2185520.2185563

Digital Library

[4]

Juan Casares, A. Chris Long, Brad A. Myers, Rishi Bhatnagar, Scott M. Stevens, Laura Dabbish, Dan Yocum, and Albert Corbett. 2002. Simplifying Video Editing Using Metadata. In Proceedings of the 4th Conference on Designing Interactive Systems: Processes, Practices, Methods, and Techniques (London, England) (DIS ’02). Association for Computing Machinery, New York, NY, USA, 157–166. https://doi.org/10.1145/778712.778737

Digital Library

[5]

Minsuk Chang, Mina Huh, and Juho Kim. 2021. RubySlippers: Supporting Content-Based Voice Navigation for How-to Videos. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (Yokohama, Japan) (CHI ’21). Association for Computing Machinery, New York, NY, USA, Article 97, 14 pages. https://doi.org/10.1145/3411764.3445131

Digital Library

[6]

Minsuk Chang, Anh Truong, Oliver Wang, Maneesh Agrawala, and Juho Kim. 2019. How to Design Voice Based Navigation for How-To Videos. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI ’19). ACM, New York, NY, USA, Article 701, 11 pages. https://doi.org/10.1145/3290605.3300931

Digital Library

[7]

Jiajian Chen, Jun Xiao, and Yuli Gao. 2010. ISlideShow: A Content-Aware Slideshow System. In Proceedings of the 15th International Conference on Intelligent User Interfaces (Hong Kong, China) (IUI ’10). Association for Computing Machinery, New York, NY, USA, 293–296. https://doi.org/10.1145/1719970.1720014

Digital Library

[8]

Yan Chen, Walter S. Lasecki, and Tao Dong. 2021. Towards Supporting Programming Education at Scale via Live Streaming. Proc. ACM Hum.-Comput. Interact. 4, CSCW3, Article 259 (jan 2021), 19 pages. https://doi.org/10.1145/3434168

Digital Library

[9]

Peggy Chi, Nathan Frey, Katrina Panovich, and Irfan Essa. 2021. Automatic Instructional Video Creation from a Markdown-Formatted Tutorial. In The 34th Annual ACM Symposium on User Interface Software and Technology (Virtual Event, USA) (UIST ’21). Association for Computing Machinery, New York, NY, USA, 677–690. https://doi.org/10.1145/3472749.3474778

Digital Library

[10]

Peggy Chi, Zheng Sun, Katrina Panovich, and Irfan Essa. 2020. Automatic Video Creation From a Web Page. In Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology (Virtual Event, USA) (UIST ’20). Association for Computing Machinery, New York, NY, USA, 279–292. https://doi.org/10.1145/3379337.3415814

Digital Library

[11]

Pei-Yu Chi, Sally Ahn, Amanda Ren, Mira Dontcheva, Wilmot Li, and Björn Hartmann. 2012. MixT: Automatic Generation of Step-by-step Mixed Media Tutorials. In Proceedings of the 25th Annual ACM Symposium on User Interface Software and Technology (Cambridge, Massachusetts, USA) (UIST ’12). ACM, New York, NY, USA, 93–102. https://doi.org/10.1145/2380116.2380130

Digital Library

[12]

Pei-Yu Chi, Joyce Liu, Jason Linder, Mira Dontcheva, Wilmot Li, and Björn Hartmann. 2013. DemoCut: Generating Concise Instructional Videos for Physical Demonstrations. In Proceedings of the 26th Annual ACM Symposium on User Interface Software and Technology (St. Andrews, Scotland, United Kingdom) (UIST ’13). Association for Computing Machinery, New York, NY, USA, 141–150. https://doi.org/10.1145/2501988.2502052

Digital Library

[13]

Han L. Han et al.2022. Passages: Interacting with Text Across Documents(CHI ’22). Association for Computing Machinery, New York, NY, USA.

[14]

Flutter. 2022. Cookbook | Flutter. Retrieved April, 2022 from https://github.com/flutter/website/tree/main/src/cookbook

[15]

Ohad Fried, Ayush Tewari, Michael Zollhöfer, Adam Finkelstein, Eli Shechtman, Dan B Goldman, Kyle Genova, Zeyu Jin, Christian Theobalt, and Maneesh Agrawala. 2019. Text-Based Editing of Talking-Head Video. ACM Trans. Graph. 38, 4, Article 68 (July 2019), 14 pages. https://doi.org/10.1145/3306346.3323028

Digital Library

[16]

Camille Gobert and Michel Beaudouin-Lafon. 2022. i-LaTeX: Manipulating Transitional Representations between LaTeX Code and Generated Documents(CHI ’22). Association for Computing Machinery, New York, NY, USA.

[17]

John Gruber. 2004. Daring fireball: Markdown. (2004). https://daringfireball.net/projects/markdown/

[18]

Philip J. Guo, Juho Kim, and Rob Rubin. 2014. How Video Production Affects Student Engagement: An Empirical Study of MOOC Videos. In Proceedings of the First ACM Conference on Learning @ Scale Conference (Atlanta, Georgia, USA) (L@S ’14). Association for Computing Machinery, New York, NY, USA, 41–50. https://doi.org/10.1145/2556325.2566239

Digital Library

[19]

Joshua M. Hailpern and Bernardo A. Huberman. 2014. Odin: Contextual Document Opinions on the Go. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Toronto, Ontario, Canada) (CHI ’14). Association for Computing Machinery, New York, NY, USA, 1525–1534. https://doi.org/10.1145/2556288.2556959

Digital Library

[20]

Bernd Huber, Hijung Valentina Shin, Bryan Russell, Oliver Wang, and Gautham J. Mysore. 2019. B-Script: Transcript-Based B-Roll Video Editing with Recommendations. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI ’19). Association for Computing Machinery, New York, NY, USA, Article 81, 11 pages. https://doi.org/10.1145/3290605.3300311

Digital Library

[21]

Corneliu Ilisescu, Halil Aytac Kanaci, Matteo Romagnoli, Neill D. F. Campbell, and Gabriel J. Brostow. 2017. Responsive Action-Based Video Synthesis. Association for Computing Machinery, New York, NY, USA, 6569–6580. https://doi.org/10.1145/3025453.3025880

Digital Library

[22]

Google Inc. 2022. Text-to-Speech: Lifelike Speech Synthesis. Retrieved April, 2022 from https://cloud.google.com/text-to-speech/

[23]

Christopher Jeffrey. 2018. Marked: A markdown parser and compiler. Built for speed.Retrieved April, 2021 from https://github.com/markedjs/marked

[24]

Murat Kalender, Mustafa Eren, Zonghuan Wu, Ozgun Cirakman, Sezer Kutluk, Gunay Gultekin, and Emin Korkmaz. 2018. Videolization: knowledge graph based automated video generation from web content. Multimedia Tools and Applications 77 (12 2018). https://doi.org/10.1007/s11042-016-4275-4

Digital Library

[25]

Kandarp Khandwala and Philip J. Guo. 2018. Codemotion: Expanding the Design Space of Learner Interactions with Computer Programming Tutorial Videos. In Proceedings of the Fifth Annual ACM Conference on Learning at Scale (London, United Kingdom) (L@S ’18). Association for Computing Machinery, New York, NY, USA, Article 57, 10 pages. https://doi.org/10.1145/3231644.3231652

Digital Library

[26]

Avisek Lahiri, Vivek Kwatra, Christian Frueh, John Lewis, and Chris Bregler. 2021. LipSync3D: Data-Efficient Learning of Personalized 3D Talking Faces from Video using Pose and Lighting Normalization. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, Los Alamitos, CA, USA, 2754–2763. https://doi.org/10.1109/CVPR46437.2021.00278

[27]

Mackenzie Leake, Abe Davis, Anh Truong, and Maneesh Agrawala. 2017. Computational Video Editing for Dialogue-Driven Scenes. ACM Trans. Graph. 36, 4, Article 130 (July 2017), 14 pages. https://doi.org/10.1145/3072959.3073653

Digital Library

[28]

Mackenzie Leake, Hijung Valentina Shin, Joy O. Kim, and Maneesh Agrawala. 2020. Generating Audio-Visual Slideshows from Text Articles Using Word Concreteness. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI ’20). Association for Computing Machinery, New York, NY, USA, 1–11. https://doi.org/10.1145/3313831.3376519

Digital Library

[29]

Bridjet Lee and Kasia Muldner. 2020. Instructional Video Design: Investigating the Impact of Monologue- and Dialogue-Style Presentations. Association for Computing Machinery, New York, NY, USA, 1–12. https://doi.org/10.1145/3313831.3376845

Digital Library

[30]

Daniel Li, Thomas Chen, Albert Tung, and Lydia B Chilton. 2021. Hierarchical Summarization for Longform Spoken Dialog. In The 34th Annual ACM Symposium on User Interface Software and Technology (Virtual Event, USA) (UIST ’21). Association for Computing Machinery, New York, NY, USA, 582–597. https://doi.org/10.1145/3472749.3474771

Digital Library

[31]

MasterClass. 2022. What Is a Table Read? How to Set Up a Table Read, Including Who to Invite and What to Provide. Retrieved April, 2022 from https://www.masterclass.com/articles/what-is-a-table-read-how-to-set-up-a-table-read-including-who-to-invite-and-what-to-provide#what-is-a-table-read

[32]

Alok Mysore and Philip J. Guo. 2017. Torta: Generating Mixed-Media GUI and Command-Line App Tutorials Using Operating-System-Wide Activity Tracing. In Proceedings of the 30th Annual ACM Symposium on User Interface Software and Technology (Québec City, QC, Canada) (UIST ’17). Association for Computing Machinery, New York, NY, USA, 703–714. https://doi.org/10.1145/3126594.3126628

Digital Library

[33]

Amy Pavel, Dan B. Goldman, Björn Hartmann, and Maneesh Agrawala. 2015. SceneSkim: Searching and Browsing Movies Using Synchronized Captions, Scripts and Plot Summaries. In Proceedings of the 28th Annual ACM Symposium on User Interface Software and Technology (Charlotte, NC, USA) (UIST ’15). Association for Computing Machinery, New York, NY, USA, 181–190. https://doi.org/10.1145/2807442.2807502

Digital Library

[34]

Amy Pavel, Dan B. Goldman, Björn Hartmann, and Maneesh Agrawala. 2016. VidCrit: Video-Based Asynchronous Video Review. In Proceedings of the 29th Annual Symposium on User Interface Software and Technology (Tokyo, Japan) (UIST ’16). Association for Computing Machinery, New York, NY, USA, 517–528. https://doi.org/10.1145/2984511.2984552

Digital Library

[35]

Amy Pavel, Gabriel Reyes, and Jeffrey P. Bigham. 2020. Rescribe: Authoring and Automatically Editing Audio Descriptions. In Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology (Virtual Event, USA) (UIST ’20). Association for Computing Machinery, New York, NY, USA, 747–759. https://doi.org/10.1145/3379337.3415864

Digital Library

[36]

Hariharan Subramonyam, Wilmot Li, Eytan Adar, and Mira Dontcheva. 2018. TakeToons: Script-Driven Performance Animation. In Proceedings of the 31st Annual ACM Symposium on User Interface Software and Technology (Berlin, Germany) (UIST ’18). Association for Computing Machinery, New York, NY, USA, 663–674. https://doi.org/10.1145/3242587.3242618

Digital Library

[37]

Synthesia. 2022. Synthesia - AI Video Generation Platform. Retrieved April, 2022 from https://www.synthesia.io/

[38]

Sarah Taylor, Taehwan Kim, Yisong Yue, Moshe Mahler, James Krahe, Anastasio Garcia Rodriguez, Jessica Hodgins, and Iain Matthews. 2017. A Deep Learning Approach for Generalized Speech Animation. ACM Trans. Graph. 36, 4, Article 93 (July 2017), 11 pages. https://doi.org/10.1145/3072959.3073699

Digital Library

[39]

Anh Truong, Floraine Berthouzoz, Wilmot Li, and Maneesh Agrawala. 2016. QuickCut: An Interactive Tool for Editing Narrated Video. In Proceedings of the 29th Annual Symposium on User Interface Software and Technology (Tokyo, Japan) (UIST ’16). Association for Computing Machinery, New York, NY, USA, 497–507. https://doi.org/10.1145/2984511.2984569

Digital Library

[40]

Anh Truong, Sara Chen, Ersin Yumer, David Salesin, and Wilmot Li. 2018. Extracting Regular FOV Shots from 360 Event Footage. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (Montreal QC, Canada) (CHI ’18). Association for Computing Machinery, New York, NY, USA, Article 316, 11 pages. https://doi.org/10.1145/3173574.3173890

Digital Library

[41]

Anh Truong, Peggy Chi, David Salesin, Irfan Essa, and Maneesh Agrawala. 2021. Automatic Generation of Two-Level Hierarchical Tutorials from Instructional Makeup Videos. In Proceedings of the 2021 ACM Conference on Human Factors in Computing Systems(CHI ’21).

Digital Library

[42]

Sylvaine Tuncer, Barry Brown, and Oskar Lindwall. 2020. On Pause: How Online Instructional Videos Are Used to Achieve Practical Tasks. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI ’20). Association for Computing Machinery, New York, NY, USA, 1–12. https://doi.org/10.1145/3313831.3376759

Digital Library

[43]

Bryan Wang, Meng Yu Yang, and Tovi Grossman. 2021. Soloist: Generating Mixed-Initiative Tutorials from Existing Guitar Instructional Videos Through Audio Processing. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (Yokohama, Japan) (CHI ’21). Association for Computing Machinery, New York, NY, USA, Article 98, 14 pages. https://doi.org/10.1145/3411764.3445162

Digital Library

[44]

Miao Wang, Guo-Wei Yang, Shi-Min Hu, Shing-Tung Yau, and Ariel Shamir. 2019. Write-a-Video: Computational Video Montage from Themed Text. ACM Trans. Graph. 38, 6, Article 177 (Nov. 2019), 13 pages. https://doi.org/10.1145/3355089.3356520

Digital Library

[45]

Nora S. Willett, Wilmot Li, Jovan Popovic, and Adam Finkelstein. 2017. Triggering Artwork Swaps for Live Animation. In Proceedings of the 30th Annual ACM Symposium on User Interface Software and Technology (Québec City, QC, Canada) (UIST ’17). Association for Computing Machinery, New York, NY, USA, 85–95. https://doi.org/10.1145/3126594.3126596

Digital Library

[46]

Haijun Xia, Jennifer Jacobs, and Maneesh Agrawala. 2020. Crosscast: Adding Visuals to Audio Travel Podcasts. In Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology (Virtual Event, USA) (UIST ’20). Association for Computing Machinery, New York, NY, USA, 735–746. https://doi.org/10.1145/3379337.3415882

Digital Library

[47]

Saelyne Yang, Jisu Yim, Aitolkyn Baigutanova, Seoyoung Kim, Minsuk Chang, and Juho Kim. 2022. SoftVideo: Improving the Learning Experience of Software Tutorial Videos with Collective Interaction Data. In 27th International Conference on Intelligent User Interfaces (Helsinki, Finland) (IUI ’22). Association for Computing Machinery, New York, NY, USA, 646–660. https://doi.org/10.1145/3490099.3511106

Digital Library

[48]

Mingyuan Zhong, Gang Li, Peggy Chi, and Yang Li. 2021. HelpViz: Automatic Generation of Contextual Visual Mobile Tutorials from Text-Based Instructions. In The 34th Annual ACM Symposium on User Interface Software and Technology (Virtual Event, USA) (UIST ’21). Association for Computing Machinery, New York, NY, USA, 1144–1153. https://doi.org/10.1145/3472749.3474812

Digital Library

[49]

Douglas E. Zongker and David H. Salesin. 2003. On Creating Animated Presentations. In Proceedings of the 2003 ACM SIGGRAPH/Eurographics Symposium on Computer Animation (San Diego, California) (SCA ’03). Eurographics Association, Goslar, DEU, 298–308.

Cited By

Harde LJensen LKrogh JPlesner ASørensen OPohl H(2024)The Generative Fairy Tale of Scary Little Red Riding HoodProceedings of the 2024 ACM International Conference on Interactive Media Experiences10.1145/3639701.3656303(129-144)Online publication date: 7-Jun-2024
https://dl.acm.org/doi/10.1145/3639701.3656303
Dunnell KAgarwal GPataranutaporn PLippman AMaes P(2024)AI-Generated Media for Exploring Alternate RealitiesExtended Abstracts of the CHI Conference on Human Factors in Computing Systems10.1145/3613905.3650861(1-8)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613905.3650861
Shi XWang YWang YZhao J(2024)Piet: Facilitating Color Authoring for Motion Graphics VideoProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642711(1-17)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642711
Show More Cited By

Index Terms

Synthesis-Assisted Video Prototyping From a Document
1. Human-centered computing
  1. Human computer interaction (HCI)

Recommendations

Automatic Instructional Video Creation from a Markdown-Formatted Tutorial
UIST '21: The 34th Annual ACM Symposium on User Interface Software and Technology

We introduce HowToCut, an automatic approach that converts a Markdown-formatted tutorial into an interactive video that presents the visual instructions with a synthesized voiceover for narration. HowToCut extracts instructional content from a ...
MixT: automatic generation of step-by-step mixed media tutorials
CHI EA '12: CHI '12 Extended Abstracts on Human Factors in Computing Systems

As software interfaces become more complicated, users rely on tutorials to learn, creating an increasing demand for effective tutorials. Existing tutorials, however, are limited in their presentation: Static step-by-step tutorials are easy to scan but ...
On the effect of visual refinement upon user feedback in the context of video prototyping
SIGDOC '11: Proceedings of the 29th ACM international conference on Design of communication

There has been extensive discussion and research surrounding fidelity or refinement of prototypes in paper and software form, especially focusing on how the nature of prototypes influences the feedback that this prototype can help elicit during user ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

UIST '22: Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology

October 2022

1363 pages

ISBN:9781450393201

DOI:10.1145/3526113

Editors:
Maneesh Agrawala
Stanford University, USA
,
Jacob O. Wobbrock
University of Washington, USA
,
Eytan Adar
University of Michigan, USA
,
Vidya Setlur
Tableau Research, USA

Copyright © 2022 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 October 2022

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

UIST '22

Sponsor:

UIST '22: The 35th Annual ACM Symposium on User Interface Software and Technology

October 29 - November 2, 2022

OR, Bend, USA

Acceptance Rates

Overall Acceptance Rate 561 of 2,567 submissions, 22%

Upcoming Conference

UIST '25

Sponsor:
sigchi
sigchi

The 38th Annual ACM Symposium on User Interface Software and Technology

September 28 - October 1, 2025

Busan , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
1,168
Total Downloads

Downloads (Last 12 months)481
Downloads (Last 6 weeks)36

Reflects downloads up to 01 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Harde LJensen LKrogh JPlesner ASørensen OPohl H(2024)The Generative Fairy Tale of Scary Little Red Riding HoodProceedings of the 2024 ACM International Conference on Interactive Media Experiences10.1145/3639701.3656303(129-144)Online publication date: 7-Jun-2024
https://dl.acm.org/doi/10.1145/3639701.3656303
Dunnell KAgarwal GPataranutaporn PLippman AMaes P(2024)AI-Generated Media for Exploring Alternate RealitiesExtended Abstracts of the CHI Conference on Human Factors in Computing Systems10.1145/3613905.3650861(1-8)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613905.3650861
Shi XWang YWang YZhao J(2024)Piet: Facilitating Color Authoring for Motion Graphics VideoProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642711(1-17)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642711
Kim TLatzke MBragg JZhang AChang J(2023)Papeos: Augmenting Research Papers with Talk VideosProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology10.1145/3586183.3606770(1-19)Online publication date: 29-Oct-2023
https://dl.acm.org/doi/10.1145/3586183.3606770

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents