More Web Proxy on the site http://driver.im/

research-article

QuickCut: An Interactive Tool for Editing Narrated Video

Authors:

Floraine Berthouzoz,

Maneesh AgrawalaAuthors Info & Claims

UIST '16: Proceedings of the 29th Annual Symposium on User Interface Software and Technology

Pages 497 - 507

https://doi.org/10.1145/2984511.2984569

Published: 16 October 2016 Publication History

Abstract

We present QuickCut, an interactive video editing tool designed to help authors efficiently edit narrated video. QuickCut takes an audio recording of the narration voiceover and a collection of raw video footage as input. Users then review the raw footage and provide spoken annotations describing the relevant actions and objects in the scene. QuickCut time-aligns a transcript of the annotations with the raw footage and a transcript of the narration to the voiceover. These aligned transcripts enable authors to quickly match story events in the narration with semantically relevant video segments and form alignment constraints between them. Given a set of such constraints, QuickCut applies dynamic programming optimization to choose frame-level cut points between the video segments while maintaining alignments with the narration and adhering to low-level film editing guidelines. We demonstrate QuickCut's effectiveness by using it to generate a variety of short (less than 2 minutes) narrated videos. Each result required between 14 and 52 minutes of user time to edit (i.e. between 8 and 31 minutes for each minute of output video), which is far less than typical authoring times with existing video editing workflows.

Supplementary Material

suppl.mov (uist4096-file3.mp4)

Supplemental video

Download
36.91 MB

MP4 File (p497-truong.mp4)

Download
253.18 MB

References

[1]

Arev, I., Park, H. S., Sheikh, Y., Hodgins, J., and Shamir, A. Automatic editing of footage from multiple social cameras. ACM Trans. Graph. (TOG) 33, 4 (2014), 81.

Digital Library

[2]

Bao, X., and Roy Choudhury, R. Movi: Mobile phone based video highlights via collaborative sensing. In Proc. MobiSys 2010, ACM (2010), 357--370.

Digital Library

[3]

Berthouzoz, F., Li, W., and Agrawala, M. Tools for placing cuts and transitions in interview video. ACM Trans. Graph. (TOG) 31, 4 (2012), 67.

Digital Library

[4]

Bird, S. Nltk: the natural language toolkit. In Proc. of the COLING/ACL on Interactive presentation sessions, Association for Computational Linguistics (2006), 69--72.

Digital Library

[5]

Chi, P.-Y., Liu, J., Linder, J., Dontcheva, M., Li, W., and Hartmann, B. Democut: generating concise instructional videos for physical demonstrations. In Proc. UIST 2014, ACM (2013), 141--150.

Digital Library

[6]

Cour, T., Jordan, C., Miltsakaki, E., and Taskar, B. Movie/script: Alignment and parsing of video and text transcription. In Proc. ECCV 2008. Springer, 2008, 158--171.

Digital Library

[7]

Fischler, M. A., and Bolles, R. C. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24, 6 (1981), 381--395.

Digital Library

[8]

Heck, R., Wallick, M., and Gleicher, M. Virtual videography. ACM Trans. Multimedia Comput. Commun. Appl. (TOMCCAP) 3, 1 (2007), 4.

Digital Library

[9]

Jain, E., Sheikh, Y., Shamir, A., and Hodgins, J. Gaze-driven video re-editing. ACM Trans. Graph. (TOG) 34, 2 (2015), 21.

Digital Library

[10]

Joshi, N., Kienzle, W., Toelle, M., Uyttendaele, M., and Cohen, M. F. Real-time hyperlapse creation via optimal frame selection. ACM Trans. Graph. (TOG) 34, 4 (2015), 63.

Digital Library

[11]

Lowe, D. G. Object recognition from local scale-invariant features. In Proc. ICCV 1999, vol. 2, Ieee (1999), 1150--1157.

Digital Library

[12]

Lu, Z., and Grauman, K. Story-driven summarization for egocentric video. In Proc. CVPR 2013, IEEE (2013), 2714--2721.

Digital Library

[13]

Manning, C. D., Raghavan, P., and Schètze, H. Introduction to Information Retrieval. Cambridge University Press, New York, NY, USA, 2008.

Digital Library

[14]

Pavel, A., Goldman, D. B., Hartmann, B., and Agrawala, M. Sceneskim: Searching and browsing movies using synchronized captions, scripts and plot summaries. In Proc. UIST 2015, ACM (2015), 181--190.

Digital Library

[15]

Pavel, A., Reed, C., Hartmann, B., and Agrawala, M. Video digests: A browsable, skimmable format for informational lecture videos. In Proc. UIST 2014, ACM (2014), 573--582.

Digital Library

[16]

Quora.com: How much time is spent editing footage for film/video. http://quora.com/How-much-time-is-spentediting-footage-for-Film-Video. Retrieved 2016-04-11.

[17]

Ranjan, A., Birnholtz, J., and Balakrishnan, R. Improving meeting capture by applying television production principles with audio and motion detection. In Proc. CHI 2008, ACM (2008), 227--236.

Digital Library

[18]

Rubin, S., Berthouzoz, F., Mysore, G. J., Li, W., and Agrawala, M. Content-based tools for editing audio stories. In Proc. UIST 2013, ACM (2013), 113--122.

Digital Library

[19]

Russ, J. C. The Image Processing Handbook, Fifth Edition (Image Processing Handbook). CRC Press, Inc., Boca Raton, FL, USA, 2006.

Digital Library

[20]

Shin, H. V., Berthouzoz, F., Li, W., and Durand, F. Visual transcripts: Lecture notes from blackboard-style lecture videos. ACM Trans. Graph. (TOG) 34, 6 (Oct. 2015), 240:1--240:10.

Digital Library

[21]

Shrestha, P., Weda, H., Barbieri, M., Aarts, E. H., et al. Automatic mashup generation from multiple-camera concert recordings. In Proc. Multimedia 2010, ACM (2010), 541--550.

Digital Library

[22]

Sumec, S. Multi camera automatic video editing. In Computer Vision and Graphics. Springer, 2006, 935--945.

[23]

Tapaswi, M., Bäuml, M., and Stiefelhagen, R. Book2movie: Aligning video scenes with book chapters. In Proc. CVPR 2015 (2015), 1827--1835.

[24]

Web Speech API. https://dvcs.w3.org/hg/speech-api/rawfile/tip/speechapi.html. Retrieved 2016-04-11.

[25]

Zhu, Y., Kiros, R., Zemel, R., Salakhutdinov, R., Urtasun, R., Torralba, A., and Fidler, S. Aligning books and movies: Towards story-like visual explanations by watching movies and reading books. arXiv preprint arXiv:1506.06724 (2015).

[26]

Zsombori, V., Frantzis, M., Guimaraes, R. L., Ursu, M. F., Cesar, P., Kegel, I., Craigie, R., and Bulterman, D. C. Automatic generation of video narratives from shared ugc. In Proc. Hypertext 2011, ACM (2011), 325--334.

Digital Library

Cited By

TSUCHIDA SFUKAYAMA SKATO JYAKURA HGOTO M(2024)DanceUnisoner: A Parametric, Visual, and Interactive Simulation Interface for Choreographic Composition of Group DanceIEICE Transactions on Information and Systems10.1587/transinf.2023EDP7063E107.D:3(386-399)Online publication date: 1-Mar-2024
https://doi.org/10.1587/transinf.2023EDP7063
Quandt DAltmeyer PRuppel WNarroschke M(2024)Automatic Text-based Clip Composition for Video NewsProceedings of the 2024 9th International Conference on Multimedia and Image Processing10.1145/3665026.3665042(106-112)Online publication date: 20-Apr-2024
https://dl.acm.org/doi/10.1145/3665026.3665042
Tilekbay BYang SLewkowicz MSuryapranata AKim J(2024)ExpressEdit: Video Editing with Natural Language and SketchingCompanion Proceedings of the 29th International Conference on Intelligent User Interfaces10.1145/3640544.3645226(50-53)Online publication date: 18-Mar-2024
https://dl.acm.org/doi/10.1145/3640544.3645226
Show More Cited By

Index Terms

QuickCut: An Interactive Tool for Editing Narrated Video
1. Human-centered computing
  1. Human computer interaction (HCI)

Recommendations

B-Script: Transcript-based B-roll Video Editing with Recommendations
CHI '19: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems

In video production, inserting B-roll is a widely used technique to enrich the story and make a video more engaging. However, determining the right content and positions of B-roll and actually inserting it within the main footage can be challenging, and ...
LACES: live authoring through compositing and editing of streaming video
CHI '14: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems

Video authoring activity typically consists of three phases: planning (pre-production), capture (production) and processing (post-production). The status quo is that these phases occur separately, and the latter two have a significant amount of "slack ...
End-user live editing of iTV programmes

Watching TV is a practice many people enjoy and feel comfortable with. While watching a TV programme, users can be offered the opportunity to, while making annotations, create their own edited versions of the programme. In this scenario, it is a ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

UIST '16: Proceedings of the 29th Annual Symposium on User Interface Software and Technology

October 2016

908 pages

ISBN:9781450341899

DOI:10.1145/2984511

General Chairs:
Jun Rekimoto
The University of Tokyo
,
Takeo Igarashi
The University of Tokyo
,
Program Chairs:
Jacob O. Wobbrock
University of Washington
,
Daniel Avrahami
FXPAL

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 October 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

David and Helen Gurley Brown Institute for Media Innovation

Conference

UIST '16

Sponsor:

UIST '16: The 29th Annual ACM Symposium on User Interface Software and Technology

October 16 - 19, 2016

Tokyo, Japan

Acceptance Rates

UIST '16 Paper Acceptance Rate 79 of 384 submissions, 21%;

Overall Acceptance Rate 561 of 2,567 submissions, 22%

Upcoming Conference

UIST '25

Sponsor:
sigchi
sigchi

The 38th Annual ACM Symposium on User Interface Software and Technology

September 28 - October 1, 2025

Busan , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

56
Total Citations
View Citations
941
Total Downloads

Downloads (Last 12 months)115
Downloads (Last 6 weeks)17

Reflects downloads up to 16 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

TSUCHIDA SFUKAYAMA SKATO JYAKURA HGOTO M(2024)DanceUnisoner: A Parametric, Visual, and Interactive Simulation Interface for Choreographic Composition of Group DanceIEICE Transactions on Information and Systems10.1587/transinf.2023EDP7063E107.D:3(386-399)Online publication date: 1-Mar-2024
https://doi.org/10.1587/transinf.2023EDP7063
Quandt DAltmeyer PRuppel WNarroschke M(2024)Automatic Text-based Clip Composition for Video NewsProceedings of the 2024 9th International Conference on Multimedia and Image Processing10.1145/3665026.3665042(106-112)Online publication date: 20-Apr-2024
https://dl.acm.org/doi/10.1145/3665026.3665042
Tilekbay BYang SLewkowicz MSuryapranata AKim J(2024)ExpressEdit: Video Editing with Natural Language and SketchingCompanion Proceedings of the 29th International Conference on Intelligent User Interfaces10.1145/3640544.3645226(50-53)Online publication date: 18-Mar-2024
https://dl.acm.org/doi/10.1145/3640544.3645226
Tilekbay BYang SLewkowicz MSuryapranata AKim J(2024)ExpressEdit: Video Editing with Natural Language and SketchingProceedings of the 29th International Conference on Intelligent User Interfaces10.1145/3640543.3645164(515-536)Online publication date: 18-Mar-2024
https://dl.acm.org/doi/10.1145/3640543.3645164
Wang BLi YLv ZXia HXu YSodhi R(2024)LAVE: LLM-Powered Agent Assistance and Language Augmentation for Video EditingProceedings of the 29th International Conference on Intelligent User Interfaces10.1145/3640543.3645143(699-714)Online publication date: 18-Mar-2024
https://dl.acm.org/doi/10.1145/3640543.3645143
He RCao YHoorn JWei H(2024)Cinemassist: An Intelligent Interactive System for Real-Time Cinematic Composition DesignExtended Abstracts of the CHI Conference on Human Factors in Computing Systems10.1145/3613905.3650898(1-7)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613905.3650898
Cao YKazi RWei LAneja DXia H(2024)Elastica: Adaptive Live Augmented Presentations with Elastic Mappings Across ModalitiesProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642725(1-19)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642725
Leake MLi W(2024)ChunkyEdit: Text-first video interview editing via chunkingProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642667(1-16)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642667
Kim JKim H(2024)Unlocking Creator-AI Synergy: Challenges, Requirements, and Design Opportunities in AI-Powered Short-Form Video ProductionProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642476(1-23)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642476
Liao JDuan HZhao WFeng KYang YChen L(2024)A Video Shot Occlusion Detection Algorithm Based on the Abnormal Fluctuation of Depth InformationIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.329524334:3(1627-1640)Online publication date: Mar-2024
https://doi.org/10.1109/TCSVT.2023.3295243
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents