[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2984511.2984569acmconferencesArticle/Chapter ViewAbstractPublication PagesuistConference Proceedingsconference-collections
research-article

QuickCut: An Interactive Tool for Editing Narrated Video

Published: 16 October 2016 Publication History

Abstract

We present QuickCut, an interactive video editing tool designed to help authors efficiently edit narrated video. QuickCut takes an audio recording of the narration voiceover and a collection of raw video footage as input. Users then review the raw footage and provide spoken annotations describing the relevant actions and objects in the scene. QuickCut time-aligns a transcript of the annotations with the raw footage and a transcript of the narration to the voiceover. These aligned transcripts enable authors to quickly match story events in the narration with semantically relevant video segments and form alignment constraints between them. Given a set of such constraints, QuickCut applies dynamic programming optimization to choose frame-level cut points between the video segments while maintaining alignments with the narration and adhering to low-level film editing guidelines. We demonstrate QuickCut's effectiveness by using it to generate a variety of short (less than 2 minutes) narrated videos. Each result required between 14 and 52 minutes of user time to edit (i.e. between 8 and 31 minutes for each minute of output video), which is far less than typical authoring times with existing video editing workflows.

Supplementary Material

suppl.mov (uist4096-file3.mp4)
Supplemental video
MP4 File (p497-truong.mp4)

References

[1]
Arev, I., Park, H. S., Sheikh, Y., Hodgins, J., and Shamir, A. Automatic editing of footage from multiple social cameras. ACM Trans. Graph. (TOG) 33, 4 (2014), 81.
[2]
Bao, X., and Roy Choudhury, R. Movi: Mobile phone based video highlights via collaborative sensing. In Proc. MobiSys 2010, ACM (2010), 357--370.
[3]
Berthouzoz, F., Li, W., and Agrawala, M. Tools for placing cuts and transitions in interview video. ACM Trans. Graph. (TOG) 31, 4 (2012), 67.
[4]
Bird, S. Nltk: the natural language toolkit. In Proc. of the COLING/ACL on Interactive presentation sessions, Association for Computational Linguistics (2006), 69--72.
[5]
Chi, P.-Y., Liu, J., Linder, J., Dontcheva, M., Li, W., and Hartmann, B. Democut: generating concise instructional videos for physical demonstrations. In Proc. UIST 2014, ACM (2013), 141--150.
[6]
Cour, T., Jordan, C., Miltsakaki, E., and Taskar, B. Movie/script: Alignment and parsing of video and text transcription. In Proc. ECCV 2008. Springer, 2008, 158--171.
[7]
Fischler, M. A., and Bolles, R. C. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24, 6 (1981), 381--395.
[8]
Heck, R., Wallick, M., and Gleicher, M. Virtual videography. ACM Trans. Multimedia Comput. Commun. Appl. (TOMCCAP) 3, 1 (2007), 4.
[9]
Jain, E., Sheikh, Y., Shamir, A., and Hodgins, J. Gaze-driven video re-editing. ACM Trans. Graph. (TOG) 34, 2 (2015), 21.
[10]
Joshi, N., Kienzle, W., Toelle, M., Uyttendaele, M., and Cohen, M. F. Real-time hyperlapse creation via optimal frame selection. ACM Trans. Graph. (TOG) 34, 4 (2015), 63.
[11]
Lowe, D. G. Object recognition from local scale-invariant features. In Proc. ICCV 1999, vol. 2, Ieee (1999), 1150--1157.
[12]
Lu, Z., and Grauman, K. Story-driven summarization for egocentric video. In Proc. CVPR 2013, IEEE (2013), 2714--2721.
[13]
Manning, C. D., Raghavan, P., and Schètze, H. Introduction to Information Retrieval. Cambridge University Press, New York, NY, USA, 2008.
[14]
Pavel, A., Goldman, D. B., Hartmann, B., and Agrawala, M. Sceneskim: Searching and browsing movies using synchronized captions, scripts and plot summaries. In Proc. UIST 2015, ACM (2015), 181--190.
[15]
Pavel, A., Reed, C., Hartmann, B., and Agrawala, M. Video digests: A browsable, skimmable format for informational lecture videos. In Proc. UIST 2014, ACM (2014), 573--582.
[16]
Quora.com: How much time is spent editing footage for film/video. http://quora.com/How-much-time-is-spentediting-footage-for-Film-Video. Retrieved 2016-04-11.
[17]
Ranjan, A., Birnholtz, J., and Balakrishnan, R. Improving meeting capture by applying television production principles with audio and motion detection. In Proc. CHI 2008, ACM (2008), 227--236.
[18]
Rubin, S., Berthouzoz, F., Mysore, G. J., Li, W., and Agrawala, M. Content-based tools for editing audio stories. In Proc. UIST 2013, ACM (2013), 113--122.
[19]
Russ, J. C. The Image Processing Handbook, Fifth Edition (Image Processing Handbook). CRC Press, Inc., Boca Raton, FL, USA, 2006.
[20]
Shin, H. V., Berthouzoz, F., Li, W., and Durand, F. Visual transcripts: Lecture notes from blackboard-style lecture videos. ACM Trans. Graph. (TOG) 34, 6 (Oct. 2015), 240:1--240:10.
[21]
Shrestha, P., Weda, H., Barbieri, M., Aarts, E. H., et al. Automatic mashup generation from multiple-camera concert recordings. In Proc. Multimedia 2010, ACM (2010), 541--550.
[22]
Sumec, S. Multi camera automatic video editing. In Computer Vision and Graphics. Springer, 2006, 935--945.
[23]
Tapaswi, M., Bäuml, M., and Stiefelhagen, R. Book2movie: Aligning video scenes with book chapters. In Proc. CVPR 2015 (2015), 1827--1835.
[24]
Web Speech API. https://dvcs.w3.org/hg/speech-api/rawfile/tip/speechapi.html. Retrieved 2016-04-11.
[25]
Zhu, Y., Kiros, R., Zemel, R., Salakhutdinov, R., Urtasun, R., Torralba, A., and Fidler, S. Aligning books and movies: Towards story-like visual explanations by watching movies and reading books. arXiv preprint arXiv:1506.06724 (2015).
[26]
Zsombori, V., Frantzis, M., Guimaraes, R. L., Ursu, M. F., Cesar, P., Kegel, I., Craigie, R., and Bulterman, D. C. Automatic generation of video narratives from shared ugc. In Proc. Hypertext 2011, ACM (2011), 325--334.

Cited By

View all
  • (2024)DanceUnisoner: A Parametric, Visual, and Interactive Simulation Interface for Choreographic Composition of Group DanceIEICE Transactions on Information and Systems10.1587/transinf.2023EDP7063E107.D:3(386-399)Online publication date: 1-Mar-2024
  • (2024)Automatic Text-based Clip Composition for Video NewsProceedings of the 2024 9th International Conference on Multimedia and Image Processing10.1145/3665026.3665042(106-112)Online publication date: 20-Apr-2024
  • (2024)ExpressEdit: Video Editing with Natural Language and SketchingCompanion Proceedings of the 29th International Conference on Intelligent User Interfaces10.1145/3640544.3645226(50-53)Online publication date: 18-Mar-2024
  • Show More Cited By

Index Terms

  1. QuickCut: An Interactive Tool for Editing Narrated Video

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    UIST '16: Proceedings of the 29th Annual Symposium on User Interface Software and Technology
    October 2016
    908 pages
    ISBN:9781450341899
    DOI:10.1145/2984511
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 16 October 2016

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. video annotation
    2. video editing
    3. video segmentation

    Qualifiers

    • Research-article

    Funding Sources

    • David and Helen Gurley Brown Institute for Media Innovation

    Conference

    UIST '16

    Acceptance Rates

    UIST '16 Paper Acceptance Rate 79 of 384 submissions, 21%;
    Overall Acceptance Rate 561 of 2,567 submissions, 22%

    Upcoming Conference

    UIST '25
    The 38th Annual ACM Symposium on User Interface Software and Technology
    September 28 - October 1, 2025
    Busan , Republic of Korea

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)115
    • Downloads (Last 6 weeks)17
    Reflects downloads up to 16 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)DanceUnisoner: A Parametric, Visual, and Interactive Simulation Interface for Choreographic Composition of Group DanceIEICE Transactions on Information and Systems10.1587/transinf.2023EDP7063E107.D:3(386-399)Online publication date: 1-Mar-2024
    • (2024)Automatic Text-based Clip Composition for Video NewsProceedings of the 2024 9th International Conference on Multimedia and Image Processing10.1145/3665026.3665042(106-112)Online publication date: 20-Apr-2024
    • (2024)ExpressEdit: Video Editing with Natural Language and SketchingCompanion Proceedings of the 29th International Conference on Intelligent User Interfaces10.1145/3640544.3645226(50-53)Online publication date: 18-Mar-2024
    • (2024)ExpressEdit: Video Editing with Natural Language and SketchingProceedings of the 29th International Conference on Intelligent User Interfaces10.1145/3640543.3645164(515-536)Online publication date: 18-Mar-2024
    • (2024)LAVE: LLM-Powered Agent Assistance and Language Augmentation for Video EditingProceedings of the 29th International Conference on Intelligent User Interfaces10.1145/3640543.3645143(699-714)Online publication date: 18-Mar-2024
    • (2024)Cinemassist: An Intelligent Interactive System for Real-Time Cinematic Composition DesignExtended Abstracts of the CHI Conference on Human Factors in Computing Systems10.1145/3613905.3650898(1-7)Online publication date: 11-May-2024
    • (2024)Elastica: Adaptive Live Augmented Presentations with Elastic Mappings Across ModalitiesProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642725(1-19)Online publication date: 11-May-2024
    • (2024)ChunkyEdit: Text-first video interview editing via chunkingProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642667(1-16)Online publication date: 11-May-2024
    • (2024)Unlocking Creator-AI Synergy: Challenges, Requirements, and Design Opportunities in AI-Powered Short-Form Video ProductionProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642476(1-23)Online publication date: 11-May-2024
    • (2024)A Video Shot Occlusion Detection Algorithm Based on the Abnormal Fluctuation of Depth InformationIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.329524334:3(1627-1640)Online publication date: Mar-2024
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media