More Web Proxy on the site http://driver.im/

research-article

Hierarchical Summarization for Longform Spoken Dialog

Authors:

Lydia B ChiltonAuthors Info & Claims

UIST '21: The 34th Annual ACM Symposium on User Interface Software and Technology

Pages 582 - 597

https://doi.org/10.1145/3472749.3474771

Published: 12 October 2021 Publication History

Abstract

Every day we are surrounded by spoken dialog. This medium delivers rich diverse streams of information auditorily; however, systematically understanding dialog can often be non-trivial. Despite the pervasiveness of spoken dialog, automated speech understanding and quality information extraction remains markedly poor, especially when compared to written prose. Furthermore, compared to understanding text, auditory communication poses many additional challenges such as speaker disfluencies, informal prose styles, and lack of structure. These concerns all demonstrate the need for a distinctly speech tailored interactive system to help users understand and navigate the spoken language domain. While individual automatic speech recognition (ASR) and text summarization methods already exist, they are imperfect technologies; neither consider user purpose and intent nor address spoken language induced complications. Consequently, we design a two stage ASR and text summarization pipeline and propose a set of semantic segmentation and merging algorithms to resolve these speech modeling challenges. Our system enables users to easily browse and navigate content as well as recover from errors in these underlying technologies. Finally, we present an evaluation of the system which highlights user preference for hierarchical summarization as a tool to quickly skim audio and identify content of interest to the user.

References

[1]

Connelly Barnes, Dan B Goldman, Eli Shechtman, and Adam Finkelstein. 2010. Video tapestries with continuous temporal zoom. In ACM SIGGRAPH 2010 papers. 1–9.

Digital Library

[2]

Michel Galley, Kathleen McKeown, Eric Fosler-Lussier, and Hongyan Jing. 2003. Discourse segmentation of multi-party conversation. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics. 562–569.

Digital Library

[3]

Matt Gardner, Joel Grus, Mark Neumann, Oyvind Tafjord, Pradeep Dasigi, Nelson Liu, Matthew Peters, Michael Schmitz, and Luke Zettlemoyer. 2018. Allennlp: A deep semantic natural language processing platform. arXiv preprint arXiv:1803.07640(2018).

[4]

Yashesh Gaur, Walter S Lasecki, Florian Metze, and Jeffrey P Bigham. 2016. The effects of automatic speech recognition quality on human transcription latency. In Proceedings of the 13th Web for All Conference. 1–8.

Digital Library

[5]

Bogdan Gliwa, Iwona Mochol, Maciej Biesek, and Aleksander Wawer. 2019. SAMSum Corpus: A Human-annotated Dialogue Dataset for Abstractive Summarization. Proceedings of the 2nd Workshop on New Frontiers in Summarization (2019). https://doi.org/10.18653/v1/d19-5409

[6]

Dan B Goldman, Brian Curless, David Salesin, and Steven M Seitz. 2006. Schematic storyboarding for video visualization and editing. Acm transactions on graphics (tog) 25, 3 (2006), 862–871.

[7]

Google. 2021. Speech-to-Text. https://cloud.google.com/speech-to-text/

[8]

Dan Jackson, James Nicholson, Gerrit Stoeckigt, Rebecca Wrobel, Anja Thieme, and Patrick Olivier. 2013. Panopticon: A parallel video overview system. In proceedings of the 26th annual ACM symposium on User interface software and technology. 123–130.

Digital Library

[9]

Derry Jatnika, Moch Arif Bijaksana, and Arie Ardiyanti Suryani. 2019. Word2vec model analysis for semantic similarities in english words. Procedia Computer Science 157 (2019), 160–167.

Digital Library

[10]

Hannes Karlbom and Ann Clifton. 2020. Abstractive Podcast Summarization using BART with Longformer attention. (2020).

[11]

Michael P Kaschak and Arthur M Glenberg. 2000. Constructing meaning: The role of affordances and grammatical constructions in sentence comprehension. Journal of memory and language 43, 3 (2000), 508–529.

[12]

Joshua Y. Kim, Chunfeng Liu, Rafael A. Calvo, Kathryn McCabe, Silas C. R. Taylor, Björn W. Schuller, and Kaihang Wu. 2019. A Comparison of Online Automatic Speech Recognition Systems and the Nonverbal Responses to Unintelligible Speech. arxiv:1904.12403 [cs.SD]

[13]

Manling Li, Lingyu Zhang, Heng Ji, and Richard J Radke. 2019. Keep meeting summaries on topic: Abstractive multi-modal meeting summarization. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2190–2196.

[14]

Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out. 74–81.

[15]

Douglas W. Maynard. 1980. Placement of topic changes in conversation. 30, 3-4 (1980), 263–290. https://doi.org/

[16]

Joshua Maynez, Shashi Narayan, Bernd Bohnet, and Ryan McDonald. 2020. On Faithfulness and Factuality in Abstractive Summarization. arxiv:2005.00661 [cs.CL]

[17]

Amy Pavel, Dan B Goldman, Björn Hartmann, and Maneesh Agrawala. 2015. Sceneskim: Searching and browsing movies using synchronized captions, scripts and plot summaries. In Proceedings of the 28th Annual ACM Symposium on User Interface Software & Technology. 181–190.

Digital Library

[18]

Amy Pavel, Colorado Reed, Björn Hartmann, and Maneesh Agrawala. 2014. Video digests: a browsable, skimmable format for informational lecture videos. In UIST, Vol. 10. Citeseer, 2642918–2647400.

[19]

Peter. Pirolli. 2007. Information foraging theory : adaptive interaction with information / Peter Pirolli. Oxford University Press New York. 204 p. : pages. http://www.loc.gov/catdir/toc/ecip0617/2006021795.html

[20]

Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. CoRR abs/1908.10084(2019). arxiv:1908.10084http://arxiv.org/abs/1908.10084

[21]

Guokan Shang, Wensi Ding, Zekun Zhang, Antoine Jean-Pierre Tixier, Polykarpos Meladianos, Michalis Vazirgiannis, and Jean-Pierre Lorré. 2018. Unsupervised abstractive meeting summarization with multi-sentence compression and budgeted submodular maximization. arXiv preprint arXiv:1805.05271(2018).

[22]

Wee Meng Soon, Hwee Tou Ng, and Daniel Chung Yong Lim. 2001. A machine learning approach to coreference resolution of noun phrases. Computational linguistics 27, 4 (2001), 521–544.

[23]

Veselin Stoyanov and Claire Cardie. 2006. Partially Supervised Coreference Resolution for Opinion Summarization through Structured Rule Learning. In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Sydney, Australia, 336–344. https://aclanthology.org/W06-1640

Digital Library

[24]

Ryuichi Takanobu, Minlie Huang, Zhongzhou Zhao, Feng-Lin Li, Haiqing Chen, Xiaoyan Zhu, and Liqiang Nie. 2018. A Weakly Supervised Method for Topic Segmentation and Labeling in Goal-oriented Dialogues via Reinforcement Learning. In IJCAI. 4403–4410.

[25]

Anh Truong, Peggy Chi, David Salesin, Irfan Essa, and Maneesh Agrawala. 2021. Automatic Generation of Two-Level Hierarchical Tutorials from Instructional Makeup Videos. (2021).

[26]

Aneesh Vartakavi and Amanmeet Garg. 2020. PodSumm–Podcast Audio Summarization. arXiv preprint arXiv:2009.10315(2020).

[27]

René Witte and Sabine Bergler. 2003. Fuzzy coreference resolution for summarization. In Proceedings of 2003 International Symposium on Reference Resolution and Its Applications to Question Answering and Summarization (ARQAS). 43–50.

[28]

Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, and Jamie Brew. 2019. HuggingFace’s Transformers: State-of-the-art Natural Language Processing. CoRR abs/1910.03771(2019). arxiv:1910.03771http://arxiv.org/abs/1910.03771

[29]

Haijun Xia. 2020. Crosspower: Bridging Graphics and Linguistics. In Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology. 722–734.

Digital Library

[30]

Haijun Xia, Jennifer Jacobs, and Maneesh Agrawala. 2020. Crosscast: Adding Visuals to Audio Travel Podcasts. In Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology. 735–746.

Digital Library

[31]

Jiacheng Xu, Zhe Gan, Yu Cheng, and Jingjing Liu. 2019. Discourse-aware neural extractive text summarization. arXiv preprint arXiv:1910.14142(2019).

[32]

Amy X Zhang, Lea Verou, and David Karger. 2017. Wikum: Bridging discussion forums and wikis using recursive summarization. In Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing. 2082–2096.

Digital Library

[33]

Jingqing Zhang, Yao Zhao, Mohammad Saleh, and Peter J. Liu. 2019. PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. CoRR abs/1912.08777(2019). arxiv:1912.08777http://arxiv.org/abs/1912.08777

[34]

Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q. Weinberger, and Yoav Artzi. 2020. BERTScore: Evaluating Text Generation with BERT. arxiv:1904.09675 [cs.CL]

[35]

Zheng Zhao, Shay B Cohen, and Bonnie Webber. 2020. Reducing Quantity Hallucinations in Abstractive Summarization. arXiv preprint arXiv:2009.13312(2020).

[36]

Chujie Zheng, Kunpeng Zhang, Harry Jiannan Wang, and Ling Fan. 2020. A Two-Phase Approach for Abstractive Podcast Summarization. arxiv:2011.08291 [cs.CL]

[37]

Chenguang Zhu, Ruochen Xu, Michael Zeng, and Xuedong Huang. 2020. A hierarchical network for abstractive meeting summarization with cross-domain pretraining. arXiv preprint arXiv:2004.02016(2020).

Cited By

Wang SNing ZTruong ADontcheva MLi DChilton L(2024)PodReels: Human-AI Co-Creation of Video Podcast TeasersProceedings of the 2024 ACM Designing Interactive Systems Conference10.1145/3643834.3661591(958-974)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1145/3643834.3661591
Van Daele TIyer AZhang YDerry JHuh MPavel A(2024)Making Short-Form Videos Accessible with Hierarchical Video SummariesProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642839(1-17)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642839
Lin SWarner JZamfirescu-Pereira JLee MJain SCai SLertvittayakumjorn PHuang MZhai SHartmann BLiu C(2024)Rambler: Supporting Writing With Speech via LLM-Assisted Gist ManipulationProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642217(1-19)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642217
Show More Cited By

Recommendations

Improving Automatic Summarization for Browsing Longform Spoken Dialog
CHI '23: Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems

Longform spoken dialog delivers rich streams of informative content through podcasts, interviews, debates, and meetings. While production of this medium has grown tremendously, spoken dialog remains challenging to consume as listening is slower than ...
A Semi-automatic Wizard of Oz Technique for Let'sFly Spoken Dialogue System
TSD '08: Proceedings of the 11th international conference on Text, Speech and Dialogue

The paper presents Let'sFly spoken dialogue system intended for natural human-computer interaction via telephone lines in travel planning domain. The system uses ASR, keyword spotting and TTS methods for continuous Russian speech and a dialogue manager ...
Enabling Structured Navigation of Longform Spoken Dialog with Automatic Summarization

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

UIST '21: The 34th Annual ACM Symposium on User Interface Software and Technology

October 2021

1357 pages

ISBN:9781450386357

DOI:10.1145/3472749

Editors:
Jeffrey Nichols
Apple, USA
,
Ranjitha Kumar
UIUC, USA
,
Michael Nebeling
University of Michigan, USA

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 October 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

UIST '21

Sponsor:

UIST '21: The 34th Annual ACM Symposium on User Interface Software and Technology

October 10 - 14, 2021

Virtual Event, USA

Acceptance Rates

Overall Acceptance Rate 561 of 2,567 submissions, 22%

Upcoming Conference

UIST '25

Sponsor:
sigchi
sigchi

The 38th Annual ACM Symposium on User Interface Software and Technology

September 28 - October 1, 2025

Busan , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
497
Total Downloads

Downloads (Last 12 months)93
Downloads (Last 6 weeks)2

Reflects downloads up to 03 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Wang SNing ZTruong ADontcheva MLi DChilton L(2024)PodReels: Human-AI Co-Creation of Video Podcast TeasersProceedings of the 2024 ACM Designing Interactive Systems Conference10.1145/3643834.3661591(958-974)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1145/3643834.3661591
Van Daele TIyer AZhang YDerry JHuh MPavel A(2024)Making Short-Form Videos Accessible with Hierarchical Video SummariesProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642839(1-17)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642839
Lin SWarner JZamfirescu-Pereira JLee MJain SCai SLertvittayakumjorn PHuang MZhai SHartmann BLiu C(2024)Rambler: Supporting Writing With Speech via LLM-Assisted Gist ManipulationProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642217(1-19)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642217
Jia QLiu YRen SZhu K(2023)Taxonomy of Abstractive Dialogue Summarization: Scenarios, Approaches, and Future DirectionsACM Computing Surveys10.1145/362293356:3(1-38)Online publication date: 5-Oct-2023
https://dl.acm.org/doi/10.1145/3622933
Chi PDong TFrueh CColonna BKwatra VEssa I(2022)Synthesis-Assisted Video Prototyping From a DocumentProceedings of the 35th Annual ACM Symposium on User Interface Software and Technology10.1145/3526113.3545676(1-10)Online publication date: 29-Oct-2022
https://dl.acm.org/doi/10.1145/3526113.3545676
Dang HBenharrak KLehmann FBuschek D(2022)Beyond Text Generation: Supporting Writers with Continuous Automatic Text SummariesProceedings of the 35th Annual ACM Symposium on User Interface Software and Technology10.1145/3526113.3545672(1-13)Online publication date: 29-Oct-2022
https://dl.acm.org/doi/10.1145/3526113.3545672

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents