[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

▪ Evidence, my Dear Watson: : Abstractive dialogue summarization on learnable relevant utterances

Published: 12 April 2024 Publication History

Abstract

Abstractive dialogue summarization requires distilling and rephrasing key information from noisy multi-speaker documents. Combining pre-trained language models with input augmentation techniques has recently led to significant research progress. However, existing solutions still struggle to select relevant chat segments, primarily relying on open-domain and unsupervised annotators not tailored to the actual needs of the summarization task. In this paper, we propose DearWatson, a task-aware utterance-level annotation framework for improving the effectiveness and interpretability of pre-trained dialogue summarization models. Precisely, we learn relevant utterances in the source document and mark them with special tags, that then act as supporting evidence for the generated summary. Quantitative experiments are conducted on two datasets made up of real-life messenger conversations. The results show that DearWatson allows model attention to focus on salient tokens, achieving new state-of-the-art results in three evaluation metrics, including semantic and factuality measures. Human evaluation proves the superiority of our solution in semantic consistency and recall. Finally, extensive ablation studies confirm each module’s importance, also exploring different annotation strategies and parameter-efficient fine-tuning of large generative language models.

References

[1]
Statista, Most popular global mobile messaging apps 2022, 2022, https://www.statista.com/statistics/258749/most-popular-global-mobile-messenger-apps/. Online; accessed 12 April 2023.
[2]
Sacks H., Schegloff E.A., Jefferson G., A simplest systematics for the organization of turn taking for conversation, in: Studies in the Organization of Conversational Interaction, Elsevier, 1978, pp. 7–55,.
[3]
Gliwa B., Mochol I., Biesek M., Wawer A., SAMSum corpus: A human-annotated dialogue dataset for abstractive summarization, in: Proceedings of the 2nd Workshop on New Frontiers in Summarization, ACL, Hong Kong, China, 2019, pp. 70–79,. URL https://aclanthology.org/D19-5409.
[4]
Feng X., Feng X., Qin B., A survey on dialogue summarization: Recent advances and new frontiers, in: IJCAI, ijcai.org, 2022, pp. 5453–5460.
[5]
Li M., Zhang L., Ji H., Radke R.J., Keep meeting summaries on topic: Abstractive multi-modal meeting summarization, in: ACL, ACL, Florence, Italy, 2019, pp. 2190–2196,. URL https://aclanthology.org/P19-1210.
[6]
Zechner K., Automatic summarization of open-domain multiparty dialogues in diverse genres, Comput. Linguist. 28 (4) (2002) 447–485.
[7]
Murray G., Renals S., Carletta J., Extractive summarization of meeting recordings, in: INTERSPEECH, ISCA, 2005, pp. 593–596.
[8]
Liu Z., Ng A., Guang S.L.S., Aw A.T., et al., Topic-aware pointer-generator networks for summarizing spoken conversations, in: ASRU, IEEE, 2019, pp. 814–821.
[9]
Chen J., Yang D., Multi-view sequence-to-sequence models with conversational structure for abstractive dialogue summarization, in: EMNLP, ACL, Online, 2020, pp. 4106–4118,. URL https://aclanthology.org/2020.emnlp-main.336.
[10]
Feng X., Feng X., Qin L., Qin B., et al., Language model as an annotator: Exploring dialoGPT for dialogue summarization, in: ACL-IJCNLP, ACL, Online, 2021, pp. 1479–1491,. URL https://aclanthology.org/2021.acl-long.117.
[11]
Srivastava V., Bhat S., Pedanekar N., A few good sentences: Content selection for abstractive text summarization, in: Koutra D., Plant C., Rodriguez M.G., Baralis E., Bonchi F. (Eds.), Machine Learning and Knowledge Discovery in Databases: Research Track - European Conference, ECML PKDD 2023, Turin, Italy, September 18-22, 2023, Proceedings, Part IV, in: Lecture Notes in Computer Science, vol. 14172, Springer, 2023, pp. 124–141,.
[12]
Jang E., Gu S., Poole B., Categorical reparameterization with gumbel-softmax, in: ICLR (Poster), OpenReview.net, 2017.
[13]
Chen Y., Liu Y., Chen L., Zhang Y., DialogSum: A real-life scenario dialogue summarization dataset, in: Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, Association for Computational Linguistics, Online, 2021, pp. 5062–5074,. URL https://aclanthology.org/2021.findings-acl.449.
[14]
Huijben I.A.M., Kool W., Paulus M.B., van Sloun R.J.G., A review of the gumbel-max trick and its extensions for discrete stochasticity in machine learning, IEEE Trans. Pattern Anal. Mach. Intell. 45 (2021) 1353–1371. URL https://api.semanticscholar.org/CorpusID:238259238.
[15]
Zhang J., Zhao Y., Saleh M., Liu P.J., PEGASUS: pre-training with extracted gap-sentences for abstractive summarization, in: ICML, in: PMLR, vol. 119, PMLR, 2020, pp. 11328–11339.
[16]
Lewis M., Liu Y., Goyal N., Ghazvininejad M., et al., BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension, in: ACL, ACL, 2020, pp. 7871–7880.
[17]
Moro G., Ragazzi L., Valgimigli L., Freddi D., Discriminative marginalized probabilistic neural method for multi-document summarization of medical literature, in: ACL (1), ACL, 2022, pp. 180–189.
[18]
Ghadimi A., Beigy H., Hybrid multi-document summarization using pre-trained language models, Expert Syst. Appl. 192 (2022) 116292,.
[19]
Zou Y., Zhu B., Hu X., Gui T., Zhang Q., Low-resource dialogue summarization with domain-agnostic multi-source pretraining, Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 November, 2021, Association for Computational Linguistics, 2021, pp. 80–91,.
[20]
Moro G., Ragazzi L., Semantic self-segmentation for abstractive summarization of long documents in low-resource regimes, in: AAAI, AAAI Press, 2022, pp. 11085–11093.
[21]
Moro G., Ragazzi L., Valgimigli L., Frisoni G., Sartori C., Marfia G., Efficient memory-enhanced transformer for long-document summarization in low-resource regimes, Sensors 23 (7) (2023),. URL https://www.mdpi.com/1424-8220/23/7/3542.
[22]
Liu Z., Chen N., Controllable neural dialogue summarization with personal named entity planning, in: ACL, ACL, Online and Punta Cana, Dominican Republic, 2021, pp. 92–106,. URL https://aclanthology.org/2021.emnlp-main.8.
[23]
Zhang Y., Sun S., Galley M., Chen Y.-C., et al., DIALOGPT : Large-scale generative pre-training for conversational response generation, in: ACL, ACL, Online, 2020, pp. 270–278,. URL https://aclanthology.org/2020.acl-demos.30.
[24]
Bao S., He H., Wang F., Wu H., et al., PLATO: Pre-trained dialogue generation model with discrete latent variable, in: ACL, ACL, Online, 2020, pp. 85–96,. URL https://aclanthology.org/2020.acl-main.9.
[25]
Wu C.-S., Hoi S.C., Socher R., Xiong C., TOD-BERT: Pre-trained natural language understanding for task-oriented dialogue, in: EMNLP, ACL, Online, 2020, pp. 917–929,. URL https://aclanthology.org/2020.emnlp-main.66.
[26]
Cao Y., Bi W., Fang M., Tao D., Pretrained language models for dialogue generation with multiple input sources, in: EMNLP, ACL, Online, 2020, pp. 909–917,. URL https://aclanthology.org/2020.findings-emnlp.81.
[27]
Gao X., Zhang Y., Galley M., Brockett C., et al., Dialogue response ranking training with large-scale human feedback data, in: EMNLP, ACL, Online, 2020, pp. 386–395,. URL https://aclanthology.org/2020.emnlp-main.28.
[28]
Gu J.-C., Tao C., Ling Z., Xu C., et al., MPC-BERT: A pre-trained language model for multi-party conversation understanding, in: ACL, ACL, Online, 2021, pp. 3682–3692,. URL https://aclanthology.org/2021.acl-long.285.
[29]
Zhong M., Liu Y., Xu Y., Zhu C., et al., DialogLM: Pre-trained model for long dialogue understanding and summarization, in: AAAI, AAAI Press, 2022, pp. 11765–11773.
[30]
Domeniconi G., Moro G., Pagliarani A., Pasolini R., Markov chain based method for in-domain and cross-domain sentiment classification, KDIR 2015 - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval, part of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2015), Volume 1, Lisbon, Portugal, November 12-14, 2015, SciTePress, 2015, pp. 127–137,.
[31]
Frisoni G., Moro G., Phenomena explanation from text: unsupervised learning of interpretable and statistically significant knowledge, Data Management Technologies and Applications - 9th International Conference, DATA 2020, Virtual Event, July 7-9, 2020, Revised Selected Papers, Communications in Computer and Information Science, Springer, 2020, pp. 293–318,.
[32]
Frisoni G., Italiani P., Salvatori S., Moro G., Cogito ergo summ: Abstractive summarization of biomedical papers via semantic parsing graphs and consistency rewards, in: AAAI, AAAI Press, 2023, pp. 1–9.
[33]
Liu Z., Shi K., Chen N., Coreference-aware dialogue summarization, in: Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue, ACL, Singapore and Online, 2021, pp. 509–519. URL https://aclanthology.org/2021.sigdial-1.53.
[34]
Peyrard M., A simple theoretical model of importance for summarization, in: ACL, ACL, Florence, Italy, 2019, pp. 1059–1073,. URL https://aclanthology.org/P19-1101.
[35]
Riedhammer K., Favre B., Hakkani-Tür D., A keyphrase based approach to interactive meeting summarization, in: SLT, IEEE, 2008, pp. 153–156.
[36]
Koay J.J., Roustai A., Dai X., Burns D., et al., How domain terminology affects meeting summarization performance, in: COLING, COLING, Barcelona, Spain (Online), 2020, pp. 5689–5695,. URL https://aclanthology.org/2020.coling-main.499.
[37]
Zhao L., Xu W., Guo J., Improving abstractive dialogue summarization with graph structures and topic words, in: COLING, COLING, Barcelona, Spain (Online), 2020, pp. 437–449,. URL https://aclanthology.org/2020.coling-main.39.
[38]
Wu C.-S., Liu L., Liu W., Stenetorp P., et al., Controllable abstractive dialogue summarization with sketch supervision, in: ACL-IJCNLP 2021, ACL, Online, 2021, pp. 5108–5122,. URL https://aclanthology.org/2021.findings-acl.454.
[39]
Frisoni G., Moro G., Balzani L., Text-to-text extraction and verbalization of biomedical event graphs, in: COLING, COLING, Gyeongju, Republic of Korea, 2022, pp. 2692–2710. URL https://aclanthology.org/2022.coling-1.238.
[40]
Domeniconi G., Moro G., Pasolini R., Sartori C., Iterative refining of category profiles for nearest centroid cross-domain text classification, Knowledge Discovery, Knowledge Engineering and Knowledge Management - 6th International Joint Conference, IC3K 2014, Rome, Italy, October 21-24, 2014, Revised Selected Papers, Communications in Computer and Information Science, Springer, 2014, pp. 50–67,.
[41]
Domeniconi G., Moro G., Pasolini R., Sartori C., Cross-domain text classification through iterative refining of target categories representations, KDIR 2014 - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval, Rome, Italy, 21 - 24 October, 2014, SciTePress, 2014, pp. 31–42,.
[42]
Zhang Y., Gan Z., Fan K., Chen Z., et al., Adversarial feature matching for text generation, in: Precup D., Teh Y.W. (Eds.), ICML, in: Proceedings of Machine Learning Research, vol. 70, PMLR, 2017, pp. 4006–4015. URL http://proceedings.mlr.press/v70/zhang17b.html.
[43]
Jang E., Gu S., Poole B., Categorical reparameterization with gumbel-softmax, in: ICLR, OpenReview.net, 2017, URL https://openreview.net/forum?id=rkE3y85ee.
[44]
Firdaus M., Shandeelya A.P., Ekbal A., More to diverse: Generating diversified responses in a task oriented multimodal dialog system, PLoS One 15 (2020) URL https://api.semanticscholar.org/CorpusID:226269766.
[45]
Gu J., Im D.J., Li V.O., Neural machine translation with gumbel-greedy decoding, Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2-7, 2018, AAAI Press, 2018, pp. 5125–5132,.
[46]
Kool W., van Hoof H., Welling M., Stochastic beams and where to find them: The gumbel-top-k trick for sampling sequences without replacement, in: Chaudhuri K., Salakhutdinov R. (Eds.), Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA, in: Proceedings of Machine Learning Research, vol. 97, PMLR, 2019, pp. 3499–3508. URL http://proceedings.mlr.press/v97/kool19a.html.
[47]
Su C., Huang H., Shi S., Jian P., Shi X., Neural machine translation with gumbel tree-lstm based encoder, J. Vis. Commun. Image Represent. 71 (2020) 102811,.
[48]
Havrylov S., Titov I., Emergence of language with multi-agent games: Learning to communicate with sequences of symbols, in: Guyon I., von Luxburg U., Bengio S., Wallach H.M., Fergus R., Vishwanathan S.V.N., Garnett R. (Eds.), Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, 2017, pp. 2149–2159. URL https://proceedings.neurips.cc/paper/2017/hash/70222949cc0db89ab32c9969754d4758-Abstract.html.
[49]
Chen Y., Liu Y., Chen L., Zhang Y., DialogSum: A real-life scenario dialogue summarization dataset, 2021, arXiv preprint arXiv:2105.06762.
[50]
Li Y., Su H., Shen X., Li W., Cao Z., Niu S., Dailydialog: A manually labelled multi-turn dialogue dataset, 2017, arXiv preprint arXiv:1710.03957.
[51]
Sun K., Yu D., Chen J., Yu D., Choi Y., Cardie C., Dream: A challenge data set and models for dialogue-based reading comprehension, Trans. Assoc. Comput. Linguist. 7 (2019) 217–231.
[52]
Cui L., Wu Y., Liu S., Zhang Y., Zhou M., Mutual: A dataset for multi-turn dialogue reasoning, 2020, arXiv preprint arXiv:2004.04494.
[53]
Bird S., Klein E., Loper E., Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit, O’Reilly Media, Inc., 2009.
[54]
Liu Y., Ott M., Goyal N., Du J., et al., RoBERTa: A robustly optimized BERT pretraining approach, 2019, CoRR abs/1907.11692. URL http://arxiv.org/abs/1907.11692. arXiv:1907.11692.
[55]
He P., Liu X., Gao J., Chen W., Deberta: decoding-enhanced bert with disentangled attention, in: ICLR, OpenReview.net, 2021, URL https://openreview.net/forum?id=XPZIaotutsD.
[56]
Radford A., Wu J., Child R., Luan D., Amodei D., Sutskever I., Language models are unsupervised multitask learners, 2019.
[57]
Zhang S., Roller S., Goyal N., Artetxe M., et al., OPT: open pre-trained transformer language models, 2022,. CoRR abs/2205.01068. arXiv:2205.01068.
[58]
Chung H.W., Hou L., Longpre S., Zoph B., et al., Scaling instruction-finetuned language models, 2022,. CoRR abs/2210.11416. arXiv:2210.11416.
[59]
Chen J., Yang D., Structure-aware abstractive conversation summarization via discourse and action graphs, in: NAACL, ACL, Online, 2021, pp. 1380–1391,. URL https://aclanthology.org/2021.naacl-main.109.
[60]
Huang K.-h., Singh S., Ma X., Xiao W., et al., SWING: Balancing coverage and faithfulness for dialogue summarization, in: EACL, ACL, Dubrovnik, Croatia, 2023, pp. 512–525. URL https://aclanthology.org/2023.findings-eacl.37.
[61]
Jia Q., Liu Y., Tang H., Zhu K., Post-training dialogue summarization using pseudo-paraphrasing, in: Findings of the Association for Computational Linguistics: NAACL 2022, Association for Computational Linguistics, Seattle, United States, 2022, pp. 1660–1669,. URL https://aclanthology.org/2022.findings-naacl.125.
[62]
Lin C.-Y., ROUGE: A package for automatic evaluation of summaries, in: Text Summarization Branches Out, ACL, Barcelona, Spain, 2004, pp. 74–81.
[63]
Moro G., Ragazzi L., Valgimigli L., Carburacy: Summarization models tuning and comparison in eco-sustainable regimes with a novel carbon-aware accuracy, in: AAAI, AAAI Press, 2023, pp. 1–9.
[64]
Zhang T., Kishore V., Wu F., Weinberger K.Q., et al., BERTScore: Evaluating text generation with BERT, in: ICLR, OpenReview.net, 2020, URL https://openreview.net/forum?id=SkeHuCVFDr.
[65]
Yuan W., Neubig G., Liu P., BARTScore: Evaluating generated text as text generation, in: Ranzato M., Beygelzimer A., Dauphin Y.N., Liang P., Vaughan J.W. (Eds.), NeurIPS, 2021, pp. 27263–27277. URL https://proceedings.neurips.cc/paper/2021/hash/e4d2b6e6fdeca3e60e0f1a62fee3d9dd-Abstract.html.
[66]
Narayan S., Cohen S.B., Lapata M., Don’t give me the details, just the summary! Topic-aware convolutional neural networks for extreme summarization, in: EMNLP, ACL, Brussels, Belgium, 2018, pp. 1797–1807,. URL https://aclanthology.org/D18-1206.
[67]
Fabbri A., Li I., She T., Li S., et al., Multi-news: A large-scale multi-document summarization dataset and abstractive hierarchical model, in: ACL, ACL, Florence, Italy, 2019, pp. 1074–1084,. URL https://aclanthology.org/P19-1102.
[68]
Zhao W.X., Zhou K., Li J., Tang T., et al., A survey of large language models, 2023,. CoRR abs/2303.18223. arXiv:2303.18223.
[69]
Wei J., Tay Y., Bommasani R., Raffel C., Zoph B., Borgeaud S., Yogatama D., Bosma M., Zhou D., Metzler D., Chi E.H., Hashimoto T., Vinyals O., Liang P., Dean J., Fedus W., Emergent abilities of large language models, Trans. Mach. Learn. Res. 2022 (2022) URL https://openreview.net/forum?id=yzkSU5zdwD.
[70]
Schaeffer R., Miranda B., Koyejo S., Are emergent abilities of large language models a mirage?, CoRR abs/2304.15004 (2023) URL https://doi.org/10.48550/arXiv.2304.15004.
[71]
Bubeck S., Chandrasekaran V., Eldan R., Gehrke J., et al., Sparks of artificial general intelligence: Early experiments with GPT-4, 2023,. CoRR abs/2303.12712. arXiv:2303.12712.
[72]
Gilardi F., Alizadeh M., Kubli M., ChatGPT outperforms crowd-workers for text-annotation tasks, 2023,. CoRR abs/2303.15056. arXiv:2303.15056.
[73]
Wang J., Liang Y., Meng F., Shi H., et al., Is ChatGPT a good NLG evaluator? A preliminary study, 2023,. CoRR abs/2303.04048. arXiv:2303.04048.
[74]
Kendall M.G., A new measure of rank correlation, Biometrika 30 (1/2) (1938) 81–93.
[75]
Moro G., Valgimigli L., Efficient self-supervised metric information retrieval: A bibliography based method applied to COVID literature, Sensors 21 (19) (2021) 6430,.
[76]
Domeniconi G., Masseroli M., Moro G., Pinoli P., Discovering new gene functionalities from random perturbations of known gene ontological annotations, KDIR 2014 - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval, Rome, Italy, 21 - 24 October, 2014, SciTePress, 2014, pp. 107–116,.
[77]
Domeniconi G., Masseroli M., Moro G., Pinoli P., Cross-organism learning method to discover new gene functionalities, Comput. Methods Programs Biomed. 126 (2016) 20–34,.
[78]
Zhong M., Liu Y., Xu Y., Zhu C., et al., DialogLM: Pre-trained model for long dialogue understanding and summarization, in: AAAI, AAAI Press, 2022, pp. 11765–11773. URL https://ojs.aaai.org/index.php/AAAI/article/view/21432.
[79]
Paszke A., Gross S., Massa F., Lerer A., et al., PyTorch: An imperative style, high-performance deep learning library, in: Wallach H.M., Larochelle H., Beygelzimer A., d’Alché-Buc F., Fox E.B., Garnett R. (Eds.), NeurIPS, 2019, pp. 8024–8035. URL https://proceedings.neurips.cc/paper/2019/hash/bdbca288fee7f92f2bfa9f7012727740-Abstract.html.
[80]
Wolf T., Debut L., Sanh V., Chaumond J., et al., Transformers: State-of-the-art natural language processing, in: EMNLP, ACL, Online, 2020, pp. 38–45,. URL https://aclanthology.org/2020.emnlp-demos.6.
[81]
Loshchilov I., Hutter F., Decoupled weight decay regularization, in: ICLR, OpenReview.net, 2019, URL https://openreview.net/forum?id=Bkg6RiCqY7.
[82]
Hu E.J., Shen Y., Wallis P., Allen-Zhu Z., et al., LoRA: Low-rank adaptation of large language models, in: ICLR, OpenReview.net, 2022, URL https://openreview.net/forum?id=nZeVKeeFYf9.
[83]
Frisoni G., Carbonaro A., Moro G., Zammarchi A., Avagnano M., NLG-metricverse: An end-to-end library for evaluating natural language generation, in: COLING, COLING, Gyeongju, Republic of Korea, 2022, pp. 3465–3479. URL https://aclanthology.org/2022.coling-1.306.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Neurocomputing
Neurocomputing  Volume 572, Issue C
Mar 2024
323 pages

Publisher

Elsevier Science Publishers B. V.

Netherlands

Publication History

Published: 12 April 2024

Author Tags

  1. Abstractive dialogue summarization
  2. Input augmentation
  3. Text classification
  4. Gumbel-softmax trick
  5. Interpretable natural language processing

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 17 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media