Abstract
Multimodal text summarization is a complex and challenging task in the field of natural language processing. Its objective is to use a combination of features from various modalities to create a concise yet informative summary from a given set of input data. In our research, we conducted a thorough survey of various techniques and methods used for multimodal text summarization and analyzed their implications on both research and industry. Moreover, we have developed a straightforward yet efficient model to address the challenges associated with this task. Our model has achieved state-of-the-art performance on the MMSS dataset. Additionally, we have proposed a semantic-based evaluation technique to measure the quality of the generated summaries. The effectiveness of our proposed technique has been substantiated through empirical evidence and appropriate analysis and discussion. Our goal is to make the code and models we have developed for our system publicly accessible.
Similar content being viewed by others
Notes
References
Emani C K, Cullot N and Nicolle C 2015 Understandable big data: A survey. Comput. Sci. Rev. 17: 70–81.
Vilca G C V and Cabezudo M A S 2017 A study of abstractive summarization using semantic representations and discourse level information. In: International Conference on Text, Speech, and Dialogue, pp. 482–490
Nazari N and Mahdavi M 2019 A survey on automatic text summarization. J. AI Data Min. 7(1): 121–135.
Maybury M T 1995 Generating summaries from event data. Inform. Process. Manag. 31(5): 735–751.
Radev D R, Hovy E and McKeown K 2002 Introduction to the special issue on summarization. Comput. linguist. 28(4): 399–408.
Tawmo T, Bohra M, Dadure P and Pakray P 2022 Comparative analysis of T5 model for abstractive text summarization on different datasets. In: Proceedings of the International Conference on Innovative Computing & Communication (ICICC)
Tawmo T, Adhikary P K, Dadure P and Pakray P 2022 An empirical analysis on abstractive text summarization. In: Computational Intelligence in Communications and Business Analytics: 4th International Conference, CICBA 2022, Silchar, India. Springer Nature
Mahajani A, Pandya V, Maria I and Sharma D 2019 A comprehensive survey on extractive and abstractive techniques for text summarization. Ambient Commun. Comput. Syst. 339–351
Baltrušaitis T, Ahuja C and Morency L P 2018 Multimodal machine learning: A survey and taxonomy. IEEE Trans. Pattern Anal. Mach. Intell. 41(2): 423–443.
Bengio Y, Courville A and Vincent P 2013 Representation learning: A review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35(8): 1798–1828.
Krizhevsky A, Sutskever I and Hinton G E 2012 Imagenet classification with deep convolutional neural networks. Adv. Neural Inform. Process. Syst. 25: 1097–1105.
Hinton G, Deng L, Yu D, Dahl G E, Mohamed A R, Jaitly N, Senior A, Vanhoucke V, Nguyen P and Sainath T N 2012 Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Process. Mag. 29(6): 82–97.
Srivastava N and Salakhutdinov R 2012 Multimodal learning with deep boltzmann machines. Adv. Neural Inform. Process. Syst. 25: 1–9.
Anagnostopoulos C N, Iliou T and Giannoukos I 2015 Features and classifiers for emotion recognition from speech: A survey from 2000 to 2011. Artif. Intell. Rev. 43(2): 155–177.
Li Y, Wang S, Tian Q and Ding X 2015 A survey of recent advances in visual feature detection. Neurocomputing 149: 736–751.
Yi Z, Zhiguo C and Yang X 2008 Multi-spectral remote image registration based on sift. Electron. Lett. 44(2): 107–108.
Lippmann R P 1989 Review of neural networks for speech recognition. Neural Comput. 1(1): 1–38.
Sutton A, Clowes M, Preston L and Booth A 2019 Meeting the review family: Exploring review types and associated information retrieval requirements. Health Inform. Libraries J. 36(3): 202–222.
Grishman R 2015 Information extraction. IEEE Intell. Syst. 30(5): 8–15.
Yulianti E, Chen R C, Scholer F, Croft W B and Sanderson M 2017 Document summarization for answering non-factoid queries. IEEE Trans. Knowl. Data. Eng. 30(1): 15–28.
Tuarob S, Bhatia S, Mitra P and Giles C L 2016 Algorithmseer: A system for extracting and searching for algorithms in scholarly big data. IEEE Trans. Big Data 2(1): 3–17.
Li H, Zhu J, Ma C, Zhang J and Zong C 2018 Read, watch, listen, and summarize: Multi-modal summarization for asynchronous text, image, audio and video. IEEE Trans. Knowl. Data Eng. 31(5): 996–1009.
Khilji A F U R, Sinha U, Singh P, Ali A and Pakray P 2021 Abstractive text summarization approaches with analysis of evaluation techniques. In: Computational Intelligence in Communications and Business Analytics (CICBA-2021), Communications in Computer and Information Science (CCIS)
Luhn H P 1957 A statistical approach to mechanized encoding and searching of literary information. IBM J. Res. Develop. 1(4): 309–317.
Hahn U and Mani I 2000 The challenges of automatic summarization. Computer 33(11): 29–36.
Lin C Y 2004 Rouge: A package for automatic evaluation of summaries. In: Text Summarization Branches Out, pp. 74–81
Gao Y, Sun C and Passonneau R J 2019 Automated pyramid summarization evaluation. In: Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL), pp. 404–418
El-Kassas W S, Salama C R, Rafea A A and Mohamed H K 2020 Automatic text summarization: A comprehensive survey. Expert Systems with Applications p. 113679
Gupta S and Gupta S 2019 Abstractive summarization: An overview of the state of the art. Expert Syst. Appl. 121: 49–65.
Joshi M, Wang H and McClean S 2018 Dense semantic graph and its application in single document summarisation. In: Emerging Ideas on Information Filtering and Retrieval Springer, Cham
Gupta V K and Siddiqui T J 2012 Multi-document summarization using sentence clustering. In: 2012 4th International Conference on Intelligent Human Computer Interaction (IHCI), pp. 1–5
Gambhir M and Gupta V 2017 Recent automatic text summarization techniques: a survey. Artif. Intell. Rev. 47(1): 1–66.
Khilji A F U R, Manna R, Laskar S R, Pakray P, Das D, Bandyopadhyay S and Gelbukh A 2020 Question classification and answer extraction for developing a cooking qa system. Comput. Sistemas 24(2): 927.
Khilji A F U R, Manna R, Laskar S R, Pakray P, Das D, Bandyopadhyay S and Gelbukh A 2021 Cookingqa: Answering questions and recommending recipes based on ingredients. Arab. J. Sci. Eng. 46(4): 3701–3712.
Agarwal P and Mehta S 2018 Empirical analysis of five nature-inspired algorithms on real parameter optimization problems. Artif. Intell. Rev. 50(3): 383–439.
Mohd M, Jan R and Shah M 2020 Text document summarization using word embedding. Expert Syst. Appl. 143: 112958.
Bhat I K, Mohd M and Hashmy R 2018 Sumitup: A hybrid single-document text summarizer. In: Soft Computing: Theories and Applications, pp. 619–634
Dernoncourt F, Ghassemi M and Chang W 2018 A repository of corpora for summarization. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
Afantenos S, Karkaletsis V and Stamatopoulos P 2005 Summarization from medical documents: a survey. Artif. Intell. Med. 33(2): 157–177.
Kanapala A, Pal S and Pamula R 2019 Text summarization from legal documents: A survey. Artif. Intell. Rev. 51(3): 371–402.
Mikolov T, Sutskever I, Chen K, Corrado G and Dean J 2013 Distributed representations of words and phrases and their compositionality. arXiv preprint arXiv:1310.4546
Rush A M, Chopra S and Weston J 2015 A neural attention model for abstractive sentence summarization. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP Lisbon, Portugal The Association for Computational Linguistics, pp. 379–389
Bahdanau D, Cho K and Bengio Y 2015 Neural machine translation by jointly learning to align and translate. In: 3rd International Conference on Learning Representations (ICLR), pp. 1–15
Luong M T, Pham H and Manning C D 2015 Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025
Hermann K M, Kočiskỳ T, Grefenstette E, Espeholt L, Kay W, Suleyman M and Blunsom P 2015 Teaching machines to read and comprehend. arXiv preprint arXiv:1506.03340
Tan J, Wan X, Xiao J 2017 Abstractive document summarization with a graph-based attentional neural model. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, pp. 1171–1181
Page L, Brin S, Motwani R and Winograd T 1999 The pagerank citation ranking: Bringing order to the web. Tech. rep., Stanford InfoLab
Nallapati R, Zhou B, Gulcehre C and Xiang B 2016 Abstractive text summarization using sequence-to-sequence rnns and beyond. arXiv preprint arXiv:1602.06023
Chen Q, Zhu X, Ling Z, Wei S and Jiang H 2016 Distraction-based neural networks for document summarization. arXiv preprint arXiv:1610.08462
Nallapati R, Zhai F and Zhou B 2017 Summarunner: A recurrent neural network based sequence model for extractive summarization of documents. In: Proceedings of the AAAI Conference on Artificial Intelligence 31: 3075–3081.
Rush A M, Chopra S and Weston J 2015 A neural attention model for abstractive sentence summarization. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 379–389
Chopra S, Auli M and Rush A M 2016 A neural attention model for abstractive sentence summarization. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 93–98
Nallapati R, Zhou B, Santos C D, Gulcehre C and Xiang B 2016 Abstractive text summarization using sequence-to-sequence rnns and beyond. In: Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning pp. 280–290
Bengio Y, Ducharme R and Vincent P 2001 A neural probabilistic language model. Adv. Neural Inform. Process. Syst., 932–938
David G and Cieri C 2003 English gigaword. In: Philadelphia: Linguistic Data Consortium 2003
Napoles C, Gormley M and Durme B V 2012 Annotated gigaword. In: Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction (AKBC-WEKEX) pp. 95–100
Over P, Dang H and Harman D 2007 Duc in context. Inform. Process. Manag. 43: 1506–1520.
Chung J, Gulcehre C, Cho K and Bengio Y 2014 Empirical evaluation of gated recurrent neural networks on sequence modeling. In: NIPS 2014 Workshop on Deep Learning, pp. 1–9
Jean S, Cho K, Memisevic R and Bengio Y 2015 On using very large target vocabulary for neural machine translation. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Atural Language Processing 1, pp. 1–10
National Institute of Standards & Technology (NIST) 2004 Duc 2004 dataset
Song S, Huang H and Ruan T 2019 Abstractive text summarization using lstm-cnn based deep learning. Multimed. Tools Appl. 78: 857–875.
Jangra A, Mukherjee S, Jatowt A, Saha S and Hasanuzzaman M 2021 A Survey on Multi-modal Summarization ACM Computing Surveys. ACM New York, NY.
Congbo M, Zhang W E, Guo M, Wang H and Sheng Q Z 2022 Multi-document summarization via deep learning techniques: A survey. ACM Comput. Surv. ACM 55(5): 1–37.
Haoran L, Junnan Z, Jiajun Z, Xiaodong H and Chengqing Z 2020 Multimodal Sentence Summarization via Multimodal Selective Encoding In: Proceedings of the 28th International Conference on Computational Linguistics. International Committee on Computational Linguistics. Barcelona, Spain pp. 5655–5667
Li H, Zhu J, Zhang J, He X and Zong C 2020 Multimodal sentence summarization via multimodal selective encoding. In Proceedings of the 28th International Conference on Computational Linguistics, pp. 5655–5667
Khullar A and Arora U 2020 MAST: Multimodal abstractive summarization with trimodal hierarchical attention. arXiv preprint arXiv:2010.08021
Atri Y K, Pramanick S, Goyal V and Chakraborty T 2021 See, hear, read: Leveraging multimodality with guided attention for abstractive text summarization. Knowl. Syst. 227: 107152.
Li H, Ke Q, Gong M and Drummond T 2023 Progressive Video Summarization via Multimodal Self-supervised Learning. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 5584-5593
Li L, Zhou K, Xue G R, Zha H and Yu Y 2011 Video summarization via transferrable structured learning. In: Proceedings of the 20th International Conference on World Wide Web pp. 287–296
Yang Z and Okazaki N 2020 Image caption generation for news articles. In: Proceedings of the 28th International Conference on Computational Linguistics, pp. 1941–1951
Liu M, Li L, Hu H, Guan W and Tian J 2020 Image caption generation with dual attention mechanism. Inform. Process. Manag. 57(2): 102178.
Steinberger J and Jezek K 2009 Evaluation measures for text summarization. Comput. Inform. 28(2): 251.
Papineni K, Roukos S, Ward T and Zhu W J 2002 Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pp. 311–318
Hewitt S 2017 Textual entailment with tensorflow
Dagan I, Roth D, Sammons M and Zanzotto F 2013 Recognizing Textual Entailment. Morgan & Claypool Publishers, San Rafael.
Androutsopoulos I and Malakasiotis P 2010 A survey of paraphrasing and textual entailment methods. J. Artif. Intell. Res. 38: 135–187.
Wang S and Jiang J 2016 Learning natural language inference with LSTM. In: The 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego California, USA The Association for Computational Linguistics, pp. 1442–1451
Pennington J, Socher R and Manning C D 2014 Glove: Global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP Doha, Qatar, pp. 1532–1543
Li H, Zhu J, Liu T, Zhang J and Zong C 2018 Multi-modal sentence summarization with modality attention and image filtering. In: Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, (IJCAI), pp. 4152–4158
Mikolov T, Grave E, Bojanowski P, Puhrsch C and Joulin A 2018 Advances in pre-training distributed word representations. In: Proceedings of the International Conference on Language Resources and Evaluation (LREC 2018)
Santos I, Nedjah N and de Macedo Mourelle L 2017 Sentiment analysis using convolutional neural network with fasttext embeddings. In: 2017 IEEE Latin American Conference on Computational Intelligence (LA-CCI) pp. 1–5
Athiwaratkun B, Wilson A G and Anandkumar A 2018 Probabilistic fasttext for multi-sense word embeddings. arXiv preprint arXiv:1806.02901
Acknowledgements
We would like to thank the Department of Computer Science and Engineering and the Center for Natural Language Processing (CNLP) at the National Institute of Technology Silchar for providing the requisite support and infrastructure to execute this work. The work presented here falls under the Research Project Grant No. IFC/4130/DST-CNRS/2018-19/IT25 (DST-CNRS targeted program).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Khilji, A.F.U.R., Sinha, U., Singh, P. et al. Multimodal text summarization with evaluation approaches. Sādhanā 48, 226 (2023). https://doi.org/10.1007/s12046-023-02284-z
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s12046-023-02284-z