Multimodal text summarization with evaluation approaches

Abdullah Faiz Ur Rahman Khilji¹,
Utkarsh Sinha¹,
Pintu Singh¹,
Adnan Ali¹,
Sahinur Rahman Laskar²,
Pankaj Dadure²,
Riyanka Manna³,
Partha Pakray ORCID: orcid.org/0000-0003-3834-5154¹,
Benoit Favre⁴ &
…
Sivaji Bandyopadhyay¹^nAff5

398 Accesses
Explore all metrics

Abstract

Multimodal text summarization is a complex and challenging task in the field of natural language processing. Its objective is to use a combination of features from various modalities to create a concise yet informative summary from a given set of input data. In our research, we conducted a thorough survey of various techniques and methods used for multimodal text summarization and analyzed their implications on both research and industry. Moreover, we have developed a straightforward yet efficient model to address the challenges associated with this task. Our model has achieved state-of-the-art performance on the MMSS dataset. Additionally, we have proposed a semantic-based evaluation technique to measure the quality of the generated summaries. The effectiveness of our proposed technique has been substantiated through empirical evidence and appropriate analysis and discussion. Our goal is to make the code and models we have developed for our system publicly accessible.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Multimodal Summarization: A Concise Review

From task to evaluation: an automatic text summarization review

Article 29 August 2023

Text summarization using multiobjective optimization

Article 06 July 2019

Notes

References

Emani C K, Cullot N and Nicolle C 2015 Understandable big data: A survey. Comput. Sci. Rev. 17: 70–81.
Article MathSciNet Google Scholar
Vilca G C V and Cabezudo M A S 2017 A study of abstractive summarization using semantic representations and discourse level information. In: International Conference on Text, Speech, and Dialogue, pp. 482–490
Nazari N and Mahdavi M 2019 A survey on automatic text summarization. J. AI Data Min. 7(1): 121–135.
Google Scholar
Maybury M T 1995 Generating summaries from event data. Inform. Process. Manag. 31(5): 735–751.
Article Google Scholar
Radev D R, Hovy E and McKeown K 2002 Introduction to the special issue on summarization. Comput. linguist. 28(4): 399–408.
Article Google Scholar
Tawmo T, Bohra M, Dadure P and Pakray P 2022 Comparative analysis of T5 model for abstractive text summarization on different datasets. In: Proceedings of the International Conference on Innovative Computing & Communication (ICICC)
Tawmo T, Adhikary P K, Dadure P and Pakray P 2022 An empirical analysis on abstractive text summarization. In: Computational Intelligence in Communications and Business Analytics: 4th International Conference, CICBA 2022, Silchar, India. Springer Nature
Mahajani A, Pandya V, Maria I and Sharma D 2019 A comprehensive survey on extractive and abstractive techniques for text summarization. Ambient Commun. Comput. Syst. 339–351
Baltrušaitis T, Ahuja C and Morency L P 2018 Multimodal machine learning: A survey and taxonomy. IEEE Trans. Pattern Anal. Mach. Intell. 41(2): 423–443.
Article Google Scholar
Bengio Y, Courville A and Vincent P 2013 Representation learning: A review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35(8): 1798–1828.
Article Google Scholar
Krizhevsky A, Sutskever I and Hinton G E 2012 Imagenet classification with deep convolutional neural networks. Adv. Neural Inform. Process. Syst. 25: 1097–1105.
Google Scholar
Hinton G, Deng L, Yu D, Dahl G E, Mohamed A R, Jaitly N, Senior A, Vanhoucke V, Nguyen P and Sainath T N 2012 Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Process. Mag. 29(6): 82–97.
Article Google Scholar
Srivastava N and Salakhutdinov R 2012 Multimodal learning with deep boltzmann machines. Adv. Neural Inform. Process. Syst. 25: 1–9.
MATH Google Scholar
Anagnostopoulos C N, Iliou T and Giannoukos I 2015 Features and classifiers for emotion recognition from speech: A survey from 2000 to 2011. Artif. Intell. Rev. 43(2): 155–177.
Article Google Scholar
Li Y, Wang S, Tian Q and Ding X 2015 A survey of recent advances in visual feature detection. Neurocomputing 149: 736–751.
Article Google Scholar
Yi Z, Zhiguo C and Yang X 2008 Multi-spectral remote image registration based on sift. Electron. Lett. 44(2): 107–108.
Article Google Scholar
Lippmann R P 1989 Review of neural networks for speech recognition. Neural Comput. 1(1): 1–38.
Article Google Scholar
Sutton A, Clowes M, Preston L and Booth A 2019 Meeting the review family: Exploring review types and associated information retrieval requirements. Health Inform. Libraries J. 36(3): 202–222.
Article Google Scholar
Grishman R 2015 Information extraction. IEEE Intell. Syst. 30(5): 8–15.
Article Google Scholar
Yulianti E, Chen R C, Scholer F, Croft W B and Sanderson M 2017 Document summarization for answering non-factoid queries. IEEE Trans. Knowl. Data. Eng. 30(1): 15–28.
Article Google Scholar
Tuarob S, Bhatia S, Mitra P and Giles C L 2016 Algorithmseer: A system for extracting and searching for algorithms in scholarly big data. IEEE Trans. Big Data 2(1): 3–17.
Article Google Scholar
Li H, Zhu J, Ma C, Zhang J and Zong C 2018 Read, watch, listen, and summarize: Multi-modal summarization for asynchronous text, image, audio and video. IEEE Trans. Knowl. Data Eng. 31(5): 996–1009.
Article Google Scholar
Khilji A F U R, Sinha U, Singh P, Ali A and Pakray P 2021 Abstractive text summarization approaches with analysis of evaluation techniques. In: Computational Intelligence in Communications and Business Analytics (CICBA-2021), Communications in Computer and Information Science (CCIS)
Luhn H P 1957 A statistical approach to mechanized encoding and searching of literary information. IBM J. Res. Develop. 1(4): 309–317.
Article MathSciNet Google Scholar
Hahn U and Mani I 2000 The challenges of automatic summarization. Computer 33(11): 29–36.
Article Google Scholar
Lin C Y 2004 Rouge: A package for automatic evaluation of summaries. In: Text Summarization Branches Out, pp. 74–81
Gao Y, Sun C and Passonneau R J 2019 Automated pyramid summarization evaluation. In: Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL), pp. 404–418
El-Kassas W S, Salama C R, Rafea A A and Mohamed H K 2020 Automatic text summarization: A comprehensive survey. Expert Systems with Applications p. 113679
Gupta S and Gupta S 2019 Abstractive summarization: An overview of the state of the art. Expert Syst. Appl. 121: 49–65.
Article Google Scholar
Joshi M, Wang H and McClean S 2018 Dense semantic graph and its application in single document summarisation. In: Emerging Ideas on Information Filtering and Retrieval Springer, Cham
Gupta V K and Siddiqui T J 2012 Multi-document summarization using sentence clustering. In: 2012 4th International Conference on Intelligent Human Computer Interaction (IHCI), pp. 1–5
Gambhir M and Gupta V 2017 Recent automatic text summarization techniques: a survey. Artif. Intell. Rev. 47(1): 1–66.
Article Google Scholar
Khilji A F U R, Manna R, Laskar S R, Pakray P, Das D, Bandyopadhyay S and Gelbukh A 2020 Question classification and answer extraction for developing a cooking qa system. Comput. Sistemas 24(2): 927.
Google Scholar
Khilji A F U R, Manna R, Laskar S R, Pakray P, Das D, Bandyopadhyay S and Gelbukh A 2021 Cookingqa: Answering questions and recommending recipes based on ingredients. Arab. J. Sci. Eng. 46(4): 3701–3712.
Article Google Scholar
Agarwal P and Mehta S 2018 Empirical analysis of five nature-inspired algorithms on real parameter optimization problems. Artif. Intell. Rev. 50(3): 383–439.
Article Google Scholar
Mohd M, Jan R and Shah M 2020 Text document summarization using word embedding. Expert Syst. Appl. 143: 112958.
Article Google Scholar
Bhat I K, Mohd M and Hashmy R 2018 Sumitup: A hybrid single-document text summarizer. In: Soft Computing: Theories and Applications, pp. 619–634
Dernoncourt F, Ghassemi M and Chang W 2018 A repository of corpora for summarization. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
Afantenos S, Karkaletsis V and Stamatopoulos P 2005 Summarization from medical documents: a survey. Artif. Intell. Med. 33(2): 157–177.
Article Google Scholar
Kanapala A, Pal S and Pamula R 2019 Text summarization from legal documents: A survey. Artif. Intell. Rev. 51(3): 371–402.
Article Google Scholar
Mikolov T, Sutskever I, Chen K, Corrado G and Dean J 2013 Distributed representations of words and phrases and their compositionality. arXiv preprint arXiv:1310.4546
Rush A M, Chopra S and Weston J 2015 A neural attention model for abstractive sentence summarization. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP Lisbon, Portugal The Association for Computational Linguistics, pp. 379–389
Bahdanau D, Cho K and Bengio Y 2015 Neural machine translation by jointly learning to align and translate. In: 3rd International Conference on Learning Representations (ICLR), pp. 1–15
Luong M T, Pham H and Manning C D 2015 Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025
Hermann K M, Kočiskỳ T, Grefenstette E, Espeholt L, Kay W, Suleyman M and Blunsom P 2015 Teaching machines to read and comprehend. arXiv preprint arXiv:1506.03340
Tan J, Wan X, Xiao J 2017 Abstractive document summarization with a graph-based attentional neural model. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, pp. 1171–1181
Page L, Brin S, Motwani R and Winograd T 1999 The pagerank citation ranking: Bringing order to the web. Tech. rep., Stanford InfoLab
Nallapati R, Zhou B, Gulcehre C and Xiang B 2016 Abstractive text summarization using sequence-to-sequence rnns and beyond. arXiv preprint arXiv:1602.06023
Chen Q, Zhu X, Ling Z, Wei S and Jiang H 2016 Distraction-based neural networks for document summarization. arXiv preprint arXiv:1610.08462
Nallapati R, Zhai F and Zhou B 2017 Summarunner: A recurrent neural network based sequence model for extractive summarization of documents. In: Proceedings of the AAAI Conference on Artificial Intelligence 31: 3075–3081.
Rush A M, Chopra S and Weston J 2015 A neural attention model for abstractive sentence summarization. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 379–389
Chopra S, Auli M and Rush A M 2016 A neural attention model for abstractive sentence summarization. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 93–98
Nallapati R, Zhou B, Santos C D, Gulcehre C and Xiang B 2016 Abstractive text summarization using sequence-to-sequence rnns and beyond. In: Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning pp. 280–290
Bengio Y, Ducharme R and Vincent P 2001 A neural probabilistic language model. Adv. Neural Inform. Process. Syst., 932–938
David G and Cieri C 2003 English gigaword. In: Philadelphia: Linguistic Data Consortium 2003
Napoles C, Gormley M and Durme B V 2012 Annotated gigaword. In: Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction (AKBC-WEKEX) pp. 95–100
Over P, Dang H and Harman D 2007 Duc in context. Inform. Process. Manag. 43: 1506–1520.
Article Google Scholar
Chung J, Gulcehre C, Cho K and Bengio Y 2014 Empirical evaluation of gated recurrent neural networks on sequence modeling. In: NIPS 2014 Workshop on Deep Learning, pp. 1–9
Jean S, Cho K, Memisevic R and Bengio Y 2015 On using very large target vocabulary for neural machine translation. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Atural Language Processing 1, pp. 1–10
National Institute of Standards & Technology (NIST) 2004 Duc 2004 dataset
Song S, Huang H and Ruan T 2019 Abstractive text summarization using lstm-cnn based deep learning. Multimed. Tools Appl. 78: 857–875.
Article Google Scholar
Jangra A, Mukherjee S, Jatowt A, Saha S and Hasanuzzaman M 2021 A Survey on Multi-modal Summarization ACM Computing Surveys. ACM New York, NY.
Google Scholar
Congbo M, Zhang W E, Guo M, Wang H and Sheng Q Z 2022 Multi-document summarization via deep learning techniques: A survey. ACM Comput. Surv. ACM 55(5): 1–37.
Google Scholar
Haoran L, Junnan Z, Jiajun Z, Xiaodong H and Chengqing Z 2020 Multimodal Sentence Summarization via Multimodal Selective Encoding In: Proceedings of the 28th International Conference on Computational Linguistics. International Committee on Computational Linguistics. Barcelona, Spain pp. 5655–5667
Li H, Zhu J, Zhang J, He X and Zong C 2020 Multimodal sentence summarization via multimodal selective encoding. In Proceedings of the 28th International Conference on Computational Linguistics, pp. 5655–5667
Khullar A and Arora U 2020 MAST: Multimodal abstractive summarization with trimodal hierarchical attention. arXiv preprint arXiv:2010.08021
Atri Y K, Pramanick S, Goyal V and Chakraborty T 2021 See, hear, read: Leveraging multimodality with guided attention for abstractive text summarization. Knowl. Syst. 227: 107152.
Article Google Scholar
Li H, Ke Q, Gong M and Drummond T 2023 Progressive Video Summarization via Multimodal Self-supervised Learning. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 5584-5593
Li L, Zhou K, Xue G R, Zha H and Yu Y 2011 Video summarization via transferrable structured learning. In: Proceedings of the 20th International Conference on World Wide Web pp. 287–296
Yang Z and Okazaki N 2020 Image caption generation for news articles. In: Proceedings of the 28th International Conference on Computational Linguistics, pp. 1941–1951
Liu M, Li L, Hu H, Guan W and Tian J 2020 Image caption generation with dual attention mechanism. Inform. Process. Manag. 57(2): 102178.
Article Google Scholar
Steinberger J and Jezek K 2009 Evaluation measures for text summarization. Comput. Inform. 28(2): 251.
Google Scholar
Papineni K, Roukos S, Ward T and Zhu W J 2002 Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pp. 311–318
Hewitt S 2017 Textual entailment with tensorflow
Dagan I, Roth D, Sammons M and Zanzotto F 2013 Recognizing Textual Entailment. Morgan & Claypool Publishers, San Rafael.
Book Google Scholar
Androutsopoulos I and Malakasiotis P 2010 A survey of paraphrasing and textual entailment methods. J. Artif. Intell. Res. 38: 135–187.
Article MATH Google Scholar
Wang S and Jiang J 2016 Learning natural language inference with LSTM. In: The 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego California, USA The Association for Computational Linguistics, pp. 1442–1451
Pennington J, Socher R and Manning C D 2014 Glove: Global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP Doha, Qatar, pp. 1532–1543
Li H, Zhu J, Liu T, Zhang J and Zong C 2018 Multi-modal sentence summarization with modality attention and image filtering. In: Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, (IJCAI), pp. 4152–4158
Mikolov T, Grave E, Bojanowski P, Puhrsch C and Joulin A 2018 Advances in pre-training distributed word representations. In: Proceedings of the International Conference on Language Resources and Evaluation (LREC 2018)
Santos I, Nedjah N and de Macedo Mourelle L 2017 Sentiment analysis using convolutional neural network with fasttext embeddings. In: 2017 IEEE Latin American Conference on Computational Intelligence (LA-CCI) pp. 1–5
Athiwaratkun B, Wilson A G and Anandkumar A 2018 Probabilistic fasttext for multi-sense word embeddings. arXiv preprint arXiv:1806.02901

Download references

Acknowledgements

We would like to thank the Department of Computer Science and Engineering and the Center for Natural Language Processing (CNLP) at the National Institute of Technology Silchar for providing the requisite support and infrastructure to execute this work. The work presented here falls under the Research Project Grant No. IFC/4130/DST-CNRS/2018-19/IT25 (DST-CNRS targeted program).

Author information

Sivaji Bandyopadhyay
Present address: Computer Science & Engineering, Jadavpur University, Jadavpur, India

Authors and Affiliations

Department of Computer Science & Engineering, National Institute of Technology, Silchar, India
Abdullah Faiz Ur Rahman Khilji, Utkarsh Sinha, Pintu Singh, Adnan Ali, Partha Pakray & Sivaji Bandyopadhyay
School of Computer Science, University of Petroleum and Energy Studies, Dehradun, India
Sahinur Rahman Laskar & Pankaj Dadure
Computer Science & Engineering, School of Computing, Amrita Vishwa Vidyapeetham, Amaravati, India
Riyanka Manna
Laboratoire d’Informatique Fondamentale, CNRS, Aix-Marseille University, Aix-en-Provence, France
Benoit Favre

Authors

Abdullah Faiz Ur Rahman Khilji
View author publications
You can also search for this author in PubMed Google Scholar
Utkarsh Sinha
View author publications
You can also search for this author in PubMed Google Scholar
Pintu Singh
View author publications
You can also search for this author in PubMed Google Scholar
Adnan Ali
View author publications
You can also search for this author in PubMed Google Scholar
Sahinur Rahman Laskar
View author publications
You can also search for this author in PubMed Google Scholar
Pankaj Dadure
View author publications
You can also search for this author in PubMed Google Scholar
Riyanka Manna
View author publications
You can also search for this author in PubMed Google Scholar
Partha Pakray
View author publications
You can also search for this author in PubMed Google Scholar
Benoit Favre
View author publications
You can also search for this author in PubMed Google Scholar
Sivaji Bandyopadhyay
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Partha Pakray.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Khilji, A.F.U.R., Sinha, U., Singh, P. et al. Multimodal text summarization with evaluation approaches. Sādhanā 48, 226 (2023). https://doi.org/10.1007/s12046-023-02284-z

Download citation

Received: 17 July 2022
Revised: 10 May 2023
Accepted: 02 August 2023
Published: 24 October 2023
DOI: https://doi.org/10.1007/s12046-023-02284-z

Multimodal text summarization with evaluation approaches

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Multimodal Summarization: A Concise Review

From task to evaluation: an automatic text summarization review

Text summarization using multiobjective optimization

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Multimodal text summarization with evaluation approaches

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Multimodal Summarization: A Concise Review

From task to evaluation: an automatic text summarization review

Text summarization using multiobjective optimization

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation