[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content
Log in

Multimodal text summarization with evaluation approaches

  • Published:
Sādhanā Aims and scope Submit manuscript

Abstract

Multimodal text summarization is a complex and challenging task in the field of natural language processing. Its objective is to use a combination of features from various modalities to create a concise yet informative summary from a given set of input data. In our research, we conducted a thorough survey of various techniques and methods used for multimodal text summarization and analyzed their implications on both research and industry. Moreover, we have developed a straightforward yet efficient model to address the challenges associated with this task. Our model has achieved state-of-the-art performance on the MMSS dataset. Additionally, we have proposed a semantic-based evaluation technique to measure the quality of the generated summaries. The effectiveness of our proposed technique has been substantiated through empirical evidence and appropriate analysis and discussion. Our goal is to make the code and models we have developed for our system publicly accessible.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Figure 1

Similar content being viewed by others

Notes

  1. https://en.wikipedia.org/wiki/Longest_common_subsequence_problem

  2. https://nlp.stanford.edu/projects/snli/

  3. https://images.search.yahoo.com/

  4. https://pypi.org/project/Pillow/

  5. https://commoncrawl.org/2017/06

  6. https://github.com/moses-smt

  7. https://opencv.org/

  8. https://github.com/keras-team/keras-applications/blob/master/keras_applications/resnet50.py

  9. https://github.com/KaimingHe/deep-residual-networks

  10. https://fasttext.cc/docs/en/unsupervised-tutorial.html

  11. https://www.tensorflow.org/tutorials/text/word2vec

References

  1. Emani C K, Cullot N and Nicolle C 2015 Understandable big data: A survey. Comput. Sci. Rev. 17: 70–81.

    Article  MathSciNet  Google Scholar 

  2. Vilca G C V and Cabezudo M A S 2017 A study of abstractive summarization using semantic representations and discourse level information. In: International Conference on Text, Speech, and Dialogue, pp. 482–490

  3. Nazari N and Mahdavi M 2019 A survey on automatic text summarization. J. AI Data Min. 7(1): 121–135.

    Google Scholar 

  4. Maybury M T 1995 Generating summaries from event data. Inform. Process. Manag. 31(5): 735–751.

    Article  Google Scholar 

  5. Radev D R, Hovy E and McKeown K 2002 Introduction to the special issue on summarization. Comput. linguist. 28(4): 399–408.

    Article  Google Scholar 

  6. Tawmo T, Bohra M, Dadure P and Pakray P 2022 Comparative analysis of T5 model for abstractive text summarization on different datasets. In: Proceedings of the International Conference on Innovative Computing & Communication (ICICC)

  7. Tawmo T, Adhikary P K, Dadure P and Pakray P 2022 An empirical analysis on abstractive text summarization. In: Computational Intelligence in Communications and Business Analytics: 4th International Conference, CICBA 2022, Silchar, India. Springer Nature

  8. Mahajani A, Pandya V, Maria I and Sharma D 2019 A comprehensive survey on extractive and abstractive techniques for text summarization. Ambient Commun. Comput. Syst. 339–351

  9. Baltrušaitis T, Ahuja C and Morency L P 2018 Multimodal machine learning: A survey and taxonomy. IEEE Trans. Pattern Anal. Mach. Intell. 41(2): 423–443.

    Article  Google Scholar 

  10. Bengio Y, Courville A and Vincent P 2013 Representation learning: A review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35(8): 1798–1828.

    Article  Google Scholar 

  11. Krizhevsky A, Sutskever I and Hinton G E 2012 Imagenet classification with deep convolutional neural networks. Adv. Neural Inform. Process. Syst. 25: 1097–1105.

    Google Scholar 

  12. Hinton G, Deng L, Yu D, Dahl G E, Mohamed A R, Jaitly N, Senior A, Vanhoucke V, Nguyen P and Sainath T N 2012 Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Process. Mag. 29(6): 82–97.

    Article  Google Scholar 

  13. Srivastava N and Salakhutdinov R 2012 Multimodal learning with deep boltzmann machines. Adv. Neural Inform. Process. Syst. 25: 1–9.

    MATH  Google Scholar 

  14. Anagnostopoulos C N, Iliou T and Giannoukos I 2015 Features and classifiers for emotion recognition from speech: A survey from 2000 to 2011. Artif. Intell. Rev. 43(2): 155–177.

    Article  Google Scholar 

  15. Li Y, Wang S, Tian Q and Ding X 2015 A survey of recent advances in visual feature detection. Neurocomputing 149: 736–751.

    Article  Google Scholar 

  16. Yi Z, Zhiguo C and Yang X 2008 Multi-spectral remote image registration based on sift. Electron. Lett. 44(2): 107–108.

    Article  Google Scholar 

  17. Lippmann R P 1989 Review of neural networks for speech recognition. Neural Comput. 1(1): 1–38.

    Article  Google Scholar 

  18. Sutton A, Clowes M, Preston L and Booth A 2019 Meeting the review family: Exploring review types and associated information retrieval requirements. Health Inform. Libraries J. 36(3): 202–222.

    Article  Google Scholar 

  19. Grishman R 2015 Information extraction. IEEE Intell. Syst. 30(5): 8–15.

    Article  Google Scholar 

  20. Yulianti E, Chen R C, Scholer F, Croft W B and Sanderson M 2017 Document summarization for answering non-factoid queries. IEEE Trans. Knowl. Data. Eng. 30(1): 15–28.

    Article  Google Scholar 

  21. Tuarob S, Bhatia S, Mitra P and Giles C L 2016 Algorithmseer: A system for extracting and searching for algorithms in scholarly big data. IEEE Trans. Big Data 2(1): 3–17.

    Article  Google Scholar 

  22. Li H, Zhu J, Ma C, Zhang J and Zong C 2018 Read, watch, listen, and summarize: Multi-modal summarization for asynchronous text, image, audio and video. IEEE Trans. Knowl. Data Eng. 31(5): 996–1009.

    Article  Google Scholar 

  23. Khilji A F U R, Sinha U, Singh P, Ali A and Pakray P 2021 Abstractive text summarization approaches with analysis of evaluation techniques. In: Computational Intelligence in Communications and Business Analytics (CICBA-2021), Communications in Computer and Information Science (CCIS)

  24. Luhn H P 1957 A statistical approach to mechanized encoding and searching of literary information. IBM J. Res. Develop. 1(4): 309–317.

    Article  MathSciNet  Google Scholar 

  25. Hahn U and Mani I 2000 The challenges of automatic summarization. Computer 33(11): 29–36.

    Article  Google Scholar 

  26. Lin C Y 2004 Rouge: A package for automatic evaluation of summaries. In: Text Summarization Branches Out, pp. 74–81

  27. Gao Y, Sun C and Passonneau R J 2019 Automated pyramid summarization evaluation. In: Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL), pp. 404–418

  28. El-Kassas W S, Salama C R, Rafea A A and Mohamed H K 2020 Automatic text summarization: A comprehensive survey. Expert Systems with Applications p. 113679

  29. Gupta S and Gupta S 2019 Abstractive summarization: An overview of the state of the art. Expert Syst. Appl. 121: 49–65.

    Article  Google Scholar 

  30. Joshi M, Wang H and McClean S 2018 Dense semantic graph and its application in single document summarisation. In: Emerging Ideas on Information Filtering and Retrieval Springer, Cham

  31. Gupta V K and Siddiqui T J 2012 Multi-document summarization using sentence clustering. In: 2012 4th International Conference on Intelligent Human Computer Interaction (IHCI), pp. 1–5

  32. Gambhir M and Gupta V 2017 Recent automatic text summarization techniques: a survey. Artif. Intell. Rev. 47(1): 1–66.

    Article  Google Scholar 

  33. Khilji A F U R, Manna R, Laskar S R, Pakray P, Das D, Bandyopadhyay S and Gelbukh A 2020 Question classification and answer extraction for developing a cooking qa system. Comput. Sistemas 24(2): 927.

    Google Scholar 

  34. Khilji A F U R, Manna R, Laskar S R, Pakray P, Das D, Bandyopadhyay S and Gelbukh A 2021 Cookingqa: Answering questions and recommending recipes based on ingredients. Arab. J. Sci. Eng. 46(4): 3701–3712.

    Article  Google Scholar 

  35. Agarwal P and Mehta S 2018 Empirical analysis of five nature-inspired algorithms on real parameter optimization problems. Artif. Intell. Rev. 50(3): 383–439.

    Article  Google Scholar 

  36. Mohd M, Jan R and Shah M 2020 Text document summarization using word embedding. Expert Syst. Appl. 143: 112958.

    Article  Google Scholar 

  37. Bhat I K, Mohd M and Hashmy R 2018 Sumitup: A hybrid single-document text summarizer. In: Soft Computing: Theories and Applications, pp. 619–634

  38. Dernoncourt F, Ghassemi M and Chang W 2018 A repository of corpora for summarization. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

  39. Afantenos S, Karkaletsis V and Stamatopoulos P 2005 Summarization from medical documents: a survey. Artif. Intell. Med. 33(2): 157–177.

    Article  Google Scholar 

  40. Kanapala A, Pal S and Pamula R 2019 Text summarization from legal documents: A survey. Artif. Intell. Rev. 51(3): 371–402.

    Article  Google Scholar 

  41. Mikolov T, Sutskever I, Chen K, Corrado G and Dean J 2013 Distributed representations of words and phrases and their compositionality. arXiv preprint arXiv:1310.4546

  42. Rush A M, Chopra S and Weston J 2015 A neural attention model for abstractive sentence summarization. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP Lisbon, Portugal The Association for Computational Linguistics, pp. 379–389

  43. Bahdanau D, Cho K and Bengio Y 2015 Neural machine translation by jointly learning to align and translate. In: 3rd International Conference on Learning Representations (ICLR), pp. 1–15

  44. Luong M T, Pham H and Manning C D 2015 Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025

  45. Hermann K M, Kočiskỳ T, Grefenstette E, Espeholt L, Kay W, Suleyman M and Blunsom P 2015 Teaching machines to read and comprehend. arXiv preprint arXiv:1506.03340

  46. Tan J, Wan X, Xiao J 2017 Abstractive document summarization with a graph-based attentional neural model. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, pp. 1171–1181

  47. Page L, Brin S, Motwani R and Winograd T 1999 The pagerank citation ranking: Bringing order to the web. Tech. rep., Stanford InfoLab

  48. Nallapati R, Zhou B, Gulcehre C and Xiang B 2016 Abstractive text summarization using sequence-to-sequence rnns and beyond. arXiv preprint arXiv:1602.06023

  49. Chen Q, Zhu X, Ling Z, Wei S and Jiang H 2016 Distraction-based neural networks for document summarization. arXiv preprint arXiv:1610.08462

  50. Nallapati R, Zhai F and Zhou B 2017 Summarunner: A recurrent neural network based sequence model for extractive summarization of documents. In: Proceedings of the AAAI Conference on Artificial Intelligence 31: 3075–3081.

  51. Rush A M, Chopra S and Weston J 2015 A neural attention model for abstractive sentence summarization. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 379–389

  52. Chopra S, Auli M and Rush A M 2016 A neural attention model for abstractive sentence summarization. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 93–98

  53. Nallapati R, Zhou B, Santos C D, Gulcehre C and Xiang B 2016 Abstractive text summarization using sequence-to-sequence rnns and beyond. In: Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning pp. 280–290

  54. Bengio Y, Ducharme R and Vincent P 2001 A neural probabilistic language model. Adv. Neural Inform. Process. Syst., 932–938

  55. David G and Cieri C 2003 English gigaword. In: Philadelphia: Linguistic Data Consortium 2003

  56. Napoles C, Gormley M and Durme B V 2012 Annotated gigaword. In: Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction (AKBC-WEKEX) pp. 95–100

  57. Over P, Dang H and Harman D 2007 Duc in context. Inform. Process. Manag. 43: 1506–1520.

    Article  Google Scholar 

  58. Chung J, Gulcehre C, Cho K and Bengio Y 2014 Empirical evaluation of gated recurrent neural networks on sequence modeling. In: NIPS 2014 Workshop on Deep Learning, pp. 1–9

  59. Jean S, Cho K, Memisevic R and Bengio Y 2015 On using very large target vocabulary for neural machine translation. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Atural Language Processing 1, pp. 1–10

  60. National Institute of Standards & Technology (NIST) 2004 Duc 2004 dataset

  61. Song S, Huang H and Ruan T 2019 Abstractive text summarization using lstm-cnn based deep learning. Multimed. Tools Appl. 78: 857–875.

    Article  Google Scholar 

  62. Jangra A, Mukherjee S, Jatowt A, Saha S and Hasanuzzaman M 2021 A Survey on Multi-modal Summarization ACM Computing Surveys. ACM New York, NY.

    Google Scholar 

  63. Congbo M, Zhang W E, Guo M, Wang H and Sheng Q Z 2022 Multi-document summarization via deep learning techniques: A survey. ACM Comput. Surv. ACM 55(5): 1–37.

    Google Scholar 

  64. Haoran L, Junnan Z, Jiajun Z, Xiaodong H and Chengqing Z 2020 Multimodal Sentence Summarization via Multimodal Selective Encoding In: Proceedings of the 28th International Conference on Computational Linguistics. International Committee on Computational Linguistics. Barcelona, Spain pp. 5655–5667

  65. Li H, Zhu J, Zhang J, He X and Zong C 2020 Multimodal sentence summarization via multimodal selective encoding. In Proceedings of the 28th International Conference on Computational Linguistics, pp. 5655–5667

  66. Khullar A and Arora U 2020 MAST: Multimodal abstractive summarization with trimodal hierarchical attention. arXiv preprint arXiv:2010.08021

  67. Atri Y K, Pramanick S, Goyal V and Chakraborty T 2021 See, hear, read: Leveraging multimodality with guided attention for abstractive text summarization. Knowl. Syst. 227: 107152.

    Article  Google Scholar 

  68. Li H, Ke Q, Gong M and Drummond T 2023 Progressive Video Summarization via Multimodal Self-supervised Learning. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 5584-5593

  69. Li L, Zhou K, Xue G R, Zha H and Yu Y 2011 Video summarization via transferrable structured learning. In: Proceedings of the 20th International Conference on World Wide Web pp. 287–296

  70. Yang Z and Okazaki N 2020 Image caption generation for news articles. In: Proceedings of the 28th International Conference on Computational Linguistics, pp. 1941–1951

  71. Liu M, Li L, Hu H, Guan W and Tian J 2020 Image caption generation with dual attention mechanism. Inform. Process. Manag. 57(2): 102178.

    Article  Google Scholar 

  72. Steinberger J and Jezek K 2009 Evaluation measures for text summarization. Comput. Inform. 28(2): 251.

    Google Scholar 

  73. Papineni K, Roukos S, Ward T and Zhu W J 2002 Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pp. 311–318

  74. Hewitt S 2017 Textual entailment with tensorflow

  75. Dagan I, Roth D, Sammons M and Zanzotto F 2013 Recognizing Textual Entailment. Morgan & Claypool Publishers, San Rafael.

    Book  Google Scholar 

  76. Androutsopoulos I and Malakasiotis P 2010 A survey of paraphrasing and textual entailment methods. J. Artif. Intell. Res. 38: 135–187.

    Article  MATH  Google Scholar 

  77. Wang S and Jiang J 2016 Learning natural language inference with LSTM. In: The 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego California, USA The Association for Computational Linguistics, pp. 1442–1451

  78. Pennington J, Socher R and Manning C D 2014 Glove: Global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP Doha, Qatar, pp. 1532–1543

  79. Li H, Zhu J, Liu T, Zhang J and Zong C 2018 Multi-modal sentence summarization with modality attention and image filtering. In: Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, (IJCAI), pp. 4152–4158

  80. Mikolov T, Grave E, Bojanowski P, Puhrsch C and Joulin A 2018 Advances in pre-training distributed word representations. In: Proceedings of the International Conference on Language Resources and Evaluation (LREC 2018)

  81. Santos I, Nedjah N and de Macedo Mourelle L 2017 Sentiment analysis using convolutional neural network with fasttext embeddings. In: 2017 IEEE Latin American Conference on Computational Intelligence (LA-CCI) pp. 1–5

  82. Athiwaratkun B, Wilson A G and Anandkumar A 2018 Probabilistic fasttext for multi-sense word embeddings. arXiv preprint arXiv:1806.02901

Download references

Acknowledgements

We would like to thank the Department of Computer Science and Engineering and the Center for Natural Language Processing (CNLP) at the National Institute of Technology Silchar for providing the requisite support and infrastructure to execute this work. The work presented here falls under the Research Project Grant No. IFC/4130/DST-CNRS/2018-19/IT25 (DST-CNRS targeted program).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Partha Pakray.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Khilji, A.F.U.R., Sinha, U., Singh, P. et al. Multimodal text summarization with evaluation approaches. Sādhanā 48, 226 (2023). https://doi.org/10.1007/s12046-023-02284-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s12046-023-02284-z

Keywords

Navigation