From Sights to Insights: Towards Summarization of Multimodal Clinical Documents

Akash Ghosh, Mohit Tomar, Abhisek Tiwari, Sriparna Saha, Jatin Salve, Setu Sinha

Abstract

The advancement of Artificial Intelligence is pivotal in reshaping healthcare, enhancing diagnostic precision, and facilitating personalized treatment strategies. One major challenge for healthcare professionals is quickly navigating through long clinical documents to provide timely and effective solutions. Doctors often struggle to draw quick conclusions from these extensive documents. To address this issue and save time for healthcare professionals, an effective summarization model is essential. Most current models assume the data is only text-based. However, patients often include images of their medical conditions in clinical documents. To effectively summarize these multimodal documents, we introduce EDI-Summ, an innovative Image-Guided Encoder-Decoder Model. This model uses modality-aware contextual attention on the encoder and an image cross-attention mechanism on the decoder, enhancing the BART base model to create detailed visual-guided summaries. We have tested our model extensively on three multimodal clinical benchmarks involving multimodal question and dialogue summarization tasks. Our analysis demonstrates that EDI-Summ outperforms state-of-the-art large language and vision-aware models in these summarization tasks. Disclaimer: The work includes vivid medical illustrations, depicting the essential aspects of the subject matter.

Anthology ID:: 2024.acl-long.708
Volume:: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: August
Year:: 2024
Address:: Bangkok, Thailand
Editors:: Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 13117–13129
Language:
URL:: https://aclanthology.org/2024.acl-long.708/
DOI:: 10.18653/v1/2024.acl-long.708
Bibkey:
Cite (ACL):: Akash Ghosh, Mohit Tomar, Abhisek Tiwari, Sriparna Saha, Jatin Salve, and Setu Sinha. 2024. From Sights to Insights: Towards Summarization of Multimodal Clinical Documents. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 13117–13129, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):: From Sights to Insights: Towards Summarization of Multimodal Clinical Documents (Ghosh et al., ACL 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.acl-long.708.pdf

PDF Cite Search Fix data