[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

MultiGBS: : A multi-layer graph approach to biomedical summarization

Published: 01 April 2021 Publication History

Graphical abstract

Display Omitted

Highlights

A multi-layer graph-based biomedical text summarizer is presented as MultiGBS.
MultiGBS covers semantic, word and co-reference relationships.
A MultiRank algorithm works on three different layers to rank all the sentences.
Extensive evaluation by ROUGE and BertScore shows increased F-measure values.

Abstract

Automatic text summarization methods generate a shorter version of the input text to assist the reader in gaining a quick yet informative gist. Existing text summarization methods generally focus on a single aspect of text when selecting sentences, causing the potential loss of essential information. In this study, we propose a domain-specific method that models a document as a multi-layer graph to enable multiple features of the text to be processed at the same time. The features we used in this paper are word similarity, semantic similarity, and co-reference similarity, which are modelled as three different layers. The unsupervised method selects sentences from the multi-layer graph based on the MultiRank algorithm and the number of concepts. The proposed MultiGBS algorithm employs UMLS and extracts the concepts and relationships using different tools such as SemRep, MetaMap, and OGER. Extensive evaluation by ROUGE and BERTScore shows increased F-measure values.

References

[1]
Home - - NCBI, (n.d.). https://www.ncbi.nlm.nih.gov/pubmed/ (accessed December 15, 2017).
[2]
M.S. Simpson, D. Demner-Fushman, Biomedical Text Mining: A Survey of Recent Progress, in: Min. Text Data, Springer US, Boston, MA, 2012, pp. 465–517,.
[3]
E. Lloret, M. Palomar, Text summarisation in progress: A literature review, Artif. Intell. Rev. 37 (2012) 1–41,.
[4]
M. Gambhir, V. Gupta, M. Gambhir, V. Gupta, Recent automatic text summarization techniques: a survey, Artif. Intell. Rev. 47 (2017) 1–66,.
[5]
Y. Liu, T. Safavi, A. Dighe, D. Koutra, Graph Summarization Methods and Applica-tions: A Survey, ACM Comput. Surv. 51 (2018),.
[6]
L. Plaza, A. Díaz, P. Gervás, A semantic graph-based approach to biomedical summarisation, Artif. Intell. Med. 53 (2011) 1–14,.
[7]
Unified Medical Language System (UMLS), (n.d.). https://www.nlm.nih.gov/research/umls/ (accessed April 25, 2019).
[8]
O. Bodenreider, The Unified Medical Language System (UMLS): integrating biomedical terminology, Nucleic Acids Res. 32 (2004) 267D–270D,.
[9]
MetaMap - A Tool For Recognizing UMLS Concepts in Text, (n.d.). https://metamap.nlm.nih.gov/ (accessed April 25, 2019).
[10]
M. Basaldella, L. Furrer, C. Tasso, F. Rinaldi, Entity recognition in the biomedical domain using a hybrid approach, J. Biomed. Semantics 8 (2017) 51,.
[11]
C. Rahmede, J. Iacovacci, A. Arenas, G. Bianconi, Centralities of nodes and influences of layers in large multiplex networks, J. Complex Networks 6 (2018) 733–752,.
[12]
C.Y. Lin, Rouge: A package for automatic evaluation of summaries, in: Proc. Work. Text Summ. Branches out (WAS 2004). (2004) 25–26.
[13]
K. Ganesan, ROUGE 2.0: Updated and Improved Measures for Evaluation of Summarization Tasks, (2018). http://arxiv.org/abs/1803.01937 (accessed September 12, 2020).
[14]
T. Zhang, V. Kishore, F. Wu, K.Q. Weinberger, Y. Artzi, BERTScore: Evaluating Text Generation with BERT, (2019). http://arxiv.org/abs/1904.09675 (accessed December 23, 2019).
[15]
K. Spärck Jones, Automatic summarising: The state of the art, Inf. Process. Manag. 43 (2007) 1449–1481,.
[16]
R. Mishra, J. Bian, M. Fiszman, C.R. Weir, S. Jonnalagadda, J. Mostafa, G. Del Fiol, Text summarization in the biomedical domain: A systematic review of recent research, J. Biomed. Inform. 52 (2014) 457–467,.
[17]
V. McCargar, Statistical Approaches to Automatic Text Summarization, Bull. Am. Soc. Inf. Sci. Technol. 30 (2005) 21–25,.
[18]
Y. Ko, J. Seo, An effective sentence-extraction technique using contextual information and statistical approaches for text summarization, Pattern Recogn. Lett. 29 (2008) 1366–1371,.
[19]
H.P. Edmundson, New methods in automatic extracting, J. Assoc. Comput. Mach. 16 (1969) 264–285,.
[20]
S. Harabagiu, F. Lacatusu, Topic themes for multi-document summarization, in: ACM Press, New York, New York, USA, 2005. https://doi.org/10.1145/1076034.1076071.
[21]
J. Kupiec, J. Pedersen, F. Chen, Trainable document summarizer, in: SIGIR Forum (ACM Spec. Interes. Gr. Inf. Retrieval), ACM, 1995, pp. 68–73,.
[22]
J.M. Conroy, D.P. O’leary, Text summarization via hidden Markov models, in: SIGIR Forum (ACM Spec. Interes. Gr. Inf. Retrieval), 2001, pp. 406–407,.
[23]
C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, G. Hullender, Learning to rank using gradient descent, in: ICML 2005 - Proc. 22nd Int. Conf. Mach. Learn., 2005, pp. 89–96,.
[24]
J.D. Schlesinger, J.D. Schlesinger, M.E. Okurowski, J.M. Conroy, D.P. O’Leary, A. Taylor, J. Hobbs, H.T. Wilson, Understanding Machine Performance in the Context of Human Performance for Multi-Document Summarization, (2002). https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.5.23 (accessed December 10, 2019).
[25]
R. Nallapati, B. Zhou, C.N. dos santos, C. Gulcehre, B. Xiang, Abstractive Text Summarization Using Sequence-to-Sequence RNNs and Beyond, (2016). http://arxiv.org/abs/1602.06023 (accessed May 9, 2018).
[26]
J. Cheng, M. Lapata, Neural summarization by extracting sentences and words, in: 54th Annu. Meet. Assoc. Comput. Linguist. ACL 2016 - Long Pap., Association for Computational Linguistics (ACL), 2016, pp. 484–494,.
[27]
Z. Cao, F. Wei, S. Li, W. Li, M. Zhou, H. Wang, Learning summary prior representation for extractive summarization, in: ACL-IJCNLP 2015 - 53rd Annu. Meet. Assoc. Comput. Linguist. 7th Int. Jt. Conf. Nat. Lang. Process. Asian Fed. Nat. Lang. Process. Proc. Conf., Association for Computational Linguistics (ACL), 2015, pp. 829–833,.
[28]
S. Narayan, J. Maynez, J. Adamek, D. Pighin, B. Bratanič, R. Mcdonald, Stepwise Extractive Summarization and Planning with Structured Transformers, n.d. https://github (accessed November 27, 2020).
[29]
W.C. Mann, S.A. Thompson, Rhetorical Structure Theory: Toward a functional theory of text organization, Text 8 (1988) 243–281,.
[30]
D. Marcu, D. Marcu, Discourse Trees Are Good Indicators of Importance in Text, Adv. Autom. TEXT Summ. (1999) 123--136. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.46.8292 (accessed December 10, 2019).
[31]
A.U. Khan, S. Khan, W. Mahmood, MRST : A New Technique for Information, Second World Enformatika Conf. WEC’05. (2007) 639–642. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.105.5232&rep=rep1&type=pdf (accessed April 30, 2020).
[32]
G. Erkan, D.R. Radev, LexRank: Graph-based lexical centrality as salience in text summarization, J. Artif. Intell. Res. 22 (2004) 457–479,.
[33]
K. Sarkar, K. Saraf, A. Ghosh, Improving graph based multidocument text summarization using an enhanced sentence similarity measure, in: 2015 IEEE 2nd Int. Conf. Recent Trends Inf. Syst. ReTIS 2015 - Proc., 2015, pp. 359–365,.
[34]
R. Mihalcea, P. Tarau, TextRank: Bringing order into texts, Proc. EMNLP 85 (2004) 404–411,.
[35]
E. Baralis, P. Torino, L. Cagliero, A. Fiori, Multi-document summarization exploiting frequent itemsets Categories and Subject Descriptors, in: Proc. 27th Annu. ACM Symp. Appl. Comput., 2012, pp. 782–786.
[36]
E. Baralis, L. Cagliero, A. Fiori, P. Garza, MWI-Sum: A Multilingual Summarizer Based on Frequent Weighted Itemsets, ACM Trans. Inf. Syst. 34 (2015) 1–35,.
[37]
J.-P. Qiang, P. Chen, W. Ding, F. Xie, X. Wu, Knowle dge-Base d Systems Multi-document summarization using closed patterns, Knowledge-Based Syst. 99 (2016) 28–38,.
[38]
G.-W. Kim, D.-H. Lee, Personalised health document summarisation exploiting Unified Medical Language System and topic-based clustering for mobile healthcare, 16555151772298 J. Inf. Sci. (2017),.
[39]
L. Plaza, Comparing different knowledge sources for the automatic summarization of biomedical literature, J. Biomed. Inform. 52 (2014) 319–328,.
[40]
M. Moradi, N. Ghadiri, Quantifying the informativeness for biomedical literature summarization: An itemset mining method, Comput. Methods Programs Biomed. 146 (2017) 77–89,.
[41]
M. Nasr Azadani, N. Ghadiri, E. Davoodijam, Graph-based biomedical text summarization: An itemset mining and sentence clustering approach, J. Biomed. Inform. 84 (2018) 42–58,.
[42]
L.H. Reeve, H. Han, A.D. Brooks, BioChain: lexical chaining methods for biomedical text summarization, Sac. (2006) 180–184.
[43]
H.D. Menéndez, L. Plaza, D. Camacho, A genetic graph-based clustering approach to biomedical summarization, in: Proc. 3rd Int. Conf. Web Intell. Min. Semant. - WIMS ’13, 2013, p. 1,.
[44]
L. Tang, X. Wang, H. Liu, Community detection via heterogeneous interaction analysis, Data Min. Knowl. Discov. 25 (2012) 1–33,.
[45]
L. Furrer, A. Jancso, N. Colic, F. Rinaldi, OGER++: hybrid multi-type entity recognition, J. Cheminform. 11 (2019) 7,.
[46]
G. Kondrak, N-Gram Similarity and Distance, in: Springer, Berlin, Heidelberg, 2005: pp. 115–126. https://doi.org/10.1007/11575832_13.
[47]
The Stanford Natural Language Processing Group, (n.d.). https://nlp.stanford.edu/projects/coref.shtml (accessed November 28, 2020).
[48]
T.C. Rindflesch, M. Fiszman, The interaction of domain knowledge and linguistic structure in natural language processing: Interpreting hypernymic propositions in biomedical text, J. Biomed. Inform. 36 (2003) 462–477,.
[49]
H. Kilicoglu, G. Rosemblat, M. Fiszman, T.C. Rindflesch, Sortal anaphora resolution to enhance relation extraction from biomedical literature, BMC Bioinf. 17 (2016) 163,.
[50]
L. Plaza, J. Carrillo-de-Albornoz, Evaluating the use of different positional strategies for sentence selection in biomedical literature summarization, BMC Bioinf. 14 (2013) 71,.
[51]
SciPy.org — SciPy.org, (n.d.). https://www.scipy.org/ (accessed September 12, 2020).
[52]
J. Devlin, M.-W. Chang, K. Lee, K.T. Google, A.I. Language, BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, n.d. https://github.com/tensorflow/tensor2tensor (accessed March 28, 2020).
[53]
Document Understanding Conferences, (n.d.). https://duc.nist.gov/ (accessed December 1, 2019).
[54]
Text Analysis Conference (TAC), (n.d.). https://tac.nist.gov/ (accessed December 1, 2019).
[55]
G. Balikas, A. Krithara, I. Partalas, G. Paliouras, BioASQ: A Challenge on Large-Scale Biomedical Semantic Indexing and Question Answering, in: Springer, Cham, 2015: pp. 26–39. https://doi.org/10.1007/978-3-319-24471-6_3.
[56]
P. Vision, G. Paliouras, A. Krithara, BioASQ, (n.d.).
[57]
L. Reeve, H. Han, A.D. Brooks, BioChain, Proc. 2006 ACM Symp. Appl. Comput. - SAC ’06. (2006) 180. https://doi.org/10.1145/1141277.1141317.
[58]
M. Moradi, CIBS: A biomedical text summarizer using topic-based sentence clustering, J. Biomed. Inform. 88 (2018) 53–61,.
[59]
E. Lloret, L. Plaza, A. Aker, The challenging task of summary evaluation: an overview, Lang. Resour. Eval. (2017) 1–48,.
[60]
C.-Y. Lin, Looking for a Few Good Metrics: Automatic Summarization Evaluation-How Many Samples Are Enough?, n.d. http://duc.nist.gov (accessed September 5, 2020).
[61]
D. Miller, Leveraging BERT for Extractive Text Summarization on Lectures, (2019). http://arxiv.org/abs/1906.04165 (accessed January 30, 2020).
[62]
J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, (2018). http://arxiv.org/abs/1810.04805 (accessed December 23, 2019).
[63]
D. Kim, J. Lee, C.H. So, H. Jeon, M. Jeong, Y. Choi, W. Yoon, M. Sung, J. Kang, A Neural Named Entity Recognition and Multi-Type Normalization Tool for Biomedical Text Mining, IEEE Access 7 (2019) 73729–73740,.
[64]
M. Bada, M. Eckert, D. Evans, K. Garcia, K. Shipley, D. Sitnikov, W.A. Baumgartner, K.B. Cohen, K. Verspoor, J.A. Blake, L.E. Hunter, Concept annotation in the CRAFT corpus, 2012. https://doi.org/10.1186/1471-2105-13-161.

Cited By

View all
  • (2023)Frequent item-set mining and clustering based ranked biomedical text summarizationThe Journal of Supercomputing10.1007/s11227-022-04578-179:1(139-159)Online publication date: 1-Jan-2023

Index Terms

  1. MultiGBS: A multi-layer graph approach to biomedical summarization
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Please enable JavaScript to view thecomments powered by Disqus.

          Information & Contributors

          Information

          Published In

          cover image Journal of Biomedical Informatics
          Journal of Biomedical Informatics  Volume 116, Issue C
          Apr 2021
          224 pages

          Publisher

          Elsevier Science

          San Diego, CA, United States

          Publication History

          Published: 01 April 2021

          Author Tags

          1. Automatic text summarization
          2. Text mining
          3. Multi-graph text modeling
          4. Concept-based summarization
          5. Domain-specific summary

          Qualifiers

          • Research-article

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)0
          • Downloads (Last 6 weeks)0
          Reflects downloads up to 13 Dec 2024

          Other Metrics

          Citations

          Cited By

          View all
          • (2023)Frequent item-set mining and clustering based ranked biomedical text summarizationThe Journal of Supercomputing10.1007/s11227-022-04578-179:1(139-159)Online publication date: 1-Jan-2023

          View Options

          View options

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media