[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3448734.3450772acmotherconferencesArticle/Chapter ViewAbstractPublication PagesdtmisConference Proceedingsconference-collections
short-paper

Extraction of gene-disease association from literature using BioBERT

Published: 17 May 2021 Publication History

Abstract

With the rapid growth of biomedical literatures, there are a large amount of bio-text data to be exploited. A wealth of knowledge concerning diseases associated with genes is present in those bio-text which is important for studies like drug-target discovery, even provide personalized medical treatment for different patients' genome conditions. BioBERT as a pre-trained BERT model with large-scale biomedical corpora, was proved has a great performance over other pre-trained language models on biomedical datasets. To make the use of a large amount of bio-text, in this paper we provide a good practice that use BioBERT to extract the gene-disease associations from bio-text, and it achieved an overall F-score of 79.98%. Hoping to inspire researchers in the biomedical field of natural language processing and be able to make applications in related fields to solve the problems encountered in the research.

References

[1]
Esmaeil Nourani, Vahideh Reshadat. Association extraction from biomedical literature based on representation and transfer learning, Journal of Theoretical Biology, Volume 488, 2020, 110112.
[2]
Bhasuran B, Natarajan J. Automatic extraction of gene-disease associations from literature using joint ensemble learning. PLoS One. 2018 Jul 26;13(7):e0200699.
[3]
Le, DH., Nguyen-Ngoc, D. Drug Repositioning by Integrating Known Disease-Gene and Drug-Target Associations in a Semi-supervised Learning Model. Acta Biotheor 66, 315–331 (2018).
[4]
Friedmann, Theodore, and Richard Roblin. Gene therapy for human genetic disease?. Science 175.4025 (1972): 949-955.
[5]
Chen X, Yan C C, Zhang X, WBSMDA: within and between score for MiRNA-disease association prediction. Scientific reports, 2016, 6: 21106.
[6]
You Z H, Huang Z A, Zhu Z, PBMDA: A novel and effective path-based computational model for miRNA-disease association prediction. PLoS computational biology, 2017, 13(3): e1005455.
[7]
Devlin, Jacob, Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
[8]
Vaswani, Ashish, Attention is all you need. Advances in neural information processing systems. 2017.
[9]
Vashishth S, Upadhyay S, Tomar G S, Attention interpretability across nlp tasks. arXiv preprint arXiv:1909.11218, 2019.
[10]
Ji, Zongcheng, Qiang Wei, Hua Xu. Bert-based ranking for biomedical entity normalization. AMIA Summits on Translational Science Proceedings 2020 (2020): 269.
[11]
Chen, Lihu, Gaël Varoquaux, Fabian M. Suchanek. "A Lightweight Neural Model for Biomedical Entity Linking." arXiv e-prints (2020): arXiv-2012.
[12]
Beltagy I, Lo K, Cohan A. SciBERT: A pretrained language model for scientific text. arXiv preprint arXiv:1903.10676, 2019.
[13]
Lee, Jinhyuk, BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36.4 (2020): 1234-1240.
[14]
Mondal I. BERTChem-DDI: Improved Drug-Drug Interaction Prediction from text using Chemical Structure Information. arXiv preprint arXiv:2012.11599, 2020.
[15]
Zhu Y, Li L, Lu H, Extracting Drug-Drug Interactions from Texts with BioBERT and Multiple Entity-aware Attentions. Journal of Biomedical Informatics, 2020: 103451.
[16]
Sänger M, Leser U. Large-scale entity representation learning for biomedical relationship extraction. Bioinformatics, 2020.
[17]
Canese K, Weis S.: the bibliographic database, The NCBI Handbook. 2nd edition. National Center for Biotechnology Information (US), 2013
[18]
Cock P J A, Antao T, Chang J T, Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics, 2009, 25(11): 1422-1423.
[19]
Schuler G D, Epstein J A, Ohkawa H, Entrez: Molecular biology database and retrieval system. Methods in enzymology, 1996, 266: 141-162.
[20]
van Mulligen EM, Fourrier-Reglat A, Gurwitz D, The EU-ADR corpus: annotated drugs, diseases, targets, and their relationships. J Biomed Inform. 2012;45(5):879-884.
[21]
Kim D, Lee J, So C H, A neural named entity recognition and multi-type normalization tool for biomedical text mining. IEEE Access, 2019, 7: 73729-73740.
[22]
Köksal A, Dönmez H, Özçelik R, Vapur: A Search Engine to Find Related Protein–Compound Pairs in COVID-19 Literature. arXiv preprint arXiv:2009.02526, 2020

Cited By

View all
  • (2023)Automatic extraction of ranked SNP-phenotype associations from text using a BERT-LSTM-based methodBMC Bioinformatics10.1186/s12859-023-05236-w24:1Online publication date: 12-Apr-2023
  • (2022)How can natural language processing help model informed drug development?: a reviewJAMIA Open10.1093/jamiaopen/ooac0435:2Online publication date: 11-Jun-2022

Index Terms

  1. Extraction of gene-disease association from literature using BioBERT
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Please enable JavaScript to view thecomments powered by Disqus.

          Information & Contributors

          Information

          Published In

          cover image ACM Other conferences
          CONF-CDS 2021: The 2nd International Conference on Computing and Data Science
          January 2021
          1142 pages
          ISBN:9781450389570
          DOI:10.1145/3448734
          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          Published: 17 May 2021

          Permissions

          Request permissions for this article.

          Check for updates

          Author Tags

          1. BioBERT
          2. Biomedical
          3. Extraction
          4. Gene-Disease Association
          5. Literature

          Qualifiers

          • Short-paper
          • Research
          • Refereed limited

          Conference

          CONF-CDS 2021

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)36
          • Downloads (Last 6 weeks)2
          Reflects downloads up to 04 Feb 2025

          Other Metrics

          Citations

          Cited By

          View all
          • (2023)Automatic extraction of ranked SNP-phenotype associations from text using a BERT-LSTM-based methodBMC Bioinformatics10.1186/s12859-023-05236-w24:1Online publication date: 12-Apr-2023
          • (2022)How can natural language processing help model informed drug development?: a reviewJAMIA Open10.1093/jamiaopen/ooac0435:2Online publication date: 11-Jun-2022

          View Options

          Login options

          View options

          PDF

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format.

          HTML Format

          Figures

          Tables

          Media

          Share

          Share

          Share this Publication link

          Share on social media