More Web Proxy on the site http://driver.im/

short-paper

Extraction of gene-disease association from literature using BioBERT

Authors:

Mingze BaiAuthors Info & Claims

CONF-CDS 2021: The 2nd International Conference on Computing and Data Science

Article No.: 42, Pages 1 - 4

https://doi.org/10.1145/3448734.3450772

Published: 17 May 2021 Publication History

Abstract

With the rapid growth of biomedical literatures, there are a large amount of bio-text data to be exploited. A wealth of knowledge concerning diseases associated with genes is present in those bio-text which is important for studies like drug-target discovery, even provide personalized medical treatment for different patients' genome conditions. BioBERT as a pre-trained BERT model with large-scale biomedical corpora, was proved has a great performance over other pre-trained language models on biomedical datasets. To make the use of a large amount of bio-text, in this paper we provide a good practice that use BioBERT to extract the gene-disease associations from bio-text, and it achieved an overall F-score of 79.98%. Hoping to inspire researchers in the biomedical field of natural language processing and be able to make applications in related fields to solve the problems encountered in the research.

References

[1]

Esmaeil Nourani, Vahideh Reshadat. Association extraction from biomedical literature based on representation and transfer learning, Journal of Theoretical Biology, Volume 488, 2020, 110112.

[2]

Bhasuran B, Natarajan J. Automatic extraction of gene-disease associations from literature using joint ensemble learning. PLoS One. 2018 Jul 26;13(7):e0200699.

[3]

Le, DH., Nguyen-Ngoc, D. Drug Repositioning by Integrating Known Disease-Gene and Drug-Target Associations in a Semi-supervised Learning Model. Acta Biotheor 66, 315–331 (2018).

[4]

Friedmann, Theodore, and Richard Roblin. Gene therapy for human genetic disease?. Science 175.4025 (1972): 949-955.

[5]

Chen X, Yan C C, Zhang X, WBSMDA: within and between score for MiRNA-disease association prediction. Scientific reports, 2016, 6: 21106.

[6]

You Z H, Huang Z A, Zhu Z, PBMDA: A novel and effective path-based computational model for miRNA-disease association prediction. PLoS computational biology, 2017, 13(3): e1005455.

[7]

Devlin, Jacob, Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).

[8]

Vaswani, Ashish, Attention is all you need. Advances in neural information processing systems. 2017.

Digital Library

[9]

Vashishth S, Upadhyay S, Tomar G S, Attention interpretability across nlp tasks. arXiv preprint arXiv:1909.11218, 2019.

[10]

Ji, Zongcheng, Qiang Wei, Hua Xu. Bert-based ranking for biomedical entity normalization. AMIA Summits on Translational Science Proceedings 2020 (2020): 269.

[11]

Chen, Lihu, Gaël Varoquaux, Fabian M. Suchanek. "A Lightweight Neural Model for Biomedical Entity Linking." arXiv e-prints (2020): arXiv-2012.

[12]

Beltagy I, Lo K, Cohan A. SciBERT: A pretrained language model for scientific text. arXiv preprint arXiv:1903.10676, 2019.

[13]

Lee, Jinhyuk, BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36.4 (2020): 1234-1240.

[14]

Mondal I. BERTChem-DDI: Improved Drug-Drug Interaction Prediction from text using Chemical Structure Information. arXiv preprint arXiv:2012.11599, 2020.

[15]

Zhu Y, Li L, Lu H, Extracting Drug-Drug Interactions from Texts with BioBERT and Multiple Entity-aware Attentions. Journal of Biomedical Informatics, 2020: 103451.

[16]

Sänger M, Leser U. Large-scale entity representation learning for biomedical relationship extraction. Bioinformatics, 2020.

[17]

Canese K, Weis S.: the bibliographic database, The NCBI Handbook. 2nd edition. National Center for Biotechnology Information (US), 2013

[18]

Cock P J A, Antao T, Chang J T, Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics, 2009, 25(11): 1422-1423.

Digital Library

[19]

Schuler G D, Epstein J A, Ohkawa H, Entrez: Molecular biology database and retrieval system. Methods in enzymology, 1996, 266: 141-162.

[20]

van Mulligen EM, Fourrier-Reglat A, Gurwitz D, The EU-ADR corpus: annotated drugs, diseases, targets, and their relationships. J Biomed Inform. 2012;45(5):879-884.

Digital Library

[21]

Kim D, Lee J, So C H, A neural named entity recognition and multi-type normalization tool for biomedical text mining. IEEE Access, 2019, 7: 73729-73740.

[22]

Köksal A, Dönmez H, Özçelik R, Vapur: A Search Engine to Find Related Protein–Compound Pairs in COVID-19 Literature. arXiv preprint arXiv:2009.02526, 2020

Cited By

Bokharaeian BDehghani MDiaz A(2023)Automatic extraction of ranked SNP-phenotype associations from text using a BERT-LSTM-based methodBMC Bioinformatics10.1186/s12859-023-05236-w24:1Online publication date: 12-Apr-2023
https://doi.org/10.1186/s12859-023-05236-w
Bhatnagar RSardar SBeheshti MPodichetty J(2022)How can natural language processing help model informed drug development?: a reviewJAMIA Open10.1093/jamiaopen/ooac0435:2Online publication date: 11-Jun-2022
https://doi.org/10.1093/jamiaopen/ooac043

Index Terms

Extraction of gene-disease association from literature using BioBERT

Index terms have been assigned to the content through auto-classification.

Recommendations

Applying BioBERT to Extract Germline Gene-Disease Associations for Building a Knowledge Graph from the Biomedical Literature
ICISDM '23: Proceedings of the 2023 7th International Conference on Information System and Data Mining

Published biomedical information has and continues to rapidly increase. The recent advancements in Natural Language Processing (NLP), have generated considerable interest in automating the extraction, normalization, and representation of biomedical ...
Gene-disease association with literature based enrichment

Graphical abstractDisplay Omitted Knowledge-based functional enrichment for gene prioritization of high throughput data.Automatic ontology generation from MEDLINE.Novel and fully automatic literature-based discovery.Literature ontologies perform better ...
Collaborative text-annotation resource for disease-centered relation extraction from biomedical text

Agglomerating results from studies of individual biological components has shown the potential to produce biomedical discovery and the promise of therapeutic development. Such knowledge integration could be tremendously facilitated by automated text ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

CONF-CDS 2021: The 2nd International Conference on Computing and Data Science

January 2021

1142 pages

ISBN:9781450389570

DOI:10.1145/3448734

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 May 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper
Research
Refereed limited

Conference

CONF-CDS 2021

CONF-CDS 2021: The 2nd International Conference on Computing and Data Science

January 28 - 30, 2021

CA, Stanford, USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
189
Total Downloads

Downloads (Last 12 months)36
Downloads (Last 6 weeks)2

Reflects downloads up to 04 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Bokharaeian BDehghani MDiaz A(2023)Automatic extraction of ranked SNP-phenotype associations from text using a BERT-LSTM-based methodBMC Bioinformatics10.1186/s12859-023-05236-w24:1Online publication date: 12-Apr-2023
https://doi.org/10.1186/s12859-023-05236-w
Bhatnagar RSardar SBeheshti MPodichetty J(2022)How can natural language processing help model informed drug development?: a reviewJAMIA Open10.1093/jamiaopen/ooac0435:2Online publication date: 11-Jun-2022
https://doi.org/10.1093/jamiaopen/ooac043

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten