Abstract
Personalized medicine promises to revolutionize healthcare in the coming years. However significant challenges remain, namely in regard to integrating the vast amount of biomedical knowledge generated in the last few years. Here we describe an approach that uses Knowledge Graph Embedding (KGE) methods on a biomedical Knowledge Graph as a path to reasoning over the wealth of information stored in publicly accessible databases. We use curated databases such as Ensembl, DisGeNET and Gene Ontology as data sources to build a Knowledge Graph containing relationships between genes, diseases and other biological entities and explore the potential of KGE methods to derive medically relevant insights from this KG. To showcase the method’s usefulness we describe two use cases: a) prediction of gene-disease associations and b) clustering of disease embeddings. We show that the top gene-disease associations predicted by this approach can be confirmed in external databases or have already been identified in the literature. An analysis of clusters of diseases, with a focus on Autism Spectrum Disorder (ASD), affords novel insights into the biology of this paradigmatic complex disorder and the overlap of its genetic background with other diseases.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Diagnostic and Statistical Manual of Mental Disorders: Dsm-5. Amer Psychiatric Pub Incorporated (2013), google-Books-ID: EIbMlwEACAAJ
Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: a next-generation Hyperparameter Optimization Framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2019, pp. 2623–2631. Association for Computing Machinery, New York (July 2019). https://doi.org/10.1145/3292500.3330701
Asif, M., Martiniano, H.F.M.C.M., Vicente, A.M., Couto, F.M.: Identifying disease genes using machine learning and gene functional similarities, assessed through Gene Ontology. PLoS One 13(12), 1–15 (2018). https://doi.org/10.1371/journal.pone.0208626
Asif, M., et al.: Identification of biological mechanisms underlying a multidimensional ASD phenotype using machine learning. bioRxiv p. 470757 (2019)
Aurilio, G., et al.: Androgen receptor signaling pathway in prostate cancer: from genetics to clinical applications. Cells 9(12) (2020). https://doi.org/10.3390/cells9122653
Autism Spectrum Disorders Working Group of The Psychiatric Genomics Consortium: Meta-analysis of GWAS of over 16,000 individuals with autism spectrum disorder highlights a novel locus at 10q24.32 and a significant overlap with schizophrenia. Mol. Autism 8, 21 (2017). https://doi.org/10.1186/s13229-017-0137-9
Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., Yakhnenko, O.: Translating embeddings for modeling multi-relational data. In: Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 26, pp. 2787–2795. Curran Associates, Inc. (2013). http://papers.nips.cc/paper/5071-translating-embeddings-for-modeling-multi-relational-data.pdf
Boyle, E.A., Li, Y.I., Pritchard, J.K.: An expanded view of complex traits: from polygenic to omnigenic. Cell 169(7), 1177–1186 (2017)
Campello, R.J.G.B., Moulavi, D., Sander, J.: Density-based clustering based on hierarchical density estimates. In: Pei, J., Tseng, V.S., Cao, L., Motoda, H., Xu, G. (eds.) PAKDD 2013. LNCS (LNAI), vol. 7819, pp. 160–172. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37456-2_14
Fleming, L., et al.: Genotype-phenotype correlation of congenital anomalies in multiple congenital anomalies hypotonia seizures syndrome (MCAHS1)/PIGN-related epilepsy. Am. J. Med. Genet.. Part A 170A(1), 77–86 (2016). https://doi.org/10.1002/ajmg.a.37369
Goetz, L.H., Schork, N.J.: Personalized medicine: motivation, challenges, and progress. Fertil. Steril. 109(6), 952–963 (2018). https://doi.org/10.1016/j.fertnstert.2018.05.006
Martiniano, H.F.M.C., Asif, M., Vicente, A.M., Correia, L.: Network propagation-based semi-supervised identification of genes associated with autism spectrum disorder. In: Raposo, M., Ribeiro, P., Sério, S., Staiano, A., Ciaramella, A. (eds.) CIBB 2018. LNCS, vol. 11925, pp. 239–248. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-34585-3_21
Maydan, G., et al.: Multiple congenital anomalies-hypotonia-seizures syndrome is caused by a mutation in PIGN. J. Med. Genet. 48(6), 383–389 (2011). https://doi.org/10.1136/jmg.2010.087114
McInnes, L., Healy, J., Melville, J.: UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. arXiv:1802.03426 [cs, stat] (December 2018), http://arxiv.org/abs/1802.03426, arXiv: 1802.03426
Mohamed, S.K., Nounu, A., Nováček, V.: Biological applications of knowledge graph embedding models. Briefings Bioinform. 22(2), 1679–1693 (2021)
Moulavi, D., Jaskowiak, P.A., Campello, R.J.G.B., Zimek, A., Sander, J.: Density-Based Clustering Validation. In: Proceedings of the 2014 SIAM International Conference on Data Mining, pp. 839–847. Proceedings, Society for Industrial and Applied Mathematics (April 2014). https://doi.org/10.1137/1.9781611973440.96, https://epubs.siam.org/doi/10.1137/1.9781611973440.96
Nicholson, D.N., Greene, C.S.: Constructing knowledge graphs and their biomedical applications. Comput. Struct. Biotech. J. 18, 1414–1428 (2020). https://doi.org/10.1016/j.csbj.2020.05.017
Trouillon, T., Welbl, J., Riedel, S., Gaussier, E., Bouchard, G.: Complex embeddings for simple link prediction. In: Balcan, M.F., Weinberger, K.Q. (eds.) Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 2071–2080. PMLR, New York (June 2016). http://proceedings.mlr.press/v48/trouillon16.html
Vicente, A.M., Ballensiefen, W., Jönsson, J.I.: How personalised medicine will transform healthcare by 2030: the ICPerMed vision. J. Transl. Med. 18(1), 180 (2020)
Wang, Q., Mao, Z., Wang, B., Guo, L.: Knowledge graph embedding: a survey of approaches and applications. IEEE Trans. Knowl. Data Eng. 29(12), 2724–2743 (2017). https://doi.org/10.1109/TKDE.2017.2754499
Yang, B., Yih, W.T., He, X., Gao, J., Deng, L.: Embedding Entities and Relations for Learning and Inference in Knowledge Bases. arXiv:1412.6575 [cs] (August 2015), http://arxiv.org/abs/1412.6575, arXiv: 1412.6575
Zheng, D., et al.: DGL-KE: Training Knowledge Graph Embeddings at Scale. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2020, pp. 739–748. Association for Computing Machinery, New York (2020)
Acknowledgements
The authors would like to acknowledge the support by the UID/MULTI/04046/2019 centre grant from FCT, Portugal (to BioISI), and the MedPerSyst project (POCI-01-0145-FEDER-016428-PAC) “Redes sinapticas e abordagens compreensivas de medicina personalizada em doenças neurocomportamentais ao longo da vida” (SAICTPAC/0010/2015). This work used the European Grid Infrastructure (EGI) with the support of NCG-INGRID-PT/INCD (Portugal). This work was produced with the support of INCD funded by FCT and FEDER under the project 01/SAICT/2016 n\(^{\circ }\) 022153.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Vilela, J. et al. (2021). Biomedical Knowledge Graph Embeddings for Personalized Medicine. In: Marreiros, G., Melo, F.S., Lau, N., Lopes Cardoso, H., Reis, L.P. (eds) Progress in Artificial Intelligence. EPIA 2021. Lecture Notes in Computer Science(), vol 12981. Springer, Cham. https://doi.org/10.1007/978-3-030-86230-5_46
Download citation
DOI: https://doi.org/10.1007/978-3-030-86230-5_46
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86229-9
Online ISBN: 978-3-030-86230-5
eBook Packages: Computer ScienceComputer Science (R0)