Abstract
RNA-binding proteins (RBPs) are a class of proteins with RNA-binding domains involved in regulating various cellular processes, such as RNA processing, transport, splicing, translation, and stability. They play crucial roles in maintaining cell differentiation, development, apoptosis, and inflammation. Although some deep learning graph neural network algorithms achieve high accuracy in RBP prediction, most methods do not fully consider the characteristics of heterogeneous graphs, often resulting in information loss during the encoding and decoding processes. We aimed to develop a graph model that can aggregate heterogeneous information on miRNAs and proteins and efficiently predict RBPs. In this study, we introduce a method that combines GraphSAGE and graph attention network techniques for encoding and aggregating relevant graph information, and uses an SVM for decoding predictions, which we refer to as the GSASVM. We collected data, constructed RBP datasets for humans and mice, and compared our method against seven state-of-the-art methods on these datasets. Our approach demonstrated superiority across various evaluation metrics, achieving AUC and PRC values of 98.46% and 97.98%, respectively, on the human dataset and 97.38% and 97.43%, respectively, on the mouse dataset. Additionally, we conducted two specific prediction studies on human proteins, providing some RBP results through case analyses. These experiments validate the potential of this method as a novel research tool for RBP-related tasks. The model optimizes the aggregation of encoded representations and effectively utilizes complex graph structures related to RBPs for feature extraction and decoding. The experimental results also verify that our new framework can effectively predict RBP binding sites, potentially facilitating further downstream analysis in biomedical and biotechnology applications.
Similar content being viewed by others
Data availability
The data used in this work are available at https://github.com/ztcxzwsteam/GSASVM-RBPs.
References
Akın Ö, Arif G (2013) Genetic algorithm wrapped bayesian network feature selection applied to the differential diagnosis of erythemato-squamous diseases. Digit Signal Proc 23(1):230–237
Alipanahi B, Delong A, Weirauch MT (2015) Predicting the sequence specificities of DNA-and RNA-binding proteins by deep learning. Nat Biotech 33(8):831–838
Ben-Bassat I, Chor B, Orenstein Y (2018) A deep neural network approach for learning intrinsic protein-RNA binding preferences. Bioinformatics 34(17):638–646
Calin GA, Croce CM (2006) MicroRNA signatures in human cancers. Nat Rev Cancer 6(11):857–866
Chang Z, Zhu R, Liu J, Shang J, Dai L (2024) HGSMDA: miRNA-Disease Association Prediction Based on HyperGCN and Sorensen-Dice Loss. Non-coding RNA 10(1):9
Chen Y, Varani G (2013) Engineering RNA-binding proteins for biology. FEBS J 280(16):3734–3754
Chen JY, Zhu P (2024) Feature selection of dominance-based neighborhood rough set approach for processing hybrid ordered data. Int J Approx Reason 167:109134
Cooper PS, Lipshultz D, Matten WT, McGinnis SD, Pechous S, Romiti ML, Tao T, Valjavec-Gratian M, Sayers EW (2010) Education resources of the National Center for Biotechnology Information. Brief Bioinform 11(6):563–569
Dudekulay DB, Panda AC, Grammatikakis I, De S, Abdelmohsen K, Gorospe M (2016) CircInteractome: a web tool for exploring circular RNAs and their interacting proteins and microRNAs. RNA Biol 13(1):34–42
Farman M, Tabassum MF, Naik PA, Akram S (2020) Numerical treatment of a nonlinear dynamical Hepatitis-B model: an evolutionary approach. Eur Phys J Plus 135(12):941
Guil S, Caceres JF (2007) The multifunctional RNA-binding protein hnRNP A1 is required for processing of miR-18a. Nat Struct Mol Biol 14(7):591–596
Hall KB (2002) RNA-protein interactions. Curr Opin Struct Biol 12(3):283–288
Hamilton WL, Ying Z, Leskovec J (2017) Inductive representation learning on large graphs. Neural Inform Proc Sys 30:1
Han Y, Zhang S-W (2023) ncRPI-LGAT: prediction of ncRNA-protein interactions with line graph attention network framework. Comput Struct Biotechnol J 21:2286–2295
Harini K, Sekijima M, Gromiha MM (2024) PRA-Pred: Structure-based prediction of protein-RNA binding affinity. Int J Biol Macromol 259:129490
He L, He X, Lowe SW, Hannon GJ (2007) microRNAs join the p53 network - another piece in the tumour-suppression puzzle. Nat Rev Cancer 7(11):819–822
Hearst MA (1998) Support Vector Machines. IEEE Intell Syst Appl 13(4):18–28
Hecquet B (1997) The area under the curve. Bull Cancer 84(12):1146–1147
Hu H, Zhang L, Ai HX, Zhang H, Fan YT, Zhao Q, Liu HS (2018) HLPI-Ensemble: prediction of human lncRNA-protein interactions based on ensemble strategy. RNA Biol 15(6):797–806
Jia CZ, Bi Y, Chen JX, Leier A, Li FY, Song JN (2020) PASSION: an ensemble neural network approach for identifying the binding sites of RBPs on circRNAs. Bioinformatics 36(15):4276–4282
Jiao CN, Zhou F, Liu BM, Zheng CH, Liu JX, Gao YL (2024) Multi-kernel Graph attention Deep Autoencoder for MiRNA-Disease Association Prediction. IEEE J Biomed Health Inf 28(2):1110–1121
Jiayue C, Ping Z (2024) Feature selection of dominance-based neighborhood rough set approach for processing hybrid ordered data. Int J Approx Reason 167:109134
Joodaki M, Dowlatshahi MB, Joodaki NZ (2021) An ensemble feature selection algorithm based on PageRank centrality and fuzzy logic. Knowl Based Syst 233:107538
Ju YQ, Yuan LL, Yang Y, Zhao H (2019) CircSLNN: Identifying RBP-Binding Sites on circRNAs via Sequence Labeling Neural Networks. Front Genet 10:1184
Kang D, Lee Y, Lee J-S (2020) RNA-Binding proteins in Cancer. Funct Therapeutic Perspect Cancers 12(9):2699
Khemani B, Patil S, Kotecha K, Tanwar S (2024) A review of graph neural networks: concepts, architectures, techniques, challenges, datasets, applications, and future directions. J Big Data 11(1):18
Konishi H, Kashima S, Goto T, Ando K, Sakatani A, Tanaka H, Ueno N, Moriichi K, Okumura T, Fujiya M (2021a) The identification of RNA-Binding proteins functionally Associated with Tumor Progression in Gastrointestinal Cancer. Cancers 13(13):3165
Konishi H, Sato H, Takahashi K, Fujiya M (2021b) Tumor-Progressive Mechanisms Mediating miRNA-Protein Interaction. Int J Mol Sci 22(22):12303
Li Q, Li B, Li Q, Wei S, He Z, Huang X, Wang L, Xia Y, Xu Z, Li Z, Wang W, Yang L, Zhang D, Xu Z (2018) Exosomal mir-21-5p derived from gastric cancer promotes peritoneal metastasis via mesothelial-to-mesenchymal transition. Cell Death Dis 9(9):854
Li K, Wu H, Yue Z, Sun Y, Xia C (2023) A convolutional network and attention mechanism-based approach to predict protein–RNA binding residues. Comput Biol Chem 105:107901
Li G, Wakao S, Kitada M, Dezawa M (2024) Tumor suppressor let-7 acts as a key regulator for pluripotency gene expression in Muse cells. Cell Mol Life Sci 81(1):54
Liao Q, Ye Y, Li Z, Chen H, Zhuo L (2023) Prediction of miRNA-disease associations in microbes based on graph convolutional networks and autoencoders. Front Microbiol 14:1170559
Liu S, Chen L, Dong H, Wang Z, Wu D, Huang Z (2019) Higher-order weighted graph convolutional networks. https://arxiv:1911.04129
Ma YZ, Zhang H, Jin C, Kang CZ (2023) Predicting lncRNA-protein interactions with bipartite graph embedding and deep graph neural networks. Front Genet 14:1136672
Mehdi J, Mohammad Bagher D, Nazanin Zahra J (2021) An ensemble feature selection algorithm based on PageRank centrality and fuzzy logic. Knowl Based Syst 233:107538
Naik PA, Eskandari Z (2024) Nonlinear dynamics of a three-dimensional discrete-time delay neural network. Int J Biomath 17(06):2350057
Needleman SB, Wunsch CD (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48(3):443–453
Ortiz-Sánchez P, Villalba-Orero M, López-Olañeta MM, Larrasa-Alonso J, Sánchez-Cabo F, Martí-Gómez C, Camafeita E, Gómez-Salinero JM, Ramos-Hernández L, Nielsen PJ, Vázquez J, Müller-McNicoll M, García-Pavía P, Lara-Pezzi E (2019) Loss of SRSF3 in Cardiomyocytes Leads to Decapping of Contraction-related mRNAs and severe systolic dysfunction. Circ Res 125(2):170–183
Özçift A, Gülten A (2013) Genetic algorithm wrapped bayesian network feature selection applied to differential diagnosis of erythemato-squamous diseases. Digit Signal Proc 23(1):230–237
Pan XY, Shen HB (2017) RNA-protein binding motifs mining with a new hybrid deep learning based cross-domain knowledge integration approach. BMC Bioinform 18:1–14
Pan XY, Shen HB (2018) Predicting RNA-protein binding sites and motifs through combining local and global deep convolutional neural networks. Bioinformatics 34(20):3427–3436
Pan X, Rijnbeek P, Yan J, Shen HB (2018) Prediction of RNA-protein sequence and structure binding preferences using deep convolutional and recurrent neural networks. BMC Genom 19(1):511
Qiao Y, Yang R, Liu Y, Chen J, Zhao L, Huo P, Wang Z, Bu D, Wu Y, Zhao YJC, Journal SB (2024) DeepFusion: A deep bimodal information fusion network for unraveling protein-RNA interactions using in vivo RNA structures. Comput Struct Biotech J 23:617–625
Stefani G, Slack FJ (2008) Small non-coding RNAs in animal development. Nat Rev Mol Cell Biol 9(3):219–230
Tabassum MF, Akram S, Mahmood-ul-Hassan S, Karim R, Naik PA, Farman M, Yavuz M, Naik M-u-d, Ahmad H (2021a) Differential gradient evolution plus algorithm for constraint optimization problems: a hybrid approach. Int J Optim Control: Theor Appl (IJOCTA) 11(2):158–177
Tabassum MF, Farman M, Naik PA, Ahmad A, Ahmad AS, Hassan SM (2021b) Modeling and simulation of glucose insulin glucagon algorithm for artificial pancreas to control the diabetes mellitus. Net Model Anal Health Inform Bioinform 10(1):42
Takahashi K, Fujiya M, Konishi H, Murakami Y, Iwama T, Sasaki T, Kunogi T, Sakatani A, Ando K, Ueno N, Kashima S, Moriichi K, Tanabe H, Okumura T (2020) Heterogenous Nuclear Ribonucleoprotein H1 promotes Colorectal Cancer Progression through the stabilization of mRNA of Sphingosine-1-Phosphate lyase 1. Int J Mol Sci 21(12):4514
Tollenaere MAX, Tiedje C, Rasmussen S, Nielsen JC, Vind AC, Blasius M, Batth TS, Mailand N, Olsen JV, Gaestel M, Bekker-Jensen S (2019) GIGYF1/2-Driven Cooperation between ZNF598 and TTP in Posttranscriptional Regulation of Inflammatory Signaling. Cell Rep 26(13):3511
Tsialikas J, Romer-Seibert J (2015) LIN28: roles and regulation in development and beyond. Development 142(14):2397–2404
Velickovic P, Cucurull G, Casanova A, Romero A, Lio’ P, Bengio Y (2017) Graph Attention Networks. ArXiv abs/1710.10903.
Wang ZF, Lei XJ (2020) Matrix factorization with neural network for predicting circRNA-RBP interactions. BMC Bioinform 21(1):1–15
Wu SJ, Boghossian AA (2019) Analytical approaches for monitoring DNA-Protein interactions. Chimia 73(4):283–287
Xu YR, Zhu JH, Huang WZ, Xu K, Yang R, Zhang QC, Sun L (2023) PrismNet: predicting protein-RNA interaction using in vivo RNA structural information. Nucleic Acids Res 51(W1):W468–W477
Yang B, Chen HL (2023) Predicting circRNA-drug sensitivity associations by learning multimodal networks using graph auto-encoders and attention mechanism. Brief Bioinform 24(1):bbac596
Yi Y, Zhao Y, Huang Y, Wang D (2017) A Brief Review of RNA-Protein Interaction Database Resources. Non-coding RNA 3(1):6
Yu C-Q, Wang X-F, Li L-P, You Z-H, Huang W-Z, Li Y-C, Ren Z-H, Guan Y-J (2022) SGCNCMI: a New Model combining Multi-modal Information to predict circRNA-Related miRNAs, Diseases and Genes. Biology 11(9):1350
Yuan LL, Yang Y (2021) DeCban: Prediction of circRNA-RBP Interaction Sites by Using Double Embeddings and Cross-Branch Attention Networks. Front Genet 11:632861
Zhang X, Wang Y, Wei Q, He S, Salhi A, Yu B (2024) DRBPPred-GAT: Accurate prediction of DNA-binding proteins and RNA-binding proteins based on graph multi-head attention network. Knowl Based Syst 285:111354
Author information
Authors and Affiliations
Contributions
Tianci Zhang presented the idea and designed the algorithm for the model. Zihao Qi completed the experimental analysis and finished the case study. Shikai Qiao provided the literature survey and wrote the manuscript with Zihao Qi. Jujuan Zhuang guided us in writing and revising the article and provided feasible suggestions. All of the authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Conflict of interest
No potential conflicts of interest were reported by the authors.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhang, T., Qi, Z., Qiao, S. et al. GSASVM-RBPs: Predicting miRNA-binding protein sites with aggregated multigraph neural networks and an SVM. Netw Model Anal Health Inform Bioinforma 13, 53 (2024). https://doi.org/10.1007/s13721-024-00486-x
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13721-024-00486-x