Abstract
Determining the functional roles of proteins is a vital task to understand life at molecular level and has great biomedical and pharmaceutical implications. With the development of novel high-throughput techniques, enormous amounts of protein-protein interaction (PPI) data are collected and provide an important and feasible way for studying protein function predictions. According to this, many approaches assign biological functions to all proteins using PPI networks directly. However, due to the extreme complexity of the topology structure of real PPI networks, it is very difficult and time consuming to seek the global optimization or clustering on the networks. In addition, biological functions are often highly correlated, which makes functions assigned to proteins are not independent. To address these challenges, in this paper we propose a two-stage function annotation method with robust feature selection. First, we transform the network into the low-dimensional representations of nodes via manifold learning. Then, we integrate the functional correlation into the framework of multi-label linear regression, and introduce robust sparse penalty to achieve the function assignment and representative feature selection simultaneously. For the optimization, we design an efficient algorithm to iteratively solve several subproblems with closed-form solutions. Extensive experiments against other baseline methods on Saccharomyces cerevisiae data demonstrate the effectiveness of the proposed approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Vazquez, A., Flammini, A., Maritan, A., Vespignani, A.: Global protein function prediction from protein-protein interaction networks. Nat. Biotechnol. 21(6), 697–700 (2003)
Ashburner, M., et al.: Gene ontology: tool for the unification of biology. The gene ontology consortium. Nat. Genet. 25, 25–29 (2000)
Elisseeff, A., Weston, J.: A kernel method for multi-labelled classification. In: Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic, NIPS 2001, pp. 681–687. MIT Press, Cambridge (2001). http://dl.acm.org/citation.cfm?id=2980539.2980628
Gligorijevic, V., Barot, M., Bonneau, R.: deepNF: deep network fusion for protein function prediction. Bioinformatics (Oxford, England) 34, 3873–3881 (2018). https://doi.org/10.1093/bioinformatics/bty440
Hishigaki, H., Nakai, K., Ono, T., Tanigami, A., Takagi, T.: Assessment of prediction accuracy of protein function from protein-protein interaction data. Yeast 18(6), 523–531 (2001). https://doi.org/10.1002/yea.706
Wang, H., Huang, H., Ding, C.: Image annotation using multi-label correlated green’s function. In: 2009 IEEE 12th International Conference on Computer Vision. pp. 2029–2034, September 2009. https://doi.org/10.1109/ICCV.2009.5459447
Wang, H., Huang, H., Ding, C.: Image annotation using bi-relational graph of images and semantic labels. In: CVPR 2011, pp. 793–800, June 2011. https://doi.org/10.1109/CVPR.2011.5995379
Karaoz, U., et al.: Whole-genome annotation by using evidence integration in functional-linkage networks. Proc. Natl. Acad. Sci. 101, 2888–2893 (2004). https://doi.org/10.1073/pnas.0307326101
Liu, J., Wang, J., Yu, G.: Protein function prediction by random walks on a hybrid graph. Curr. Proteom. 13, 130–142 (2016). https://doi.org/10.2174/157016461302160514004307
Mewes, H., et al.: MIPS: a database for genomes and protein sequences. Nucleic Acids Res. 28(1), 37–40 (2000). https://doi.org/10.1093/nar/28.1.37. http://europepmc.org/articles/PMC102494
Zhang, M.-L., Zhou, Z.-H.: Multilabel neural networks with applications to functional genomics and text categorization. IEEE Trans. Knowl. Data Eng. 18(10), 1338–1351 (2006). https://doi.org/10.1109/TKDE.2006.162
Nabieva, E., Jim, K., Agarwal, A., Chazelle, B., Singh, M.: Whole-proteome prediction of protein function via graph-theoretic analysis of interaction maps. Bioinformatics 21(1), 302–310 (2005)
Pizzuti, C.: GA-net: a genetic algorithm for community detection in social networks. In: Rudolph, G., Jansen, T., Beume, N., Lucas, S., Poloni, C. (eds.) PPSN 2008. LNCS, vol. 5199, pp. 1081–1090. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-87700-4_107
Schwikowski, B., Uetz, P., Fields, S.: A network of protein-protein interactions in yeast. Nat. Biotechnol. 18(12), 1257–1261 (2000). https://doi.org/10.1038/82360
Sharan, R., Ulitsky, I., Shamir, R.: Network-based prediction of protein function. Mol. Syst. Biol. 3, 88 (2007). https://doi.org/10.1038/msb4100129
Wang, H., Huang, H., Ding, C.: Correlated protein function prediction via maximization of data-knowledge consistency. In: International Conference on Research in Computational Molecular Biology (2014)
Yu, Z., Fu, G., Wang, J., Zhao, Y.: NewGOA: predicting new go annotations of proteins by bi-random walks on a hybrid graph. IEEE/ACM Trans. Comput. Biol. Bioinform. 1 (2017). https://doi.org/10.1109/TCBB.2017.2715842
Zhang, M., Wu, L.: LIFT: multi-label learning with label-specific features. IEEE Trans. Pattern Anal. Mach. Intell. 37(1), 107–120 (2015). https://doi.org/10.1109/TPAMI.2014.2339815
Zhang, M.L.: ML-RBF: RBF neural networks for multi-label learning. Neural Process. Lett. 29(2), 61–74 (2009). https://doi.org/10.1007/s11063-009-9095-3
Zhang, M.L., Peña, J.M., Robles, V.: Feature selection for multi-label naive bayes classification. Inform. Sci. 179(19), 3218–3229 (2009). https://doi.org/10.1016/j.ins.2009.06.010. http://www.sciencedirect.com/science/article/pii/S0020025509002552
Zhang, M.L., Zhang, K.: Multi-label learning by exploiting label dependency. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2010, pp. 999–1008. ACM, New York (2010). https://doi.org/10.1145/1835804.1835930
Zhao, H., Sun, D., Wang, R., Luo, B.: A network-based approach for protein functions prediction using locally linear embedding. In: International Conference on Bioinformatics and Biomedical Engineering (2010)
You, Z.H., Lei, Y.K., Gui, J., Huang, D.S., Zhou, X.: Using manifold embedding for assessing and predicting protein interactions from high-throughput experimental data. Bioinformatics 26(21), 2744–2751 (2010)
Acknowledgements
This work was supported by the Key Natural Science Project of Anhui Provincial Education Department (KJ2018A0023), the Guangdong Province Science and Technology Plan Projects (2017B010110011), the Anhui Key Research and Development Plan (1804a09020101), the National Basic Research Program (973 Program) of China (2015CB351705) and the National Natural Science Foundation of China (61906002, 61402002, 61876002 and 61860206004).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Sun, D., Sun, H., Wu, H., Liang, H., Ding, Z. (2020). Correlated Protein Function Prediction with Robust Feature Selection. In: Pan, L., Liang, J., Qu, B. (eds) Bio-inspired Computing: Theories and Applications. BIC-TA 2019. Communications in Computer and Information Science, vol 1160. Springer, Singapore. https://doi.org/10.1007/978-981-15-3415-7_1
Download citation
DOI: https://doi.org/10.1007/978-981-15-3415-7_1
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-3414-0
Online ISBN: 978-981-15-3415-7
eBook Packages: Computer ScienceComputer Science (R0)