Abstract
Computational prediction of protein localization is one common way to characterize the functions of newly sequenced proteins. Sequence features such as amino acid (AA) composition have been widely used for subcellular localization prediction due to their simplicity while suffering from low coverage and low prediction accuracy. We present a physichemical encoding method that maps protein sequences into feature vectors composed of the locations and lengths of amino acid groups (AAGs) with similar physichemical properties. This high-level modular representation of protein sequences overcomes the shortcoming of losing order information in the commonly used AA composition and AA pair composition encoding. When applied with SVM classifiers, we showed that AAG based features are able to achieve higher prediction accuracy (up to 20% improvement) than the widely used AA composition and AA pair composition to differentiate proteins of different localizations. When AAGs and AA composition encoding combined, the prediction accuracy can be further improved thus achieving synergistic effect.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Huh, W.K., Falvo, J.V., Gerke, L.C., Carroll, A.S., Howson, R.W., Weissman, J.S., O’Shea, E.K.: Global analysis of protein localization in budding yeast. Nature 425, 686–691 (2003)
Kumar, A., Agarwal, S., Heyman, J.A., Matson, S., Heidtman, M., Piccirillo, S., Umansky, L., Drawid, A., Jansen, R., Liu, Y., et al.: Subcellular localization of the yeast proteome. Genes Dev. 16, 707–719 (2002)
Casadio, R., Martelli, P.L., Pierleoni, A.: The prediction of protein subcellular localization from sequence: a shortcut to functional genome annotation. Brief Funct. Genomic. Proteomic. 7, 63–73 (2008)
Emanuelsson, O., Brunak, S., von Heijne, G., Nielsen, H.: Locating proteins in the cell using TargetP, SignalP and related tools. Nature Protocols 2, 953–971 (2007)
Gardy, J.L., Brinkman, F.S.L.: Methods for predicting bacterial protein subcellular localization. Nature Reviews Microbiology 4, 741–751 (2006)
Sprenger, J., Fink, J.L., Teasdale, R.D.: Evaluation and comparison of mammalian subcellular localization prediction methods. Bmc Bioinformatics 7 (2006)
Shen, H.B., Yang, J., Chou, K.C.: Methodology development for predicting subcellular localization and other attributes of proteins. Expert Review of Proteomics 4, 453–463 (2007)
Emanuelsson, O., Nielsen, H., Brunak, S., von Heijne, G.: Predicting subcellular localization of proteins based on their N-terminal amino acid sequence. Journal of Molecular Biology 300, 1005–1016 (2000)
Scott, M.S., Calafell, S.J., Thomas, D.Y., Hallett, M.T.: Refining protein subcellular localization. PLoS Comput. Biol. 1, 518–528 (2005)
Jin, Y.H., Niu, B., Feng, K.Y., Lu, W.C., Cai, Y.D., Li, G.Z.: Predicting subcellular localization with AdaBoost Learner. Protein and Peptide Letters 15, 286–289 (2008)
Lorena, A.C., de Carvalho, A.C.P.L.: Protein cellular localization prediction with support vector machines and decision trees. Computers in Biology and Medicine 37, 115–125 (2007)
Sarda, D., Chua, G.H., Li, K.B., Krishnan, A.: pSLIP: SVM based protein subcellular localization prediction using multiple physicochemical properties. Bmc Bioinformatics 6 (2005)
Hua, S.J., Sun, Z.R.: Support vector machine approach for protein subcellular localization prediction. Bioinformatics 17, 721–728 (2001)
Nakai, K., Horton, P.: PSORT: a program for detecting sorting signals in proteins and predicting their subcellular localization. Trends in Biochemical Sciences 24, 34–35 (1999)
Chou, K.C., Cai, Y.D.: Predicting subcellular localization of proteins by hybridizing functional domain composition and pseudo-amino acid composition. Journal of Cellular Biochemistry 91, 1197–1203 (2004)
Nanni, L., Lumini, A.: Genetic programming for creating Chou’s pseudo amino acid based features for submitochondria localization. Amino Acids 34, 653–660 (2008)
Li, Y.F., Liu, J.: Predicting subcellular localization of proteins using support vector machine with N-terminal amino composition. Proceedings of Advanced Data Mining and Applications 3584, 618–625 (2005)
Shi, J.Y., Zhang, S.W., Pan, Q., Cheng, Y.M., Xie, J.: Prediction of protein subcellular localization by support vector machines using multi-scale energy and pseudo amino acid composition. Amino Acids 33, 69–74 (2007)
Yu, C.S., Lin, C.J., Hwang, J.K.: Predicting subcellular localization of proteins for Gram-negative bacteria by support vector machines based on n-peptide compositions. Protein Science 13, 1402–1406 (2004)
Szafron, D., Lu, P., Greiner, R., Wishart, D.S., Poulin, B., Eisner, R., Lu, Z., Anvik, J., Macdonell, C., Fyshe, A., et al.: Proteome Analyst: custom predictions with explanations in a web-based tool for high-throughput proteome annotations. Nucleic Acids Research 32, W365–W371 (2004)
Marcotte, E.M., Xenarios, I., van der Bliek, A.M., Eisenberg, D.: Localizing proteins in the cell from their phylogenetic profiles. Proceedings of the National Academy of Sciences of the United States of America 97, 12115–12120 (2000)
Yu, C.S., Chen, Y.C., Lu, C.H., Hwang, J.K.: Prediction of protein subcellular localization. Proteins-Structure Function and Bioinformatics 64, 643–651 (2006)
Zhang, S., Xia, X.F., Shen, J.C., Sun, Z.R.: Eukaryotic protein subcellular localization prediction based on sequence conservation and protein-protein interaction. Progress in Biochemistry and Biophysics 35, 531–535 (2008)
Drawid, A., Gerstein, M.: A Bayesian system integrating expression data with sequence patterns for localizing proteins: Comprehensive application to the yeast genome. Journal of Molecular Biology 301, 1059–1075 (2000)
Kawashima, S., Pokarowski, P., Pokarowska, M., Kolinski, A., Katayama, T., Kanehisa, M.: AAindex: amino acid index database, progress report 2008. Nucleic Acids Research 36, D202–D205 (2008)
Silhavy, T.J., Benson, S.A., Emr, S.D.: Mechanisms of Protein Localization. Microbiological Reviews 47, 313–344 (1983)
Ng, S.Y.M., Chaban, B., VanDyke, D.J., Jarrell, K.F.: Archaeal signal peptidases. Microbiology-Sgm 153, 305–314 (2007)
Nielsen, H., Brunak, S., von Heijne, G.: Machine learning approaches for the prediction of signal peptides and other protein sorting signals. Protein Engineering 12, 3–9 (1999)
Emanuelsson, O.: Predicting protein subcellular localisation from amino acid sequence information. Brief Bioinform. 3, 361–376 (2002)
Li, Z.R., Lin, H.H., Han, L.Y., Jiang, L., Chen, X., Chen, Y.Z.: PROFEAT: a web server for computing structural and physicochemical features of proteins and peptides from amino acid sequence. Nucleic Acids Research 34, W32–W37 (2006)
Biro, J.C.: Amino acid size, charge, hydropathy indices and matrices for protein structure analysis. Theor. Biol Med. Model. 3, 15 (2006)
Lu, Y., Bulka, B., Desjardins, M., Freeland, S.J.: Amino acid quantitative structure property relationship database: a web-based platform for quantitative investigations of amino acids. Protein Engineering Design & Selection 20, 347–351 (2007)
Nair, R., Rost, B.: Mimicking cellular sorting improves prediction of subcellular localization. Journal of Molecular Biology 348, 85–100 (2005)
Pierleoni, A., Martelli, P.L., Fariselli, P., Casadio, R.: BaCelLo: a balanced subcellular localization predictor. Bioinformatics 22, E408–E416 (2006)
Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, San Francisco (2005)
Casadio, R., Martelli, P.L., Pierleoni, A.: The prediction of protein subcellular localization from sequence: a shortcut to functional genome annotation. Brief Funct. Genomic. Proteomic. 7, 63–73 (2008)
Pierleoni, A., Martelli, P.L., Fariselli, P., Casadio, R.: BaCelLo: a balanced subcellular localization predictor. Bioinformatics 22, E408–E416 (2006)
Hall, M.A., Smith, L.A.: Feature subset selection: a correlation based filter approach. In: Proceeding of International Conference on Neural Information Processing and Intelligent Information Systems, pp. 855–858. Springer, Heidelberg (1997)
Xiao, X., Chou, K.C.: Digital coding of amino acids based on hydrophobic index. Protein and Peptide Letters 14, 871–875 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hu, J., Zhang, F. (2009). Improving Protein Localization Prediction Using Amino Acid Group Based Physichemical Encoding. In: Rajasekaran, S. (eds) Bioinformatics and Computational Biology. BICoB 2009. Lecture Notes in Computer Science(), vol 5462. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00727-9_24
Download citation
DOI: https://doi.org/10.1007/978-3-642-00727-9_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-00726-2
Online ISBN: 978-3-642-00727-9
eBook Packages: Computer ScienceComputer Science (R0)