Abstract
Enzyme function prediction is an important problem in post-genomic bioinformatics, needed for reconstruction of metabolic networks of organisms. Currently there are two general methods for solving the problem: annotation transfer from a similar annotated protein, and machine learning approaches that treat the problem as classification against a fixed taxonomy, such as Gene Ontology or the EC hierarchy. These methods are suitable in cases where the function of the new protein is indeed previously characterized and included in the taxonomy. However, given a new function that is not previously described, these approaches are not of significant assistance to the human expert. The goal of this paper is to bring forward structured output learning approaches for the case where the exactly correct function of the enzyme to be annotated may not be contained in the training set. Our approach hinges on fine-grained representation of the enzyme function via the so called reaction kernels that allow interpolation and extrapolation in the output (reaction) space. A kernel-based structured output prediction model is used to predict enzymatic reactions from sequence motifs. We bring forward several choices for constructing reaction kernels and experiment with them in the remote homology case where the functions in the test set have not been seen in the training phase.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Astikainen, K., Holm, L., Pitknen, E., Szedmak, S., Rousu, J.: Towards structured output prediction of enzyme function. In: BMC Proceedings, vol. 2(S4), S2 (2008)
Barutcuoglu, Z., Schapire, R., Troyanskaya, O.: Hierarchical multi-label prediction of gene function. Bioinformatics 22(7), 830–836 (2006)
Blockeel, H., Schietgat, L., Struyf, J., Džeroski, S., Clare, A.: Decision trees for hierarchical multilabel classification: A case study in functional genomics. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 18–29. Springer, Heidelberg (2006)
Borgwardt, K.M., Ong, C.S., Schnauer, S., Vishwanathan, S.V.N., Smola, A.J., Kriegel, H.P.: Protein function prediction via graph kernels. Bioinformatics 21(1), 47–56 (2005)
Clare, A., King, R.: Machine learning of functional class from phenotype data. Bioinformatics 18(1), 160–166 (2002)
Gartner, T.: A survey of kernels for structured data. SIGKDD Explorations 5 (2003)
Goto, S., Okuno, Y., Hattori, M., Nishioka, T., Kanehisa, M.: Ligand: database of chemical compounds and reactions in biological pathways. Nucleic Acids Research 30(1), 402 (2002)
Heger, A., Korpelainen, E., Hupponen, T., Mattila, K., Ollikainen, V., Holm, L.: Pairsdb atlas of protein sequence space. Nucl. Acids Res. 36, D276–D280 (2008)
Heger, A., Mallick, S., Wilton, C., Holm, L.: The global trace graph, a novel paradigm for searching protein sequence databases. Bioinformatics 23(18) (2007)
Heinonen, M., Lappalainen, S., Mielikäinen, T., Rousu, J.: Computing Atom Mappings for Biochemical Reactions without Subgraph Isomorphism. Journal of Computational Biology (to appear 2011)
Holm, L., Sander, C.: Dali/fssp classification of three-dimensional protein folds. Nucleic Acids Research 25(1), 231–234 (1996)
Lanckriet, G., Deng, M., Cristianini, N., et al.: Kernel-based data fusion and its application to protein function prediction in yeast. In: PSB 2004 (2004)
Pitkänen, E., Jouhten, P., Rousu, J.: Inferring branching pathways in genome-scale metabolic networks. BMC Systems Biology 3(1), 103 (2009)
Pitkänen, E., Rousu, J., Ukkonen, E.: Computational methods for metabolic reconstruction. Current Opinion in Biotechnology 21, 70–77 (2010)
Punta, M., Ofran, Y.: The rough guide to in silico function prediction, or how to use sequence and structure information to predict protein function. PLoS Computational Biology 4(10) (2008)
Rantanen, A., Rousu, J., Jouhten, P., Zamboni, N., Maaheimo, H., Ukkonen, E.: An analytic and systematic framework for estimating metabolic flux ratios from 13 C tracer experiments. BMC bioinformatics 9(1), 266 (2008)
Rousu, J., Saunders, C., Szedmak, S., Shawe-Taylor, J.: Kernel-based learning of hierarchical multilabel classification models. JMLR 7 (2006)
Schlkopf, B., Platt, J.C., Shawe-Taylor, J., Smola, A.J., Williamson, R.C.: Estimating the support of a high-dimensional distribution. Neural Computation 13(7), 1443–1471 (2001)
Sokolov, A., Ben-Hur, A.: A structured-outputs method for prediction of protein function. In: Proceedings of the 3rd International Workshop on Machine Learning in Systems Biology (2008)
Szedmak, S., Shawe-Taylor, J., Parado-Hernandez, E.: Learning via linear operators: Maximum margin regression. Tech. rep., Pascal (2005)
Taskar, B., Guestrin, C., Koller, D.: Max-margin markov networks. In: NIPS 2003 (2004)
Tsochantaridis, I., Hofmann, T., Joachims, T., Altun, Y.: Support vector machine learning for interdependent and structured output spaces. In: ICML (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Astikainen, K., Holm, L., Pitkänen, E., Szedmak, S., Rousu, J. (2011). Structured Output Prediction of Novel Enzyme Function with Reaction Kernels. In: Fred, A., Filipe, J., Gamboa, H. (eds) Biomedical Engineering Systems and Technologies. BIOSTEC 2010. Communications in Computer and Information Science, vol 127. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-18472-7_29
Download citation
DOI: https://doi.org/10.1007/978-3-642-18472-7_29
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-18471-0
Online ISBN: 978-3-642-18472-7
eBook Packages: Computer ScienceComputer Science (R0)