Abstract
Protein threading programs align a probe amino acid sequence onto a library of representative folds of known protein structure to identify a structural homology. A scoring function is usually formulated in terms of the threading energy to evaluate the protein sequence-structure fitness. In this paper, a model named threading with environment-specific score (TES) is proposed to build a new threading score function with the use of artificial neural networks. Given a protein structure with a residue level environment description, the compatibility of residue in sequence with its structural environment is presented. A threading score is constructed by log-odds scores of predicted probabilities from the trained model to determine which residue best fits its environment. Two decoy sets are used to test the proposed TES method on discrimination of native and decoy protein three-dimensional structure. The results showed that the performance of the proposed method is comparable to those of knowledge-based potential energy function.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Baldi P, Brunak S (2001) Bioinformaics: the machine learning approach. MIT Press, Cambridge
Baldi P, Brunak S, Chauvin Y, Andersen CAF, Nielsen H (2000) Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics 16(5):412–424
Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H et al (2000) The Protein Data Bank. Nucleic Acids Res 28:235–242
Bernstein FC, Koetzle TF, Williams GJB, Meyer E Jr, Brice MD, Rodgers JR, Kennard O, Shimanouchi T, Tasumi M (1977) The Protein Data Bank: a computer-based archival file for macromolecular structures. J Mol Biol 15:937–946
Bowie JU, Luthy R, Eisenberg D (1991) A method to identify protein sequences that fold into a known three-dimensional structure. Science 253:164–170
Braxenthaler M, Samudrala R, Pedersen J, Luo R, Milash B Moult J (1997) PROSTAR: the protein potential test site. http://prostar.carb.nist.gov
Bryant SH, Lawrence CE (1993) An empirical energy function for threading protein sequence through the folding motif. Proteins Struct Funct Genet 16(1):92–112
Cybenko G (1989) Approximation by superpositions of a sigmoidal function. Math Control Signal Syst 2(4):303–314
Gatchell DW, Dennis S, Vajda S (2000) Discrimination of nearnative protein structures from misfold models by empirical free energy functions. Proteins Struct Funct Genet 41:518–534
Hanley JA, McNeil BJ (1982) The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143:29–36
Holm L, Sander C (1992) Evaluation of protein models by atomic solvation preference. J Mol Biol 225:93–105
Holm L, Sander C (1997) Dali/ FSSP classification of three-dimensional protein folds. Nucleic Acids Res 25:231–234
Jadwiga RB, Robert GR Jr, Temple FS (1999) Performance of threading scoring function designed using new optimisation method. J Comput Biol 6:299–311
Jones DT, Miller RT, Thornton JM (1995) Successful protein fold recognition by optimal sequence threading validated by rigorous blind testing. Proteins Struct Funct Genet 23:387–397
Jones DT (1999) GenTHREADER: an efficient and reliable protein fold recognition method for genomic sequences. J Mol Biol 287(4):797–815
Lathrop RH, Smith TF (1996) Global optimum protein threading with gapped alignment and empirical pair potentials. J Mol Biol 255:641–665
Lazaridis T, Karplus M (2000) Effective energy functions for protein structure prediction. Curr Opin Struct Biol 10:139–145
Lin K, May ACW, Taylor WR (2002) Threading using neural network: the measure of protein sequence-structure compatibility. Bioinformatics 18(10):1350–1357
Lo Conte L, Brenner SE, Hubbard TJP, Chothia C, Murzin A (2002) SCOP database in 2002: refinements accommodate structural genomics. Nucleic Acids Res 30(1):264–267
Lu H, Skolnick J (2001) A distance-dependent atomic knowledge-based potential for improved protein structure selection. Proteins Struct Funct Genet 44:223–232
McConkey BJ, Sobolev V, Edelman M (2003) Discrimination of native protein structures using atom-atom contact scoring. Proc Natl Acad Sci USA 100:3215–3220
McGuffin LJ, Jones DT (2003) Improvement of the GenTHREADER method for genomic fold recognition. Bioinformatics 19:874–881
Mosimann S, Meleshko R, James M (1995) A critical assessment of comparative molecular modelling of tertiary structures in proteins. Proteins Struct Funct Genet 23:301–317
Murzin AG, Brenner SE, Hubbard T, Chothia C (1995) SCOP: a structural classification of proteins database for the invertigation of sequences and structures. J Mol Biol 241(4):536–540
Orengo CA, Michie AD, Jones S, Jones DT, Swindells MB, Thornton JM (1997) CATH- a hierarchic classification of protein domain structures. Structure 5:1093–1108
Park B, Levitt M (1996) Energy functions that discriminate X-ray and near-native folds from well-constructed decoys. J Mol Biol 258:367–392
Russ WP, Ranganathan R (2002) Knowledge-based potential functions in protein design. Curr Opin Struct Biol 12:447–452
Samudrala R, Moult J (1998) An all-atom distance-dependent conditional probability discriminatory function for protein structure prediction. J Mol Biol 275:895–916
Samudrala R, Huang ES, Levitt M (1998) Selection of the most native-like conformations from a set of models constructed by homology modelling. Unpublished results.
Samudrala R, Xia Y, Levitt M, Huang ES (1999) A combined approach for ab initio construction of low resolution protein tertiary structures from sequence. In: Proceedings of the pacific symposium on biocomputing, pp 505–516
Samudrala R, Levitt M (2000) Decoys `R' Us: a database of incorrect conformations to improve protein structure prediction. Protein Sci 9:1399–1401
Samudrala R, Levitt M (2002) A comprehensive analysis of 40 blind protein structure predictions. BMC Struct Biol 2:3–18
Skolnick J, Kolinski A, Ortiz A (2000) Derivation of protein-specific pair potentials based on weak sequence fragment similarity. Proteins Struct Funct Genet 38:3–16
Simons KT, Kooperberg C, Huang ES, Baker D (1997) Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and bayesian scoring functions. J Mol Biol 268:209–225
Sippl MJ (1995) Knowledge-based potentials for proteins. Curr Opin Struct Biol 5:229–235
Taylor WR (1997) Multiple sequence threading: an analysis of alignment quality and stability. J Mol Biol 269:902–943
Taylor WR, Orengo CA (1989) Protein structure alignment. J Mol Biol 208(1):1–22
Thiele R, Zimmer R, Lengauer T (1999) Protein threading by recursive dynamic programming. J Mol Biol 290:757–779
Unger R, Moult J (1991) An analysis of protein folding pathways. Biochemistry 30:3816–3823
Vendruscolo M, Najmanovich R, Domany E (2000) Can a pairwise contact potential stabilize native protein folds against decoys obtained by threading?. Proteins Struct Funct Genet 38:134–148
Wang K, Fain B, Levitt M, Samudrala R (2004) Improved protein structure selection using decoy-dependent discriminatory functions. BMC Struct Biol 4(1):8
Xia Y, Huang ES, Levitt M, Samudrala R (2000) Ab initio construction of protein tertiary structures using a hierarchical approach. J Mol Biol 300:171–185
Zhou H, Zhou Y (2002) Distance-scaled, finite ideal-gas reference state improves structure-derived potentials of mean force for structure selection and stability prediction. Protein Sci 11:2714–2726
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Jiang, N., XinyuWu, W. & Mitchell, I. Threading with environment-specific score by artificial neural networks. Soft Comput 10, 305–314 (2006). https://doi.org/10.1007/s00500-005-0488-6
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-005-0488-6