Abstract
We present a GEneral Neural Network (GENN) for learning trends from existing data and making predictions of unknown information. The main novelty of GENN is in its generality, simplicity of use, and its specific handling of windowed input/output. Its main strength is its efficient handling of the input data, enabling learning from large datasets. GENN is built on a two-layered neural network and has the option to use separate inputs–output pairs or window-based data using data structures to efficiently represent input–output pairs. The program was tested on predicting the accessible surface area of globular proteins, scoring proteins according to similarity to native, predicting protein disorder, and has performed remarkably well. In this paper we describe the program and its use. Specifically, we give as an example the construction of a similarity to native protein scoring function that was constructed using GENN. The source code and Linux executables for GENN are available from Research and Information Systems at http://mamiris.com and from the Battelle Center for Mathematical Medicine at http://mathmed.org. Bugs and problems with the GENN program should be reported to EF.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Kassin SM (1979) Consensus information, prediction, and causal attribution: a review of the literature and issues. J Pers Soc Psychol 37:1966
Crick NR, Dodge KA (1994) A review and reformulation of social information-processing mechanisms in children’s social adjustment. Psychol Bull 115:74
Fielding AH, Bell JF (1997) A review of methods for the assessment of prediction errors in conservation presence/absence models. Environ Conserv 24:38–49
Makhoul J (1975) Linear prediction: a tutorial review. Proc IEEE 63:561–580
Fontenot RJ, Wilson EJ (1997) Relational exchange: a review of selected models for a prediction matrix of relationship activities. J Bus Res 39:5–12
Rost B et al (2001) Review: protein secondary structure prediction continues to rise. J Struct Biol 134:204–218
Maier HR, Dandy GC (2000) Neural networks for the prediction and forecasting of water resources variables: a review of modelling issues and applications. Environ Model Softw 15:101–124
Chang JC, Wooten EC, Tsimelzon A, Hilsenbeck SG, Gutierrez MC, Elledge R, Mohsin S, Osborne CK, Chamness GC, Allred DC et al (2003) Gene expression profiling for the prediction of therapeutic response to docetaxel in patients with breast cancer. Lancet 362:362–369
Schofield W et al (1985) Predicting basal metabolic rate, new standards and review of previous work. Hum Nutr Clin Nutr 39:5
Blundell T, Sibanda B, Sternberg M, Thornton J (1987) Knowledge-based prediction of protein structures. Nature 326:26
Chou PY, Fasman GD (1978) Empirical predictions of protein conformation. Annu Rev Biochem 47:251–276
Floudas C, Fung H, McAllister S, Mönnigmann M, Rajgaria R (2006) Advances in protein structure prediction and de novo protein design: a review. Chem Eng Sci 61:966–988
Moult J (2005) A decade of CASP: progress, bottlenecks and prognosis in protein structure prediction. Curr Opin Struct Biol 15:285–289
Vazquez A, Flammini A, Maritan A, Vespignani A (2003) Global protein function prediction from protein-protein interaction networks. Nat Biotechnol 21:697–700
Borgwardt KM, Ong CS, Schönauer S, Vishwanathan S, Smola AJ, Kriegel H-P (2005) Protein function prediction via graph kernels. Bioinformatics 21:i47–i56
Chothia C (1974) Hydrophobic bonding and accessible surface area in proteins. Nature 248:338–339
Moret M, Zebende G (2007) Amino acid hydrophobicity and accessible surface area. Phys Rev E 75:011920
Dor O, Zhou Y (2007) Real-spine: an integrated system of neural networks for real-value prediction of protein structural properties. Proteins: Struct Funct Bioinf 68:76–81
Durham E, Dorr B, Woetzel N, Staritzbichler R, Meiler J (2009) Solvent accessible surface area approximations for rapid and accurate protein structure prediction. J Mol Model 15:1093–1108
Zhang H, Zhang T, Chen K, Shen S, Ruan J, Kurgan L (2009) On the relation between residue flexibility and local solvent accessibility in proteins. Proteins: Struct Funct Bioinf 76:617–636
Faraggi E, Xue B, Zhou Y (2009) Improving the prediction accuracy of residue solvent accessibility and real-value backbone torsion angles of proteins by guided-learning through a two-layer neural network. Proteins: Struct Funct Bioinf 74:847–856
Zhang T, Zhang H, Chen K, Ruan J, Shen S, Kurgan L (2010) Analysis and prediction of RNA-binding residues using sequence, evolutionary conservation, and predicted secondary structure and solvent accessibility. Curr Protein Pept Sci 11:609–628
Gao J, Zhang T, Zhang H, Shen S, Ruan J, Kurgan L (2010) Accurate prediction of protein folding rates from sequence and sequence-derived residue flexibility and solvent accessibility. Proteins: Struct Funct Bioinf 78:2114–2130
Nunez S, Venhorst J, Kruse CG (2010) Assessment of a novel scoring method based on solvent accessible surface area descriptors. J Chem Inf Model 50:480–486
Faraggi E, Zhang T, Yang Y, Kurgan L, Zhou Y (2012) Spine x: improving protein secondary structure prediction by multistep learning coupled with prediction of solvent accessible surface area and backbone torsion angles. J Comput Chem 33:259–267
Wang C, Xi L, Li S, Liu H, Yao X (2012) A sequence-based computational model for the prediction of the solvent accessible surface area for α-helix and β-barrel transmembrane residues. J Comput Chem 33:11–17
Faraggi E, Kloczkowski A (2013) A global machine learning based scoring function for protein structure prediction. Proteins: Struct Funct Bioinf. doi:10.1002/prot.24454
Xue B, Dor O, Faraggi E, Zhou Y (2008) Real value prediction of backbone torsion angles. Proteins: Struct Funct Bioinf 72:427–433
Faraggi E, Yang Y, Zhang S, Zhou Y (2009) Predicting continuous local structure and the effect of its substitution for secondary structure in fragment-free protein structure prediction. Structure 17:1515–1527
Zhang T, Faraggi E, Zhou Y (2010) Fluctuations of backbone torsion angles obtained from nmr-determined structures and their prediction. Proteins: Struct Funct Bioinf 78:3353–3362
Zhang T, Faraggi E, Xue B, Dunker AK, Uversky VN, Zhou Y (2012) Spine-d: accurate prediction of short and long disordered regions by a single neural-network based method. J Biomol Struct Dyn 29:799–813
Moult J, Fidelis K, Kryshtafovych A, Tramontano A (2011) Critical assessment of methods of protein structure prediction (casp) round ix. Proteins: Struct Funct Bioinf 79:1–5
Faraggi E, Yaoqi Z, Kloczkowski A (2014) Accurate single-sequence prediction of solvent accessible surface area using local and global features. Proteins: Struct, Funct, and Bioinf. DOI: 10.1002/prot.24682
CASP10 (2012) Official group performance ranking. http://www.predictioncenter.org/casp10/groups_analysis.cgi. Accessed 10 June 2012
Feng Y, Kloczkowski A, Jernigan R (2007) Four-body contact potentials derived from two protein datasets to discriminate native structures from decoys. Proteins: Struct Funct Bioinf 68:57–66
Feng Y, Kloczkowski A, Jernigan RL (2010) Potentials’ r’us web-server for protein energy estimations with coarse-grained knowledge-based potentials. BMC Bioinf 11:92
Gniewek P, Leelananda SP, Kolinski A, Jernigan RL, Kloczkowski A (2011) Multibody coarse-grained potentials for native structure recognition and quality assessment of protein models. Proteins: Struct Funct Bioinf 79:1923–1929
Zhou H, Zhou Y (2002) Distance-scaled, finite ideal-gas reference state improves structure-derived potentials of mean force for structure selection and stability prediction. Protein Sci 11:2714–2726
Yang Y, Zhou Y (2008) Ab initio folding of terminal segments with secondary structures reveals the fine difference between two closely related all-atom statistical energy functions. Protein Sci 17:1212–1219
Zhang J, Zhang Y (2010) A novel side-chain orientation dependent potential derived from random-walk reference state for protein fold selection and structure prediction. PLoS One 5:e15386
Zhang Y, Skolnick J (2004) Scoring function for automated assessment of protein structure template quality. Proteins: Struct Funct Bioinf 57:702–710
Xu J, Zhang Y (2010) How significant is a protein structure similarity with tm-score = 0.5? Bioinformatics 26:889–895
Acknowledgements
We gratefully acknowledge the financial support provided by the National Institutes of Health (NIH) through Grants R01GM072014 and R01GM073095 and the National Science Foundation through Grant NSF MCB 1071785. Both authors would like to thank the organizers of CASP10 conference in Gaeta, Italy, for inviting them to the conference and providing free registration to EF. EF would also like to thank Yaoqi Zhou and Keith Dunker for hosting him at IUPUI and general discussions.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer Science+Business Media New York
About this protocol
Cite this protocol
Faraggi, E., Kloczkowski, A. (2015). GENN: A GEneral Neural Network for Learning Tabulated Data with Examples from Protein Structure Prediction. In: Cartwright, H. (eds) Artificial Neural Networks. Methods in Molecular Biology, vol 1260. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-2239-0_10
Download citation
DOI: https://doi.org/10.1007/978-1-4939-2239-0_10
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4939-2238-3
Online ISBN: 978-1-4939-2239-0
eBook Packages: Springer Protocols