[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1007/11510888_33guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Prediction of secondary protein structure content from primary sequence alone – a feature selection based approach

Published: 09 July 2005 Publication History

Abstract

Research in protein structure and function is one of the most important subjects in modern bioinformatics and computational biology. It often uses advanced data mining and machine learning methodologies to perform prediction or pattern recognition tasks. This paper describes a new method for prediction of protein secondary structure content based on feature selection and multiple linear regression. The method develops a novel representation of primary protein sequences based on a large set of 495 features. The feature selection task performed using very large set of nearly 6,000 proteins, and tests performed on standard non-homologues protein sets confirm high quality of the developed solution. The application of feature selection and the novel representation resulted in 14-15% error rate reduction when compared to results achieved when standard representation is used. The prediction tests also show that a small set of 5-25 features is sufficient to achieve accurate prediction for both helix and strand content for non-homologous proteins.

References

[1]
Berman H.M., et al.: The Protein Data Bank, Nucleic Acids Research, 28, 235-242, 2000
[2]
Bussian B., & Sender, C., How to Determine Protein Secondary Structure in Solution by Raman Spectroscopy: Practical Guide and Test Case DNsae I, Biochem., 28, 4271-77, 1989
[3]
Boeckmann B., et al., The SWISS-PROT Protein Knowledgebase and Its Supplement TrEMBL in 2003, Nucleic Acids Research, 31, 365-370, 2003
[4]
Dwyer D., Electronic Properties of Amino Acids Side Chains Contribute to the Structural Preferences in Protein Folding, J Bimolecular Structure & Dynamics, 18:6, 881-892, 2001
[5]
Eisenhaber F., et al., Prediction of Secondary Structural Contents of Proteins from Their Amino Acid Composition Alone, I. New Analytic Vector Decomposition Methods, Proteins, 25:2, 157-168, 1996
[6]
Ganapathiraju M.K., et al., Characterization of Protein Secondary Structure, IEEE Signal Processing Magazine, 78-87, May 2004
[7]
Hobohm U., & Sander C., A Sequence Property Approach to Searching Protein Databases, J. of Molecular Biology, 251, 390-399, 1995
[8]
Krigbaum W., & Knutton S., Prediction of the Amount of Secondary Structure in a Globular Protein from its Amino Acid Composition, Proc. of the Nat. Academy of Science, 70, 2809-2813, 1973
[9]
Lodish H., et al., Molecular Cell Biology, 4th ed., W.H. Freeman & Company, New York, 50-54, 2000
[10]
Muskal S.M., & Kim S-H., Predicting Protein Secondary Structure Content: a Tandem Neural Network Approach, J. of Molecular Biology, 225, 713-727, 1992
[11]
Nelson D. & Cox M., Lehninger Principles of Biochemistry Amino, Worth Publish., 2000
[12]
Ruan J. et al., Highly Accurate and Consistent Method for Prediction of Helix and Strand Content from Primary Protein Sequences, Artificial Intelligence in Medicine, special issue on Computational Intelligence Techniques in Bioinformatics, accepted, 2005
[13]
Sreerama N., & Woody, R.W., Protein Secondary Structure from Circular Dichroism Spectroscopy, J Molecular Biology, 242, 497-507, 1994
[14]
Syed U., & Yona G., Using a Mixture of Probabilistic Decision Trees for Direct Prediction of Protein Function, Proc. of RECOMB 2003 Conf., 224-234, 2003
[15]
Wang, J., et al., Application of Neural Networks to Biological Data Mining: a Case Study in Protein Sequence Classification, Proc. of 6th ACM SIGKDD Inter. Conf. on Knowledge Discovery and Data Mining, 305-309, 2000
[16]
Yang, X., & Wang, B., Weave Amino Acid Sequences for Protein Secondary Structure Prediction, Proc. of 8th ACM SIGMOD workshop on Research issues in Data Mining and Knowledge Discovery, 80-87, 2003
[17]
Zhang, C.T., Zhang, Z., & He. Z., Prediction of the Secondary Structure of Globular Proteins Based on Structural Classes, J. of Protein Chemistry, 15, 775-786, 1996
[18]
Zhang, C.T., et al., Prediction of Helix/Strand Content of Globular Proteins Based on Their Primary Sequences, Protein Engineering, 11:11, 971-979, 1998a
[19]
Zhang C.T., Zhang Z., & He Z., Prediction of the Secondary Structure Contents of Globular Proteins based on Three Structural Classes, J Protein Chemistry, 17, 261-272, 1998b
[20]
Zhang Z.D., Sun Z.R., & Zhang C.T., A New Approach to Predict the Helix/Strand Content of Globular Proteins, J Theoretical Biology, 208, 65-78, 2001

Cited By

View all
  • (2013)A Combination of Feature Extraction Methods with an Ensemble of Different Classifiers for Protein Structural Class Prediction ProblemIEEE/ACM Transactions on Computational Biology and Bioinformatics10.1109/TCBB.2013.6510:3(564-575)Online publication date: 1-May-2013
  • (2007)A feature selection algorithm based on graph theory and random forests for protein secondary structure predictionProceedings of the 3rd international conference on Bioinformatics research and applications10.5555/1759681.1759735(590-600)Online publication date: 7-May-2007
  • (2006)Prediction of structural classes for protein sequences and domains-Impact of prediction algorithms, sequence representation and homology, and test procedures on accuracyPattern Recognition10.1016/j.patcog.2006.02.01439:12(2323-2343)Online publication date: 1-Dec-2006

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
MLDM'05: Proceedings of the 4th international conference on Machine Learning and Data Mining in Pattern Recognition
July 2005
691 pages
ISBN:3540269231
  • Editors:
  • Petra Perner,
  • Atsushi Imiya

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 09 July 2005

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 07 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2013)A Combination of Feature Extraction Methods with an Ensemble of Different Classifiers for Protein Structural Class Prediction ProblemIEEE/ACM Transactions on Computational Biology and Bioinformatics10.1109/TCBB.2013.6510:3(564-575)Online publication date: 1-May-2013
  • (2007)A feature selection algorithm based on graph theory and random forests for protein secondary structure predictionProceedings of the 3rd international conference on Bioinformatics research and applications10.5555/1759681.1759735(590-600)Online publication date: 7-May-2007
  • (2006)Prediction of structural classes for protein sequences and domains-Impact of prediction algorithms, sequence representation and homology, and test procedures on accuracyPattern Recognition10.1016/j.patcog.2006.02.01439:12(2323-2343)Online publication date: 1-Dec-2006

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media