[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article

Structural SCOP Superfamily Level Classification Using Unsupervised Machine Learning

Published: 01 March 2012 Publication History

Abstract

One of the major research directions in bioinformatics is that of assigning superfamily classification to a given set of proteins. The classification reflects the structural, evolutionary, and functional relatedness. These relationships are embodied in a hierarchical classification, such as the Structural Classification of Protein (SCOP), which is mostly manually curated. Such a classification is essential for the structural and functional analyses of proteins. Yet a large number of proteins remain unclassified. In this study, we have proposed an unsupervised machine learning approach to classify and assign a given set of proteins to SCOP superfamilies. In the method, we have constructed a database and similarity matrix using P-values obtained from an all-against-all BLAST run and trained the network with the ART2 unsupervised learning algorithm using the rows of the similarity matrix as input vectors, enabling the trained network to classify the proteins from 0.82 to 0.97 f-measure accuracy. The performance of ART2 has been compared with that of spectral clustering, Random forest, SVM, and HHpred. ART2 performs better than the others except HHpred. HHpred performs better than ART2 and the sum of errors is smaller than that of the other methods evaluated.

References

[1]
A.G. Murzin, S.E. Brenner, T. Hubbard, and C. Chothia, "SCOP: A Structural Classification of Proteins Database for the Investigation of Sequences and Structures," J. Molecular Biology, vol. 247, no. 4, pp. 536-540, 1995.
[2]
C.A. Orengo, A.D. Michie, S. Jones, D.T. Jones, M.B. Swindells, and J.M. Thornton, "CATH--A Hierarchic Classification of Protein Domain Structures," Structure, vol. 5, no. 8, pp. 1093-1108, 1997.
[3]
L. Holm and C. Sander, "Dali/FSSP Classification of Three-Dimensional Protein Folds," Nucleic Acid Research, vol. 25, no. 1, pp. 231-234, 1997.
[4]
P. Jain, J.M. Garibaldi, and J.D. Hirst, "Supervised Machine Learning Algorithms for Protein Structure Classification," Computational Biology and Chemistry, vol. 33, no. 3, pp. 216-223, 2009.
[5]
J. Gough and C. Chothia, "SUPERFAMILY: HMMs Representing All Proteins of Known Structure, SCOP Sequence Searches, Alignments and Genome Assignments," Nucleic Acids Research, vol. 30, no. 1, pp. 268-272, 2002.
[6]
S. Cheek, Y. Qi, S.S. Krishna, L.N. Kinch, and N.V. Grishin, "SCOPmap: Automated Assignment of Protein Structures to Evolutionary Superfamilies," BMC Bioinformatics, vol. 5, article 197. 2004.
[7]
O. Camoglu, T. Can, A.K. Singh, and Y.F. Wang, "Decision Tree Based Information Integration for Automated Protein Classification," J. Bioinformatics and Computational Biology, vol. 3, no. 3, pp. 717-742, 2005.
[8]
A. Paccanaro, J.A. Casbon, and M.A. Saqi, "Spectral Clustering of Protein Sequences," Nucleic Acids Research, vol. 34, no. 5, pp. 1571-1580, 2006.
[9]
J.E. Gewehr, V. Hintermair, and R. Zimmer, "AutoSCOP: Automated Prediction of SCOP Classifications Using Unique Pattern_Class Mappings," Bioinformatics, vol. 23, no. 10, pp. 1203-1210, 2007.
[10]
L. Liao and W.S. Noble, "Combining Pairwise Sequence Similarity and Support Vector Machines for Detecting Remote Protein Evolutionary and Structural Relationship," J. Computational Biology, vol. 10, no. 6, pp. 857-868, 2003.
[11]
Y.J. Kim and J.M. Patel, "A Framework for Protein Structure Classification and Identification of Novel Protein Structures," BMC Bioinformatics, vol. 7, article 456. 2006.
[12]
G. Csaba, F. Birzele, and R. Zimmer, "Systematic Comparison of SCOP and CATH: A New Gold Standard for Protein Structure Analysis," BMC Structural Biology, vol. 9, article 23. 1472-6807-9-23, 2009.
[13]
J. Soding, A. Biegert, and A.N. Lupas, "The HHpred Interactive Server for Protein Homology Detection and Structure Prediction," Nucleic Acids Research, vol. 33, pp. W244-W248, 2005.
[14]
J. Soding, "Protein Homology Detection by HMM-HMM Comparison," Bioinformatics, vol. 21, no. 7, pp. 951-960, 2005.
[15]
O. Sasson, A. Vaaknin, H. Fleischer, E. Portugaly, Y. Bilu, N. Linial, and M. Linial, "ProtoNet: Hierarchical Classification of the Protein Space," Nucleic Acid Research, vol. 31, no. 1, pp. 348-352, 2003.
[16]
O. Shachar and M. Linial, "A Robust Method to Detect Structural and Functional Remote Homologues," Proteins, vol. 57, no. 3, pp. 531-538, 2004.
[17]
A. Krause, J. Stoye, and M. Vingron, "The SYSTERS Protein Sequence Cluster Set," Nucleic Acid Research, vol. 28, no. 1, pp. 270-272, 2000.
[18]
A. Heger and L. Holm, "Picasso: Generating a Covering Set of Protein Family Profiles," Bioinformatics, vol. 17, no. 3, pp. 272-279, 2001.
[19]
R.O. Duda, P.E. Hart, and D.G. Stork, Pattern Classification, pp. 14- 16, Wiley, 2007.
[20]
P. Baldi and S. Brunak, Bioinformatics: The Machine Learning Approach, second ed., pp. 20-45, MIT Press, 2001.
[21]
G.A. Carpenter and S. Grossberg, "ART 2: Self-Organization of Stable Category Recognition Codes for Analog Input Patterns," Applied Optics, vol. 26, no. 23, pp. 4919-4930, 1987.
[22]
S.F. Altschul, W. Gish, W. Miller, E.W. Myers, and D.J. Lipman, "Basic Local Alignment Search Tool," J. Molecular Biology, vol. 215, no. 3, pp. 403-410, 1990.
[23]
L. Fausett, Fundamentals of Neural Networks, pp. 264-300, Pearson Education, 2006.
[24]
J. Chandonia, G. Hon, N.S. Walker, L. Lo Conte, P. Koehl, M. Levitt, and S.E. Brenner, "The ASTRAL Compendium in 2004," Nucleic Acids Research, vol. 32, pp. D189-D192, 2004.
[25]
M.O. Dayhoff, R.M. Schwartz, and B.C. Orcutt, "A Model of Evolutionary Change in Proteins," Atlas of Protein Sequence and Structure, vol. 5, no. suppl. 3, pp. 345-351, 1978.
[26]
S. Henikoff and J.G. Henikoff, "Amino Acid Substitution Matrices from Protein Blocks," Proc. Nat'l Academy of Sciences USA, vol. 89, no. 22, pp. 10915-10919, 1992.
[27]
Y. Yang and X. Liu, "A Re-Examination of Text Categorization Methods," Proc. ACM SIGIR Conf. Research and Development in Information Retrieval, pp. 42-49, 1999.
[28]
C.J. van Rijsbergen, Information Retrieval, second ed., pp. 112-135, Butterworths, 1979.
[29]
G. Yona, N. Linial, and M. Linial, "Protomap: Automatic Classification of Protein Sequences and Hierarchy of Protein Families," Nucleic Acid Research, vol. 28, no. 1, pp. 49-55, 2000.
[30]
C. Tung and J. Yang, "Fastscop: A Fast Web Server for Recognizing Protein Structural Domains and SCOP Superfamilies," Nucleic Acids Research, vol. 35, pp. W438-W443. nar/gkm288, 2007.
[31]
E. Bolten, A. Schliep, S. Schneckener, D. Schomburg, and R. Schrader, "Clustering Protein Sequences-Structure Prediction by Transitive Homology," Bioinformatics, vol. 17, no. 10, pp. 935-941, 2001.
[32]
J. Liu and B. Rost, "Domains, Motifs and Clusters in the Protein Universe," Current Opinion in Chemical Biology, vol. 7, pp. 5-11, 2003.

Cited By

View all
  • (2021)Develop and implement unsupervised learning through hybrid FFPA clustering in large-scale datasetsSoft Computing - A Fusion of Foundations, Methodologies and Applications10.1007/s00500-020-05140-y25:1(277-290)Online publication date: 1-Jan-2021
  • (2015)An improved ART2 neural networkNeurocomputing10.1016/j.neucom.2014.12.055156:C(239-244)Online publication date: 25-May-2015
  • (2014)Clustering and group selection of interim product in shipbuildingJournal of Intelligent Manufacturing10.1007/s10845-013-0737-y25:6(1393-1401)Online publication date: 1-Dec-2014

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE/ACM Transactions on Computational Biology and Bioinformatics
IEEE/ACM Transactions on Computational Biology and Bioinformatics  Volume 9, Issue 2
March 2012
316 pages

Publisher

IEEE Computer Society Press

Washington, DC, United States

Publication History

Published: 01 March 2012
Published in TCBB Volume 9, Issue 2

Author Tags

  1. ART2 neural network
  2. Protein classification
  3. SCOP
  4. unsupervised learning.

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)1
Reflects downloads up to 03 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2021)Develop and implement unsupervised learning through hybrid FFPA clustering in large-scale datasetsSoft Computing - A Fusion of Foundations, Methodologies and Applications10.1007/s00500-020-05140-y25:1(277-290)Online publication date: 1-Jan-2021
  • (2015)An improved ART2 neural networkNeurocomputing10.1016/j.neucom.2014.12.055156:C(239-244)Online publication date: 25-May-2015
  • (2014)Clustering and group selection of interim product in shipbuildingJournal of Intelligent Manufacturing10.1007/s10845-013-0737-y25:6(1393-1401)Online publication date: 1-Dec-2014

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media