Abstract
Driven by the challenge of integrating large amount of experimental data, classification technique emerges as one of the major and popular tools in computational biology and bioinformatics research. Machine learning methods, especially kernel methods with Support Vector Machines (SVMs) are very popular and effective tools. In the perspective of kernel matrix, a technique namely Eigenmatrix translation has been introduced for protein data classification. The Eigen-matrix translation strategy has a lot of nice properties which deserve more exploration. This paper investigates the major role of Eigen-matrix translation in classification. The authors propose that its importance lies in the dimension reduction of predictor attributes within the data set. This is very important when the dimension of features is huge. The authors show by numerical experiments on real biological data sets that the proposed framework is crucial and effective in improving classification accuracy. This can therefore serve as a novel perspective for future research in dimension reduction problems.
Similar content being viewed by others
References
Fielding A H, Cluster and Classification Techniques for the Biosciences, 1st Edition Cambridge, U.K., 2007.
Watanabe S, Knowing and Guessing: A Quantitative Study of Inference and Information, New York U.S.A., 1969.
Agrawal R, Gehrke J, Gunopulos D and Raghavan R, Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications, Proceedings of the 1998 ACM-SIGMOD International Conference on the Management of Data (SIGMOD98), Seattle, WA, June 2–4, 1998.
Dy J and Brodley C E, Feature subset selection and order identification for unsupervised learning, The Seventeenth International Conference on Machine Learning, Stanford, CA, USA, June 29, 2000.
Schölkopf B and Smola A J, A short introduction to learning with kernels, Advanced Lectures on Machine Learning, New York, U.S., 2003.
Borgwardt K and Kriegel H, Kernel Methods for Protein Function Prediction, AFP-SIG, Detroit, USA: Oxford, 2005.
Jaakola T, Diekhans M, and Haussler D, A discriminant framework for detecting remote protein homologies, Journal of Computational Biology, 2000, 7: 95–114.
Shawe-Taylor J and Cristianini N, Kernel Methods for Pattern Analysis, Cambridge University Press, 2004.
Leslie C, Eskin E, Cohen A, and Noble W, The spectrum kernel: A string kernel for SVM protein classification, Proceedings of the Pacific Biocomputing Symposium, Hawaii, 2002.
Leslie C, Eskin E, Weston J, and Noble W, Mismatch string kernel for discriminative protein classification, Bioinformatics, 2004, 20: 467–476.
Yuan Y, Lin L, Dong Q, Wang X, and Li M, A protein classification method based on latent semantic analysis, Proceedings of the 2005 IEEE Engineering in Medicine and Biology 27th Annual Conference, Shanghai, 2005.
Ratsch G, Sonnenburg S, and Scolkopf B, RASE: Recognition of alternatively spliced exons in c. elegans, Bioinformatics, 2005, 21: 1369–1377.
Webb-Robertson B, Ratuiste K, and Oehmen C, Physicochemical property distributions for accurate and rapid pairwise protein homology detection, BMC Bioinformatics, 2010, 11: 145.
Jiang H and Ching W, Physico-chemically weighted kernel for SVM protein classification, Proceedings of the 2nd International Conference on Biomedical Engineering and Computer Science (ICBECS 2011), 23–24 April, Wuhan, China, 2011.
Horn R and Johnson C, Matrix Analysis, Cambridge University Press Cambridge, 1985.
Donoho D, High-dimensional data analysis: The curses and blessings of dimensionality, American Mathematical Society Conference of Math Challenges of the 21st Century, Los Angeles, August, 2000.
Bellman R, Adaptive Control Processes: A Guided Tour, Princeton University Press Princeton, New Jersey, 1961.
Breiman L, Random forests, Machine Learning, 2001, 45: 5–32.
Jiang H and Ching W, Kernel techniques in support vector machines for classification of biological data, International Journal of Information Technology and Computer Science, 2011, 3: 1–8.
He H, Eigenvectors and reconstruction, The Electronic Journal of Combinatorics, 2007, 14: 1–8.
Functional Glycomics Gateway, Available at http://www.functionalglycomics.org.
Yang Y, Lin L, Dong Q, Wang X, and Li M, Remote protein homology detection using recurrence quantification analysis and amino acid physicochemical properties, Journal of Theorietical Biology, 2008, 252: 145–154.
http://hkumath.hku.hk/~wkc/papers/ieeeadditionalfile1.pdf.
Mamitsuka H, Selecting features in microarray classification using ROC curves, Pattern Recognition, 2006, 39: 2393–2404.
Fan J Q and Fan Y Y, High-dimensional classification using features annealed independence rules, Annals of Statistics, 2008, 36: 2605–2637.
Jiang H and Ching W, The role of eigen-matrix translation in classification of biological datasets, Proceedings of the IEEE International Conference on Bioinformatics & Biomedicine (BIBM 2012) 2012, Philadelphia, U.S., 2012.
Author information
Authors and Affiliations
Corresponding author
Additional information
This research work was supported by Research Grants Council of Hong Kong under Grant No. 17301214 and HKU CERG Grants, Fundamental Research Funds for the Central Universities, and the Research Funds of Renmin University of China, Hung Hing Ying Physical Research Grant, and the Natural Science Foundation of China under Grant No. 11271144.
This paper was recommended for publication by Editor GAO Xiao-Shan.
Rights and permissions
About this article
Cite this article
Jiang, H., Qiu, Y., Cheng, X. et al. On Eigen-matrix translation method for classification of biological data. J Syst Sci Complex 28, 1212–1230 (2015). https://doi.org/10.1007/s11424-015-3043-2
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11424-015-3043-2