Abstract
The increase in the number and complexity of biological databases has raised the need for modern and powerful data analysis tools and techniques. In order to fulfill these requirements, the machine learning discipline has become an everyday tool in bio-laboratories. The use of machine learning techniques has been extended to a wide spectrum of bioinformatics applications. It is broadly used to investigate the underlying mechanisms and interactions between biological molecules in many diseases, and it is an essential tool in any biomarker discovery process.
In this chapter, we provide a basic taxonomy of machine learning algorithms, and the characteristics of main data preprocessing, supervised classification, and clustering techniques are shown. Feature selection, classifier evaluation, and two supervised classification topics that have a deep impact on current bioinformatics are presented. We make the interested reader aware of a set of popular web resources, open source software tools, and benchmarking data repositories that are frequently used by the machine learning community.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Prompramote S, Chen Y, Chen Y-PP. (2005) Machine learning in bioinformatics. In Bioinformatics Technologies (Chen Y-PP., ed.), Springer, Heidelberg, Germany, pp. 117–153.
Somorjai RL, Dolenko B, Baumgartner R. (2003) Class prediction and discovery using gene microarray and proteomics mass spectroscopy data: curses, caveats, cautions. Bioinformatics 19:1484–1491.
Larrañaga P, Calvo B, Santana R, Bielza C, Galdiano J, Inza I, Lozano JA, Armañanzas R, Santafé G, Pérez A, Robles V. (2006) Machine learning in bioinformatics. Briefings in Bioinformatics 7: 86–112.
Alpaydin E. (2004) Introduction to Machine Learning, MIT Press, Cambridge, MA.
Mitchell T. (1997) Machine Learning, McGraw Hill, New York.
Causton HC, Quackenbush J, Brazma A. (2003) A Beginner’s Guide. Microarray Gene Expression Data Analysis, Blackwell Publishing, Oxford.
Parmigiani G, Garett ES, Irizarry RA, Zeger SL. (2003) The Analysis of Gene Expression Data, Springer-Verlag, New York.
Hilario M, Kalousis A, Pellegrini C, Muller M. (2006) Processing and classification of protein mass spectra. Mass Spectrometry Rev 25:409–449.
Shin H, Markey M. (2006) A machine learning perspective on the development of clinical decision support systems utilizing mass spectra of blood samples. J Biomed Inform 39:227–248.
Fayyad UM, Irani KB. (1993) Multi-interval discretization of continuous-valued attributes for classification learning. In Proceedings of the 13th International Joint Conference on Artificial Intelligence, pp. 1022–1029.
Friedman N, Geiger D, Goldszmidt M. (1997) Bayesian network classifiers. Mach Learn 29:131–163.
Witten IH, Frank E. (2005) Data Mining. Practical Machine Learning Tools and Techniques (2nd ed.), Morgan Kaufmann, San Francisco.
Dietterich TG. (1998) Approximate statistical test for comparing supervised classification learning algorithms. Neural Comp 10:1895–1923.
Sima C, Braga-Neto U, Dougherty E. (2005) Superior feature-set ranking for small samples using bolstered error estimation. Bioinformatics 21:1046–1054.
Kanji GK. (2006) 100 Statistical Tests, SAGE Publications, Thousand Oaks, CA.
Demsar J. (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30.
Liu H, Motoda H. (2007) Computational Methods of Feature Selection, Chapman and Hall–CRC Press, Boca Raton, FL.
Saeys Y, Inza I, Larrañaga P. (2007) A review of feature selection methods in bioinformatics. Bioinformatics 23:2507–2517.
Sheng Q, Moreau Y, De Smet F, Marchal K, De Moor B. (2005) Advances in cluster analysis of microarray data. In Data Analysis and Visualization in Genomics and Proteomics (Azuaje F, Dopazo J, Eds.), Wiley, New York, pp. 153–173.
Cheng Y, Church GM. (2000) Biclustering of expression data. In Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology, pp. 93–103.
Kdnuggets: Data Mining, Web Mining and Knowledge Discovery (2008) http://www.kdnuggets.com
Kmining: Business Intelligence, Knowledge Discovery in Databases and Data Mining News (2008) http://www.kmining.com
Google Group – Machine Learning News (2008) http://groups.google.com/group/ML-news/
Kohavi R, Sommerfield D, Dougherty J. (1997) Data mining using MLC++, a machine learning library in C++. Int J Artif Intell Tools 6:537–566.
Dalgaard R. (2002) Introductory Statistics with R, Springer, New York.
Gentleman R, Carey VJ, Huber W, Irizarry RA, Dudoit S. (2005) Bioinformatics and Computational Biology Solutions Using R and Bioconductor, Springer, New York.
Mierswa I, Wurst M, Klinkerberg R, Scholz M, Euler T. (2006) YALE: Rapid prototyping for complex data mining tasks. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 935–940.
Demsar J, Zupan B, Leban G. (2004) Orange: From Experimental Machine Learning to Interactive Data Mining, White Paper, Faculty of Computer and Information Science, University of Ljubljana, Slovenia.
Asunción A, Newman DJ. (2008) UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences. http://archive.ics.uci.edu/ml/
Hettich S, Bay SD. (1999) The UCI KDD Archive, University of California, Irvine, School of Information and Computer Sciences. http://kdd.ics.uci.edu
Swivel project – Tasty Data Goodies (2008) http://www.swivel.com
Kent Ridge Biomedical Data Set Repository (2008) http://research.i2r.a-star.edu.sg/rp/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Humana Press, a part of Springer Science+Business Media, LLC
About this protocol
Cite this protocol
Inza, I., Calvo, B., Armañanzas, R., Bengoetxea, E., Larrañaga, P., Lozano, J.A. (2010). Machine Learning: An Indispensable Tool in Bioinformatics. In: Matthiesen, R. (eds) Bioinformatics Methods in Clinical Research. Methods in Molecular Biology, vol 593. Humana Press. https://doi.org/10.1007/978-1-60327-194-3_2
Download citation
DOI: https://doi.org/10.1007/978-1-60327-194-3_2
Published:
Publisher Name: Humana Press
Print ISBN: 978-1-60327-193-6
Online ISBN: 978-1-60327-194-3
eBook Packages: Springer Protocols