Abstract
Machine Learning methods have been widely used in bioinformatics, mainly for data classification and pattern recognition. The detection of genes in DNA sequences is still an open problem. Identifying the promoter region laying prior the gene itself is an important aid to detect a gene. This paper aims at applying several Machine Learning methods to the construction of classifiers for detection of promoters in the DNA of Escherichia coli. A thorough comparison of methods was done. In general, probabilistic and neural network-based methods were those that performed better regarding accuracy rate.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Baldi, P., Brunak, S., Chauvin, Y., Andersen, C.A.F., Nielsen, H.: Assessing the Accuracy of Prediction Algorithms for Classification: an Overview. Bioinformatics 16, 412–424 (2000)
Durbin, R., Eddy, S., Krogh, A., Mitchison, G.: Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, Cambridge (1998)
Ephraim, Y., Merhav, N.: Hidden Markov Processes. IEEE T. Inform. Theory 48, 1518–1569 (2002)
Fawcett, T.: An Introduction to ROC Analysis. Pattern Recogn. Lett. 27, 861–874 (2006)
Harley, C., McClure, W.: Compilation and Analysis of Escherichia coli Promoter DNA Sequences. Nucleic Acids Res. 11, 2237–2255 (1983)
Kohavi, R.: A Study of Cross-validation and Bootstrap for Accuracy Estimation and Model Selection. In: 14th Int. Joint Conf. on Artificial Intelligence, pp. 1137–1143 (1995)
Nelson, D.L., Cox, M.M.: Lehninger Principles of Biochemistry, 4th edn. W.H. Freeman, Chicago (2006)
Matthews, B.W.: Comparison of the Predicted and Observed Secondary Structure of T4 Phage Lysozyme. Biochim. Biophys. Acta 405, 442–451 (1975)
Mount, D.W.: Bioinformatics: Sequence and Genome Analysis. CSHL Press, Woodbury (2001)
Platt, J.: Fast Training of Support Vector Machines Using Sequential Minimal Optimization. In: Schoelkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods - Support Vector Learning. MIT Press, Cambridge (1998)
Rennie, J., Shih, L., Teevan, J., Karger, D.: Tackling the Poor Assumptions of Naive Bayes Text Classifiers. In: Fawcett, T., Mishra, N. (eds.) 20th Int. Conf. on Machine Learning, pp. 616–623. AAAI Press, Menlo Park (2003)
Sing, T., Sander, O., Beerenwinke, N., Lengauer, T.: ROCR: Visualizing Classifier Performance in R. Bioinformatics 21, 3940–3941 (2005)
Towell, G., Shavlik, J., Noordewier, M.: Refinement of Approximate Domain Theories by Knowledge-based Artificial Neural Networks. In: 8th National Conference on Artificial Intelligence, pp. 861–866. AAAI Press, Menlo Park (1990)
Weinert, W., Lopes, H.S.: Neural Networks for Protein Classification. Appl. Bioinformatics 3, 41–48 (2004)
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)
Wolfsberg, T., McEntyre, J., Schuler, G.: Guide to the Draft Human Genome. Nature 409, 824–826 (2001)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Tavares, L.G., Lopes, H.S., Erig Lima, C.R. (2008). A Comparative Study of Machine Learning Methods for Detecting Promoters in Bacterial DNA Sequences. In: Huang, DS., Wunsch, D.C., Levine, D.S., Jo, KH. (eds) Advanced Intelligent Computing Theories and Applications. With Aspects of Artificial Intelligence. ICIC 2008. Lecture Notes in Computer Science(), vol 5227. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85984-0_115
Download citation
DOI: https://doi.org/10.1007/978-3-540-85984-0_115
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85983-3
Online ISBN: 978-3-540-85984-0
eBook Packages: Computer ScienceComputer Science (R0)