Abstract
The phylogenetic classification of DNA fragments from a variety of microorganisms is often performed in metagenomic analysis to understand the taxonomic composition of microbial communities. A faster method for taxonomic classification based on metagenomic reads is required with the improvement of DNA sequencer’s throughput in recent years. In this research we focus on naïve Bayes, which can quickly classify organisms with sufficient accuracy, and we have developed an acceptably fast, yet more accurate classification method using improved naïve Bayes, Weightily Averaged One-Dependence Estimators (WAODE). Additionally, we accelerated WAODE classification by introducing a cutoff for the mutual information content, and achieved a 20 times faster classification speed while keeping comparable prediction accuracy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Daniel, R.: The Soil Metagenome – A Rich Resource for The Discovery of Novel Natural Products. Current Opinion in Biotechnology 15, 199–204 (2004)
Tyson, G.W., Chapman, J., Hugenholtz, P., Allen, E.E., Raml, R.J., et al.: Community Structure and Metabolism Through Reconstruction of Microbial Genomes From the Environment. Nature 428, 37–43 (2004)
Qin, J., Li, R., Raes, J., Arumugam, M., Burgdorf, K.S., et al.: A Human Gut Microbial Gene Catalogue Established by Metagenomic Sequencing. Nature 464, 59–65 (2010)
Mchardy, A.C., Martin, H.G., et al.: Accurate Phylogenetic Classification of Variable-Length DNA Fragments. Nature Methods 4(1), 63–72 (2007)
Rosen, G., Garbarine, E., Caseiro, D., Polikar, R., Sokhansanj, B.: Metagenome Fragment Classification Using N-Mer Frequency Profiles. Advances in Bioinformatics 2008(20), 59–69 (2008)
Brady, A., Salzberg, S.L., et al.: Metagenomic Phylogenetic Classification with Interpolated Markov Models. Nature Methods 6(9), 673–676 (2009)
Diaz, N.N., Krause, L., Goesmann, A., Niehaus, K., Nattkemper, T.W.: TACOA – Taxonomic Classification of Environmental Genomic Fragments Using a Kernelized Nearest Neighbor Approach. BMC Bioinformatics 10, 56 (2009)
Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic Local Alignment Search Tool. Journal of Molecular Biology 215(3), 403–410 (1990)
Huson, D.H., Auch, A.F., Qi, J., Schuster, S.C.: MEGAN Analysis of Metagenomic Data. Genome Research 17(3), 377–386 (2007)
Suzuki, S., Ishida, T., Kurokawa, K., Akiyama, Y.: GHOSTM: A GPU-Accelerated Homology Search Tool for Metagenomics. Plos One 7(5), E36060 (2012)
Zhao, Y., Tang, H., Ye, Y.: Rapsearch2: A Fast And Memory-Efficient Protein Similarity Search Tool for Next-Generation Sequencing Data. Bioinformatics 23(1), 125–126 (2012)
Jiang, L., Zhang, H.: Weightily Averaged One-Dependence Estimators. In: Yang, Q., Webb, G. (eds.) PRICAI 2006. LNCS (LNAI), vol. 4099, pp. 970–974. Springer, Heidelberg (2006)
Koc, L., Mazzuchi, T.A., Sarkani, S.: A Network Intrusion Detection System Based on a Hidden Naive Bayes Multiclass Classifier. Expert Systems with Applications 39(18), 13492–13500 (2012)
Chang, C.-C., Lin, C.-J.: LIBSVM: A Library for Support Vector Machines. ACM Transactions on Intelligent Systems and Technology 2(27), 1–27 (2011)
Richter, D.C., Ott, F., Auch, A.F., Schmid, R., Huson, D.H.: Metasim - A Sequence Simulator for Genomics and Metagenomics. Plos One 3(10), P.E3373 (2008)
Zweig, M.H., Campbell, G.: Receiver-Operating Characteristic (ROC) Plots: a Fundamental Evaluation Tool in Clinical Medicine. Clinical Chemistry 39, 561–577 (1993)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Komatsu, Y., Ishida, T., Akiyama, Y. (2014). Metagenomic Phylogenetic Classification Using Improved Naïve Bayes. In: Huang, DS., Han, K., Gromiha, M. (eds) Intelligent Computing in Bioinformatics. ICIC 2014. Lecture Notes in Computer Science(), vol 8590. Springer, Cham. https://doi.org/10.1007/978-3-319-09330-7_32
Download citation
DOI: https://doi.org/10.1007/978-3-319-09330-7_32
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-09329-1
Online ISBN: 978-3-319-09330-7
eBook Packages: Computer ScienceComputer Science (R0)