[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2649387.2660824acmconferencesArticle/Chapter ViewAbstractPublication PagesbcbConference Proceedingsconference-collections
research-article

Feature subset selection for inferring relative importance of taxonomy

Published: 20 September 2014 Publication History

Abstract

Examining the bacterial or functional differences between multiple habitats/populations/phenotypes plays an important role in making inferences about the roles that the taxonomy and functional profiles can take on in microbial ecology. It is therefore important to the field of comparative metagenomics, using α- & β-diversity, that methods or algorithms can detect the importance of particular subsets of variables that best differentiate the multiple phenotypes in the data. Given todays genomic data deluge efficient methods that can carry out these inferences cannot be understated enough. We assume observations are collected from a multitude of different environments (e.g., males vs. females, control vs. stimulus, etc.), and each observation is comprised of hundreds or thousands of different taxa/functional features (i.e., 16S or whole genome shotgun). Our goal in this work is to examine the role, assumptions, and inferences that feature subset selection can provide the field of microbial ecology and comparative metagenomics. Specifically we examine feature subset selection algorithms using embedded and filter approaches to infer taxa importance on data collected from the human gut microbiome We compare several widely adopted approaches from machine learning including greedy algorithms and l1 regularization methods, as well as some software tools provided with QIIME, on data collected from the American Gut Project and other canonical studies of the human gut microbiome. We find that there are very few OTUs that carry information in regards to predicting the sex of a gut sample, and that Bacteroidetes is quite frequently found in the top ranked OTUs.

References

[1]
M. Arumugam, J. Raes, E. Pelletier, D. Le Paslier, T. Yamada, D. R. Mende, G. R. Fernandes, J. Tap, T. Bruls, J.-M. Batto, et al. Enterotypes of the human gut microbiome. Nature, 473:174--180, 2011.
[2]
G. Brown, A. Pocock, M.-J. Zhao, and M. Luján. Conditional likelihood maximisation: A unifying framework for information theoretic feature selection. Journal of Machine Learning Research, 13:27--66, 2012.
[3]
J. G. Caporaso, J. Kuczynski, J. Stombaugh, K. Bittinger, F. D. Bushman, E. K. Costello, N. Fierer, A. G. Peña, J. K. Goodrich, J. I. Gordon, G. A. Huttley, S. T. Kelley, D. Knights, J. E. Koenig, R. E. Ley, C. A. Lozupone, D. McDonald, B. D. Muegge, M. Pirrung, J. Reeder, J. R. Sevinsky, P. J. Turnbaugh, W. A. Walters, J. Widmann, T. Yatsunenko, J. Zaneveld, and R. Knight. QIIME allows analysis of high-throughput community sequencing data. Nature Methods, 7:335--336, 2010.
[4]
J. G. Caporaso, C. L. Lauber, E. K. Costello, D. Berg-Lyons, A. Gonzalez, J. Stombaugh, D. Knights, P. Gajer, J. Ravel, N. Fierer, J. I. Gordon, and R. Knight. Moving pictures of the human microbiome. Genome Biology, 12(5), 2011.
[5]
Department of Energy. DOE systems biology knowledge base, 2013.
[6]
G. Ditzler, R. Polikar, and G. Rosen. Forensic identification using environmental samples. In International Conference on Acoustics, Speech and Signal Processing, pages 1861--1864, 2012.
[7]
G. Ditzler, R. Polikar, and G. Rosen. Information theoretic feature selection for high dimensional metagenomic data. In International Workshop on Genomic Signal Processing and Statistics, 2012.
[8]
G. Ditzler, R. Polikar, and G. Rosen. A bootstrap based neyman--pearson test for identifying variable importance. IEEE Transactions on Neural Networks and Learning Systems, 2014.
[9]
N. Fierer, C. L. Lauber, N. Zhou, D. McDonald, E. K. Costello, and R. Knight. Forensic identification using skin bacterial communities. Proceedings of the National Academy of Sciences, 107(14):6477--6481, 2010.
[10]
E. Garbarine, J. DePasquale, V. Gadia, R. Polikar, and G. Rosen. Information-theoretic approaches to SVM feature selection for metagenome read classification. Computational Biology and Chemistry, 35:199--209, 2011.
[11]
J. Gilbert, F. Meyer, D. Antonopoulos, P. Balaji, C. T. Brown, C. Brown, N. Desai, J. A. Eisen, D. Evers, D. Field, W. Feng, D. Huson, J. Jansson, R. Knight, J. Knight, E. Kolker, K. Konstantindis, J. Kostka, N. Kyrpides, R. Mackelprang, A. McHardy, C. Quince, J. Raes, A. Sczyrba, A. Shade, and R. Stevens. Meeting Report: The Terabase Metagenomics Workshop and the Vision of an Earth Microbiome Project. Standards in Genomic Sciences, 3(3), 2010.
[12]
I. Guyon and A. Elisseeff. An introduction to variable and feature selection. Journal of Machine Learning Research, 3:1157--1182, 2003.
[13]
I. Guyon, S. Gunn, M. Nikravesh, and L. A. Zadeh. Feature Extraction: Foundations and Applications. Springer, 2006.
[14]
D. Knights, E. K. Costello, and R. Knight. Supervised classification of human microbiota. FEMS Microbiology Reviews, 35(2):343--359, 2011.
[15]
J. E. Koenig, A. Spor, N. Scalfone, A. D. Fricker, J. Stombaugh, R. Knight, L. T. Angenent, and R. E. Ley. Succession of microbial consortia in the developing infant gut microbiome. Proceedings of the National Academy of Sciences, pages 4578--4585, 2010.
[16]
D. D. Lewis. Feature selection and feature extraction for text categorization. In Proceedings of the Workshop on Speech and Natural Language, pages 212--217, 1992.
[17]
H. Liu, H. Motoda, R. Setiono, and Z. Zhao. Feature selection: An ever evolving frontier in data mining. In Workshop on Feature Selection in Data Mining, 2010.
[18]
Z. Liu, W. Hsiao, B. Cantarel, E. F. Drábek, and C. Fraser-Liggett. Sparse distance based learning for simultaneous multiclass classification and feature selection of metagenomic data. Oxford Bioinformatics, 27(23), 2011.
[19]
F. Meyer, D. Paarmann, M. D'Souza, R. Olson, E. M. Glass, M. Kubal, T. Paczian, A. Rodriguez, R. Stevens, A. Wilke, J. Wilkening, and R. A. Edwards. The metagenomics RAST server -- a public resource for the automatic phylogenetic and functional analysis of metagenomes. BMC Bioinformatics, 9(386), 2008.
[20]
National Research Council. Frontiers in Massive Data Analysis. National Academies Press, 2013.
[21]
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12(2825--2830), 2011.
[22]
J. Qin, R. Li, J. Raes, M. Arumugam, K. S. Burgdorf, C. Manichanh, T. Nielsen, N. Pons, F. Levenez, T. Yamada, D. R. Mende, J. Li, J. Xu, S. Li, D. Li, J. Cao, B. Wang, H. Liang, H. Zheng, Y. Xie, J. Tap, P. Lepage, M. Bertalan, J. M. Batto, T. Hansen, D. Le Paslier, A. Linneberg, H. B. Nielsen, E. Pelletier, P. Renault, T. Sicheritz-Ponten, K. Turner, H. Zhu, C. Yu, M. Jian, Y. Zhou, Y. Li, X. Zhang, N. Qin, H. Yang, J. Wang, S. Brunak, J. Dore, F. Guarner, K. Kristiansen, O. Pedersen, J. Parkhill, J. Weissenbach, P. Bork, and S. D. Ehrlich. A human gut microbial gene catalogue established by metagenomic sequencing. Nature, 464:59--65, 2010.
[23]
C.-H. Su, T.-Y. Wang, M.-T. Hsu, F. C.-H. Weng, C.-Y. Kao, D. Wang, and H.-K. Tsai. The impact of normalization and phylogenetic information on estimating the distance for metagenomes. IEEE Transactions on Computational Biology and Bioinformatics, 2(9):619--628, 2012.
[24]
The NIH HMP Working Group et al. The nih human microbiome project. Genome Research, 19(12):2317--2323, 2009.
[25]
R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of Royal Statistics Society, 58(1):267--288, 1996.
[26]
P. Turnbaugh, M. Hamady, T. Yatsunenko, B. Cantarel, A. Duncan, R. Ley, M. Sogin, W. Jones, B. Roe, J. Affourtit, M. Egholm, B. Henrissat, A. Heath, R. Knight, and J. Gordon. A core gut microbiome in obese and lean twins. Nature, 475:480--485, 2009.
[27]
M. Vidyasagar. Opportunities in the life sciences. IEEE Circuits and Systems Magazine, 2012.
[28]
P. Xu, M. Li, J. Zhang, and T. Zhang. Correlation of intestinal microbiota with overweight and obesity in kazakh school children. BMC Microbiology, 12(283), 2012.
[29]
H. Yang and J. Moody. Data visualization and feature selection: New algorithms for non-Gaussian data. In Advances in Neural Information Processing Systems, 1999.
[30]
T. Yatsunenko, F. E. Rey, M. J. Manary, I. Trehan, M. G. Dominguez-Bello, M. Contreras, M. Magris, G. Hidalgo, R. N. Baldassano, A. P. Anokhin, A. C. Heath, B. Warner, J. Reeder, J. Kuczynski, J. G. Caporaso, C. A. Lozupone, C. Lauber, J. C. Clemente, D. Knights, R. Knight, and J. Gordon. Human gut microbiome viewed across age and geography. Nature, 486:222--227, 2012.
[31]
Y. Zhai, Y.-S. Ong, and I. W. Tsang. The Emerging "Big Dimensionality". Computational Intelligence Magazine, 9(3):14--26, August 2014.
[32]
H. Zou and T. Hastie. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society, 67(2):301--320, 2005.

Cited By

View all
  • (2016)Using “Omics” and Integrated Multi-Omics Approaches to Guide Probiotic Selection to Mitigate Chytridiomycosis and Other Emerging Infectious DiseasesFrontiers in Microbiology10.3389/fmicb.2016.000687Online publication date: 2-Feb-2016

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
BCB '14: Proceedings of the 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics
September 2014
851 pages
ISBN:9781450328944
DOI:10.1145/2649387
  • General Chairs:
  • Pierre Baldi,
  • Wei Wang
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 September 2014

Check for updates

Qualifiers

  • Research-article

Conference

BCB '14
Sponsor:
BCB '14: ACM-BCB '14
September 20 - 23, 2014
California, Newport Beach

Acceptance Rates

Overall Acceptance Rate 254 of 885 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 14 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2016)Using “Omics” and Integrated Multi-Omics Approaches to Guide Probiotic Selection to Mitigate Chytridiomycosis and Other Emerging Infectious DiseasesFrontiers in Microbiology10.3389/fmicb.2016.000687Online publication date: 2-Feb-2016

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media