[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2808719.2808749acmconferencesArticle/Chapter ViewAbstractPublication PagesbcbConference Proceedingsconference-collections
research-article

SAME: a sampling-based multi-locus epistasis algorithm for quantitative genetic trait prediction

Published: 09 September 2015 Publication History

Abstract

Quantitative genetic trait prediction based on high-density genotyping arrays plays an important role for plant and animal breeding. The prediction can be very helpful to develop breeding strategies. Epistasis, the phenomena where the SNPs interact with each other, has been studied extensively in Genome Wide Association Studies (GWAS) but received relatively less attention for quantitative genetic trait prediction. As the number of possible interactions is in general extremely large, even pairwise interactions is very challenging. To our knowledge, there is no solid solution yet to utilize epistasis to improve genetic trait prediction. In this work, we studied the multi-locus epistasis problem where the interactions with more than two SNPs are considered. We first analyzed the traditional additive epistasis model and we showed that the model has certain limitations, such as it prefers epistasis interactions with more SNPs, and it does not assume a small set of very significant epistasis effects. Then we developed an efficient algorithm SAME to improve the genetic trait prediction with the help of multi-locus epistasis. Our experiments on both simulated and real data showed that SAME is not only efficient but also effective to improve the genetic trait prediction. SAME also achieved very significant improvements on a real plant data set. To our knowledge, this is the first work that is able to achieve such significant improvements with the help of epistasis.

References

[1]
J. Bien, J. Taylor, R. Tibshirani, et al. A lasso for hierarchical interactions. The Annals of Statistics, 41(3):1111--1141, 2013.
[2]
S. S. Chen, D. L. Donoho, Michael, and A. Saunders. Atomic decomposition by basis pursuit. SIAM Journal on Scientific Computing, 20:33--61, 1998.
[3]
M. A. Cleveland, J. M. Hickey, and S. Forni. A common dataset for genomic analysis of livestock populations. G3: Genes| Genomes| Genetics, 2(4):429--435, 2012.
[4]
N. R. Cook, R. Y. Zee, and P. M. Ridker. Tree and spline based association analysis of gene--gene interaction models for ischemic stroke. Statistics in medicine, 23(9):1439--1453, 2004.
[5]
G. Fang, M. Haznadar, W. Wang, H. Yu, M. Steinbach, T. R. Church, W. S. Oetting, B. Van Ness, and V. Kumar. High-order snp combinations associated with complex diseases: efficient discovery, statistical power and functional interactions. PloS one, 7(4):e33531, 2012.
[6]
D. He, Z. Wang, and L. Parada. Mined: An efficient mutual information based epistasis detection method to improve quantitative genetic trait prediction. In Bioinformatics Research and Applications, pages 108--124. Springer, 2015.
[7]
J. R. Kilpatrick. Methods for detecting multi-locus genotype-phenotype association. PhD thesis, RICE UNIVERSITY, 2009.
[8]
K. Kizilkaya, R. Fernando, and D. Garrick. Genomic prediction of simulated multibreed and purebred performance using observed fifty thousand single nucleotide polymorphism genotypes. Journal of animal science, 88(2):544--551, 2010.
[9]
A. Legarra, C. Robert-Granié, P. Croiseau, F. Guillaume, S. Fritz, et al. Improved lasso for genomic selection. Genetics research, 93(1):77, 2011.
[10]
J. Marchini, P. Donnelly, and L. R. Cardon. Genome-wide strategies for detecting multiple loci that influence complex diseases. Nature genetics, 37(4):413--417, 2005.
[11]
T. Meuwissen, B. Hayes, and M. Goddard. Prediction of total genetic value using genome-wide dense marker maps. Genetics, 157:1819--1829, 2001.
[12]
T. Park and G. Casella. The bayesian lasso. Journal of the American Statistical Association, 103:681--686, June 2008.
[13]
K. A. Pattin, B. C. White, N. Barney, J. Gui, H. H. Nelson, K. T. Kelsey, A. S. Andrew, M. R. Karagas, and J. H. Moore. A computationally efficient hypothesis testing method for epistasis analysis using multifactor dimensionality reduction. Genetic epidemiology, 33(1):87--94, 2009.
[14]
R. Rincent, D. Laloë, S. Nicolas, T. Altmann, D. Brunel, P. Revilla, V. M. Rodriguez, J. Moreno-Gonzalez, A. Melchinger, E. Bauer, et al. Maximizing the reliability of genomic selection by optimizing the calibration set of reference individuals: Comparison of methods in two diverse groups of maize inbreds (zea mays l.). Genetics, 192(2):715--728, 2012.
[15]
R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B, 58:267--288, 1994.
[16]
R. J. Urbanowicz, J. Kiralis, N. A. Sinnott-Armstrong, T. Heberling, J. M. Fisher, and J. H. Moore. Gametes: a fast, direct algorithm for generating pure, strict, epistatic models with random architectures. BioData mining, 5(1):1--14, 2012.
[17]
J. Whittaker, R. Thompson, and M. Denham. Marker-assisted selection using ridge regression. Genet Res, 75:249--252, 2000.
[18]
C. Yang, Z. He, X. Wan, Q. Yang, H. Xue, and W. Yu. Snpharvester: a filtering-based approach for detecting epistatic interactions in genome-wide association studies. Bioinformatics, 25(4):504--511, 2009.
[19]
X. Zhang, S. Huang, F. Zou, and W. Wang. Team: efficient two-locus epistasis tests in human genome-wide association study. Bioinformatics, 26(12):i217--i227, 2010.
[20]
Y. Zhang and J. S. Liu. Bayesian inference of epistatic interactions in case-control studies. Nature genetics, 39(9):1167--1173, 2007.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
BCB '15: Proceedings of the 6th ACM Conference on Bioinformatics, Computational Biology and Health Informatics
September 2015
683 pages
ISBN:9781450338530
DOI:10.1145/2808719
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 September 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. feature selection
  2. linear regression
  3. multi-locus epistasis
  4. quantitative genetic trait prediction
  5. sampling

Qualifiers

  • Research-article

Conference

BCB '15
Sponsor:

Acceptance Rates

BCB '15 Paper Acceptance Rate 48 of 141 submissions, 34%;
Overall Acceptance Rate 254 of 885 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 53
    Total Downloads
  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 23 Dec 2024

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media