Abstract
Multifactor Dimensionality Reduction (MDR) is a widely-used data-mining method for detecting and interpreting epistatic effects that do not display significant main effects. MDR produces a reduced-dimensionality representation of a dataset which classifies multi-locus genotypes into either high- or low-risk groups. The weighted fraction of cases and controls correctly labelled by this classification, the balanced accuracy, is typically used as a metric to select the best or most-fit model. We propose two new metrics for MDR to use in evaluating models, Variance and Fisher, and compare those metrics to two previously-used MDR metrics, Balanced Accuracy and Normalized Mutual Information. We find that the proposed metrics consistently outperform the existing metrics across a variety of scenarios.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Bush, W.S., Edwards, T., Dudek, S., McKinney, B., Ritchie, M.: Alternative contingency table measures improve the power and detection of multifactor dimensionality reduction. BMC Bioinformatics 9, 238 (2008)
Collins, R.L., Hu, T., Wejse, C., Sirugo, G., Williams, S., Moore, J.: Multifactor dimensionality reduction reveals a three-locus epistatic interaction associated with susceptibility to pulmonary tuberculosis (2012) (manuscript submitted for publication)
Cordell, H.: Epistasis: what it means, what it doesn’t mean, and statistical methods to detect it in humans. Hum. Mol. Genet. 11, 2463–2468 (2002)
Fisher, R.: Statistical methods for research workers. Genesis Publishing Pvt. Ltd. (1925)
Hahn, L., Moore, J.: Ideal discrimination of discrete clinical endpoints using multilocus genotypes. In Silico Biol. 4, 0016 (2004)
Hahn, L., Ritchie, M., Moore, J.: Multifactor dimensionality reduction software for detecting gene-gene and gene-environment interactions. Bioinformatics 19, 376–382 (2003)
Moore, J.H.: Computational analysis of gene-gene interactions in common human diseases using multifactor dimensionality reduction. Expert Rev. Mol. Diagn. 4, 795–803 (2004)
Moore, J.: The ubiquitous nature of epistasis in determining susceptibility to common human diseases. Hum. Hered. 56, 73–82 (2003)
Moore, J.: A global view of epistasis. Nat. Genet. 37, 13–14 (2005)
Moore, J.: Genome-wide analysis of epistasis using multifactor dimensionality reduction: feature selection and construction in the domain of human genetics. In: Zhu, X., Davidson, I. (eds.) Knowledge Discovery and Data Mining: Challenges and Realities with Real World Data, pp. 17–30. IGI Press, Hershey (2007)
Moore, J., Gilbert, J., Tsai, C., Chiang, F., Holden, W., Barney, N., White, B.: A flexible computational framework for detecting, characterizing, and interpreting statistical patterns of epistasis in genetic studies of human disease susceptibility. J. Theor. Biol. 241, 252–261 (2006)
Moore, J., Williams, S.: New strategies for identifying gene-gene interactions in hypertension. Ann. Med. 34, 88–95 (2002)
Moore, J., Williams, S.: Traversing the conceptual divide between biological and statistical epistasis: systems biology and a more modern synthesis. BioEssays 27, 637–646 (2005)
Olesen, R., Wejse, C., Velez, D., Bisseye, C., Sodemann, M., Aaby, P., Rabna, P., Worwui, A., Chapman, H., Diatta, M., Adegbola, R., Hill, P., Stergaard, L., Williams, S., Sirugo, G.: Dc-sign (cd209), pentraxin 3 and vitamin d receptor gene variants associate with pulmonary tuberculosis risk in West Africans. Genes and Immunity 8(suppl. 6), 456–467 (2007)
Rea, T., Brown, C., Sing, C.: Complex adaptive system models and the genetic analysis of plasma hdl-cholesterol concentration. Perspect. Biol. Med. 49, 490–503 (2006)
Risch, N., Merikangas, K.: The future of genetic studies of complex human disease. Science 273, 1516–1517 (1996)
Ritchie, M., Hahn, L., Moore, J.: Power of multifactor dimensionality reduction for detecting gene-gene interactions in the presence of genotyping error, missing data, phenocopy, and genetic heterogeneity. Genet. Epidemiol. 24, 150–157 (2003)
Ritchie, M., Hahn, L., Roodi, N., Bailey, L., Dupont, W., Parl, F., Moore, J.: Multifactor dimensionality reduction reveals high-order interactions among estrogen metabolism genes in sporadic breast cancer. Am. J. Hum. Genet. 69, 138–147 (2001)
Sing, C., Stengard, J., Kardia, S.: Genes, environment, and cardiovascular disease. Arterioscler. Thromb. Vasc. Biol. 23, 1190–1196 (2003)
Templeton, A.: Epistasis and complex traits. In: Wade, M., Brodie III, B., Wolf, J. (eds.) Epistasis and Evolutionary Process. Oxford University Press, New York (2000)
Thornton-Wells, T., Moore, J., Haines, J.: Genetics, statistics, and human disease: analytical retooling for complexity. Trends Genet. 20, 640–647 (2004)
Urbanowicz, R., Kiralis, J., Fisher, J., Moore, J.: Predicting the difficulty of pure, strict, epistatic models: metrics for simulated model selection. BioData Mining 5(1), 15 (2012)
Urbanowicz, R., Kiralis, J., Sinnott-Armstrong, N., Heberling, T., Fisher, J., Moore, J.: Gametes: a fast, direct algorithm for generating pure, strict, epistatic models with random architectures. BioData Mining 5(1), 16 (2012)
Velez, D., White, B., Motsinger, A., Bush, W., Ritchie, M., Williams, S., Moore, J.: A balanced accuracy metric for epistasis modeling in imbalanced datasets using multifactor dimensionality reduction. Genet. Epidemiol. 31, 306–315 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Fisher, J.M., Andrews, P., Kiralis, J., Sinnott-Armstrong, N.A., Moore, J.H. (2013). Cell-Based Metrics Improve the Detection of Gene-Gene Interactions Using Multifactor Dimensionality Reduction. In: Vanneschi, L., Bush, W.S., Giacobini, M. (eds) Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics. EvoBIO 2013. Lecture Notes in Computer Science, vol 7833. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37189-9_18
Download citation
DOI: https://doi.org/10.1007/978-3-642-37189-9_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37188-2
Online ISBN: 978-3-642-37189-9
eBook Packages: Computer ScienceComputer Science (R0)