Abstract
Gene-set analysis seeks to identify enriched gene sets that are strongly associated with the phenotype. In many applications, only a small subset of core genes in each enriched gene set is likely associated with the phenotype. The reduction of enriched gene sets to the corresponding leading-edge subsets of core genes is a useful way for biologists to understand the biological processes underlying the association of a gene set with the phenotype of interest. Therefore, we propose a new gene-set analysis that tests the significance of enrichment on multiple gene sets, while simultaneously determining the corresponding leading-edge subsets of core genes. In the proposed analysis, we assigned a newly defined enrichment score to each gene set, and then corrected the statistical significance of the score for multiple testing of many gene sets by controlling the false-discovery rate.
Similar content being viewed by others
References
Armstrong SA, Staunton JE, Silverman LB, Pieters R, den Boer ML, Minden MD, Sallan SE, Lander ES, Golub TR, Korsmeyer SJ (2002) MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nat Genet 30:41–47
Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G (2000) Gene Ontology: tool for the unification of biology. Nat Genet 25:25–29
Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser A 57:289–300
Breitling R, Amtmann A, Herzyk P (2004) Iterative group analysis (iGA): a simple tool to enhance sensitivity and facilitate interpretation of microarray experiments. BMC Bioinform 5:34
Chan WY, Cheung KK, Schorge JO et al (2000) Bcl-2 and p53 protein expression, apoptosis, and p53 mutation in human epithelial ovarian cancers. Am J Pathol 156:409–417
Chang B, Kustra R, Tian W (2013) Functional-network-based gene set analysis using gene-ontology. PLoS ONE 13:8(2):e55635
Cheong JK, Virshup DM (2011) Casein kinase 1: complexity in the family. Int J Biochem Cell Biol 43:465–469
Dinu I, Potter JD, Mueller T, Liu Q, Adewale AJ, Jhangri GS, Einecke G, Famulski KS, Halloran P, Yasui Y (2007) Improving gene set analysis of microarray data by SAM-GS. BMC Bioinform 8:242
Dinu I, Potter J, Mueller T, Liu Q, Adewale A, Jhangri G, Einecke G, Famulski K, Halloran P, Yasui Y (2009) Gene-set analysis and reduction. Brief Bioinform 10:24–34
Evangelou M, Rendon A, Ouwehand W, Wernisch L, Dudbridge F (2012) Comparison of methods for competitive tests of pathway analysis. PLoS One 7(7):e41018
Goeman J, Bühlmann P (2007) Analyzing gene expression data in terms of gene sets: methodological issues. Bioinformatics 23:980–987
Johnston JB, Daeninck P, Verburg L, Lee K, Williams G, Israels LG et al (1997) P53, MDM-2, BAX and BCL-2 and drug resistance in chronic lymphocytic leukemia. Leuk Lymphoma 26(435):449
Mihara M, Erster S, Zaika A, Petrenko O, Chittenden T et al (2003) p53 has a direct apoptogenic role at the mitochondria. Mol Cell 11:577–590
Miyashita T, Reed JC (1995) Tumor suppressor p53 is a direct transcriptional activator of the human bax gene. Cell 80:293–299
O’Callaghan-Sunol C, Gabai VL, Sherman MY (2007) Hsp27 modulates p53 signaling and sup-presses cellular senescence. Cancer Res 67:11779–11788
Ogata H, Goto S, Sato K, Fujibuchi W, Bono H, Kanehisa M (1999) KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res 27:29–34
Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, Mesirov JP (2005) Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci USA 102:15545–15550
Tusher VG, Tibshirani R, Chu G (2001) Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci USA 98:5116–5121
Yang T (2011) A SATS algorithm for jointly identifying multiple differentially expressed gene sets. Stat Med 30:2028–2039
Acknowledgments
This research was supported by Basic Science Research Program (NRF-2010-0009461 and NRF-2011-0016383) through the National Research Foundation of Korea (NRF) funded by the Ministry of Education and Ministry of Science, ICT & Future Planning.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Yang, T.Y. A GS-CORE algorithm for performing a reduction test on multiple gene sets and their core genes. Comput Stat 30, 29–41 (2015). https://doi.org/10.1007/s00180-014-0519-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-014-0519-9