[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article

Linear Separability of Gene Expression Data Sets

Published: 01 April 2010 Publication History

Abstract

We study simple geometric properties of gene expression data sets, where samples are taken from two distinct classes (e.g., two types of cancer). Specifically, the problem of linear separability for pairs of genes is investigated. If a pair of genes exhibits linear separation with respect to the two classes, then the joint expression level of the two genes is strongly correlated to the phenomena of the sample being taken from one class or the other. This may indicate an underlying molecular mechanism relating the two genes and the phenomena(e.g., a specific cancer). We developed and implemented novel efficient algorithmic tools for finding all pairs of genes that induce a linear separation of the two sample classes. These tools are based on computational geometric properties and were applied to 10 publicly available cancer data sets. For each data set, we computed the number of actual separating pairs and compared it to an upper bound on the number expected by chance and to the numbers resulting from shuffling the labels of the data at random empirically. Seven out of these 10 data sets are highly separable. Statistically, this phenomenon is highly significant, very unlikely to occur at random. It is therefore reasonable to expect that it manifests a functional association between separating genes and the underlying phenotypic classes.

References

[1]
"The Chipping Forecast," Nature Genetics, special supplement, vol. 21, Jan. 1999.
[2]
A. Ben-Dor, A.L. Bruhn, N. Friedman, I. Nachman, M. Schummer, and Z. Yakhini, "Tissue Classification with Gene Expression Profiles," J. Computational Biology, vol. 7, no. 3/4, pp. 559-583, 2000.
[3]
T.H. Bø and I. Jonassen, "New Feature Subset Selection Procedures for Classification of Expression Profiles," Genome Biology, vol. 3, no. 4, pp. 0017.1-0017.11, Mar. 2002.
[4]
T. Breslin, P. Eden, and M. Krogh, "Comparing Functional Annotation Analyses with Catmap," BMC Bioinformatics, vol. 5, no. 193. 1471-2105-5-193, 2004.
[5]
K. Crammer, "MCSVM_1.0: C Code for Multiclass SVM," http://www.cis. upenn.edu/~crammer, 2003.
[6]
M. Dettling and P. Buhlmann, "Finding Predictive Gene Groups from Microarray Data," J. Multivariate Analysis, vol. 90, pp. 106-131, 2004.
[7]
T.K. Dey, "Improved Bounds for Planar k-Sets and Related Problems," Discrete and Computational Geometry, vol. 19, no. 3, pp. 373-382, 1998.
[8]
A. Bhattacharjee et al., "Classification of Human Lung Carcinomas by mRNA Expression Profiling Reveals Distinct Adenocarcinoma Subclasses," Proc. Nat'l Academy of Sciences of the USA, vol. 98, no. 24, pp. 13790-13795, Nov. 2001.
[9]
D.G. Beer et al., "Gene-Expression Profiles Predict Survival of Patients with Lung Adenocarcinoma," Nature Medicine, vol. 8, no. 8, pp. 816-824, Aug. 2002.
[10]
G. Wright et al., "A Gene Expression-Based Method to Diagnose Clinically Distinct Subgroups of Diffuse Large B-Cell Lymphoma," Proc. Nat'l Academy of Sciences of the USA, vol. 100, no. 17, pp. 9991-9996, Sept. 2003.
[11]
G.J. Gordon et al., "Translation of Microarray Data into Clinically Relevant Cancer Diagnostic Tests Using Gene Expression Ratios in Lung Cancer and Mesothelioma," Cancer Research, vol. 62, no. 17, pp. 4963-4967, Sept. 2002.
[12]
J. Khan et al., "Classification and Diagnostic Prediction of Cancers Using Gene Expression Profiling and Artificial Neural Networks," Nature Medicine, vol. 7, no. 6, pp. 673-679, June 2001.
[13]
J. Weston et al., "Use of the Zero-Norm with Linear Models and Kernel Methods," J. Machine Learning Research, vol. 3, pp. 1439-14361, Mar. 2003.
[14]
S. Kim et al., "Strong Feature Sets from Small Samples," J. Computational Biology, vol. 9, no. 1, pp. 127-146, 2002.
[15]
S. Mukherjee et al., "Estimating Dataset Size Requirements for Classifying DNA Microarray Data," J. Computational Biology, vol. 10, no. 2, pp. 119-142, 2003.
[16]
T.R. Golub et al., "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring," Science, vol. 286, no. 5439, pp. 531-537, Oct. 1999.
[17]
U. Alon et al., "Broad Patterns of Gene Expression Revealed by Clustering Analysis of Tumor and Normal Colon Tissues Probed by Oligonucleotide Arrays," Proc. Nat'l Academy of Sciences of the USA, vol. 96, no. 12, pp. 6745-6750, June 1999.
[18]
T.R. Golub, "Genome-Wide Views of Cancer," New England J. Medicine, vol. 344, no. 8, pp. 601-602, Feb. 2001.
[19]
H. Liu, ed., "Evolving Feature Selection," IEEE Intelligent Systems, vol. 24, no. 4, pp. 64-76, 2005.
[20]
J.K. Bertrand, K.R. Robbins, W. Zhang, and R. Rekaya, "The Ant Colony Algorithm for Feature Selection in High-Dimension Gene Expression Data for Disease Classification," Math. Medicine and Biology, vol. 24, no. 4, pp. 413-426, 2007.
[21]
S. Ramaswamy and T.R. Golub, "DNA Microarrays in Clinical Oncology," J. Clinical Oncology, vol. 20, no. 7, pp. 1932-1941, Apr. 2002.
[22]
G. Unger, "Linear Separability and Classifiability of Gene Expression Datasets," Tel-Aviv Univ., MSc thesis, http://www.tau.ac.il/ ~giorau/documents/giora_thesis.pdf, Feb. 2004.
[23]
L.J. van't Veer et al., "Gene Expression Profiling Predicts Clinical Outcome of Breast Cancer," Nature, vol. 415, no. 6871, pp. 530-536, Jan. 2002.

Cited By

View all
  • (2022)A Classification Strategy for Internet of Things Data Based on the Class Separability Analysis of Time Series DynamicsACM Transactions on Internet of Things10.1145/35330493:3(1-30)Online publication date: 13-Jul-2022
  • (2015)Emerging Trends in Computational Biology, Bioinformatics, and Systems BiologyundefinedOnline publication date: 21-Aug-2015
  • (2011)New gene subset selection approaches based on linear separating genes and gene-pairsProceedings of the 6th IAPR international conference on Pattern recognition in bioinformatics10.5555/2075619.2075626(50-62)Online publication date: 2-Nov-2011

Index Terms

  1. Linear Separability of Gene Expression Data Sets

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image IEEE/ACM Transactions on Computational Biology and Bioinformatics
        IEEE/ACM Transactions on Computational Biology and Bioinformatics  Volume 7, Issue 2
        April 2010
        189 pages

        Publisher

        IEEE Computer Society Press

        Washington, DC, United States

        Publication History

        Published: 01 April 2010
        Published in TCBB Volume 7, Issue 2

        Author Tags

        1. DNA microarrays
        2. Gene expression analysis
        3. diagnosis
        4. linear separation.

        Qualifiers

        • Article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)3
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 11 Dec 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2022)A Classification Strategy for Internet of Things Data Based on the Class Separability Analysis of Time Series DynamicsACM Transactions on Internet of Things10.1145/35330493:3(1-30)Online publication date: 13-Jul-2022
        • (2015)Emerging Trends in Computational Biology, Bioinformatics, and Systems BiologyundefinedOnline publication date: 21-Aug-2015
        • (2011)New gene subset selection approaches based on linear separating genes and gene-pairsProceedings of the 6th IAPR international conference on Pattern recognition in bioinformatics10.5555/2075619.2075626(50-62)Online publication date: 2-Nov-2011

        View Options

        Login options

        Full Access

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media