More Web Proxy on the site http://driver.im/

Article

Two-phase clustering strategy for gene expression data sets

Authors:

Thomas Wächter,

Wolfgang Lehner,

Christian PilarskyAuthors Info & Claims

SAC '06: Proceedings of the 2006 ACM symposium on Applied computing

Pages 145 - 150

https://doi.org/10.1145/1141277.1141311

Published: 23 April 2006 Publication History

Abstract

In the context of genome research, the method of gene expression analysis has been used for several years. Related microarray experiments are conducted all over the world, and consequently, a vast amount of microarray data sets are produced. Having access to this variety of repositories, researchers would like to incorporate this data in their analyses to increase the statistical significance of their results. In this paper, we present a new two-phase clustering strategy which is based on the combination of local clustering results to obtain a global clustering. The advantage of such a technique is that each microarray data set can be normalized and clustered separately. The set of different relevant local clustering results is then used to calculate the global clustering result. Furthermore, we present an approach based on technical as well as biological quality measures to determine weighting factors for quantifying the local results proportion within the global result. The better the attested quality of the local results, the stronger their impact on the global result.

References

[1]

J. Bryan. Problems in gene clustering based on gene expression data. Journal of Multivariate Analysis, 90:44--66, 2004.]]

Digital Library

[2]

Y. Cheng and G. M. Church. Biclustering of expression data. In Proc. of ISMB, pages 93--103, 2000.]]

Digital Library

[3]

H. Chipman, T. J. Hastie, and R. Tibshirani. Clustering microarray data. In Terry Speed, editor, Statistical Analysis of Gene Expression Microarray Datas, pages 159--200. Chapman and Hall/CRC, 2003.]]

[4]

D. J. Cook, G. H. Guyatt, A. Laupacis, D. L. Sackett, and R. J. Goldberg. Clinical recommendations using levels of evidence for antithrombotic agents. Chest, 108(4 Suppl):227--230, 1995.]]

[5]

M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proc. of KDD, pages 226--231, 1996.]]

[6]

H. Friess, J. Ding, J. Kleeff, L. Fenkell, J. A. Rosinski, A. Guweidhi, J. F. Reidhaar-Olson, M. Korc, J. Hammer, and M. W. Büchler. Microarray-based identification of differentially expressed groth- and metastasis-associated genes in pancreatic cancer. CMLS Cellular and Molecular Life Science, 60:1180--1199, 2003.]]

[7]

R. Hoffmann, T. Seidl, and M. Dugas. Profound effect of normalization on detection of differentially expressed genes in oligonucleotide microarray data analysis. Genome Biology, 3:0033.1--0033.11, 2002.]]

[8]

E. Januzaj, H.-P. Kriegel, and M. Pfeifle. Dbdc: Density based distributed clustering. In Proc. of EDBT, pages 88--105, 2004.]]

[9]

E. L. Johnson and H. Kargupta. Collective, hierarchical clustering from distributed, heterogeneous data. In Proc. of the Large-Scale Parallel Data Mining, pages 221--244, 1999.]]

Digital Library

[10]

H. Kargupta and P. Chan. Advances in Distributed and Parallel Knowledge Discovery. AAAI/MIT Press, 2000.]]

Digital Library

[11]

L. Kaufmann and P. Rousseeuw. Finding Groups in Data: An Introduction to Cluster Analysis. Wiley New York, 1990.]]

[12]

C. D. Logsdon, D. M. Simeone, C. Binkley, T. Arumugam, J-K. Greenson, T. J. Giordano, D. E. Misek, and S. Hanash. Molecular profiling of pancreatic adenocarcinoma and chronic pancreatitis identifies multiple genes differentially regulated in pancreatic cancer. Cancer Research, 63:2649--2657, 2003.]]

[13]

D. W. Scott. Multivariate Density Estimation. Wiley and Sons, 1992.]]

[14]

B. W. Silverman. Density Estimation for Statistics and Data Analysis. Chapman & Hall, 1986.]]

[15]

Y. Zeng, J. Tang, J. Garcia-Frias, and G. R. Gao. An adaptive meta-clustering approach: Combining the information from different clustering results. In Proc. of. CSB, pages 276--287, 2002.]]

Digital Library

[16]

T. Zhang, R. Ramakrishnan, and M. Livny. Birch: An efficient data clustering method for very large databases. In Proc. of SIGMOD, pages 103--114, 1996.]]

Digital Library

[17]

T. Zhang, R. Ramakrishnan, and M. Livny. Fast density estimation using cf-kernel for very large databases. In Proc. of KDD, pages 312--316, 1999.]]

Digital Library

Cited By

Hahmann MHabich DLehner W(2011)Large-Scale Data Analytics Using Ensemble ClusteringHandbook of Data Intensive Computing10.1007/978-1-4614-1415-5_11(285-321)Online publication date: 11-Nov-2011
https://doi.org/10.1007/978-1-4614-1415-5_11
Habich DLehner WRichly SAssmann U(2010)Using Cloud Technologies to Optimize Data-Intensive Service ApplicationsProceedings of the 2010 IEEE 3rd International Conference on Cloud Computing10.1109/CLOUD.2010.56(19-26)Online publication date: 5-Jul-2010
https://dl.acm.org/doi/10.1109/CLOUD.2010.56
Kashef RKamel M(2008)Distributed Peer-to-Peer Cooperative Partitional-Divisive Clustering for gene expression datasets2008 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology10.1109/CIBCB.2008.4675771(143-150)Online publication date: Sep-2008
https://doi.org/10.1109/CIBCB.2008.4675771
Show More Cited By

Index Terms

Two-phase clustering strategy for gene expression data sets

Recommendations

Phase-Wise Clustering of Time Series Gene Expression Data
TRUSTCOM '11: Proceedings of the 2011IEEE 10th International Conference on Trust, Security and Privacy in Computing and Communications

Extensive studies have shown that analyzing micro array time series data is important in bioinformatics research and biomedical applications. An observation in the analysis of gene expression data is that many genes have similarity in their expression ...
Discovering tightly regulated and differentially expressed gene sets in whole genome expression data

Motivation: Recently, a new type of expression data is being collected which aims to measure the effect of genetic variation on gene expression in pathways. In these datasets, expression profiles are constructed for multiple strains of the same model ...
Context-dependent clustering for dynamic cellular state modeling of microarray gene expression

Motivation: High-throughput expression profiling allows researchers to study gene activities globally. Genes with similar expression profiles are likely to encode proteins that may participate in a common structural complex, metabolic pathway or ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

SAC '06: Proceedings of the 2006 ACM symposium on Applied computing

April 2006

1967 pages

ISBN:1595931082

DOI:10.1145/1141277

Conference Chair:
Hisham M. Haddad
Kennesaw State University, Kennesaw, Georgia

Copyright © 2006 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGAPP: ACM Special Interest Group on Applied Computing

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 April 2006

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Conference

SAC06

Sponsor:

SIGAPP

SAC06: The 2006 ACM Symposium on Applied Computing

April 23 - 27, 2006

Dijon, France

Acceptance Rates

Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

Upcoming Conference

SAC '25

Sponsor:
sigapp

The 40th ACM/SIGAPP Symposium on Applied Computing

March 31 - April 4, 2025

Catania , Italy

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
286
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)0

Reflects downloads up to 14 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Hahmann MHabich DLehner W(2011)Large-Scale Data Analytics Using Ensemble ClusteringHandbook of Data Intensive Computing10.1007/978-1-4614-1415-5_11(285-321)Online publication date: 11-Nov-2011
https://doi.org/10.1007/978-1-4614-1415-5_11
Habich DLehner WRichly SAssmann U(2010)Using Cloud Technologies to Optimize Data-Intensive Service ApplicationsProceedings of the 2010 IEEE 3rd International Conference on Cloud Computing10.1109/CLOUD.2010.56(19-26)Online publication date: 5-Jul-2010
https://dl.acm.org/doi/10.1109/CLOUD.2010.56
Kashef RKamel M(2008)Distributed Peer-to-Peer Cooperative Partitional-Divisive Clustering for gene expression datasets2008 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology10.1109/CIBCB.2008.4675771(143-150)Online publication date: Sep-2008
https://doi.org/10.1109/CIBCB.2008.4675771
Habich DRichly SPreissler SGrasselt MLehner WMaier A(2008)BPEL DT — Data-Aware Extension for Data-Intensive Service ApplicationsEmerging Web Services Technology, Volume II10.1007/978-3-7643-8864-5_8(111-128)Online publication date: 2008
https://doi.org/10.1007/978-3-7643-8864-5_8
Habich DRichly SLehner WAssmann UGrasselt MMaier APilarsky C(2007)Data-aware SOA for Gene Expression Analysis Processes2007 IEEE Congress on Services (Services 2007)10.1109/SERVICES.2007.28(138-145)Online publication date: Jul-2007
https://doi.org/10.1109/SERVICES.2007.28
Yajima YSato T(2007)Semi-supervised Kernel Logistic Regression and Its Extension to Active Learning Based on A-OptimalityProceedings of the Seventh IEEE International Conference on Data Mining Workshops10.1109/ICDMW.2007.88(277-282)Online publication date: 28-Oct-2007
https://dl.acm.org/doi/10.1109/ICDMW.2007.88

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents