[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1141277.1141311acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
Article

Two-phase clustering strategy for gene expression data sets

Published: 23 April 2006 Publication History

Abstract

In the context of genome research, the method of gene expression analysis has been used for several years. Related microarray experiments are conducted all over the world, and consequently, a vast amount of microarray data sets are produced. Having access to this variety of repositories, researchers would like to incorporate this data in their analyses to increase the statistical significance of their results. In this paper, we present a new two-phase clustering strategy which is based on the combination of local clustering results to obtain a global clustering. The advantage of such a technique is that each microarray data set can be normalized and clustered separately. The set of different relevant local clustering results is then used to calculate the global clustering result. Furthermore, we present an approach based on technical as well as biological quality measures to determine weighting factors for quantifying the local results proportion within the global result. The better the attested quality of the local results, the stronger their impact on the global result.

References

[1]
J. Bryan. Problems in gene clustering based on gene expression data. Journal of Multivariate Analysis, 90:44--66, 2004.]]
[2]
Y. Cheng and G. M. Church. Biclustering of expression data. In Proc. of ISMB, pages 93--103, 2000.]]
[3]
H. Chipman, T. J. Hastie, and R. Tibshirani. Clustering microarray data. In Terry Speed, editor, Statistical Analysis of Gene Expression Microarray Datas, pages 159--200. Chapman and Hall/CRC, 2003.]]
[4]
D. J. Cook, G. H. Guyatt, A. Laupacis, D. L. Sackett, and R. J. Goldberg. Clinical recommendations using levels of evidence for antithrombotic agents. Chest, 108(4 Suppl):227--230, 1995.]]
[5]
M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proc. of KDD, pages 226--231, 1996.]]
[6]
H. Friess, J. Ding, J. Kleeff, L. Fenkell, J. A. Rosinski, A. Guweidhi, J. F. Reidhaar-Olson, M. Korc, J. Hammer, and M. W. Büchler. Microarray-based identification of differentially expressed groth- and metastasis-associated genes in pancreatic cancer. CMLS Cellular and Molecular Life Science, 60:1180--1199, 2003.]]
[7]
R. Hoffmann, T. Seidl, and M. Dugas. Profound effect of normalization on detection of differentially expressed genes in oligonucleotide microarray data analysis. Genome Biology, 3:0033.1--0033.11, 2002.]]
[8]
E. Januzaj, H.-P. Kriegel, and M. Pfeifle. Dbdc: Density based distributed clustering. In Proc. of EDBT, pages 88--105, 2004.]]
[9]
E. L. Johnson and H. Kargupta. Collective, hierarchical clustering from distributed, heterogeneous data. In Proc. of the Large-Scale Parallel Data Mining, pages 221--244, 1999.]]
[10]
H. Kargupta and P. Chan. Advances in Distributed and Parallel Knowledge Discovery. AAAI/MIT Press, 2000.]]
[11]
L. Kaufmann and P. Rousseeuw. Finding Groups in Data: An Introduction to Cluster Analysis. Wiley New York, 1990.]]
[12]
C. D. Logsdon, D. M. Simeone, C. Binkley, T. Arumugam, J-K. Greenson, T. J. Giordano, D. E. Misek, and S. Hanash. Molecular profiling of pancreatic adenocarcinoma and chronic pancreatitis identifies multiple genes differentially regulated in pancreatic cancer. Cancer Research, 63:2649--2657, 2003.]]
[13]
D. W. Scott. Multivariate Density Estimation. Wiley and Sons, 1992.]]
[14]
B. W. Silverman. Density Estimation for Statistics and Data Analysis. Chapman & Hall, 1986.]]
[15]
Y. Zeng, J. Tang, J. Garcia-Frias, and G. R. Gao. An adaptive meta-clustering approach: Combining the information from different clustering results. In Proc. of. CSB, pages 276--287, 2002.]]
[16]
T. Zhang, R. Ramakrishnan, and M. Livny. Birch: An efficient data clustering method for very large databases. In Proc. of SIGMOD, pages 103--114, 1996.]]
[17]
T. Zhang, R. Ramakrishnan, and M. Livny. Fast density estimation using cf-kernel for very large databases. In Proc. of KDD, pages 312--316, 1999.]]

Cited By

View all
  • (2011)Large-Scale Data Analytics Using Ensemble ClusteringHandbook of Data Intensive Computing10.1007/978-1-4614-1415-5_11(285-321)Online publication date: 11-Nov-2011
  • (2010)Using Cloud Technologies to Optimize Data-Intensive Service ApplicationsProceedings of the 2010 IEEE 3rd International Conference on Cloud Computing10.1109/CLOUD.2010.56(19-26)Online publication date: 5-Jul-2010
  • (2008)Distributed Peer-to-Peer Cooperative Partitional-Divisive Clustering for gene expression datasets2008 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology10.1109/CIBCB.2008.4675771(143-150)Online publication date: Sep-2008
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SAC '06: Proceedings of the 2006 ACM symposium on Applied computing
April 2006
1967 pages
ISBN:1595931082
DOI:10.1145/1141277
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 April 2006

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

SAC06
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

Upcoming Conference

SAC '25
The 40th ACM/SIGAPP Symposium on Applied Computing
March 31 - April 4, 2025
Catania , Italy

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 14 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2011)Large-Scale Data Analytics Using Ensemble ClusteringHandbook of Data Intensive Computing10.1007/978-1-4614-1415-5_11(285-321)Online publication date: 11-Nov-2011
  • (2010)Using Cloud Technologies to Optimize Data-Intensive Service ApplicationsProceedings of the 2010 IEEE 3rd International Conference on Cloud Computing10.1109/CLOUD.2010.56(19-26)Online publication date: 5-Jul-2010
  • (2008)Distributed Peer-to-Peer Cooperative Partitional-Divisive Clustering for gene expression datasets2008 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology10.1109/CIBCB.2008.4675771(143-150)Online publication date: Sep-2008
  • (2008)BPEL DT — Data-Aware Extension for Data-Intensive Service ApplicationsEmerging Web Services Technology, Volume II10.1007/978-3-7643-8864-5_8(111-128)Online publication date: 2008
  • (2007)Data-aware SOA for Gene Expression Analysis Processes2007 IEEE Congress on Services (Services 2007)10.1109/SERVICES.2007.28(138-145)Online publication date: Jul-2007
  • (2007)Semi-supervised Kernel Logistic Regression and Its Extension to Active Learning Based on A-OptimalityProceedings of the Seventh IEEE International Conference on Data Mining Workshops10.1109/ICDMW.2007.88(277-282)Online publication date: 28-Oct-2007

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media