More Web Proxy on the site http://driver.im/

Article

Non-redundant clustering with conditional ensembles

Authors:

Thomas HofmannAuthors Info & Claims

KDD '05: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining

Pages 70 - 77

https://doi.org/10.1145/1081870.1081882

Published: 21 August 2005 Publication History

Abstract

Data may often contain multiple plausible clusterings. In order to discover a clustering which is useful to the user, constrained clustering techniques have been proposed to guide the search. Typically, these techniques assume background knowledge in the form of explicit information about the desired clustering. In contrast, we consider the setting in which the background knowledge is instead about an undesired clustering. Such knowledge may be obtained from an existing classification or precedent algorithm. The problem is then to find a novel, "orthogonal" clustering in the data. We present a general algorithmic framework which makes use of cluster ensemble methods to solve this problem. One key advantage of this approach is that it takes a base clustering method which is used as a black box, allowing the practitioner to select the most appropriate clustering method for the domain. We present experimental results on synthetic and text data which establish the competitiveness of this framework.

References

[1]

M. Bilenko, S. Basu, and R. J. Mooney. Integrating constraints and metric learning in semi-supervised clustering. In Proceedings of the 21st International Conference on Machine Learning, pages 81--88, 2004.

Digital Library

[2]

L. Bottou and Y. Bengio. Convergence properties of the K-means algorithms. In Advances in Neural Information Processing Systems, volume 7, pages 585--592. MIT Press, 1995.

[3]

G. Chechik and N. Tishby. Extracting relevant structures with side information. In Advances in Neural Information Processing Systems, volume 15, pages 857--864. MIT Press, 2002.

[4]

M. Craven, D. DiPasquo, D. Freitag, A. K. McCallum, T. M. Mitchell, K. Nigam, and S. Slattery. Learning to extract symbolic knowledge from the World Wide Web. In Proceedings of the 15th Conference of the American Association for Artificial Intelligence, pages 509--516, 1998.

Digital Library

[5]

I. Davidson and A. Satyanarayana. Speeding up k-means clustering by bootstrap averaging. In Proceedings of the Third IEEE International Conference on Data Mining, Workshop on Clustering Large Data Sets, pages 16--25, 2003.

[6]

B. Dom. An information-theoretic external cluster-validity measure. In Proceedings of the 18th Annual Conference on Uncertainty in Artificial Intelligence, pages 137--145, 2002.

Digital Library

[7]

M. Gluck and J. E. Corter. Information, uncertainty, and the utility of categories. In Proceedings of the Seventh Annual Conference of the Cognitive Science Society, pages 283--287, 1985.

[8]

D. Gondek and T. Hofmann. Non-redundant data clustering. In Proceedings of the Fourth IEEE International Conference on Data Mining, pages 75--82, 2004.

Digital Library

[9]

A. Gordon. A survey of constrained classification. Computational Statistics and Data Analysis, 21:17--29, 1996.

Digital Library

[10]

J. Havrda and F. Charvát. Quantification method of classification processes. Concept of structural a-entropy. Kybernetika, 3:30--35, 1967.

[11]

D. Klein, S. Kamvar, and C. Manning. From instance-level constraints to space-level constraints: Making the most of prior knowledge in data clustering. In Proceedings of the 19th International Conference on Machine Learning, pages 307--314, 2002.

Digital Library

[12]

M. Meilă. Comparing clusterings by the variation of information. In Proceedings of the 16th Annual Conference on Computational Learning Theory, pages 173--187, 2003.

[13]

B. Minaei-Bidgoli, A. Topchy, and W. F. Punch. Ensembles of partitions via data resampling. In Proceedings of the International Conference on Information Technology, volume 2, pages 188--192, 2004.

Digital Library

[14]

B. Mirkin. Reinterpreting the category utility function. Machine Learning, 45(2):219--218, 2001.

Digital Library

[15]

M. Schultz and T. Joachims. Learning a distance metric from relative comparisons. In Advances in Neural Information Processing Systems 16, pages 41--48, 2003.

[16]

A. Strehl and J. Ghosh. Cluster ensembles: A knowledge reuse framework for combining partitionings. Journal of Machine Learning Research, 3:583--617, 2002.

Digital Library

[17]

N. Tishby, F. C. Pereira, and W. Bialek. The information bottleneck method. In Proceedings of the 37th Annual Allerton Conference on Communication, Control and Computing, pages 368--377, 1999.

[18]

A. Topchy, M. Law, and A. K. Jain. Analysis of consensus partition in clustering ensemble. In Proceedings of the Fourth IEEE International Conference on Data Mining, pages 225--232, 2004.

Digital Library

[19]

A. Topchy, A. K. Jain, and W. Punch. Combining multiple weak clusterings. In Proceedings of the Third IEEE International Conference on Data Mining, pages 331--338, 2003.

Digital Library

[20]

A. Topchy, A. K. Jain, and W. Punch. A mixture model for clustering ensembles. In Proceedings of the Fourth SIAM Conference on Data Mining, pages 379--390, 2004.

[21]

S. Vaithyanathan and D. Gondek. Clustering with informative priors. Technical report, IBM Almaden Research Center, 2002.

[22]

K. Wagstaff and C. Cardie. Clustering with instance-level constraints. In Proceedings of the 17th International Conference on Machine Learning, pages 1103--1110, 2000.

Digital Library

[23]

E. P. Xing, A. Y. Ng, M. I. Jordan, and S. Russell. Distance metric learning, with application to clustering with side information. In Advances in Neural Information Processing Systems 15, pages 505--512, 2002.

Cited By

Yu GRen LWang JDomeniconi CZhang X(2024)Multiple clusterings: Recent advances and perspectivesComputer Science Review10.1016/j.cosrev.2024.10062152(100621)Online publication date: May-2024
https://doi.org/10.1016/j.cosrev.2024.100621
Vahidi Ferdosi SAmirkhani H(2020)Weighted Ensemble Clustering for Increasing the Accuracy of the Final ClusteringSignal and Data Processing10.29252/jsdp.17.2.10017:2(100-85)Online publication date: 1-Sep-2020
https://doi.org/10.29252/jsdp.17.2.100
Zhao CChen FWang ZKhan L(2020)A Primal-Dual Subgradient Approach for Fair Meta Learning2020 IEEE International Conference on Data Mining (ICDM)10.1109/ICDM50108.2020.00091(821-830)Online publication date: Nov-2020
https://doi.org/10.1109/ICDM50108.2020.00091
Show More Cited By

Index Terms

Non-redundant clustering with conditional ensembles
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Unsupervised learning
        Cluster analysis
2. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

A clustering ensemble

The aim of clustering ensemble is to combine multiple base partitions into a robust, stable and accurate partition. One of the key problems of clustering ensemble is how to exploit the cluster structure information in each base partition. Evidence ...
Evaluation of Stability of k-Means Cluster Ensembles with Respect to Random Initialization

Many clustering algorithms, including cluster ensembles, rely on a random component. Stability of the results across different runs is considered to be an asset of the algorithm. The cluster ensembles considered here are based on k-means clusterers. ...
A robust adaptive clustering analysis method for automatic identification of clusters

Identifying the optimal cluster number and generating reliable clustering results are necessary but challenging tasks in cluster analysis. The effectiveness of clustering analysis relies not only on the assumption of cluster number but also on the ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '05: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining

August 2005

844 pages

ISBN:159593135X

DOI:10.1145/1081870

General Chair:
Robert Grossman
University of Illinois at Chicago & Open Data Partners, USA
,
Program Chairs:
Roberto Bayardo
IBM Almaden Research, USA
,
Kristin Bennett
RPI, USA

Copyright © 2005 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 August 2005

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

KDD05

Sponsor:

KDD05: The Eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

August 21 - 24, 2005

Illinois, Chicago, USA

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Sponsor:
sigkdd
sigkdd

The 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 3 - 7, 2025

Toronto , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

36
Total Citations
View Citations
840
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)0

Reflects downloads up to 09 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Yu GRen LWang JDomeniconi CZhang X(2024)Multiple clusterings: Recent advances and perspectivesComputer Science Review10.1016/j.cosrev.2024.10062152(100621)Online publication date: May-2024
https://doi.org/10.1016/j.cosrev.2024.100621
Vahidi Ferdosi SAmirkhani H(2020)Weighted Ensemble Clustering for Increasing the Accuracy of the Final ClusteringSignal and Data Processing10.29252/jsdp.17.2.10017:2(100-85)Online publication date: 1-Sep-2020
https://doi.org/10.29252/jsdp.17.2.100
Zhao CChen FWang ZKhan L(2020)A Primal-Dual Subgradient Approach for Fair Meta Learning2020 IEEE International Conference on Data Mining (ICDM)10.1109/ICDM50108.2020.00091(821-830)Online publication date: Nov-2020
https://doi.org/10.1109/ICDM50108.2020.00091
Zhao CLi CLi JChen F(2020)Fair Meta-Learning For Few-Shot Classification2020 IEEE International Conference on Knowledge Graph (ICKG)10.1109/ICBK50248.2020.00047(275-282)Online publication date: Aug-2020
https://doi.org/10.1109/ICBK50248.2020.00047
Zhao CChen F(2020)Unfairness Discovery and Prevention For Few-Shot Regression2020 IEEE International Conference on Knowledge Graph (ICKG)10.1109/ICBK50248.2020.00029(137-144)Online publication date: Aug-2020
https://doi.org/10.1109/ICBK50248.2020.00029
Zhao CChen F(2019)Rank-Based Multi-task Learning for Fair Regression2019 IEEE International Conference on Data Mining (ICDM)10.1109/ICDM.2019.00102(916-925)Online publication date: Nov-2019
https://doi.org/10.1109/ICDM.2019.00102
Truong DBattiti R(2015)A flexible cluster-oriented alternative clustering algorithm for choosing from the Pareto front of solutionsMachine Language10.1007/s10994-013-5350-y98:1-2(57-91)Online publication date: 1-Jan-2015
https://dl.acm.org/doi/10.1007/s10994-013-5350-y
Nguyen N(2015)A Note on Clustering Difference by Maximizing Variation of InformationComputational Social Networks10.1007/978-3-319-21786-4_13(148-159)Online publication date: 31-Jul-2015
https://doi.org/10.1007/978-3-319-21786-4_13
Züfle AEmrich TSchmid KMamoulis NZimek ARenz MMacskassy SPerlich CLeskovec JWang WGhani R(2014)Representative clustering of uncertain dataProceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining10.1145/2623330.2623725(243-252)Online publication date: 24-Aug-2014
https://dl.acm.org/doi/10.1145/2623330.2623725
Shi CKong XFu DYu PWu B(2014)Multi-Label Classification Based on Multi-Objective OptimizationACM Transactions on Intelligent Systems and Technology10.1145/25052725:2(1-22)Online publication date: 30-Apr-2014
https://dl.acm.org/doi/10.1145/2505272
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents