[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1081870.1081882acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Non-redundant clustering with conditional ensembles

Published: 21 August 2005 Publication History

Abstract

Data may often contain multiple plausible clusterings. In order to discover a clustering which is useful to the user, constrained clustering techniques have been proposed to guide the search. Typically, these techniques assume background knowledge in the form of explicit information about the desired clustering. In contrast, we consider the setting in which the background knowledge is instead about an undesired clustering. Such knowledge may be obtained from an existing classification or precedent algorithm. The problem is then to find a novel, "orthogonal" clustering in the data. We present a general algorithmic framework which makes use of cluster ensemble methods to solve this problem. One key advantage of this approach is that it takes a base clustering method which is used as a black box, allowing the practitioner to select the most appropriate clustering method for the domain. We present experimental results on synthetic and text data which establish the competitiveness of this framework.

References

[1]
M. Bilenko, S. Basu, and R. J. Mooney. Integrating constraints and metric learning in semi-supervised clustering. In Proceedings of the 21st International Conference on Machine Learning, pages 81--88, 2004.
[2]
L. Bottou and Y. Bengio. Convergence properties of the K-means algorithms. In Advances in Neural Information Processing Systems, volume 7, pages 585--592. MIT Press, 1995.
[3]
G. Chechik and N. Tishby. Extracting relevant structures with side information. In Advances in Neural Information Processing Systems, volume 15, pages 857--864. MIT Press, 2002.
[4]
M. Craven, D. DiPasquo, D. Freitag, A. K. McCallum, T. M. Mitchell, K. Nigam, and S. Slattery. Learning to extract symbolic knowledge from the World Wide Web. In Proceedings of the 15th Conference of the American Association for Artificial Intelligence, pages 509--516, 1998.
[5]
I. Davidson and A. Satyanarayana. Speeding up k-means clustering by bootstrap averaging. In Proceedings of the Third IEEE International Conference on Data Mining, Workshop on Clustering Large Data Sets, pages 16--25, 2003.
[6]
B. Dom. An information-theoretic external cluster-validity measure. In Proceedings of the 18th Annual Conference on Uncertainty in Artificial Intelligence, pages 137--145, 2002.
[7]
M. Gluck and J. E. Corter. Information, uncertainty, and the utility of categories. In Proceedings of the Seventh Annual Conference of the Cognitive Science Society, pages 283--287, 1985.
[8]
D. Gondek and T. Hofmann. Non-redundant data clustering. In Proceedings of the Fourth IEEE International Conference on Data Mining, pages 75--82, 2004.
[9]
A. Gordon. A survey of constrained classification. Computational Statistics and Data Analysis, 21:17--29, 1996.
[10]
J. Havrda and F. Charvát. Quantification method of classification processes. Concept of structural a-entropy. Kybernetika, 3:30--35, 1967.
[11]
D. Klein, S. Kamvar, and C. Manning. From instance-level constraints to space-level constraints: Making the most of prior knowledge in data clustering. In Proceedings of the 19th International Conference on Machine Learning, pages 307--314, 2002.
[12]
M. Meilă. Comparing clusterings by the variation of information. In Proceedings of the 16th Annual Conference on Computational Learning Theory, pages 173--187, 2003.
[13]
B. Minaei-Bidgoli, A. Topchy, and W. F. Punch. Ensembles of partitions via data resampling. In Proceedings of the International Conference on Information Technology, volume 2, pages 188--192, 2004.
[14]
B. Mirkin. Reinterpreting the category utility function. Machine Learning, 45(2):219--218, 2001.
[15]
M. Schultz and T. Joachims. Learning a distance metric from relative comparisons. In Advances in Neural Information Processing Systems 16, pages 41--48, 2003.
[16]
A. Strehl and J. Ghosh. Cluster ensembles: A knowledge reuse framework for combining partitionings. Journal of Machine Learning Research, 3:583--617, 2002.
[17]
N. Tishby, F. C. Pereira, and W. Bialek. The information bottleneck method. In Proceedings of the 37th Annual Allerton Conference on Communication, Control and Computing, pages 368--377, 1999.
[18]
A. Topchy, M. Law, and A. K. Jain. Analysis of consensus partition in clustering ensemble. In Proceedings of the Fourth IEEE International Conference on Data Mining, pages 225--232, 2004.
[19]
A. Topchy, A. K. Jain, and W. Punch. Combining multiple weak clusterings. In Proceedings of the Third IEEE International Conference on Data Mining, pages 331--338, 2003.
[20]
A. Topchy, A. K. Jain, and W. Punch. A mixture model for clustering ensembles. In Proceedings of the Fourth SIAM Conference on Data Mining, pages 379--390, 2004.
[21]
S. Vaithyanathan and D. Gondek. Clustering with informative priors. Technical report, IBM Almaden Research Center, 2002.
[22]
K. Wagstaff and C. Cardie. Clustering with instance-level constraints. In Proceedings of the 17th International Conference on Machine Learning, pages 1103--1110, 2000.
[23]
E. P. Xing, A. Y. Ng, M. I. Jordan, and S. Russell. Distance metric learning, with application to clustering with side information. In Advances in Neural Information Processing Systems 15, pages 505--512, 2002.

Cited By

View all
  • (2024)Multiple clusterings: Recent advances and perspectivesComputer Science Review10.1016/j.cosrev.2024.10062152(100621)Online publication date: May-2024
  • (2020)Weighted Ensemble Clustering for Increasing the Accuracy of the Final ClusteringSignal and Data Processing10.29252/jsdp.17.2.10017:2(100-85)Online publication date: 1-Sep-2020
  • (2020)A Primal-Dual Subgradient Approach for Fair Meta Learning2020 IEEE International Conference on Data Mining (ICDM)10.1109/ICDM50108.2020.00091(821-830)Online publication date: Nov-2020
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '05: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
August 2005
844 pages
ISBN:159593135X
DOI:10.1145/1081870
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 August 2005

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. cluster ensembles
  2. non-redundant clustering

Qualifiers

  • Article

Conference

KDD05

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 09 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Multiple clusterings: Recent advances and perspectivesComputer Science Review10.1016/j.cosrev.2024.10062152(100621)Online publication date: May-2024
  • (2020)Weighted Ensemble Clustering for Increasing the Accuracy of the Final ClusteringSignal and Data Processing10.29252/jsdp.17.2.10017:2(100-85)Online publication date: 1-Sep-2020
  • (2020)A Primal-Dual Subgradient Approach for Fair Meta Learning2020 IEEE International Conference on Data Mining (ICDM)10.1109/ICDM50108.2020.00091(821-830)Online publication date: Nov-2020
  • (2020)Fair Meta-Learning For Few-Shot Classification2020 IEEE International Conference on Knowledge Graph (ICKG)10.1109/ICBK50248.2020.00047(275-282)Online publication date: Aug-2020
  • (2020)Unfairness Discovery and Prevention For Few-Shot Regression2020 IEEE International Conference on Knowledge Graph (ICKG)10.1109/ICBK50248.2020.00029(137-144)Online publication date: Aug-2020
  • (2019)Rank-Based Multi-task Learning for Fair Regression2019 IEEE International Conference on Data Mining (ICDM)10.1109/ICDM.2019.00102(916-925)Online publication date: Nov-2019
  • (2015)A flexible cluster-oriented alternative clustering algorithm for choosing from the Pareto front of solutionsMachine Language10.1007/s10994-013-5350-y98:1-2(57-91)Online publication date: 1-Jan-2015
  • (2015)A Note on Clustering Difference by Maximizing Variation of InformationComputational Social Networks10.1007/978-3-319-21786-4_13(148-159)Online publication date: 31-Jul-2015
  • (2014)Representative clustering of uncertain dataProceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining10.1145/2623330.2623725(243-252)Online publication date: 24-Aug-2014
  • (2014)Multi-Label Classification Based on Multi-Objective OptimizationACM Transactions on Intelligent Systems and Technology10.1145/25052725:2(1-22)Online publication date: 30-Apr-2014
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media