Abstract
Ensemble clustering has attracted much attention for its robustness and effectiveness compared to single clustering. As one of the representative methods, most co-association matrix-based ensemble clustering typically only take into account a single type of information contained in base partitions. This study proposes a new weighted ensemble clustering algorithm of fusing multi-level data information to sufficiently mine the information from the base partition family. Three different levels of data information, including partition granularity level, cluster granularity level and sample granularity level, are concomitantly considered in the co-association matrix. More specifically, we utilize knowledge granularity to measure the quality of base partitions, and rough membership to quantify the credibility of base clusters; Additionally, the relative similarity of a pair of samples is estimated with respect to different base partitions, taking into account the close relationship between samples and the structure of base clusters. Subsequently, the partition-cluster-sample-granularity weighted co-association (PCSCA) matrix is proposed to address the limitations of the co-association matrix, quantifying the quality of information at multiple levels. Finally, this study introduces the partition-cluster-sample-granularity weighted ensemble clustering (PCSEC), which incorporates the PCSCA matrix. The experimental results demonstrate the effectiveness of the proposed method.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data Availability
The datasets analysed during the current study are available from the corresponding author on reasonable request.
Notes
Synthetic datasets derived from the website: https://github.com/milaan9/Clustering-Datasets/tree/master/02.%20Synthetic.
Real datasets derived from the website: http://archive.ics.uci.edu/ml/datasets.php.
References
MacQueen J (1967) Some methods for classification and analysis of multivariate observations. Berkeley Symp Math Stat Probab 1967:281–297
Arthur D, Vassilvitskii S (2007) K-means++: the advantages of careful seeding. In: The annual ACM-SIAM symposium on discrete algorithms, pp 1027–1035
Frey BJ, Dueck D (2007) Clustering by passing messages between data points. Science 315(5814):972–976
Zhou ZH (2009) Ensemble learning. Encyclopedia of biometrics, pp 270–273
Ren YZ, Domeniconi C, Zhang GJ, Yu GX (2017) Weighted-object ensemble clustering: methods and analysis. Knowl Inf Syst 51(2):661–689
Tao ZQ, Liu HF, Li J, Wang ZW, Fu Y (2019) Adversarial graph embedding for ensemble clustering. In: International joint conferences on artificial intelligence, pp 3562–3568
Zhou P, Du L, Li XJ (2020) Self-paced consensus clustering with bipartite graph. In: International joint conferences on artificial intelligence, pp 2133–2139
Huang D, Wang CD, Lai JH (2023) Fast multi-view clustering via ensembles: towards scalability, superiority, and simplicity. IEEE Trans Knowl Data Eng 35(11):11388–11402
Zhou J, Zheng HC, Pan LL (2019) Ensemble clustering based on dense representation. Neurocomputing 357:66–76
Bagherinia A, Minaei-Bidgoli B, Hosseinzadeh M, Parvin H (2021) Reliability-based fuzzy clustering ensemble. Fuzzy Sets Syst 413:1–28
Hu J, Li TR, Wang HJ, Fujita H (2016) Hierarchical cluster ensemble model based on knowledge granulation. Knowl-Based Syst 91:179–188
Fred ALN (2001) Finding consistent clusters in data partitions. Lect Notes Comput Sci 2096:309–318
Fred ALN, Jain AK (2002) Data clustering using evidence accumulation. In: International conference on pattern recognition, pp 276–280
Fred ALN, Jain AK (2005) Combining multiple clusterings using evidence accumulation. IEEE Trans Pattern Anal Mach Intell 27(6):835–850
Jain AK, Murty MN, Flynn PJ (1999) Data clustering. ACM Comput Surv (CSUR) 31(3):264–323
Huang D, Lai JH, Wang CD (2015) Combining multiple clusterings via crowd agreement estimation and multi-granularity link analysis. Neurocomputing 170:240–250
Huang D, Wang CD, Lai JH (2018) Locally weighted ensemble clustering. IEEE Trans Cybern 48(5):1460–1473
Gu QH, Wang Y, Wang PP, Li XX, Chen L, Xiong NN, Liu D (2024) An improved weighted ensemble clustering based on two-tier uncertainty measurement. Expert Syst Appl 238(Part A):121672
Niu XY, Zhang CW, Zhao XJ, Hu LH, Zhang JF (2023) A multi-view ensemble clustering approach using joint affinity matrix. Expert Syst Appl 216:119484
Xu L, Ding SF (2021) Dual-granularity weighted ensemble clustering. Knowl-Based Syst 225:107124
Huang D, Lai JH, Wang CD (2016) Robust ensemble clustering using probability trajectories. IEEE Trans Knowl Data Eng 28(5):1312–1326
Li, FJ, Qian YH, Wang JT (2021) GoT: a growing tree model for clustering ensemble. In: the AAAI conference on artificial intelligence, pp 8349–8356
Xu JX, Li TY, Zhang DZ, Wu J (2024) Ensemble clustering via fusing global and local structure information. Expert Syst Appl 237(Part B):121557
Li FJ, Qian YH, Wang JT, Dang CY, Jing LP (2019) Clustering ensemble based on sample’s stability. Artif Intell 273:37–55
Ji X, Liu SS, Zhao P, Li XJ, Liu Q (2021) Clustering ensemble based on sample’s certainty. Cogn Comput 13:1034–1046
Ji X, Liu SS, Yang L, Ye WL, Zhao P (2022) Clustering ensemble based on approximate accuracy of the equivalence granularity. Appl Soft Comput 129:109492
Lin TY (2003) Granular computing. Rough sets, fuzzy sets, data mining, and granular computing, pp 16–24
Pawlak Z (1982) Rough sets. Int J Comput Inform Sci 11(5):341–356
Chakrabarty K (2001) Roughness indicator fuzzy set. Developments in soft computing, pp 56–61
Miao DQ, Fan SD (2002) The calculation of knowledge granulation and its application. Syst Eng Theory Pract 22:48–59
Liang JY, Wang JH, Qian YH (2009) A new measure of uncertainty based on knowledge granulation for rough sets. Inf Sci 179(4):458–470
Rendón E, Abundez I, Arizmendi A, Quiroz EM (2011) Internal versus external cluster validation indexes. Int J Comput Commun 5(1):27–34
Hubert L, Arabie P (1985) Comparing partitions. J Classif 2(1):193–218
Strehl A, Ghosh J (2003) Cluster ensembles – a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–617
Friedman M (1940) A comparison of alternative tests of significance for the problem of \(m\) rankings. Ann Math Stat 11(1):86–92
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
Funding
The authors would like to thank the editors and anonymous reviewers for their constructive comments. This work is supported by NSFC (No. 12231007), Hunan Provincial Natural Science Foundation of China (No. 2023JJ30113), and Guangdong Basic and Applied Basic Research Foundation (No. 2023A1515012342).
Author information
Authors and Affiliations
Contributions
All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by Mingjie Cai, Feng Xu and Qingguo Li. The first draft of the manuscript was written by Zhishan Wu and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.
Corresponding authors
Ethics declarations
Ethical and Informed Consent for Data Used
The data used in the current study are ethical.
Competing Interests
The authors have no competing interests to declare that are relevant to the content of this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wu, Z., Cai, M., Xu, F. et al. PCS-granularity weighted ensemble clustering via Co-association matrix. Appl Intell 54, 3884–3901 (2024). https://doi.org/10.1007/s10489-024-05368-3
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-024-05368-3