Abstract
In recent years, the growing volume of data in numerous clustering tasks has greatly boosted the existing clustering algorithms in dealing with very large datasets. The K-means has been one of the most popular clustering algorithms because of its simplicity and easiness in application, but its efficiency and effectiveness for large datasets are often unacceptable. In contrast to the K-means algorithm, most existing grid-clustering algorithms have linear time and space complexities and thus can perform well for large datasets. In this paper, we propose a grid-based partitional algorithm to overcome the drawbacks of the K-means clustering algorithm. This new algorithm is based on two major concepts: 1) maximizing the average density of a group of grids instead of minimizing the minimal square error which is applied in the K-means algorithm, and 2) using gridclustering algorithms to thoroughly reformulate the object-driven assigning in the K-means algorithm into a new grid-driven assigning. Consequently, our proposed algorithm obtains an average speed-up about 10–100 times faster and produces better partitions than those by the K-means algorithm. Also, compared with the K-means algorithm, our proposed algorithm has ability to partition any dataset when the number of clusters is unknown. The effectiveness of our proposed algorithm has been demonstrated through successfully clustering datasets with different features in comparison with the other three typical clustering algorithms besides the K-means algorithm.
Similar content being viewed by others
References
MacQueen J B. Some methods for classification and analysis of multivariate observations. In: The 5th Berkeley Symposium on Mathematical and Probability. Berkeley, 1967, 1: 281–297
Jenssen R, Erdogmus D, Hild K, et al. Information cut for clustering using a gradient decent approach. Patt Recogn, 2007, 40: 796–806
Xu R, Wunsch D. Survey of clustering algorithm. IEEE Trans Neur Netw, 2005, 16: 645–678
Guha S, Rastogi R, Shim K. CURE: An efficient clustering algorithm for large databases. Inf Syst, 2001, 26: 35–58
Pedrycz W. Fuzzy clustering with a knowledge-based guidance. Patt Recogn Lett, 2004, 25: 469–480
Huang Z, Ng M K, Rong H. Automated variable weighting in k-means type clustering. IEEE Trans Patt Anal Mach Intell, 2005, 27: 657–668
Jing L, Gao Y, Wu G, et al. Feature weighting k-means algorithm for large-scale documents clustering. In: Proc. 1st China Classification Conf. Beijing, China, 2005, 1: 85–90
Fabrizio S. Machine learning in automated text categorization. ACM Comput Surv, 2002, 34: 1–47
Yu J. General C-means clustering model. IEEE Trans Patt Anal Mach Intell, 2003, 25: 1197–1211
Hinneburg A, Keim D A. An efficient approach to clustering in large multimedia databases with noise. In: Proc Int Conf Knowl Disc Data Mining, New York, 1998. 58–65
Eden W, Ma M, Tommy W S. A new shifting grid clustering algorithm. Patt Recogn, 2004, 37: 503–514
Agrawal R, Gehrke J, Gunopulos D. Automatic subspace clustering of high dimensional data. Data Mining. Knowl Discov, 2005, 11: 5–33
Yue S, Wei M, Wang J. A general grid-based approach to clustering. Patt Recogn Lett, 2008, 29: 1372–1384
Ordonez C, Omiecinski E. Efficient disk-based K-means clustering for relational databases. IEEE Trans Knowl Data Eng, 2004, 16: 909–921
Chiang J, Yin Z. Unsupervised minor prototype detection using an adaptive population partitioning algorithm. Patt Recogn, 2007, 40: 3132–3145
Parizeau M, Lee S W. A fuzzy-syntactic approach to allograph modeling for cursive script recognition. IEEE Trans Patt Anal Mach Intell, 1995, 17: 702–712
Chiang H, Yue S, Yin Z. A new fuzzy cover approach to clustering. IEEE Trans Fuzzy Syst, 2004, 12: 199–208
AlphaMiner2.0: http://bi.hitsz.edu.cn/alphaminer/index.htm
Zalik K. An efficient k-means clustering algorithm. Patt Recogn Lett, 2008, 29: 1385–1391
Krishnapuran R, Keller J M. A possiblistic c-means algorithm. IEEE Trans Fuzzy Syst, 1993, 2: 100–112
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Yue, S., Wang, J., Tao, G. et al. An unsupervised grid-based approach for clustering analysis. Sci. China Inf. Sci. 53, 1345–1357 (2010). https://doi.org/10.1007/s11432-010-3112-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11432-010-3112-z