An unsupervised grid-based approach for clustering analysis

ShiHong Yue¹,
JeenShing Wang²,
Gao Tao¹ &
…
HuaXiang Wang¹

204 Accesses
17 Citations
Explore all metrics

Abstract

In recent years, the growing volume of data in numerous clustering tasks has greatly boosted the existing clustering algorithms in dealing with very large datasets. The K-means has been one of the most popular clustering algorithms because of its simplicity and easiness in application, but its efficiency and effectiveness for large datasets are often unacceptable. In contrast to the K-means algorithm, most existing grid-clustering algorithms have linear time and space complexities and thus can perform well for large datasets. In this paper, we propose a grid-based partitional algorithm to overcome the drawbacks of the K-means clustering algorithm. This new algorithm is based on two major concepts: 1) maximizing the average density of a group of grids instead of minimizing the minimal square error which is applied in the K-means algorithm, and 2) using gridclustering algorithms to thoroughly reformulate the object-driven assigning in the K-means algorithm into a new grid-driven assigning. Consequently, our proposed algorithm obtains an average speed-up about 10–100 times faster and produces better partitions than those by the K-means algorithm. Also, compared with the K-means algorithm, our proposed algorithm has ability to partition any dataset when the number of clusters is unknown. The effectiveness of our proposed algorithm has been demonstrated through successfully clustering datasets with different features in comparison with the other three typical clustering algorithms besides the K-means algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

References

MacQueen J B. Some methods for classification and analysis of multivariate observations. In: The 5th Berkeley Symposium on Mathematical and Probability. Berkeley, 1967, 1: 281–297
MathSciNet Google Scholar
Jenssen R, Erdogmus D, Hild K, et al. Information cut for clustering using a gradient decent approach. Patt Recogn, 2007, 40: 796–806
Article MATH Google Scholar
Xu R, Wunsch D. Survey of clustering algorithm. IEEE Trans Neur Netw, 2005, 16: 645–678
Article Google Scholar
Guha S, Rastogi R, Shim K. CURE: An efficient clustering algorithm for large databases. Inf Syst, 2001, 26: 35–58
Article MATH Google Scholar
Pedrycz W. Fuzzy clustering with a knowledge-based guidance. Patt Recogn Lett, 2004, 25: 469–480
Article Google Scholar
Huang Z, Ng M K, Rong H. Automated variable weighting in k-means type clustering. IEEE Trans Patt Anal Mach Intell, 2005, 27: 657–668
Article Google Scholar
Jing L, Gao Y, Wu G, et al. Feature weighting k-means algorithm for large-scale documents clustering. In: Proc. 1st China Classification Conf. Beijing, China, 2005, 1: 85–90
Google Scholar
Fabrizio S. Machine learning in automated text categorization. ACM Comput Surv, 2002, 34: 1–47
Article Google Scholar
Yu J. General C-means clustering model. IEEE Trans Patt Anal Mach Intell, 2003, 25: 1197–1211
Google Scholar
Hinneburg A, Keim D A. An efficient approach to clustering in large multimedia databases with noise. In: Proc Int Conf Knowl Disc Data Mining, New York, 1998. 58–65
Eden W, Ma M, Tommy W S. A new shifting grid clustering algorithm. Patt Recogn, 2004, 37: 503–514
Article MATH Google Scholar
Agrawal R, Gehrke J, Gunopulos D. Automatic subspace clustering of high dimensional data. Data Mining. Knowl Discov, 2005, 11: 5–33
Article MathSciNet Google Scholar
Yue S, Wei M, Wang J. A general grid-based approach to clustering. Patt Recogn Lett, 2008, 29: 1372–1384
Article Google Scholar
Ordonez C, Omiecinski E. Efficient disk-based K-means clustering for relational databases. IEEE Trans Knowl Data Eng, 2004, 16: 909–921
Article Google Scholar
Chiang J, Yin Z. Unsupervised minor prototype detection using an adaptive population partitioning algorithm. Patt Recogn, 2007, 40: 3132–3145
Article MATH Google Scholar
Parizeau M, Lee S W. A fuzzy-syntactic approach to allograph modeling for cursive script recognition. IEEE Trans Patt Anal Mach Intell, 1995, 17: 702–712
Article Google Scholar
Chiang H, Yue S, Yin Z. A new fuzzy cover approach to clustering. IEEE Trans Fuzzy Syst, 2004, 12: 199–208
Article Google Scholar
AlphaMiner2.0: http://bi.hitsz.edu.cn/alphaminer/index.htm
Zalik K. An efficient k-means clustering algorithm. Patt Recogn Lett, 2008, 29: 1385–1391
Article Google Scholar
Krishnapuran R, Keller J M. A possiblistic c-means algorithm. IEEE Trans Fuzzy Syst, 1993, 2: 100–112
Google Scholar

Download references

Author information

Authors and Affiliations

School of Electrical Engineering and Automation, Tianjin Key Laboratory of Process Measurement and Control, Tianjin University, Tianjin, 300072, China
ShiHong Yue, Gao Tao & HuaXiang Wang
Department of Electrical Engineering, National Cheng Kung University, Tainan, 701, China
JeenShing Wang

Authors

ShiHong Yue
View author publications
You can also search for this author in PubMed Google Scholar
JeenShing Wang
View author publications
You can also search for this author in PubMed Google Scholar
Gao Tao
View author publications
You can also search for this author in PubMed Google Scholar
HuaXiang Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to ShiHong Yue.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yue, S., Wang, J., Tao, G. et al. An unsupervised grid-based approach for clustering analysis. Sci. China Inf. Sci. 53, 1345–1357 (2010). https://doi.org/10.1007/s11432-010-3112-z

Download citation

Received: 17 February 2009
Accepted: 05 December 2009
Published: 10 July 2010
Issue Date: July 2010
DOI: https://doi.org/10.1007/s11432-010-3112-z

An unsupervised grid-based approach for clustering analysis

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Fast and Efficient Grid-Based K-means++ Clustering Algorithm for Large-Scale Datasets

Non-hierarchical Clustering for Large Data Without Recalculating Cluster Center

Clustering algorithm based on Hopkins statistics and K-means

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

An unsupervised grid-based approach for clustering analysis

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Fast and Efficient Grid-Based K-means++ Clustering Algorithm for Large-Scale Datasets

Non-hierarchical Clustering for Large Data Without Recalculating Cluster Center

Clustering algorithm based on Hopkins statistics and K-means

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation