Abstract
The gravitational clustering algorithm is a dynamic clustering model that achieves outstanding performance in uncovering the hidden clusters of a complex dataset with any shape, density and distribution. This algorithm is very suitable for mining irregular and unbalanced clusters from large-scale datasets with noise. However, the unbearable time overhead makes this algorithm ineffective to apply at large scales. Therefore, a parallel gravitational clustering algorithm based on grid partitioning (PGCGP) is developed in this paper. First, a grid partitioning strategy is designed to divide a large-scale dataset into multiple grids as evenly as possible. Second, a neighbourhood repair strategy is proposed to work with the gravitational clustering algorithm to accurately mine the clusters of a single grid. Finally, a border point alignment strategy is devised to determine whether to merge two small clusters located in different grids to discover the real clusters of the original large dataset by merging multiple grids. Extensive experiments on multiple artificial and real-world datasets verify that our PGCGP approach achieves good performance.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Saxena A, Prasad M, Gupta A, et al. (2017) A review of clustering techniques and developments. Neurocomputing 267:664–681
Boxiang Z, Shuliang W, Chuanlu L (2021) State: A clustering algorithm focusing on edges instead of centers. Chin J Electron 30(5):902–908
Wang S, Li Q, Zhao C, et al. (2021) Extreme clustering–a clustering method via density extreme points. Inf Sci 542:24–39
Kumar H (2019) Clustering techniques: A review on some clustering algorithms. Emerging Trends and Applications in Cognitive Computing, pp 198–223
Bae J, Helldin T, Riveiro M, et al. (2020) Interactive clustering: A comprehensive review. ACM Computing Surveys (CSUR) 53(1):1–39
Jafarzadegan M, Safi-Esfahani F, Beheshti Z (2019) Combining hierarchical clustering approaches using the pca method. Expert Syst Appl 137:1–10
Wang S, Wang D, Li C et al (2016) Clustering by fast search and find of density peaks with data field. Chin J Electron 25(3):397–402
Khan K, Rehman SU, Aziz K et al (2014) Dbscan: Past, present and future. In: The fifth international conference on the applications of digital information and web technologies (ICADIWT 2014), IEEE, pp 232–238
Chen L, Zhang J, Cai L, et al. (2017) Fast community detection based on distance dynamics. Tsinghua Sci Technol 22(6):564– 585
Pang N, Zhang J, Zhang C, et al. (2018) Parallel hierarchical subspace clustering of categorical data. IEEE Trans Comput 68(4):542–555
Chen L, Guo Q, Liu Z, et al. (2021) Enhanced synchronization-inspired clustering for high-dimensional data. Complex & Intelligent Systems 7(1):203–223
Ianni M, Masciari E, Mazzeo GM, et al. (2020) Fast and effective big data exploration by clustering. Futur Gener Comput Syst 102:84–94
Pandove D, Goel S, Rani R (2018) Systematic review of clustering high-dimensional and large datasets. ACM Transactions on Knowledge Discovery from Data (TKDD) 12(2):1–68
Lin WC, Tsai CF, Hu YH, et al. (2017) Clustering-based undersampling in class-imbalanced data. Inf Sci 409:17– 26
Wen L, Zhou K, Yang S, et al. (2018) Compression of smart meter big data: A survey. Renew Sust Energ Rev 91:59–69
Dafir Z, Lamari Y, Slaoui SC (2021) A survey on parallel clustering algorithms for big data. Artif Intell Rev 54(4):2411–2443
Shen Y, Pedrycz W, Chen Y et al (2019) Hyperplane division in fuzzy c-means: Clustering big data. IEEE Trans Fuzzy Syst 28(11):3032–3046
Gomez J, Dasgupta D, Nasraoui O (2003) A new gravitational clustering algorithm. In: Proceedings of the 2003 SIAM international conference on data mining, SIAM, pp 83–94
Binder P, Muma M, Zoubir AM (2018) Gravitational clustering: A simple, robust and adaptive approach for distributed networks. Signal Process 149:36–48
Alswaitti M, Ishak MK, Isa NAM (2018) Optimized gravitational-based data clustering algorithm. Eng Appl Artif Intell 73:126– 148
Li Q, Wang S, Zhao C, et al. (2021) Hibog: Improving the clustering accuracy by ameliorating dataset with gravitation. Inf Sci 550:41–56
Shi Y, Song Y, Zhang A (2005) A shrinking-based clustering approach for multidimensional data. IEEE Trans Knowl Data Eng 17(10):1389–1403
Wong KC, Peng C, Li Y, et al. (2014) Herd clustering: A synergistic data clustering approach using collective intelligence. Appl Soft Comput 23:61–75
Zhang J, Zhang X (2018) Gravitational clustering of cosmic relic neutrinos in the milky way. Nat Commun 9(1):1–7
Kim JH, Choi JH, Yoo KH, et al. (2019) Aa-dbscan: An approximate adaptive dbscan for finding clusters with varying densities. The Journal of Supercomputing 75(1):142–169
Andrade G, Ramos G, Madeira D, et al. (2013) G-dbscan: A gpu accelerated algorithm for density-based clustering. Procedia Computer Science 18:369–378
Huo Z, Mei G, Casolla G, et al. (2020) Designing an efficient parallel spectral clustering algorithm on multi-core processors in julia. Journal of Parallel and Distributed Computing 138:211–221
Shao J, Tan Y, Gao L, et al. (2019) Synchronization-based clustering on evolving data stream. Inf Sci 501:573–587
Ying W, Chung FL, Wang S (2013) Scaling up synchronization-inspired partitioning clustering. IEEE Trans Knowl Data Eng 26(8):2045–2057
Abualigah LM, Khader AT, Hanandeh ES (2018) Hybrid clustering analysis using improved krill herd algorithm. Appl Intell 48(11):4047–4071
AL-Sharuee MT, Liu F, Pratama M (2021) Sentiment analysis: Dynamic and temporal clustering of product reviews. Appl Intell 51(1):51–70
Mojarad M, Nejatian S, Parvin H, et al. (2019) A fuzzy clustering ensemble based on cluster clustering and iterative fusion of base clusters. Appl Intell 49(7):2567–2581
Chen Y, Hu X, Fan W et al (2020) Fast density peak clustering for large scale data based on knn. Knowledge-Based Systems 187:104,824
Galán SF (2019) Comparative evaluation of region query strategies for dbscan clustering. Inf Sci 502:76–90
Acknowledgments
This work is supported by the National Natural Science Foundation of China (Nos.62103143 and 61702180); the Hunan Provincial Natural Science Foundation of China (Nos.2020JJ5199 and 2021JJ40214); the National Defense Basic Research Program of China (JCKY2019403D006); the National Key Research and Development Program (No.2019YFE0105300); the Scientific Research Fund of Hunan Provincial Education Department (Nos.20C0786 and 20C0781); and the Hunan Province Science and Technology Project Funds (No.2018TP1036).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interests
The authors declare that there is no conflict of interest regarding the publication of the article.
Additional information
Availability of data and materials
The MAGIC, SHUTTLE, SKIN, and Poker Hand datasets are the public datasets, they are available in the UCI machine learning repository (http://archive.ics.uci.edu/ml/). The DS1, DS2, DS3, DS4 datasets are available on request from the corresponding author.
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Fadong Chen, Zhaohua Liu, Mingyang Lv, Tingqin He and Shiwen Zhang contributed equally to this work.
Rights and permissions
About this article
Cite this article
Chen, L., Chen, F., Liu, Z. et al. Parallel gravitational clustering based on grid partitioning for large-scale data. Appl Intell 53, 2506–2526 (2023). https://doi.org/10.1007/s10489-022-03661-7
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-022-03661-7