Parallel gravitational clustering based on grid partitioning for large-scale data

Lei Chen ORCID: orcid.org/0000-0002-8000-7872¹,
Fadong Chen¹,
Zhaohua Liu¹,
Mingyang Lv¹,
Tingqin He² &
…
Shiwen Zhang²

473 Accesses
8 Citations
1 Altmetric
Explore all metrics

Abstract

The gravitational clustering algorithm is a dynamic clustering model that achieves outstanding performance in uncovering the hidden clusters of a complex dataset with any shape, density and distribution. This algorithm is very suitable for mining irregular and unbalanced clusters from large-scale datasets with noise. However, the unbearable time overhead makes this algorithm ineffective to apply at large scales. Therefore, a parallel gravitational clustering algorithm based on grid partitioning (PGCGP) is developed in this paper. First, a grid partitioning strategy is designed to divide a large-scale dataset into multiple grids as evenly as possible. Second, a neighbourhood repair strategy is proposed to work with the gravitational clustering algorithm to accurately mine the clusters of a single grid. Finally, a border point alignment strategy is devised to determine whether to merge two small clusters located in different grids to discover the real clusters of the original large dataset by merging multiple grids. Extensive experiments on multiple artificial and real-world datasets verify that our PGCGP approach achieves good performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Generating clusters of similar sizes by constrained balanced clustering

Article 09 August 2021

Density Peak Clustering Based on Cumulative Nearest Neighbors Degree and Micro Cluster Merging

Article 02 August 2019

A Data Stream Clustering Algorithm Based on Density and Extended Grid

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Saxena A, Prasad M, Gupta A, et al. (2017) A review of clustering techniques and developments. Neurocomputing 267:664–681
Article Google Scholar
Boxiang Z, Shuliang W, Chuanlu L (2021) State: A clustering algorithm focusing on edges instead of centers. Chin J Electron 30(5):902–908
Article Google Scholar
Wang S, Li Q, Zhao C, et al. (2021) Extreme clustering–a clustering method via density extreme points. Inf Sci 542:24–39
Article MathSciNet MATH Google Scholar
Kumar H (2019) Clustering techniques: A review on some clustering algorithms. Emerging Trends and Applications in Cognitive Computing, pp 198–223
Bae J, Helldin T, Riveiro M, et al. (2020) Interactive clustering: A comprehensive review. ACM Computing Surveys (CSUR) 53(1):1–39
Article Google Scholar
Jafarzadegan M, Safi-Esfahani F, Beheshti Z (2019) Combining hierarchical clustering approaches using the pca method. Expert Syst Appl 137:1–10
Article Google Scholar
Wang S, Wang D, Li C et al (2016) Clustering by fast search and find of density peaks with data field. Chin J Electron 25(3):397–402
Article MathSciNet Google Scholar
Khan K, Rehman SU, Aziz K et al (2014) Dbscan: Past, present and future. In: The fifth international conference on the applications of digital information and web technologies (ICADIWT 2014), IEEE, pp 232–238
Chen L, Zhang J, Cai L, et al. (2017) Fast community detection based on distance dynamics. Tsinghua Sci Technol 22(6):564– 585
Article MATH Google Scholar
Pang N, Zhang J, Zhang C, et al. (2018) Parallel hierarchical subspace clustering of categorical data. IEEE Trans Comput 68(4):542–555
Article MathSciNet MATH Google Scholar
Chen L, Guo Q, Liu Z, et al. (2021) Enhanced synchronization-inspired clustering for high-dimensional data. Complex & Intelligent Systems 7(1):203–223
Article Google Scholar
Ianni M, Masciari E, Mazzeo GM, et al. (2020) Fast and effective big data exploration by clustering. Futur Gener Comput Syst 102:84–94
Article Google Scholar
Pandove D, Goel S, Rani R (2018) Systematic review of clustering high-dimensional and large datasets. ACM Transactions on Knowledge Discovery from Data (TKDD) 12(2):1–68
Article Google Scholar
Lin WC, Tsai CF, Hu YH, et al. (2017) Clustering-based undersampling in class-imbalanced data. Inf Sci 409:17– 26
Article Google Scholar
Wen L, Zhou K, Yang S, et al. (2018) Compression of smart meter big data: A survey. Renew Sust Energ Rev 91:59–69
Article Google Scholar
Dafir Z, Lamari Y, Slaoui SC (2021) A survey on parallel clustering algorithms for big data. Artif Intell Rev 54(4):2411–2443
Article Google Scholar
Shen Y, Pedrycz W, Chen Y et al (2019) Hyperplane division in fuzzy c-means: Clustering big data. IEEE Trans Fuzzy Syst 28(11):3032–3046
Article Google Scholar
Gomez J, Dasgupta D, Nasraoui O (2003) A new gravitational clustering algorithm. In: Proceedings of the 2003 SIAM international conference on data mining, SIAM, pp 83–94
Binder P, Muma M, Zoubir AM (2018) Gravitational clustering: A simple, robust and adaptive approach for distributed networks. Signal Process 149:36–48
Article Google Scholar
Alswaitti M, Ishak MK, Isa NAM (2018) Optimized gravitational-based data clustering algorithm. Eng Appl Artif Intell 73:126– 148
Article Google Scholar
Li Q, Wang S, Zhao C, et al. (2021) Hibog: Improving the clustering accuracy by ameliorating dataset with gravitation. Inf Sci 550:41–56
Article MathSciNet Google Scholar
Shi Y, Song Y, Zhang A (2005) A shrinking-based clustering approach for multidimensional data. IEEE Trans Knowl Data Eng 17(10):1389–1403
Article Google Scholar
Wong KC, Peng C, Li Y, et al. (2014) Herd clustering: A synergistic data clustering approach using collective intelligence. Appl Soft Comput 23:61–75
Article Google Scholar
Zhang J, Zhang X (2018) Gravitational clustering of cosmic relic neutrinos in the milky way. Nat Commun 9(1):1–7
Google Scholar
Kim JH, Choi JH, Yoo KH, et al. (2019) Aa-dbscan: An approximate adaptive dbscan for finding clusters with varying densities. The Journal of Supercomputing 75(1):142–169
Article Google Scholar
Andrade G, Ramos G, Madeira D, et al. (2013) G-dbscan: A gpu accelerated algorithm for density-based clustering. Procedia Computer Science 18:369–378
Article Google Scholar
Huo Z, Mei G, Casolla G, et al. (2020) Designing an efficient parallel spectral clustering algorithm on multi-core processors in julia. Journal of Parallel and Distributed Computing 138:211–221
Article Google Scholar
Shao J, Tan Y, Gao L, et al. (2019) Synchronization-based clustering on evolving data stream. Inf Sci 501:573–587
Article MathSciNet Google Scholar
Ying W, Chung FL, Wang S (2013) Scaling up synchronization-inspired partitioning clustering. IEEE Trans Knowl Data Eng 26(8):2045–2057
Article Google Scholar
Abualigah LM, Khader AT, Hanandeh ES (2018) Hybrid clustering analysis using improved krill herd algorithm. Appl Intell 48(11):4047–4071
Article Google Scholar
AL-Sharuee MT, Liu F, Pratama M (2021) Sentiment analysis: Dynamic and temporal clustering of product reviews. Appl Intell 51(1):51–70
Article Google Scholar
Mojarad M, Nejatian S, Parvin H, et al. (2019) A fuzzy clustering ensemble based on cluster clustering and iterative fusion of base clusters. Appl Intell 49(7):2567–2581
Article Google Scholar
Chen Y, Hu X, Fan W et al (2020) Fast density peak clustering for large scale data based on knn. Knowledge-Based Systems 187:104,824
Article Google Scholar
Galán SF (2019) Comparative evaluation of region query strategies for dbscan clustering. Inf Sci 502:76–90
Article MathSciNet Google Scholar

Download references

Acknowledgments

This work is supported by the National Natural Science Foundation of China (Nos.62103143 and 61702180); the Hunan Provincial Natural Science Foundation of China (Nos.2020JJ5199 and 2021JJ40214); the National Defense Basic Research Program of China (JCKY2019403D006); the National Key Research and Development Program (No.2019YFE0105300); the Scientific Research Fund of Hunan Provincial Education Department (Nos.20C0786 and 20C0781); and the Hunan Province Science and Technology Project Funds (No.2018TP1036).

Author information

Authors and Affiliations

School of Information and Electrical Engineering, Hunan University of Science and Technology, Xiangtan, 411201, China
Lei Chen, Fadong Chen, Zhaohua Liu & Mingyang Lv
School of Computer Science and Engineering, Hunan University of Science and Technology, Xiangtan, 411201, China
Tingqin He & Shiwen Zhang

Authors

Lei Chen
View author publications
You can also search for this author in PubMed Google Scholar
Fadong Chen
View author publications
You can also search for this author in PubMed Google Scholar
Zhaohua Liu
View author publications
You can also search for this author in PubMed Google Scholar
Mingyang Lv
View author publications
You can also search for this author in PubMed Google Scholar
Tingqin He
View author publications
You can also search for this author in PubMed Google Scholar
Shiwen Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lei Chen.

Ethics declarations

Conflict of Interests

The authors declare that there is no conflict of interest regarding the publication of the article.

Additional information

Availability of data and materials

The MAGIC, SHUTTLE, SKIN, and Poker Hand datasets are the public datasets, they are available in the UCI machine learning repository (http://archive.ics.uci.edu/ml/). The DS1, DS2, DS3, DS4 datasets are available on request from the corresponding author.

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Fadong Chen, Zhaohua Liu, Mingyang Lv, Tingqin He and Shiwen Zhang contributed equally to this work.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, L., Chen, F., Liu, Z. et al. Parallel gravitational clustering based on grid partitioning for large-scale data. Appl Intell 53, 2506–2526 (2023). https://doi.org/10.1007/s10489-022-03661-7

Download citation

Accepted: 17 April 2022
Published: 10 May 2022
Issue Date: February 2023
DOI: https://doi.org/10.1007/s10489-022-03661-7

Parallel gravitational clustering based on grid partitioning for large-scale data

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Generating clusters of similar sizes by constrained balanced clustering

Density Peak Clustering Based on Cumulative Nearest Neighbors Degree and Micro Cluster Merging

A Data Stream Clustering Algorithm Based on Density and Extended Grid

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Availability of data and materials

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Parallel gravitational clustering based on grid partitioning for large-scale data

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Generating clusters of similar sizes by constrained balanced clustering

Density Peak Clustering Based on Cumulative Nearest Neighbors Degree and Micro Cluster Merging

A Data Stream Clustering Algorithm Based on Density and Extended Grid

Explore related subjects

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Availability of data and materials

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation