Abstract
Clustering is a popular data analysis technique that can explore the structure of data through cluster analysis. Similar data are put into the same cluster, while dissimilar data allocate to other clusters. The similarity/dissimilarity among data objects is determined using a distance function. Further, clustering algorithms aim to choose the optimal set of centroids for obtaining better partitioning, but clustering accuracy is always susceptible. This issue of clustering is addressed through meta-heuristic algorithms. This research also aims to handle the accuracy issue and presents a new algorithm for effective cluster analysis. The proposed clustering algorithm is inspired by a water flow optimizer (WFO). The WFO algorithm performance is validated on the well-defined clustering problems based on SSE, accuracy (AR) and detection rate (DR) parameters. The results indicate that the WFO algorithm gets higher clustering results in terms of SSE, AR and DR than the same class of algorithms. The performance is also validated using Friedman statistical test followed by a post hoc test. Results indicated that the proposed WFO gets better statistical results than other clustering algorithms.
Similar content being viewed by others
Abbreviations
- AAA:
-
Artificial algae algorithm
- ABC:
-
Artificial bee colony
- ACO:
-
Ant colony optimization
- AR:
-
Accuracy rate
- BB-BC:
-
Big bang–big crunch
- BH:
-
Black hole
- CS:
-
Cuckoo search
- CSS:
-
Charge system search
- DR:
-
Detection rate
- ECA:
-
Evolutionary center algorithm
- EFO:
-
Electromagnetic field optimization
- FA:
-
Firefly algorithm
- FA-SOM:
-
Firefly algorithm-self organizing maps
- FCM:
-
Fuzzy C-means
- FKM:
-
Fuzzy k-modes
- FPCOM:
-
Fuzzy C-means and Fuzzy C-ordered means
- GA:
-
Genetic algorithm
- GA-PFKM:
-
Genetic algorithm-possibilistic Fuzzy k-modes
- GWO:
-
Grey wolf optimization
- IGWO:
-
Improved grey wolf optimization
- KM:
-
K-means
- MOA:
-
Magnetic optimization algorithm
- MoCS:
-
Modified cuckoo search
- PSO:
-
Particle swarm optimization
- PSO-FCM:
-
Particle swarm optimization-Fuzzy C-means
- PSO-PFKM:
-
Particle swarm optimization-possibilistic Fuzzy k-modes
- ROA:
-
Remora optimization algorithm
- RSA:
-
Reptile search algorithm
- SCA:
-
Sine cosine algorithm
- SCA-PFKM:
-
Sine cosine algorithm-possibilistic Fuzzy k-modes
- SMSHO:
-
Selfish herd optimization algorithm and simplex method
- SOM:
-
Self-organizing maps
- SSE:
-
Sum of squared error
- UCI:
-
University of California Irvine
- VNS:
-
Variable neighbourhood strategy
- WFO:
-
Water flow optimizer
- WWO:
-
Water wave optimization
References
Allahyari M, Pouriyeh S, Assefi M, Safaei S, Trippe ED, Gutierrez JB, Kochut K (2017) A brief survey of text mining: classification, clustering and extraction techniques. arXiv preprint.
Rehm F, Klawonn F, Kruse R (2007) A novel approach to noise clustering for outlier detection. Soft Comput 11(5):489–494
Baraldi A, Blonda P (1999) A survey of fuzzy clustering algorithms for pattern recognition. I. IEEE Trans Syst Man Cybern Part B (Cybern) 29(6):778–785
Orhan U, Hekim M, Ozer M (2011) EEG signals classification using the K-means clustering and a multilayer perceptron neural network model. Expert Syst Appl 38(10):13475–13481
Kanwal S, Asghar S (2021) Speech emotion recognition using clustering based GA-optimized feature set. IEEE Access 9:125830–125842
Djenouri Y, Belhadi A, Belkebir R (2018) Bees swarm optimization guided by data mining techniques for document information retrieval. Expert Syst Appl 94:126–136
Steinbach M, Karypis G, Kumar V (2000) A comparison of document clustering techniques. Technical report, Department of computer science and engineering, University of Minnesota
Garces E, Munoz A, Lopez-Moreno J, Gutierrez D (2012) Intrinsic images by clustering. Computer graphics forum. Blackwell Publishing Ltd, Oxford, pp 1415–1424
Kaur A, Kumar Y (2022) A new metaheuristic algorithm based on water wave optimization for data clustering. Evol Intel 15(1):759–783
Kumar Y, Kaur A (2022) Variants of bat algorithm for solving partitional clustering problems. Eng Comput 38(3):1973–1999
Kaur A, Kumar Y (2022) Neighborhood search based improved bat algorithm for data clustering. Appl Intell 52(9):10541–10575
Kaur A, Kumar Y (2022) A multi-objective vibrating particle system algorithm for data clustering. Pattern Anal Appl 25(1):209–239
Saxena A, Prasad M, Gupta A, Bharill N, Patel OP, Tiwari A, Er MJ, Ding W, Lin CT (2017) A review of clustering techniques and developments. Neurocomputing 267:664–681
Özbakır L, Turna F (2017) Clustering performance comparison of new generation meta-heuristic algorithms. Knowl-Based Syst 130:1–16
Han J, Pei J, Tong H (2022) Data mining: concepts and techniques, Second edn. (Book), Morgan kaufmann, ISBN 10:1-55860-901-6
Xu R, Wunsch DC II (2011) BARTMAP: a viable structure for biclustering. Neural Netw 24(7):709–716
Jiang B, Pei J, Tao Y, Lin X (2011) Clustering uncertain data based on probability distribution similarity. IEEE Trans Knowl Data Eng 25(4):751–763
Mukhopadhyay A, Maulik U, Bandyopadhyay S (2015) A survey of multiobjective evolutionary clustering. ACM Comput Surv (CSUR) 47(4):1–46
Sevillano X, Alías F (2014) A one-shot domain-independent robust multimedia clustering methodology based on hybrid multimodal fusion. Multimed Tools Appl 73(3):1507–1543
Jain AK (2010) Data clustering: 50 years beyond K-means. Pattern Recogn Lett 31(8):651–666
Liang Z, Chen P (2021) An automatic clustering algorithm based on the density-peak framework and Chameleon method. Pattern Recogn Lett 150:40–48
Hao Y, Gwa B, Jga B et al (2020) Self-paced learning for K -means clustering algorithm. Pattern Recogn Lett 132:69–75
Singh H, Kumar Y (2019) Cellular automata based model for e-healthcare data analysis. Int J Inf Syst Model Design (IJISMD) 10(3):1–18
Kumar Y, Sahoo G (2018) Hybridization of magnetic charge system search method for efficient data clustering. Malays J Comput Sci 31(2):108–129
Kumar Y, Gupta S, Kumar D, Sahoo G (2016) A clustering approach based on charged particles. Optimization Algorithms-Methods and Applications, InTech. https://doi.org/10.5772/61426
Karaboga D, Ozturk C (2011) A novel clustering approach: artificial Bee Colony (ABC) algorithm. Appl Soft Comput 11(1):652–657
Alam S, Dobbie G, Koh YS, Riddle P, Rehman SU (2014) Research on particle swarm optimization based clustering: a systematic review of literature and techniques. Swarm Evol Comput 17:1–13
Luo K (2021) Water flow optimizer: a nature-inspired evolutionary algorithm for global optimization. IEEE Trans Cybern 52(8):7753–7764
Matos Macêdo FJ, da Rocha Neto AR (2022) A binary water flow optimizer applied to feature selection. International conference on intelligent data engineering and automated learning. Springer, Cham, pp 94–103
Verma H, Verma D, Tiwari PK (2021) A population based hybrid FCM-PSO algorithm for clustering analysis and segmentation of brain image. Expert Syst Appl 167:114121
Al-Behadili HNK (2022) Improved firefly algorithm with variable neighborhood search for data clustering. Baghdad Sci J 19(2):0409–0409
Xia H, Liu L (2022) Basketball big data and visual management system under metaheuristic clustering. Mobile Inf Syst 2022:14
Besharatnia F, Talebpour A, Aliakbary S (2022) An improved grey wolves optimization algorithm for dynamic community detection and data clustering. Appl Artif Intell 36(1):2012000
Singh H, Kumar Y (2022) An enhanced version of cat swarm optimization algorithm for cluster analysis. Int J Appl Metaheur Comput (IJAMC) 13(1):1–25
Kuo RJ, Zheng YR, Nguyen TPQ (2021) Metaheuristic-based possibilistic fuzzy k-modes algorithms for categorical data clustering. Inf Sci 557:1–15
Kushwaha N, Pant M, Sharma S (2022) Electromagnetic optimization-based clustering algorithm. Expert Syst 39(7):e12491
Mohanty PP, Nayak SK (2022) A modified cuckoo search algorithm for data clustering. Int J Appl Metaheur Comput (IJAMC) 13(1):1–32
Hashemi SE, Tavana M, Bakhshi M (2022) A new particle swarm optimization algorithm for optimizing big data clustering. SN Comput Sci 3(4):1–16
Kuo RJ, Lin JY, Nguyen TPQ (2021) An application of sine cosine algorithm-based fuzzy possibilistic c-ordered means algorithm to cluster analysis. Soft Comput 25(5):3469–3484
Hassan BA, Rashid TA (2021) A multidisciplinary ensemble algorithm for clustering heterogeneous datasets. Neural Comput Appl 33(17):10987–11010
Zhu Q, Tang X, Elahi A (2022) Automatic clustering based on dynamic parameters harmony search optimization algorithm. Pattern Anal Appl 25(4):693–709
Duan Y, Liu C, Li S, Guo X, Yang C (2022) Gradient-based elephant herding optimization for cluster analysis. Appl Intell 52(10):11606–11637
Rashidi R, Khamforoosh K, Sheikhahmadi A (2022) Proposing improved meta-heuristic algorithms for clustering and separating users in the recommender systems. Electron Commer Res 22(2):623–648
Zhao R, Wang Y, Xiao G, Liu C, Hu P, Li H (2021) A selfish herd optimization algorithm based on the simplex method for clustering analysis. J Supercomput 77(8):8840–8910
Turkoglu B, Uymaz SA, Kaya E (2022) Clustering analysis through artificial algae algorithm. Int J Mach Learn Cybern 13(4):1179–1196
Mohammadi M, Mobarakeh MI (2022) An integrated clustering algorithm based on firefly algorithm and self-organized neural network. Prog Artif Intell 11(3):207–217
Almotairi KH, Abualigah L (2022) Hybrid reptile search algorithm and remora optimization algorithm for optimization tasks and data clustering. Symmetry 14(3):458
Mohan P, Subramani N, Alotaibi Y, Alghamdi S, Khalaf OI, Ulaganathan S (2022) Improved metaheuristics-based clustering with multihop routing protocol for underwater wireless sensor networks. Sensors 22(4):1618
Taib H, Bahreininejad A (2021) Data clustering using hybrid water cycle algorithm and a local pattern search method. Adv Eng Softw 153:102961
Moghadam P, Ahmadi A (2023) A novel two-stage bio-inspired method using red deer algorithm for data clustering. Evolut Intell. https://doi.org/10.1007/s12065-023-00864-w
Hashemi SE, Gholian-Jouybari F, Hajiaghaei-Keshteli M (2023) A fuzzy C-means algorithm for optimizing data clustering. Expert Syst Appl 227:120377
Author information
Authors and Affiliations
Contributions
RCS, TK design the research. RCS, TK, PT analyze the algorithm and implement it. RCS, JP, SS handle the data collection part and pre-processing part. RCS, TK, JP and SS did the result analysis part. RCS, TK, PT wrote the article and managed it.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Sahoo, R.C., Kumar, T., Tanwar, P. et al. An efficient meta-heuristic algorithm based on water flow optimizer for data clustering. J Supercomput 80, 10301–10326 (2024). https://doi.org/10.1007/s11227-023-05822-y
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-023-05822-y