[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content
Log in

An efficient meta-heuristic algorithm based on water flow optimizer for data clustering

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Clustering is a popular data analysis technique that can explore the structure of data through cluster analysis. Similar data are put into the same cluster, while dissimilar data allocate to other clusters. The similarity/dissimilarity among data objects is determined using a distance function. Further, clustering algorithms aim to choose the optimal set of centroids for obtaining better partitioning, but clustering accuracy is always susceptible. This issue of clustering is addressed through meta-heuristic algorithms. This research also aims to handle the accuracy issue and presents a new algorithm for effective cluster analysis. The proposed clustering algorithm is inspired by a water flow optimizer (WFO). The WFO algorithm performance is validated on the well-defined clustering problems based on SSE, accuracy (AR) and detection rate (DR) parameters. The results indicate that the WFO algorithm gets higher clustering results in terms of SSE, AR and DR than the same class of algorithms. The performance is also validated using Friedman statistical test followed by a post hoc test. Results indicated that the proposed WFO gets better statistical results than other clustering algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Algorithm 1
Fig. 2

Similar content being viewed by others

Abbreviations

AAA:

Artificial algae algorithm

ABC:

Artificial bee colony

ACO:

Ant colony optimization

AR:

Accuracy rate

BB-BC:

Big bang–big crunch

BH:

Black hole

CS:

Cuckoo search

CSS:

Charge system search

DR:

Detection rate

ECA:

Evolutionary center algorithm

EFO:

Electromagnetic field optimization

FA:

Firefly algorithm

FA-SOM:

Firefly algorithm-self organizing maps

FCM:

Fuzzy C-means

FKM:

Fuzzy k-modes

FPCOM:

Fuzzy C-means and Fuzzy C-ordered means

GA:

Genetic algorithm

GA-PFKM:

Genetic algorithm-possibilistic Fuzzy k-modes

GWO:

Grey wolf optimization

IGWO:

Improved grey wolf optimization

KM:

K-means

MOA:

Magnetic optimization algorithm

MoCS:

Modified cuckoo search

PSO:

Particle swarm optimization

PSO-FCM:

Particle swarm optimization-Fuzzy C-means

PSO-PFKM:

Particle swarm optimization-possibilistic Fuzzy k-modes

ROA:

Remora optimization algorithm

RSA:

Reptile search algorithm

SCA:

Sine cosine algorithm

SCA-PFKM:

Sine cosine algorithm-possibilistic Fuzzy k-modes

SMSHO:

Selfish herd optimization algorithm and simplex method

SOM:

Self-organizing maps

SSE:

Sum of squared error

UCI:

University of California Irvine

VNS:

Variable neighbourhood strategy

WFO:

Water flow optimizer

WWO:

Water wave optimization

References

  1. Allahyari M, Pouriyeh S, Assefi M, Safaei S, Trippe ED, Gutierrez JB, Kochut K (2017) A brief survey of text mining: classification, clustering and extraction techniques. arXiv preprint.

  2. Rehm F, Klawonn F, Kruse R (2007) A novel approach to noise clustering for outlier detection. Soft Comput 11(5):489–494

    Google Scholar 

  3. Baraldi A, Blonda P (1999) A survey of fuzzy clustering algorithms for pattern recognition. I. IEEE Trans Syst Man Cybern Part B (Cybern) 29(6):778–785

    Google Scholar 

  4. Orhan U, Hekim M, Ozer M (2011) EEG signals classification using the K-means clustering and a multilayer perceptron neural network model. Expert Syst Appl 38(10):13475–13481

    Google Scholar 

  5. Kanwal S, Asghar S (2021) Speech emotion recognition using clustering based GA-optimized feature set. IEEE Access 9:125830–125842

    Google Scholar 

  6. Djenouri Y, Belhadi A, Belkebir R (2018) Bees swarm optimization guided by data mining techniques for document information retrieval. Expert Syst Appl 94:126–136

    Google Scholar 

  7. Steinbach M, Karypis G, Kumar V (2000) A comparison of document clustering techniques. Technical report, Department of computer science and engineering, University of Minnesota

  8. Garces E, Munoz A, Lopez-Moreno J, Gutierrez D (2012) Intrinsic images by clustering. Computer graphics forum. Blackwell Publishing Ltd, Oxford, pp 1415–1424

    Google Scholar 

  9. Kaur A, Kumar Y (2022) A new metaheuristic algorithm based on water wave optimization for data clustering. Evol Intel 15(1):759–783

    Google Scholar 

  10. Kumar Y, Kaur A (2022) Variants of bat algorithm for solving partitional clustering problems. Eng Comput 38(3):1973–1999

    Google Scholar 

  11. Kaur A, Kumar Y (2022) Neighborhood search based improved bat algorithm for data clustering. Appl Intell 52(9):10541–10575

    Google Scholar 

  12. Kaur A, Kumar Y (2022) A multi-objective vibrating particle system algorithm for data clustering. Pattern Anal Appl 25(1):209–239

    Google Scholar 

  13. Saxena A, Prasad M, Gupta A, Bharill N, Patel OP, Tiwari A, Er MJ, Ding W, Lin CT (2017) A review of clustering techniques and developments. Neurocomputing 267:664–681

    Google Scholar 

  14. Özbakır L, Turna F (2017) Clustering performance comparison of new generation meta-heuristic algorithms. Knowl-Based Syst 130:1–16

    Google Scholar 

  15. Han J, Pei J, Tong H (2022) Data mining: concepts and techniques, Second edn. (Book), Morgan kaufmann, ISBN 10:1-55860-901-6

  16. Xu R, Wunsch DC II (2011) BARTMAP: a viable structure for biclustering. Neural Netw 24(7):709–716

    Google Scholar 

  17. Jiang B, Pei J, Tao Y, Lin X (2011) Clustering uncertain data based on probability distribution similarity. IEEE Trans Knowl Data Eng 25(4):751–763

    Google Scholar 

  18. Mukhopadhyay A, Maulik U, Bandyopadhyay S (2015) A survey of multiobjective evolutionary clustering. ACM Comput Surv (CSUR) 47(4):1–46

    Google Scholar 

  19. Sevillano X, Alías F (2014) A one-shot domain-independent robust multimedia clustering methodology based on hybrid multimodal fusion. Multimed Tools Appl 73(3):1507–1543

    Google Scholar 

  20. Jain AK (2010) Data clustering: 50 years beyond K-means. Pattern Recogn Lett 31(8):651–666

    Google Scholar 

  21. Liang Z, Chen P (2021) An automatic clustering algorithm based on the density-peak framework and Chameleon method. Pattern Recogn Lett 150:40–48

    Google Scholar 

  22. Hao Y, Gwa B, Jga B et al (2020) Self-paced learning for K -means clustering algorithm. Pattern Recogn Lett 132:69–75

    Google Scholar 

  23. Singh H, Kumar Y (2019) Cellular automata based model for e-healthcare data analysis. Int J Inf Syst Model Design (IJISMD) 10(3):1–18

    Google Scholar 

  24. Kumar Y, Sahoo G (2018) Hybridization of magnetic charge system search method for efficient data clustering. Malays J Comput Sci 31(2):108–129

    MathSciNet  Google Scholar 

  25. Kumar Y, Gupta S, Kumar D, Sahoo G (2016) A clustering approach based on charged particles. Optimization Algorithms-Methods and Applications, InTech. https://doi.org/10.5772/61426

  26. Karaboga D, Ozturk C (2011) A novel clustering approach: artificial Bee Colony (ABC) algorithm. Appl Soft Comput 11(1):652–657

    Google Scholar 

  27. Alam S, Dobbie G, Koh YS, Riddle P, Rehman SU (2014) Research on particle swarm optimization based clustering: a systematic review of literature and techniques. Swarm Evol Comput 17:1–13

    Google Scholar 

  28. Luo K (2021) Water flow optimizer: a nature-inspired evolutionary algorithm for global optimization. IEEE Trans Cybern 52(8):7753–7764

    Google Scholar 

  29. Matos Macêdo FJ, da Rocha Neto AR (2022) A binary water flow optimizer applied to feature selection. International conference on intelligent data engineering and automated learning. Springer, Cham, pp 94–103

    Google Scholar 

  30. Verma H, Verma D, Tiwari PK (2021) A population based hybrid FCM-PSO algorithm for clustering analysis and segmentation of brain image. Expert Syst Appl 167:114121

    Google Scholar 

  31. Al-Behadili HNK (2022) Improved firefly algorithm with variable neighborhood search for data clustering. Baghdad Sci J 19(2):0409–0409

    Google Scholar 

  32. Xia H, Liu L (2022) Basketball big data and visual management system under metaheuristic clustering. Mobile Inf Syst 2022:14

    Google Scholar 

  33. Besharatnia F, Talebpour A, Aliakbary S (2022) An improved grey wolves optimization algorithm for dynamic community detection and data clustering. Appl Artif Intell 36(1):2012000

    Google Scholar 

  34. Singh H, Kumar Y (2022) An enhanced version of cat swarm optimization algorithm for cluster analysis. Int J Appl Metaheur Comput (IJAMC) 13(1):1–25

    MathSciNet  Google Scholar 

  35. Kuo RJ, Zheng YR, Nguyen TPQ (2021) Metaheuristic-based possibilistic fuzzy k-modes algorithms for categorical data clustering. Inf Sci 557:1–15

    MathSciNet  Google Scholar 

  36. Kushwaha N, Pant M, Sharma S (2022) Electromagnetic optimization-based clustering algorithm. Expert Syst 39(7):e12491

    Google Scholar 

  37. Mohanty PP, Nayak SK (2022) A modified cuckoo search algorithm for data clustering. Int J Appl Metaheur Comput (IJAMC) 13(1):1–32

    Google Scholar 

  38. Hashemi SE, Tavana M, Bakhshi M (2022) A new particle swarm optimization algorithm for optimizing big data clustering. SN Comput Sci 3(4):1–16

    Google Scholar 

  39. Kuo RJ, Lin JY, Nguyen TPQ (2021) An application of sine cosine algorithm-based fuzzy possibilistic c-ordered means algorithm to cluster analysis. Soft Comput 25(5):3469–3484

    Google Scholar 

  40. Hassan BA, Rashid TA (2021) A multidisciplinary ensemble algorithm for clustering heterogeneous datasets. Neural Comput Appl 33(17):10987–11010

    Google Scholar 

  41. Zhu Q, Tang X, Elahi A (2022) Automatic clustering based on dynamic parameters harmony search optimization algorithm. Pattern Anal Appl 25(4):693–709

    Google Scholar 

  42. Duan Y, Liu C, Li S, Guo X, Yang C (2022) Gradient-based elephant herding optimization for cluster analysis. Appl Intell 52(10):11606–11637

    Google Scholar 

  43. Rashidi R, Khamforoosh K, Sheikhahmadi A (2022) Proposing improved meta-heuristic algorithms for clustering and separating users in the recommender systems. Electron Commer Res 22(2):623–648

    Google Scholar 

  44. Zhao R, Wang Y, Xiao G, Liu C, Hu P, Li H (2021) A selfish herd optimization algorithm based on the simplex method for clustering analysis. J Supercomput 77(8):8840–8910

    Google Scholar 

  45. Turkoglu B, Uymaz SA, Kaya E (2022) Clustering analysis through artificial algae algorithm. Int J Mach Learn Cybern 13(4):1179–1196

    Google Scholar 

  46. Mohammadi M, Mobarakeh MI (2022) An integrated clustering algorithm based on firefly algorithm and self-organized neural network. Prog Artif Intell 11(3):207–217

    Google Scholar 

  47. Almotairi KH, Abualigah L (2022) Hybrid reptile search algorithm and remora optimization algorithm for optimization tasks and data clustering. Symmetry 14(3):458

    Google Scholar 

  48. Mohan P, Subramani N, Alotaibi Y, Alghamdi S, Khalaf OI, Ulaganathan S (2022) Improved metaheuristics-based clustering with multihop routing protocol for underwater wireless sensor networks. Sensors 22(4):1618

    Google Scholar 

  49. Taib H, Bahreininejad A (2021) Data clustering using hybrid water cycle algorithm and a local pattern search method. Adv Eng Softw 153:102961

    Google Scholar 

  50. Moghadam P, Ahmadi A (2023) A novel two-stage bio-inspired method using red deer algorithm for data clustering. Evolut Intell. https://doi.org/10.1007/s12065-023-00864-w

    Article  Google Scholar 

  51. Hashemi SE, Gholian-Jouybari F, Hajiaghaei-Keshteli M (2023) A fuzzy C-means algorithm for optimizing data clustering. Expert Syst Appl 227:120377

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Contributions

RCS, TK design the research. RCS, TK, PT analyze the algorithm and implement it. RCS, JP, SS handle the data collection part and pre-processing part. RCS, TK, JP and SS did the result analysis part. RCS, TK, PT wrote the article and managed it.

Corresponding author

Correspondence to Ramesh Chandra Sahoo.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sahoo, R.C., Kumar, T., Tanwar, P. et al. An efficient meta-heuristic algorithm based on water flow optimizer for data clustering. J Supercomput 80, 10301–10326 (2024). https://doi.org/10.1007/s11227-023-05822-y

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-023-05822-y

Keywords

Navigation