A deep learning object detection method to improve cluster analysis of two-dimensional data

Raphaël Couturier ORCID: orcid.org/0000-0003-1490-9592¹,
Pablo Gregori²,
Hassan Noura¹,
Ola Salman³ &
…
Abderrahmane Sider⁴

159 Accesses
Explore all metrics

Abstract

Clustering is an unsupervised machine learning method grouping data samples into clusters of similar objects, used as a system support tool in numerous applications such as banking customers profiling, document retrieval, image segmentation, and e-commerce recommendation engines. The effectiveness of several clustering techniques is sensible to the initialization parameters, and different solutions have been proposed in the literature to overcome this limitation. They require high computational memory consumption when dealing with big data. In this paper, we propose the application of a recent object detection Deep Learning model (YOLO-v5) for assisting the initialization of classical techniques and improving their effectiveness on two-variate datasets, leveraging the accuracy and reducing dramatically the memory and time consumption of classical clustering methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Selective Pseudo-Label Clustering

Deep K-Means: A Simple and Effective Method for Data Clustering

An Approach Towards Learning K-Means-Friendly Deep Latent Representation

Data availability

The code to generate data is available here: https://github.com/rcouturier/data4clustering

References

Xu D, Yingjie T (2015) A comprehensive survey of clustering algorithms. Annals of Data Science 2(2):165–193
Article Google Scholar
Guyeux C, Chrétien S, Bou Tayeh G, Demerjian J, Bahi J (2019) Introducing and comparing recent clustering methods for massive data management in the internet of things. J Sensor Actuator Netw, 8(4):56 (25)
Yang M-S, Lai C-Y, Lin C-Y (2012) A robust em clustering algorithm for gaussian mixture models. Pattern Recogn 45(11):3950–3961
Article Google Scholar
Sinaga KP, Yang M-S (2020) Unsupervised k-means clustering algorithm. IEEE Access, 8:80716–80727
Nyo MT, Mebarek-Oudina F, Hlaing SS, Khan NA (2022) Otsu’s thresholding technique for mri image brain tumor segmentation. Multimed Tools Appl 81(30):43837–43849
Fahad A, Alshatri N, Tari Z, Alamri A, Khalil I, Zomaya AY, Foufou S, Bouras A (2014) A survey of clustering algorithms for big data: Taxonomy and empirical analysis. IEEE Trans Emerg Top Comput 2(3):267–279
Article Google Scholar
Barioni MCN, Razente H, Marcelino AMR, Traina AJM, Traina CJ (2014) Open issues for partitioning clustering methods: an overview. Wiley Interdisc Rev Data Mining Know Discov 4(3):161–177
Article Google Scholar
Alhawarat M, Hegazi M (2018) Revisiting k-means and topic modeling, a comparison study to cluster arabic documents. IEEE Access 6:42740–42749
Article Google Scholar
Yinfeng M, Jiye L, Fuyuan C, Yijun H (2018) A new distance with derivative information for functional k-means clustering algorithm. Inf Sci 463:166–185
Google Scholar
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the em algorithm. J Roy Stat Soc B 39(1):1–38
Article MathSciNet Google Scholar
MacQueen JB (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of 5-th berkeley symposium on mathematical statistics and probability, Berkeley, University of California Press, pp 281–297
Stuart L (1982) Least squares quantization in pcm. IEEE Trans Inf Theory 28(2):129–137
Article MathSciNet Google Scholar
Arthur D, Vassilvitskii S (2006) k-means++: the advantages of careful seeding. Techn Rep 2006–13 Stanford InfoLab
Pelleg D, Moore A (2000) X-means: Extending k-means with efficient estimation of the number of clusters. In: Proceedings of the 17th international conference on machine learning, Citeseer, pp 727
Dunn JC (1973) A fuzzy relative of the isodata process and its use in detecting compact well-separated clusters
James C (1981) Bezdek. Pattern Recognition with Fuzzy Objective Function Algorithms. Springer, US
Google Scholar
Yeung KY, Ruzzo WL (2001) Details of the adjusted rand index and clustering algorithms, supplement to the paper an empirical study on principal component analysis for clustering gene expression data. Bioinformatics, 17(9):763–774
Kass RE Raftery AE (1995) Bayes factors. J Am Stati Assoc 90(430):773–795
Dziopa T (2016) Clustering validity indices evaluation with regard to semantic homogeneity. In: FedCSIS (Position Papers), pp 3–9
Hamparsum B (1987) Model selection and akaike’s information criterion (aic): the general theory and its analytical extensions. Psychometrika 52(3):345–370
Davies DL, Bouldin DW (1979) A cluster separation measure. IEEE Trans Pattern Anal Mach Intell (2):224–227
Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65
Caliński T, Harabasz J (1974) A dendrite method for cluster analysis. Communications in Statistics-theory and Methods 3(1):1–27
Tibshirani R, Walther G, Hastie T (2001) Estimating the number of clusters in a data set via the gap statistic. J Royal Stat Soc Ser B (Statistical Methodology) 63(2):411–423
Rendón E, Abundez I, Arizmendi A, Quiroz EM (2011) Internal versus external cluster validation indexes. Int J Comput Commun 5(1):27–34
Yolov5 in pytorch, 06:2020. https://github.com/ultralytics/yolov5
Imambi S, Prakash KB, Kanagachidambaresan GR (2021) Pytorch. In: Programming with TensorFlow, Springer, pp 87–104
Ketkar N, Moolayil J (2021) Introduction to pytorch. In: Deep learning with python, Springer, pp 27–91
Likas A, Vlassis N, Verbeek JJ (2003) The global k-means clustering algorithm. Pattern Recogn 36(2):451–461
Pelleg D, Moore AW et al (2000) X-means: extending k-means with efficient estimation of the number of clusters. In: Icml, vol 1, pp 727–734
Maji P, Pal SK (2007) Rfcm: a hybrid clustering algorithm using rough and fuzzy sets. Fundamenta Informaticae 80(4):475–496

Download references

Acknowledgements

This work was partially funded by project ANER 2022 AGRO-IA-LIMENTAIRE and the EIPHI Graduate School (contract ANR-17-EURE-0002). The Mesocentre of Franche-Comté provided the computing facilities. This work was also partially sponsored by the General Directorate for Scientific Research and Technological Development, Ministry of Higher Education and Scientific Research (DGRSDT), Algeria.

Author information

Authors and Affiliations

FEMTO-ST Institute, CNRS, University of Franche-Comte (UFC), Besançon, France
Raphaël Couturier & Hassan Noura
Instituto de Matemáticas y Aplicaciones de Castellón, Departamento de Matemáticas, Universitat Jaume I de Castellón, E-12071, Castellón, Spain
Pablo Gregori
Electrical and Computer Engineering Department, American University of Beirut, Beirut, Lebanon
Ola Salman
LIMED Laboratory, Faculty of Exact Sciences, University of Bejaia, 06000, Bejaia, Algeria
Abderrahmane Sider

Authors

Raphaël Couturier
View author publications
You can also search for this author in PubMed Google Scholar
Pablo Gregori
View author publications
You can also search for this author in PubMed Google Scholar
Hassan Noura
View author publications
You can also search for this author in PubMed Google Scholar
Ola Salman
View author publications
You can also search for this author in PubMed Google Scholar
Abderrahmane Sider
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Raphaël Couturier.

Ethics declarations

Conflict of Interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Couturier, R., Gregori, P., Noura, H. et al. A deep learning object detection method to improve cluster analysis of two-dimensional data. Multimed Tools Appl 83, 71171–71187 (2024). https://doi.org/10.1007/s11042-024-18148-5

Download citation

Received: 17 January 2023
Revised: 31 December 2023
Accepted: 03 January 2024
Published: 07 February 2024
Issue Date: August 2024
DOI: https://doi.org/10.1007/s11042-024-18148-5

A deep learning object detection method to improve cluster analysis of two-dimensional data

Abstract

Access this article

Subscribe and save

Buy Now