[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3673038.3673057acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicppConference Proceedingsconference-collections
research-article
Open access

Parallel Iterative Mistake Minimization (IMM) clustering algorithm for shared-memory systems

Published: 12 August 2024 Publication History

Abstract

This paper addresses the problem of deriving explanations in the form of compact decision trees for cluster assignments made by the well-known K-means method. It introduces two versions of the Iterative Mistake Minimization (IMM) algorithm, both parallelized using the OpenMP standard. The first version mirrors the reference implementation by parallelizing the outer loop that iterates over the training set’s features. The second version employs OpenMP nested parallelism to additionally parallelize the process of finding the optimal cut for a given feature. The algorithms were tested on nine synthetic datasets using single 48-core nodes of a compute cluster. The results indicate that the approach utilizing nested parallelism significantly surpasses the other two versions in performance. Depending on the dimension of the feature space, it is 1.3 to 15 times faster than our re-implementation of the reference version. Its parallel efficiency relative to the single-threaded variant ranges from 54% to 78%.

References

[1]
D. Aloise, A. Deshpande, P. Hansen, and P. Popat. 2009. NP-hardness of Euclidean sum-of-squares clustering. Mach. Learn. 75, 2 (2009), 245–248. https://doi.org/10.1007/s10994-009-5103-0
[2]
M. Axtmann, S. Witt, D. Ferizovic, and P. Sanders. 2020. In-place Parallel Super Scalar Samplesort (IPS4o). https://github.com/ips4o/ips4o. GitHub repository.
[3]
M. Axtmann, S. Witt, D. Ferizovic, and P. Sanders. 2022. Engineering in-place (shared-memory) sorting algorithms. ACM Trans. Parallel Comput. 9, 1 (2022), 1–62. https://doi.org/10.1145/3505286
[4]
E. Ayguadé, N. Copty, A. Duran, J. Hoeflinger, Y. Lin, F. Massaioli, X. Teruel, P. Unnikrishnan, and G. Zhang. 2008. The design of OpenMP tasks. IEEE Trans. on Parallel and Distributed Systems 20, 3 (2008), 404–418. https://doi.org/10.1109/TPDS.2008.105
[5]
S. Behnel, R. Bradshaw, C. Citro, L. Dalcin, D.S. Seljebotn, and K. Smith. 2011. Cython: The best of both worlds. Comput. Sci. Eng. 13, 2 (2011), 31–39. https://doi.org/10.1109/MCSE.2010.118
[6]
N. Bell and J. Hoberock. 2012. Thrust: A productivity-oriented library for CUDA. In GPU computing gems Jade edition. Elsevier, 359–371.
[7]
G.E. Blelloch. 1989. Scans as primitive parallel operations. IEEE Trans. Comput. 38, 11 (1989), 1526–1538. https://doi.org/10.1109/12.42122
[8]
D.V. Carvalho, E. M. Pereira, and J. M. Cardoso. 2019. Machine learning interpretability: A survey on methods and metrics. Electronics 8, 8 (2019), 832. https://doi.org/10.3390/electronics8080832
[9]
NVIDIA Corporation. 2024. CUDA C++ Core Libraries. https://github.com/nvidia/cccl. GitHub repository.
[10]
S. Dasgupta, N. Frost, M. Moshkovitz, and C. Rashtchian. 2020. Explainable k-means and k-medians clustering. In International Conference on Machine Learning, Vol. 119. PMLR, 7055–7065. https://proceedings.mlr.press/v119/moshkovitz20a.html
[11]
A.E. Ezugwu, A.M. Ikotun, O.O. Oyelade, L. Abualigah, J.O. Agushaka, C.I. Eke, and A.A. Akinyelu. 2022. A comprehensive survey of clustering algorithms: State-of-the-art machine learning applications, taxonomy, challenges, and future research prospects. Eng. Appl. Artif. Intell. 110 (2022), 104743. https://doi.org/10.1016/j.engappai.2022.104743
[12]
N. Frost. 2020. Expanding Explainable K-Means Clustering. https://github.com/navefr/ExKMC. GitHub repository.
[13]
A. K. Jain. 2010. Data clustering: 50 years beyond K-means. Pattern Recognit. Lett. 31, 8 (2010), 651–666. https://doi.org/10.1016/j.patrec.2009.09.011
[14]
K. Jurczuk, M. Czajkowski, and M. Kretowski. 2017. Evolutionary induction of a decision tree for large-scale data: a GPU-based approach. Soft Comput. 21 (2017), 7363–7379. https://doi.org/10.1007/s00500-016-2280-1
[15]
W. Kwedlo and M. Lubowicz. 2021. Accelerated K-means algorithms for low-dimensional data on parallel shared-memory systems. IEEE Access 9 (2021), 74286–74301. https://doi.org/10.1109/ACCESS.2021.3080821
[16]
E. Laber, L. Murtinho, and F. Oliveira. 2023. Shallow decision trees for explainable k-means clustering. Pattern Recognit. 137 (2023), 109239.
[17]
J. B. MacQueen. 1967. Some methods for classification and analysis of multivariate observations. 281–297.
[18]
W.J. Murdoch, C. Singh, K. Kumbier, R. Abbasi-Asl, and B. Yu. 2019. Definitions, methods, and applications in interpretable machine learning. Proc. Natl. Acad. Sci. U. S. A. 116, 44 (2019), 22071–22080. https://doi.org/10.1073/pnas.1900654116
[19]
J. Newling and F. Fleuret. 2016. Fast k-means with accurate bounds. In Proceedings of the 33rd International Conference on Machine Learning. 936–944. https://proceedings.mlr.press/v48/newling16.html
[20]
OpenMP Architecture Review Board. 2018. OpenMP Application Program Interface Version 5.0. https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5.0.pdf
[21]
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 12 (2011), 2825–2830. http://jmlr.org/papers/v12/pedregosa11a.html

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICPP '24: Proceedings of the 53rd International Conference on Parallel Processing
August 2024
1279 pages
ISBN:9798400717932
DOI:10.1145/3673038
This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 August 2024

Check for updates

Author Tags

  1. K-means
  2. OpenMP
  3. explainable clustering
  4. shared-memory

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • Bialystok University of Technology

Conference

ICPP '24

Acceptance Rates

Overall Acceptance Rate 91 of 313 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 200
    Total Downloads
  • Downloads (Last 12 months)200
  • Downloads (Last 6 weeks)43
Reflects downloads up to 05 Jan 2025

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media