Abstract
In this paper, we consider the Bregman k-means problem (BKM) which is a variant of the classical k-means problem. For an n-point set \({\mathcal {S}}\) and \(k \le n\) with respect to \(\mu \)-similar Bregman divergence, the BKM problem aims first to find a center subset \(C \subseteq {\mathcal {S}}\) with \( \mid C \mid = k\) and then separate \({\mathcal {S}}\) into k clusters according to C, such that the sum of \(\mu \)-similar Bregman divergence from each point in \({\mathcal {S}}\) to its nearest center is minimized. We propose a \(\mu \)-similar BregMeans++ algorithm by employing the local search scheme, and prove that the algorithm deserves a constant approximation guarantee. Moreover, we extend our algorithm to solve a variant of BKM called noisy \(\mu \)-similar Bregman k-means++ (noisy \(\mu \)-BKM++) which is BKM in the noisy scenario. For the same instance and purpose as BKM, we consider the case of sampling a point with an imprecise probability by a factor between \(1-\varepsilon _1\) and \(1+ \varepsilon _2\) for \(\varepsilon _1 \in [0,1)\) and \(\varepsilon _2 \ge 0\), and obtain an approximation ratio of \(O(\log ^2 k)\) in expectation.
Similar content being viewed by others
References
Ackermann M Blömer J (2009) Coresets and approximate clustering for Bregman divergences. In: Proceedings of SODA, pp 1088–1097
Ackermann M, Blömer J (2010) Bregman clustering for separable instances. In: Proceedings of SWAT, pp 212–223
Ackermann M, Blömer J, Sohler C (2010) Clustering for metric and non-metric distance measures. ACM Trans Algorithms 6(4):1–59
Arthur D, Vassilvitskii S (2007) \(k\)-means++: the advantages of careful seeding. In: Proceedings of SODA, pp 1027–1035
Banerjee A, Guo X, Wang H (2005) On the optimality of conditional expectation as a Bregman predictor. IEEE Trans Inf Theory 51(7):2664–2669
Banerjee A, Merugu S, Dhillon I, Ghosh J (2005) Clustering with Bregman divergences. J Mach Learn Res 6:1705–1749
Belhassena A, Wang H (2019) Trajectory big data processing based on frequent activity. Tsinghua Sci Technol 24(3):317–332
Bhattacharya A, Eube J, Heiko Röglin H, Schmidt MN (2020) Greedy and Not So Greedy \(k\)-means++. In: Proceedings of ESA, pp 18:1-18:21
Bregman L (1967) The relaxation method of finding the common points of convex sets and its application to the solution of problems in convex programming. USSR Comput Math Math Phys 7:200–217
Censor Y, Lent A (1981) An iterative rowaction method for interval convex programming. J Optim Theory Appl 34(3):321–353
Choo D, Grunau C, Portmann J, Rozhon V (2020) \(k\)-means++: few more steps yield constant approximation In: Proceedings of the 37th International Conference on Machine Learning (ICML), pp 1909–1917
Censor Y, Zenios S (1997) Parallel optimization: theory, algorithms, and applications. Oxford University Press, Oxford
Feldman D, Schmidt M, Sohler C (2020) Turning big data into tiny data: constant-size coresets for \(k\)-means, PCA and projective clustering. SIAM J Comput 49(3):601–657
Jain A, Dubes R (1988) Algorithms for clustering data. Prentice Hall, New Jersey
Jain A, Murty M, Flynn P (1999) Data clustering: a review. ACM Comput Surveys 31:264–323
Kanungo T, Mount D, Netanyahu N, Piatko C, Silverman R, Wu A (2002) A local search approximation algorithm for \(k\)-means clustering. In: Proceedings of SoCG, pp 10–18
Kumar A, Sabharwal Y, Sen S. A simple linear time \((1+\varepsilon )\)-approximation algorithm for \(k\)-means clustering in any dimensions. In: Proceedings of FOCS, pp 454–462
Lattanzi S, Sohler C (2019) A better \(k\)-means++ algorithm via local search. In: Proceedings of ICML, pp 3662–3671
Lloyd S (1982) Least squares quantization in PCM. IEEE Trans Inf Theory 28(2):129–137
Wang N, Guo G, Wang B, Wang C (2020) Traffic clustering algorithm of urban data brain based on a hybrid-augmented architecture of quantum annealing and brain-inspired cognitive computing. Tsinghua Sci Technol 25(6):813–825
Acknowledgements
The first two authors are supported by National Natural Science Foundation of China (No. 11871081) and Beijing Natural Science Foundation Project No. Z200002. The third author is supported by National Natural Science Foundation of China (No. 61772005) and Outstanding Youth Innovation Team Project for Universities of Shandong Province (No. 2020KJN008). The fourth author is supported by National Natural Science Foundation of China (No.11701150).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
A preliminary version of this paper appeared in Proceedings of the 26th International Computing and Combinatorics Conference, pp. 532–541, 2020.
Rights and permissions
About this article
Cite this article
Tian, X., Xu, D., Guo, L. et al. Improved local search algorithms for Bregman k-means and its variants. J Comb Optim 44, 2533–2550 (2022). https://doi.org/10.1007/s10878-021-00771-9
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10878-021-00771-9