Improved PTAS for the constrained k-means problem

Qilong Feng¹,
Jiaxin Hu¹,
Neng Huang¹ &
…
Jianxin Wang¹

652 Accesses
Explore all metrics

Abstract

The k-means problem has been paid lots of attention in many fields, and each cluster of the k-means problem always satisfies locality property. In this paper, we study the constrained k-means problem, where the clusters do not satisfy locality property and can be an arbitrary partition of the set of points. Ding and Xu presented a unified framework with running time \(O(2^{poly (k/\epsilon )} (\log n)^{k+1} nd)\) by applying uniform sampling and simplex lemma techniques such that a collection of size \(O(2^{poly (k/\epsilon )} (\log n)^{k+1})\) of candidate sets containing approximate centers is obtained. Then, the collection is enumerated to get the one that can induce a \((1+\epsilon )\)-approximation solution for the constrained k-means problem. By applying \(D^2\)-sampling technique, Bhattacharya, Jaiswal, and Kumar presented an algorithm with running time \(O(2^{{\tilde{O}}(k/\epsilon )}nd)\), which is bounded by \(O(2^k( \frac{2123ek}{\epsilon ^3})^{64k/\epsilon }knd)\). The algorithm outputs a collection of size \(O(2^k( \frac{2123ek}{\epsilon ^3})^{64k/\epsilon })\) of candidate sets containing approximate centers. In this paper, we present an algorithm with running time \(O((\frac{1891ek}{\epsilon ^2})^{8k/\epsilon }nd)\) such that a collection of size \(O((\frac{1891ek}{\epsilon ^2})^{8k/\epsilon }n)\) of candidate sets containing approximate centers can be obtained.

This is a preview of subscription content, log in via an institution to check access.

Access this article

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Fig. 2

Faster Algorithms for the Constrained k-means Problem

Article 06 November 2017

Exact Linear-Time Algorithm for Parameterized K-Means Problem with Optimized Number of Clusters in the 1D Case

A Unified Framework for Clustering Constrained Data Without Locality Property

Article 20 August 2019

References

Aggarwal G, Panigrahy R, Feder T, Thomas D, Kenthapadi K, Khuller S, Zhu A (2010) Achieving anonymity via clustering. ACM Trans Algorithms 6(3):49:1–49:19
Article MathSciNet MATH Google Scholar
Ahmadian S, Norouzi-Fard A, Svensson O, Ward J (2017) Better guarantees for k-means and Euclidean k-median by primal-dual algorithms. In: Proceedings of 58th annual symposium on foundations of computer science, FOCS, California, USA, pp 61–72
Aloise D, Deshpande A, Hansen P, Popat P (2009) NP-hardness of euclidean sum-of-squares clustering. Mach Learn 75(2):245–248
Article MATH Google Scholar
Arthur D, Vassilvitskii S (2007) k-means++: the advantages of careful seeding. In: Proceedings of 18th annual ACM-SIAM symposium on discrete algorithms, SODA, Louisiana, USA, pp 1027–1035
Awasthi P, Charikar M, Krishnaswamy R, Sinop AK (2015) The hardness of approximation of Euclidean k-means. In: Proceedings of 31st annual international symposium on computational geometry, SoCG, Eindhoven, The Netherlands, pp 754–767
Badoiu M, Har-Peled S, Indyk P (2002) Approximate clustering via core-sets. In: Proceedings of 34th annual ACM symposium on theory of computing, STOC, Québec, Canada, pp 250–257
Betzler N, Guo J, Niedermeier R (2010) Parameterized computational complexity of dodgson and young elections. Inf Comput 208(2):165–177
Article MathSciNet MATH Google Scholar
Bhattacharya A, Jaiswal R, Kumar A (2016) Faster algorithms for the constrained k-means problem. In: Proceedings of 33rd annual symposium on theoretical aspects of computer science, STACS, Orléans, France, pp 16:1–16:13
Bhattacharya A, Jaiswal R, Kumar A (2018) Faster algorithms for the constrained k-means problem. Theory Comput Syst 62(1):93–115
Article MathSciNet MATH Google Scholar
Cabello S, Giannopoulos P, Knauer C, Marx D, Rote G (2011) Geometric clustering: fixed-parameter tractability and lower bounds with respect to the dimension. ACM Trans Algorithms 7(4):43:1–43:27
Cohen-Addad V (2018) A fast approximation scheme for low-dimensional k-means. In: Proceedings 29th annual ACM-SIAM symposium on discrete algorithms, SODA, Louisiana, USA, pp 430–440
Cohen-Addad V, Klein PN, Mathieu C (2016) Local search yields approximation schemes for k-means and k-median in euclidean and minor-free metrics. In: Proceedings of 57th annual symposium on foundations of computer science, FOCS, New Jersey, USA, pp 353–364
Cygan M, Hajiaghayi M, Khuller S (2012) LP rounding for k-centers with non-uniform hard capacities. In: Proceedings of 53rd annual symposium on foundations of computer science, FOCS, New Jersey, USA, pp 273–282
Cygan M, Fomin FV, Kowalik L, Lokshtanov D, Marx D, Pilipczuk M, Pilipczuk M, Saurabh S (2015) Parameterized algorithms. Springer
De la Vega WF, Karpinski M, Kenyon C, Rabani Y (2003) Approximation schemes for clustering problems. In: Proceedings of 35th annual ACM symposium on theory of computing, STOC, California, USA, pp 50–58
Ding H, Xu J (2011) Solving the chromatic cone clustering problem via minimum spanning sphere. In: Proceedings of 38th annual international colloquium on automata, languages and programming, ICALP, Zurich, Switzerland, pp 773–784
Ding H, Xu J (2015) A unified framework for clustering constrained data without locality property. In: Proceedings of 26th annual ACM-SIAM symposium on discrete algorithms, SODA, California, USA, pp 1471–1490
Downey RG, Fellows MR (1999) Parameterized complexity. Monographs in computer science, Springer
Book MATH Google Scholar
Ene A, Har-Peled S, Raichel B (2013) Fast clustering with lower bounds: no customer too far, no shop too small. CoRR arXiv:1304.7318
Fayyad UM, Piatetsky-Shapiro G, Smyth P, Uthurusamy R (eds) (1996) Advances in knowledge discovery and data mining. AAAI Press, Quebec
Google Scholar
Feldman D, Monemizadeh M, Sohler C (2007) A PTAS for k-means clustering based on weak coresets. In: Proceedings of 23rd annual ACM symposium on computational geometry, SoCG, Gyeongju, South Korea, pp 11–18
Feldman D, Schmidt M, Sohler C (2013) Turning big data into tiny data: constant-size coresets for k-means, PCA and projective clustering. In: Proceedings of 24th annual ACM-SIAM symposium on discrete algorithms, SODA, Louisiana, USA, pp 1434–1453
Feldmann AE (2015) Fixed parameter approximations for \(k\)-center problems in low highway dimension graphs. In: Proceedings of 42th annual international colloquium on automata, languages and programming, ICALP, Kyoto, Japan, pp 588–600
Feng Q, Huang N, Jiang X, Wang J (2018a) Dealing with several parameterized problems by random methods. Theor Comput Sci 734:94–104
Article MathSciNet MATH Google Scholar
Feng Q, Li S, Zhou Z, Wang J (2018b) Parameterized algorithms for edge biclique and related problems. Theor Comput Sci 734:105–118
Article MathSciNet MATH Google Scholar
Friggstad Z, Rezapour M, Salavatipour MR (2016) Local search yields a PTAS for k-means in doubling metrics. In: Proceedings of 57th annual symposium on foundations of computer science, FOCS, New Jersey, USA, pp 365–374
Gao J, Ping Q, Wang J (2018) Resisting re-identification mining on social graph data. World Wide Web. https://doi.org/10.1007/s11280-017-0524-3
Guo L, Shen H, Zhu W (2017) Efficient approximation algorithms for multi-antennae largest weight data retrieval. IEEE Trans Mob Comput 16(12):3320–3333
Article Google Scholar
Hajiaghayi MT, Hu W, Li J, Li S, Saha B (2016) A constant factor approximation algorithm for fault-tolerant k-median. ACM Trans Algorithms 12(3):36:1–36:19
Article MathSciNet MATH Google Scholar
Har-Peled S, Raichel B (2015) Net and prune: a linear time algorithm for Euclidean distance problems. J ACM 62(6):44:1–44:35
Article MathSciNet MATH Google Scholar
Hoeffding W (1963) Probability inequalities for sums of bounded random variables. J Am Stat Assoc 58(301):13–30
Article MathSciNet MATH Google Scholar
Inaba M, Katoh N, Imai H (1994) Applications of weighted Voronoi diagrams and randomization to variance-based k-clustering (extended abstract). In: Proceedings of 10th annual symposium on computational geometry, SoCG, New York, USA, pp 332–339
Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 31(3):264–323
Article Google Scholar
Jain AK, Duin RPW, Mao J (2000) Statistical pattern recognition: a review. IEEE Trans Pattern Anal Mach Intell 22(1):4–37
Article Google Scholar
Jaiswal R, Kumar M, Yadav P (2015) Improved analysis of D\({}^{2}\)-sampling based PTAS for k-means and other clustering problems. Inf Process Lett 115(2):100–103
Article MathSciNet MATH Google Scholar
Kanungo T, Mount DM, Netanyahu NS, Piatko CD, Silverman R, Wu AY (2004) A local search approximation algorithm for k-means clustering. Comput Geom 28(2–3):89–112
Article MathSciNet MATH Google Scholar
Khuller S, Sussmann YJ (2000) The capacitated K-center problem. SIAM J Discrete Math 13(3):403–418
Article MathSciNet MATH Google Scholar
Khuller S, Pless R, Sussmann YJ (2000) Fault tolerant K-center problems. Theor Comput Sci 242(1–2):237–245
Article MathSciNet MATH Google Scholar
Kumar A, Sabharwal Y, Sen S (2010) Linear-time approximation schemes for clustering problems in any dimensions. J ACM 57(2):5:1–5:32
Article MathSciNet MATH Google Scholar
Kumar N, Raichel B (2013) Fault tolerant clustering revisited. In: Proceedings of 25th annual Canadian conference on computational geometry, CCCG, Ontario, Canada
Li J, Yi K, Zhang Q (2010) Clustering with diversity. In: Proceedings of 37th annual international colloquium on automata, languages and programming, ICALP, Bordeaux, France, pp 188–200
Li W, Feng Q, Chen J, Hu S (2017) Improved kernel results for some FPT problems based on simple observations. Theor Comput Sci 657:20–27
Lin M, Feng Q, Chen J, Li W (2017) Partition on trees with supply and demand: Kernelization and algorithms. Theor Comput Sci 657:11–19
Article MathSciNet MATH Google Scholar
Lin M, Feng Q, Wang J, Chen J, Fu B, Li W (2018) An improved FPT algorithm for almost forest deletion problem. Inf Process Lett 136:30–36
Article MathSciNet MATH Google Scholar
Lloyd SP (1982) Least squares quantization in PCM. IEEE Trans Inf Theory 28(2):129–136
Article MathSciNet MATH Google Scholar
Mahajan M, Nimbhorkar P, Varadarajan KR (2012) The planar k-means problem is NP-hard. Theor Comput Sci 442:13–21
Article MathSciNet MATH Google Scholar
Ostrovsky R, Rabani Y, Schulman LJ, Swamy C (2012) The effectiveness of lloyd-type methods for the k-means problem. J ACM 59(6):28:1–28:22
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

School of Information Science and Engineering, Central South University, Changsha, 410083, Hunan, People’s Republic of China
Qilong Feng, Jiaxin Hu, Neng Huang & Jianxin Wang

Authors

Qilong Feng
View author publications
You can also search for this author in PubMed Google Scholar
Jiaxin Hu
View author publications
You can also search for this author in PubMed Google Scholar
Neng Huang
View author publications
You can also search for this author in PubMed Google Scholar
Jianxin Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jianxin Wang.

Additional information

This work is supported by the National Natural Science Foundation of China under Grants (61472449, 61420106009, 61672536, 71631008).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Feng, Q., Hu, J., Huang, N. et al. Improved PTAS for the constrained k-means problem. J Comb Optim 37, 1091–1110 (2019). https://doi.org/10.1007/s10878-018-0340-4

Download citation

Published: 23 August 2018
Issue Date: 01 May 2019
DOI: https://doi.org/10.1007/s10878-018-0340-4

Abstract

Access this article

Similar content being viewed by others

Faster Algorithms for the Constrained k-means Problem

Exact Linear-Time Algorithm for Parameterized K-Means Problem with Optimized Number of Clusters in the 1D Case

A Unified Framework for Clustering Constrained Data Without Locality Property

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Improved PTAS for the constrained k-means problem

Abstract

Access this article

Similar content being viewed by others

Faster Algorithms for the Constrained k-means Problem

Exact Linear-Time Algorithm for Parameterized K-Means Problem with Optimized Number of Clusters in the 1D Case

A Unified Framework for Clustering Constrained Data Without Locality Property

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation