Abstract
The k-means problem has been paid lots of attention in many fields, and each cluster of the k-means problem always satisfies locality property. In this paper, we study the constrained k-means problem, where the clusters do not satisfy locality property and can be an arbitrary partition of the set of points. Ding and Xu presented a unified framework with running time \(O(2^{poly (k/\epsilon )} (\log n)^{k+1} nd)\) by applying uniform sampling and simplex lemma techniques such that a collection of size \(O(2^{poly (k/\epsilon )} (\log n)^{k+1})\) of candidate sets containing approximate centers is obtained. Then, the collection is enumerated to get the one that can induce a \((1+\epsilon )\)-approximation solution for the constrained k-means problem. By applying \(D^2\)-sampling technique, Bhattacharya, Jaiswal, and Kumar presented an algorithm with running time \(O(2^{{\tilde{O}}(k/\epsilon )}nd)\), which is bounded by \(O(2^k( \frac{2123ek}{\epsilon ^3})^{64k/\epsilon }knd)\). The algorithm outputs a collection of size \(O(2^k( \frac{2123ek}{\epsilon ^3})^{64k/\epsilon })\) of candidate sets containing approximate centers. In this paper, we present an algorithm with running time \(O((\frac{1891ek}{\epsilon ^2})^{8k/\epsilon }nd)\) such that a collection of size \(O((\frac{1891ek}{\epsilon ^2})^{8k/\epsilon }n)\) of candidate sets containing approximate centers can be obtained.
Access this article
We’re sorry, something doesn't seem to be working properly.
Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.
Similar content being viewed by others
References
Aggarwal G, Panigrahy R, Feder T, Thomas D, Kenthapadi K, Khuller S, Zhu A (2010) Achieving anonymity via clustering. ACM Trans Algorithms 6(3):49:1–49:19
Ahmadian S, Norouzi-Fard A, Svensson O, Ward J (2017) Better guarantees for k-means and Euclidean k-median by primal-dual algorithms. In: Proceedings of 58th annual symposium on foundations of computer science, FOCS, California, USA, pp 61–72
Aloise D, Deshpande A, Hansen P, Popat P (2009) NP-hardness of euclidean sum-of-squares clustering. Mach Learn 75(2):245–248
Arthur D, Vassilvitskii S (2007) k-means++: the advantages of careful seeding. In: Proceedings of 18th annual ACM-SIAM symposium on discrete algorithms, SODA, Louisiana, USA, pp 1027–1035
Awasthi P, Charikar M, Krishnaswamy R, Sinop AK (2015) The hardness of approximation of Euclidean k-means. In: Proceedings of 31st annual international symposium on computational geometry, SoCG, Eindhoven, The Netherlands, pp 754–767
Badoiu M, Har-Peled S, Indyk P (2002) Approximate clustering via core-sets. In: Proceedings of 34th annual ACM symposium on theory of computing, STOC, Québec, Canada, pp 250–257
Betzler N, Guo J, Niedermeier R (2010) Parameterized computational complexity of dodgson and young elections. Inf Comput 208(2):165–177
Bhattacharya A, Jaiswal R, Kumar A (2016) Faster algorithms for the constrained k-means problem. In: Proceedings of 33rd annual symposium on theoretical aspects of computer science, STACS, Orléans, France, pp 16:1–16:13
Bhattacharya A, Jaiswal R, Kumar A (2018) Faster algorithms for the constrained k-means problem. Theory Comput Syst 62(1):93–115
Cabello S, Giannopoulos P, Knauer C, Marx D, Rote G (2011) Geometric clustering: fixed-parameter tractability and lower bounds with respect to the dimension. ACM Trans Algorithms 7(4):43:1–43:27
Cohen-Addad V (2018) A fast approximation scheme for low-dimensional k-means. In: Proceedings 29th annual ACM-SIAM symposium on discrete algorithms, SODA, Louisiana, USA, pp 430–440
Cohen-Addad V, Klein PN, Mathieu C (2016) Local search yields approximation schemes for k-means and k-median in euclidean and minor-free metrics. In: Proceedings of 57th annual symposium on foundations of computer science, FOCS, New Jersey, USA, pp 353–364
Cygan M, Hajiaghayi M, Khuller S (2012) LP rounding for k-centers with non-uniform hard capacities. In: Proceedings of 53rd annual symposium on foundations of computer science, FOCS, New Jersey, USA, pp 273–282
Cygan M, Fomin FV, Kowalik L, Lokshtanov D, Marx D, Pilipczuk M, Pilipczuk M, Saurabh S (2015) Parameterized algorithms. Springer
De la Vega WF, Karpinski M, Kenyon C, Rabani Y (2003) Approximation schemes for clustering problems. In: Proceedings of 35th annual ACM symposium on theory of computing, STOC, California, USA, pp 50–58
Ding H, Xu J (2011) Solving the chromatic cone clustering problem via minimum spanning sphere. In: Proceedings of 38th annual international colloquium on automata, languages and programming, ICALP, Zurich, Switzerland, pp 773–784
Ding H, Xu J (2015) A unified framework for clustering constrained data without locality property. In: Proceedings of 26th annual ACM-SIAM symposium on discrete algorithms, SODA, California, USA, pp 1471–1490
Downey RG, Fellows MR (1999) Parameterized complexity. Monographs in computer science, Springer
Ene A, Har-Peled S, Raichel B (2013) Fast clustering with lower bounds: no customer too far, no shop too small. CoRR arXiv:1304.7318
Fayyad UM, Piatetsky-Shapiro G, Smyth P, Uthurusamy R (eds) (1996) Advances in knowledge discovery and data mining. AAAI Press, Quebec
Feldman D, Monemizadeh M, Sohler C (2007) A PTAS for k-means clustering based on weak coresets. In: Proceedings of 23rd annual ACM symposium on computational geometry, SoCG, Gyeongju, South Korea, pp 11–18
Feldman D, Schmidt M, Sohler C (2013) Turning big data into tiny data: constant-size coresets for k-means, PCA and projective clustering. In: Proceedings of 24th annual ACM-SIAM symposium on discrete algorithms, SODA, Louisiana, USA, pp 1434–1453
Feldmann AE (2015) Fixed parameter approximations for \(k\)-center problems in low highway dimension graphs. In: Proceedings of 42th annual international colloquium on automata, languages and programming, ICALP, Kyoto, Japan, pp 588–600
Feng Q, Huang N, Jiang X, Wang J (2018a) Dealing with several parameterized problems by random methods. Theor Comput Sci 734:94–104
Feng Q, Li S, Zhou Z, Wang J (2018b) Parameterized algorithms for edge biclique and related problems. Theor Comput Sci 734:105–118
Friggstad Z, Rezapour M, Salavatipour MR (2016) Local search yields a PTAS for k-means in doubling metrics. In: Proceedings of 57th annual symposium on foundations of computer science, FOCS, New Jersey, USA, pp 365–374
Gao J, Ping Q, Wang J (2018) Resisting re-identification mining on social graph data. World Wide Web. https://doi.org/10.1007/s11280-017-0524-3
Guo L, Shen H, Zhu W (2017) Efficient approximation algorithms for multi-antennae largest weight data retrieval. IEEE Trans Mob Comput 16(12):3320–3333
Hajiaghayi MT, Hu W, Li J, Li S, Saha B (2016) A constant factor approximation algorithm for fault-tolerant k-median. ACM Trans Algorithms 12(3):36:1–36:19
Har-Peled S, Raichel B (2015) Net and prune: a linear time algorithm for Euclidean distance problems. J ACM 62(6):44:1–44:35
Hoeffding W (1963) Probability inequalities for sums of bounded random variables. J Am Stat Assoc 58(301):13–30
Inaba M, Katoh N, Imai H (1994) Applications of weighted Voronoi diagrams and randomization to variance-based k-clustering (extended abstract). In: Proceedings of 10th annual symposium on computational geometry, SoCG, New York, USA, pp 332–339
Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 31(3):264–323
Jain AK, Duin RPW, Mao J (2000) Statistical pattern recognition: a review. IEEE Trans Pattern Anal Mach Intell 22(1):4–37
Jaiswal R, Kumar M, Yadav P (2015) Improved analysis of D\({}^{2}\)-sampling based PTAS for k-means and other clustering problems. Inf Process Lett 115(2):100–103
Kanungo T, Mount DM, Netanyahu NS, Piatko CD, Silverman R, Wu AY (2004) A local search approximation algorithm for k-means clustering. Comput Geom 28(2–3):89–112
Khuller S, Sussmann YJ (2000) The capacitated K-center problem. SIAM J Discrete Math 13(3):403–418
Khuller S, Pless R, Sussmann YJ (2000) Fault tolerant K-center problems. Theor Comput Sci 242(1–2):237–245
Kumar A, Sabharwal Y, Sen S (2010) Linear-time approximation schemes for clustering problems in any dimensions. J ACM 57(2):5:1–5:32
Kumar N, Raichel B (2013) Fault tolerant clustering revisited. In: Proceedings of 25th annual Canadian conference on computational geometry, CCCG, Ontario, Canada
Li J, Yi K, Zhang Q (2010) Clustering with diversity. In: Proceedings of 37th annual international colloquium on automata, languages and programming, ICALP, Bordeaux, France, pp 188–200
Li W, Feng Q, Chen J, Hu S (2017) Improved kernel results for some FPT problems based on simple observations. Theor Comput Sci 657:20–27
Lin M, Feng Q, Chen J, Li W (2017) Partition on trees with supply and demand: Kernelization and algorithms. Theor Comput Sci 657:11–19
Lin M, Feng Q, Wang J, Chen J, Fu B, Li W (2018) An improved FPT algorithm for almost forest deletion problem. Inf Process Lett 136:30–36
Lloyd SP (1982) Least squares quantization in PCM. IEEE Trans Inf Theory 28(2):129–136
Mahajan M, Nimbhorkar P, Varadarajan KR (2012) The planar k-means problem is NP-hard. Theor Comput Sci 442:13–21
Ostrovsky R, Rabani Y, Schulman LJ, Swamy C (2012) The effectiveness of lloyd-type methods for the k-means problem. J ACM 59(6):28:1–28:22
Author information
Authors and Affiliations
Corresponding author
Additional information
This work is supported by the National Natural Science Foundation of China under Grants (61472449, 61420106009, 61672536, 71631008).
Rights and permissions
About this article
Cite this article
Feng, Q., Hu, J., Huang, N. et al. Improved PTAS for the constrained k-means problem. J Comb Optim 37, 1091–1110 (2019). https://doi.org/10.1007/s10878-018-0340-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10878-018-0340-4