Keyword: k-median clustering : Search

research-article

Open Access

Settling Time vs. Accuracy Tradeoffs for Clustering Big Data

Proceedings of the ACM on Management of Data (PACMMOD), Volume 2, Issue 3Article No.: 173, Pages 1–25https://doi.org/10.1145/3654976

We study the theoretical and practical runtime limits of k-means and k-median clustering on large datasets. Since effectively all clustering methods are slower than the time it takes to read the dataset, the fastest approach is to quickly compress the ...

research-article

Lossy Kernelization of Same-Size Clustering

Theory of Computing Systems (TOCSYS), Volume 67, Issue 4Pages 785–824https://doi.org/10.1007/s00224-023-10129-9

Abstract

In this work, we study the k-median clustering problem with an additional equal-size constraint on the clusters from the perspective of parameterized preprocessing. Our main result is the first lossy (2-approximate) polynomial kernel for this ...

rapid-communication

On the k-means/median cost function

Information Processing Letters (IPRL), Volume 177, Issue Chttps://doi.org/10.1016/j.ipl.2022.106252

Highlights

We try to understand how the optimal k-means cost behaves as a function of the k (the number of centers/clusters).
We show that D 2 sampling is a useful method for designing pseudo-approximation algorithm and movement-based coreset for ...

Abstract

In this work, we study the k-means cost function. Given a dataset X ⊆ R d and an integer k, the goal of the Euclidean k-means problem is to find a set of k centers C ⊆ R d such that Φ ( C , X ) ≡ ∑ x ∈ X min c ∈ C ⁡ ‖ x − c ‖ 2 is minimized. Let ...

Article

Lossy Kernelization of Same-Size Clustering

Computer Science – Theory and ApplicationsPages 96–114https://doi.org/10.1007/978-3-031-09574-0_7

Abstract

In this work, we study the k-median clustering problem with an additional equal-size constraint on the clusters, from the perspective of parameterized preprocessing. Our main result is the first lossy (2-approximate) polynomial kernel for this ...

research-article

Scenario reduction revisited: fundamental limits and guarantees

Mathematical Programming: Series A and B (MPRG), Volume 191, Issue 1Pages 207–242https://doi.org/10.1007/s10107-018-1269-1

Abstract

The goal of scenario reduction is to approximate a given discrete distribution with another discrete distribution that has fewer atoms. We distinguish continuous scenario reduction, where the new atoms may be chosen freely, and discrete scenario ...

research-article

Subsampled Exponential Mechanism: Differential Privacy in Large Output Spaces

AISec '15: Proceedings of the 8th ACM Workshop on Artificial Intelligence and SecurityPages 25–33https://doi.org/10.1145/2808769.2808776

In the last several years, differential privacy has become the leading framework for private data analysis. It provides bounds on the amount that a randomized function can change as the result of a modification to one record of a database. This ...

article

Probabilistic k-Median Clustering in Data Streams

Theory of Computing Systems (TOCSYS), Volume 56, Issue 1Pages 251–290https://doi.org/10.1007/s00224-014-9539-7

The focus of our work is introducing and constructing probabilistic coresets. A probabilistic coreset can contain probabilistic points, and the number of these points should be polylogarithmic in the input size. However, the overall storage size is also ...

research-article

Clustering for metric and nonmetric distance measures

ACM Transactions on Algorithms (TALG), Volume 6, Issue 4Article No.: 59, Pages 1–26https://doi.org/10.1145/1824777.1824779

We study a generalization of the k-median problem with respect to an arbitrary dissimilarity measure D. Given a finite set P of size n, our goal is to find a set C of size k such that the sum of errors D(P,C) = ∑_{p ∈ P} min_{c ∈ C} {D(p,c)} is minimized. The ...

article

A framework for statistical clustering with constant time approximation algorithms for K-median and K-means clustering

Shai Ben-David

Machine Language (MALE), Volume 66, Issue 2-3Pages 243–257https://doi.org/10.1007/s10994-006-0587-3

We consider a framework of sample-based clustering . In this setting, the input to a clustering algorithm is a sample generated i.i.d by some unknown arbitrary distribution. Based on such a sample, the algorithm has to output a clustering of the full ...

Search Results

Applied Filters

Publication Date

People

Authors

Institutions

Publications

Journal/Magazine Names

All Publications

Content Type

Supplemental Material Type

Publisher

Proceedings Series

ACM SIG Sponsors

Results

Caption

Settling Time vs. Accuracy Tradeoffs for Clustering Big Data

Lossy Kernelization of Same-Size Clustering

On the k-means/median cost function

Lossy Kernelization of Same-Size Clustering

Scenario reduction revisited: fundamental limits and guarantees

Subsampled Exponential Mechanism: Differential Privacy in Large Output Spaces

Probabilistic k-Median Clustering in Data Streams

Clustering for metric and nonmetric distance measures

A framework for statistical clustering with constant time approximation algorithms for K-median and K-means clustering

Applied Filters

Publication Date

People

Authors

Institutions

Publications

Journal/Magazine Names

All Publications

Content Type

Supplemental Material Type

Publisher

Proceedings Series

ACM SIG Sponsors

Save to Binder