[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3357713.3384337acmconferencesArticle/Chapter ViewAbstractPublication PagesstocConference Proceedingsconference-collections
research-article
Open access

Efficiently learning structured distributions from untrusted batches

Published: 22 June 2020 Publication History

Abstract

We study the problem, introduced by Qiao and Valiant, of learning from untrusted batches. Here, we assume m users, all of whom have samples from some underlying distribution over 1, …, n. Each user sends a batch of k i.i.d. samples from this distribution; however an є-fraction of users are untrustworthy and can send adversarially chosen responses. The goal of the algorithm is to learn in total variation distance. When k = 1 this is the standard robust univariate density estimation setting and it is well-understood that (є) error is unavoidable. Suprisingly, Qiao and Valiant gave an estimator which improves upon this rate when k is large. Unfortunately, their algorithms run in time which is exponential in either n or k.
We first give a sequence of polynomial time algorithms whose estimation error approaches the information-theoretically optimal bound for this problem. Our approach is based on recent algorithms derived from the sum-of-squares hierarchy, in the context of high-dimensional robust estimation. We show that algorithms for learning from untrusted batches can also be cast in this framework, but by working with a more complicated set of test functions.
It turns out that this abstraction is quite powerful, and can be generalized to incorporate additional problem specific constraints. Our second and main result is to show that this technology can be leveraged to build in prior knowledge about the shape of the distribution. Crucially, this allows us to reduce the sample complexity of learning from untrusted batches to polylogarithmic in n for most natural classes of distributions, which is important in many applications. To do so, we demonstrate that these sum-of-squares algorithms for robust mean estimation can be made to handle complex combinatorial constraints (e.g. those arising from VC theory), which may be of independent technical interest.

References

[1]
J. Acharya, I. Diakonikolas, C. Hegde, J. Li, and L. Schmidt. 2015. Fast and Near-Optimal Algorithms for Approximating Distributions by Histograms. In PODS.
[2]
Jayadev Acharya, Ilias Diakonikolas, Jerry Li, and Ludwig Schmidt. 2017. Sample-optimal density estimation in nearly-linear time. In Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms. 1278–1289.
[3]
D. Achlioptas and F. McSherry. 2005. On Spectral Learning of Mixtures of Distributions. In Proceedings of the Eighteenth Annual Conference on Learning Theory (COLT). 458–469.
[4]
Frank J Anscombe. 1960. Rejection of outliers. Technometrics, 2, 2 (1960), 123–146.
[5]
S. Arora and R. Kannan. 2001. Learning mixtures of arbitrary Gaussians. In Proceedings of the 33rd Symposium on Theory of Computing. 247–257.
[6]
F. Balabdaoui, K. Rufibach, and J. A. Wellner. 2009. Limit Distribution Theory for Maximum Likelihood Estimation of a Log-Concave Density. The Annals of Statistics, 37, 3 (2009), pp. 1299–1331. issn:00905364
[7]
F. Balabdaoui and J. A. Wellner. 2007. Estimation of a k-Monotone Density: Limit Distribution Theory and the Spline Connection. The Annals of Statistics, 35, 6 (2007), pp. 2536–2564. issn:00905364
[8]
Sivaraman Balakrishnan, Simon S Du, Jerry Li, and Aarti Singh. 2017. Computationally efficient robust sparse estimation in high dimensions. In Conference on Learning Theory. 169–212.
[9]
Richard E Barlow, David J Bartholomew, James M Bremner, and H Daniel Brunk. 1972. Statistical inference under order restrictions: The theory and application of isotonic regression. Wiley New York.
[10]
L. Birgé. 1987. Estimating a density under order restrictions: Nonasymptotic minimax risk. Annals of Statistics, 15, 3 (1987), 995–1012.
[11]
Hugh D Brunk. 1955. Maximum likelihood estimates of monotone parameters. The Annals of Mathematical Statistics, 607–616.
[12]
H. D. Brunk. 1958. On the Estimation of Parameters Restricted by Inequalities. The Annals of Mathematical Statistics, 29, 2 (1958), pp. 437–454. issn:00034851
[13]
K.S. Chan and H. Tong. 2004. Testing for multimodality with dependent data. Biometrika, 91, 1 (2004), 113–123.
[14]
S. Chan, I. Diakonikolas, R. Servedio, and X. Sun. 2014. Near-Optimal Density Estimation in Near-Linear Time Using Variable-Width Histograms. In NIPS. 1844–1852.
[15]
Siu-On Chan, Ilias Diakonikolas, Rocco A Servedio, and Xiaorui Sun. 2013. Learning mixtures of structured distributions over discrete domains. In Proceedings of the twenty-fourth annual ACM-SIAM symposium on Discrete algorithms. 1380–1394.
[16]
Siu-On Chan, Ilias Diakonikolas, Rocco A Servedio, and Xiaorui Sun. 2014. Efficient density estimation via piecewise polynomial approximation. In Proceedings of the forty-sixth annual ACM symposium on Theory of computing. 604–613.
[17]
Moses Charikar, Jacob Steinhardt, and Gregory Valiant. 2017. Learning from untrusted data. In Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing. 47–60.
[18]
Sitan Chen, Jerry Li, and Ankur Moitra. 2020. Learning Structured Distributions From Untrusted Batches: Faster and Simpler. arXiv preprint arXiv:2002.10435.
[19]
Sitan Chen and Ankur Moitra. 2019. Beyond the low-degree algorithm: mixtures of subcubes and their applications. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing. 869–880.
[20]
Richard Cole and Tim Roughgarden. 2014. The sample complexity of revenue maximization. In Proceedings of the forty-sixth annual ACM symposium on Theory of computing. 243–252.
[21]
S. Dasgupta. 1999. Learning mixtures of Gaussians. In Proceedings of the 40th Annual Symposium on Foundations of Computer Science. 634–644.
[22]
S. Dasgupta and L. Schulman. 2000. A two-round variant of EM for Gaussian mixtures. In Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence. 143–151.
[23]
Constantinos Daskalakis, Anindya De, Gautam Kamath, and Christos Tzamos. 2016. A size-free CLT for poisson multinomials and its applications. In Proceedings of the forty-eighth annual ACM symposium on Theory of Computing. 1074–1086.
[24]
C. Daskalakis, I. Diakonikolas, R. O’Donnell, R.A. Servedio, and L. Tan. 2013. Learning Sums of Independent Integer Random Variables. In FOCS. 217–226.
[25]
C. Daskalakis, I. Diakonikolas, and R.A. Servedio. 2012. Learning k-modal distributions via testing. In SODA. 1371–1385.
[26]
C. Daskalakis, I. Diakonikolas, and R.A. Servedio. 2012. Learning Poisson Binomial Distributions. In STOC. 709–728.
[27]
Luc Devroye and Gabor Lugosi. 2001. Combinatorial Methods in Density Estimation. Springer Science & Business Media.
[28]
Ilias Diakonikolas. 2016. Learning Structured Distributions. Handbook of Big Data, 267 (2016).
[29]
Ilias Diakonikolas, Gautam Kamath, Daniel Kane, Jerry Li, Ankur Moitra, and Alistair Stewart. 2019. Robust Estimators in High-Dimensions Without the Computational Intractability. SIAM J. Comput., 48, 2 (2019), 742–864.
[30]
Ilias Diakonikolas, Gautam Kamath, Daniel M Kane, Jerry Li, Ankur Moitra, and Alistair Stewart. 2017. Being robust (in high dimensions) can be practical. In Proceedings of the 34th International Conference on Machine Learning-Volume 70. 999–1008.
[31]
Ilias Diakonikolas, Daniel M Kane, and Alistair Stewart. 2016. The fourier transform of poisson multinomial distributions and its algorithmic applications. In Proceedings of the forty-eighth annual ACM symposium on Theory of Computing. 1060–1073.
[32]
Ilias Diakonikolas, Daniel M Kane, and Alistair Stewart. 2016. Optimal learning via the fourier transform for sums of independent integer random variables. In Conference on Learning Theory. 831–849.
[33]
Ilias Diakonikolas, Daniel M Kane, and Alistair Stewart. 2016. Properly learning poisson binomial distributions in almost polynomial time. In Conference on Learning Theory. 850–878.
[34]
Ilias Diakonikolas, Daniel M Kane, and Alistair Stewart. 2018. List-decodable robust mean estimation and learning mixtures of spherical gaussians. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing. 1047–1060.
[35]
D. L. Donoho, I. M. Johnstone, G. Kerkyacharian, and D. Picard. 1995. Wavelet shrinkage: asymptopia. Journal of the Royal Statistical Society, Ser. B, 371–394.
[36]
D. L. Donoho, I. M. Johnstone, G. Kerkyacharian, and D. Picard. 1996. Density estimation by wavelet thresholding. Ann. Statist., 24, 2 (1996), 508–539.
[37]
L. Dumbgen and K. Rufibach. 2009. Maximum likelihood estimation of a log-concave density and its distribution function: Basic properties and uniform consistency. Bernoulli, 15, 1 (2009), 40–68.
[38]
J. Feldman, R. O’Donnell, and R. Servedio. 2005. Learning mixtures of product distributions over discrete domains. In FOCS 2005. 501–510.
[39]
A.-L. Fougères. 1997. Estimation de densités unimodales. Canadian Journal of Statistics, 25 (1997), 375–387.
[40]
F. Gao and J. A. Wellner. 2009. On the rate of convergence of the maximum likelihood estimator of a k-monotone density. Science in China Series A: Mathematics, 52 (2009), 1525–1538.
[41]
U. Grenander. 1956. On the theory of mortality measurement. Skand. Aktuarietidskr., 39 (1956), 125–153.
[42]
P. Groeneboom. 1985. Estimating a monotone density. In Proc. of the Berkeley Conference in Honor of Jerzy Neyman and Jack Kiefer. 539–555.
[43]
D. L. Hanson and G. Pledger. 1976. Consistency in Concave Regression. The Annals of Statistics, 4, 6 (1976), pp. 1038–1050. issn:00905364
[44]
Clifford Hildreth. 1954. Point estimates of ordinates of concave functions. J. Amer. Statist. Assoc., 49, 267 (1954), 598–619.
[45]
Samuel B Hopkins and Jerry Li. 2018. Mixture models, robustness, and sum of squares proofs. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing. 1021–1034.
[46]
Zhiyi Huang, Yishay Mansour, and Tim Roughgarden. 2018. Making the most of your samples. SIAM J. Comput., 47, 3 (2018), 651–674.
[47]
Peter J Huber. 1992. Robust estimation of a location parameter. In Breakthroughs in statistics. Springer, 492–518.
[48]
Ayush Jain and Alon Orlitsky. 2019. Robust Learning of Discrete Distributions from Batches. arXiv preprint arXiv:1911.08532.
[49]
Ayush Jain and Alon Orlitsky. 2020. A General Method for Robust Learning from Batches. arXiv preprint arXiv:2002.11099.
[50]
H. K. Jankowski and J. A. Wellner. 2009. Estimation of a discrete monotone density. Electronic Journal of Statistics, 3 (2009), 1567–1605.
[51]
A. T. Kalai, A. Moitra, and G. Valiant. 2010. Efficiently learning mixtures of two Gaussians. In STOC. 553–562.
[52]
Sushrut Karmalkar, Pravesh Kothari, and Adam Klivans. 2019. List-Decodable Linear Regression. arXiv preprint arXiv:1905.05679.
[53]
G. Kerkyacharian, D. Picard, and K. Tribouley. 1996. Lp Adaptive Density Estimation. Bernoulli, 2, 3 (1996), pp. 229–247.
[54]
Adam Klivans, Pravesh K Kothari, and Raghu Meka. 2018. Efficient Algorithms for Outlier-Robust Regression. In Conference On Learning Theory. 1420–1430.
[55]
R. Koenker and I. Mizera. 2010. Quasi-concave density estimation. Ann. Statist., 38, 5 (2010), 2998–3027. issn:0090-5364 https://doi.org/10.1214/10-AOS814
[56]
Jakub Konečnỳ, H Brendan McMahan, Felix X Yu, Peter Richtárik, Ananda Theertha Suresh, and Dave Bacon. 2016. Federated learning: Strategies for improving communication efficiency. arXiv preprint arXiv:1610.05492.
[57]
Pravesh K Kothari, Jacob Steinhardt, and David Steurer. 2018. Robust moment estimation and improved clustering via sum of squares. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing. 1035–1046.
[58]
Kevin A Lai, Anup B Rao, and Santosh Vempala. 2016. Agnostic estimation of mean and covariance. In 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS). 665–674.
[59]
Rafał Latał a. 1997. Estimation of moments of sums of independent real random variables. The Annals of Probability, 25, 3 (1997), 1502–1513.
[60]
Reut Levi, Dana Ron, and Ronitt Rubinfeld. 2013. Testing properties of collections of distributions. Theory of Computing, 9, 1 (2013), 295–347.
[61]
Jerry Li and Ludwig Schmidt. 2017. Robust and proper learning for mixtures of gaussians via systems of polynomial inequalities. In Conference on Learning Theory. 1302–1382.
[62]
Jerry Zheng Li. 2018. Principled approaches to robust machine learning and beyond. Ph.D. Dissertation. Massachusetts Institute of Technology.
[63]
H. Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. 2017. Communication-Efficient Learning of Deep Networks from Decentralized Data. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS). arxiv:1602.05629
[64]
A. Moitra and G. Valiant. 2010. Settling the polynomial learnability of mixtures of Gaussians. In FOCS. 93–102.
[65]
Carl M O’Brien. 2016. Nonparametric Estimation under Shape Constraints: Estimators, Algorithms and Asymptotics. International Statistical Review, 84, 2 (2016), 318–319.
[66]
Mingda Qiao and Gregory Valiant. 2017. Learning discrete distributions from untrusted batches. arXiv preprint arXiv:1711.08113.
[67]
Prasad Raghavendra, Tselil Schramm, and David Steurer. 2018. High-dimensional estimation via sum-of-squares proofs. arXiv preprint arXiv:1807.11419.
[68]
Prasad Raghavendra and Morris Yau. 2019. List Decodable Learning via Sum of Squares. arXiv preprint arXiv:1905.04660.
[69]
B.L.S. Prakasa Rao. 1969. Estimation of a unimodal density. Sankhya Ser. A, 31 (1969), 23–36.
[70]
R. A. Redner and H. F. Walker. 1984. Mixture densities, maximum likelihood and the EM algorithm. SIAM Rev., 26 (1984), 195–202.
[71]
Jacob Steinhardt. 2018. Robust Learning: Information Theory and Algorithms. Ph.D. Dissertation. Stanford University.
[72]
Jacob Steinhardt, Moses Charikar, and Gregory Valiant. 2018. Resilience: A Criterion for Learning in the Presence of Arbitrary Outliers. In 9th Innovations in Theoretical Computer Science Conference (ITCS 2018).
[73]
C. J. Stone. 1994. The Use of Polynomial Splines and Their Tensor Products in Multivariate Function Estimation. The Annals of Statistics, 22, 1 (1994), pp. 118–171.
[74]
C. J. Stone, M. H. Hansen, C. Kooperberg, and Y. K. Truong. 1997. Polynomial splines and their tensor products in extended linear modeling: 1994 Wald memorial lecture. Ann. Statist., 25, 4 (1997), 1371–1470.
[75]
Kevin Tian, Weihao Kong, and Gregory Valiant. 2017. Learning populations of parameters. In Advances in Neural Information Processing Systems. 5778–5787.
[76]
Aleksandr Filippovich Timan. 2014. Theory of approximation of functions of a real variable. 34, Elsevier.
[77]
John W Tukey. 1960. A survey of sampling from contaminated distributions. Contributions to probability and statistics, 448–485.
[78]
John W Tukey. 1975. Mathematics and the picturing of data. In Proceedings of the International Congress of Mathematicians, Vancouver, 1975. 2, 523–531.
[79]
Vladimir Vapnik and Alexey Chervonenkis. 1974. Theory of pattern recognition.
[80]
S. Vempala and G. Wang. 2002. A Spectral Algorithm for learning mixtures of distributions. In Proceedings of the 43rd Annual Symposium on Foundations of Computer Science. 113–122.
[81]
G. Walther. 2009. Inference and Modeling with Log-concave Distributions. Statist. Sci., 24, 3 (2009), 319–327.
[82]
E.J. Wegman. 1970. Maximum likelihood estimation of a unimodal density. I. and II. Ann. Math. Statist., 41 (1970), 457–471, 2169–2174.
[83]
Edward J Wegman. 1970. Maximum likelihood estimation of a unimodal density function. The Annals of Mathematical Statistics, 41, 2 (1970), 457–471.
[84]
E. J. Wegman and I. W. Wright. 1983. Splines in Statistics. J. Amer. Statist. Assoc., 78, 382 (1983), pp. 351–365.
[85]
R. Willett and R. D. Nowak. 2007. Multiscale Poisson Intensity and Density Estimation. IEEE Transactions on Information Theory, 53, 9 (2007), 3171–3187.

Cited By

View all
  • (2024)Testing Closeness of Multivariate Distributions via Ramsey TheoryProceedings of the 56th Annual ACM Symposium on Theory of Computing10.1145/3618260.3649657(340-347)Online publication date: 10-Jun-2024
  • (2020)Confidence regions and minimax rates in outlier-robust estimation on the probability simplexElectronic Journal of Statistics10.1214/20-EJS173114:2Online publication date: 1-Jan-2020

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
STOC 2020: Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing
June 2020
1429 pages
ISBN:9781450369794
DOI:10.1145/3357713
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 June 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Robust statistics
  2. VC complexity
  3. federated learning
  4. sum-of-squares

Qualifiers

  • Research-article

Funding Sources

Conference

STOC '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,469 of 4,586 submissions, 32%

Upcoming Conference

STOC '25
57th Annual ACM Symposium on Theory of Computing (STOC 2025)
June 23 - 27, 2025
Prague , Czech Republic

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)66
  • Downloads (Last 6 weeks)6
Reflects downloads up to 31 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Testing Closeness of Multivariate Distributions via Ramsey TheoryProceedings of the 56th Annual ACM Symposium on Theory of Computing10.1145/3618260.3649657(340-347)Online publication date: 10-Jun-2024
  • (2020)Confidence regions and minimax rates in outlier-robust estimation on the probability simplexElectronic Journal of Statistics10.1214/20-EJS173114:2Online publication date: 1-Jan-2020

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media