[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3319535.3339821acmconferencesArticle/Chapter ViewAbstractPublication PagesccsConference Proceedingsconference-collections
research-article
Public Access

Differentially Private Nonparametric Hypothesis Testing

Published: 06 November 2019 Publication History

Abstract

Hypothesis tests are a crucial statistical tool for data mining and are the workhorse of scientific research in many fields. Here we study differentially private tests of independence between a categorical and a continuous variable. We take as our starting point traditional nonparametric tests, which require no distributional assumption (e.g., normality) about the data distribution. We present private analogues of the Kruskal-Wallis, Mann-Whitney, and Wilcoxon signed-rank tests, as well as the parametric one-sample t-test. These tests use novel test statistics developed specifically for the private setting. We compare our tests to prior work, both on parametric and nonparametric tests. We find that in all cases our new nonparametric tests achieve large improvements in statistical power, even when the assumptions of parametric tests are met.

Supplementary Material

WEBM File (p737-groce.webm)

References

[1]
Kathryn Andersen, Mary Fjerstad, Indira Basnett, Shailes Neupane, Valerie Acre, Sharad Kumar Sharma, and Emily Jackson. 2017. Determination of medical abortion eligibility by women and community health volunteers in Nepal: A toolkit evaluation. PLOS ONE, Vol. 12, 9 (09 2017), 1--13. https://doi.org/10.1371/journal.pone.0178248
[2]
Jordan Awan and Aleksandra Slavković. 2018. Differentially private uniformly most powerful tests for binomial data. In Advances in Neural Information Processing Systems. 4208--4218.
[3]
Andrés F Barrientos, Jerome P Reiter, Ashwin Machanavajjhala, and Yan Chen. 2019. Differentially private significance tests for regression coefficients. Journal of Computational and Graphical Statistics (2019), 1--24.
[4]
John Martin Bland. 2009. The Tyranny of Power: Is There a Better Way to Calculate Sample Size? Bmj, Vol. 339 (2009), b3985.
[5]
Zachary Campbell, Andrew Bray, Anna Ritz, and Adam Groce. 2018. Differentially Private ANOVA Testing. In Data Intelligence and Security (ICDIS), 2018 1st International Conference on. IEEE, 281--285.
[6]
Anthony Carrard, Annick Salzmann, Alain Malafosse, and Felicien Karege. 2011. Increased DNA methylation status of the serotonin receptor 5HTR1A gene promoter in schizophrenia and bipolar disorder. Journal of Affective Disorders, Vol. 132(3) (2011), 450--453.
[7]
William Jay Conover. 1973. On Methods of Handling Ties in the Wilcoxon Signed-Rank Test. J. Amer. Statist. Assoc., Vol. 68, 344 (1973), 985--988.
[8]
Bolin Ding, Harsha Nori, Paul Li, and Joshua Allen. 2018. Comparing population means under local differential privacy: with significance and power. In Thirty-Second AAAI Conference on Artificial Intelligence .
[9]
Vito D'Orazio, James Honaker, and Gary King. 2015. Differential Privacy for Social Science Inference. (2015).
[10]
Cynthia Dwork, Krishnaram Kenthapadi, Frank McSherry, Ilya Mironov, and Moni Naor. 2006 a. Our Data, Ourselves: Privacy Via Distributed Noise Generation, In Advances in Cryptology (EUROCRYPT 2006)., Vol. 4004, 486--503. https://www.microsoft.com/en-us/research/publication/our-data-ourselves-privacy-via-distributed-noise-generation/
[11]
Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. 2006 b. Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography Conference . Springer, 265--284.
[12]
Morten Fagerland, Leiv Sandvik, and Petter Mowinckel. 2011. Parametric Methods Outperformed Non-Parametric Methods in Comparisons of Discrete Numerical Variables. BMC Medical Research Methodology, Vol. 11 (04 2011), 44.
[13]
Stephen E Fienberg, Aleksandra Slavkovic, and Caroline Uhler. 2011. Privacy preserving GWAS data sharing. In Data Mining Workshops (ICDMW), 2011 IEEE 11th International Conference on. IEEE, 628--635.
[14]
Marco Gaboardi, Hyun-Woo Lim, Ryan M Rogers, and Salil P Vadhan. 2016. Differentially private chi-squared hypothesis testing: Goodness of fit and independence testing. In ICML'16 Proceedings of the 33rd International Conference on International Conference on Machine Learning-Volume 48. JMLR.
[15]
Marco Gaboardi, Ryan Rogers, and Or Sheffet. 2018. Locally private mean estimation: Z-test and tight confidence intervals. arXiv preprint arXiv:1810.08054 (2018).
[16]
Corrado Gini. 1936. On the measure of concentration with special reference to income and statistics. Colorado College Publication, General Series, Vol. 208 (1936), 73--79.
[17]
Nils Homer, Szabolcs Szelinger, Margot Redman, David Duggan, Waibhav Tembe, Jill Muehling, John V Pearson, Dietrich A Stephan, Stanley F Nelson, and David W Craig. 2008. Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays. PLoS genetics, Vol. 4, 8 (2008), e1000167.
[18]
Aaron Johnson and Vitaly Shmatikov. 2013. Privacy-preserving data exploration in genome-wide association studies. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining . ACM, 1079--1087.
[19]
Vishesh Karwa and Salil Vadhan. 2017. Finite sample differentially private confidence intervals. arXiv preprint arXiv:1711.03908 (2017).
[20]
William H. Kruskal and W. Allen Wallis. 1952. Use of Ranks in One-Criterion Variance Analysis. J. Amer. Statist. Assoc., Vol. 47, 260 (1952), 583--621.
[21]
Henry B Mann and Donald R Whitney. 1947. On a test of whether one of two random variables is stochastically larger than the other. The annals of mathematical statistics (1947), 50--60.
[22]
Arvind Narayanan and Vitaly Shmatikov. 2008. Robust de-anonymization of large sparse datasets. Security and Privacy, 2008. SP 2008. IEEE Symposium on. IEEE, 111--125.
[23]
Thông T Nguyên and Siu Cheung Hui. 2017. Differentially Private Regression for Discrete-Time Survival Analysis. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. ACM, 1199--1208.
[24]
Kobbi Nissim, Sofya Raskhodnikova, and Adam Smith. 2007. Smooth sensitivity and sampling in private data analysis. In Proceedings of the thirty-ninth annual ACM symposium on Theory of computing. ACM, 75--84.
[25]
John W Pratt. 1959. Remarks on Zeros and Ties in the Wilcoxon Signed Rank Procedures. J. Amer. Statist. Assoc., Vol. 54, 287 (1959), 655--667.
[26]
Ryan Rogers and Daniel Kifer. 2017. A new class of private Chi-square hypothesis tests. In Artificial Intelligence and Statistics . 991--1000.
[27]
Or Sheffet. 2017. Differentially private ordinary least squares. In Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 3105--3114.
[28]
Adam Smith. 2008. Efficient, differentially private point estimators. arXiv preprint arXiv:0809.4794 (2008).
[29]
Adam Smith. 2011. Privacy-preserving statistical estimation with optimal convergence rates. In Proceedings of the forty-third annual ACM symposium on Theory of computing. ACM, 813--822.
[30]
Eftychia Solea. 2014. Differentially Private Hypothesis Testing For Normal Random Variables. (2014).
[31]
Marika Swanberg, Ira Globus-Harris, Iris Griffith, Anna Ritz, Adam Groce, and Andrew Bray. 2019. Improved Differentially Private Analysis of Variance. Proceedings on Privacy Enhancing Technologies (2019).
[32]
Latanya Sweeney. 2002. k-anonymity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, Vol. 10, 05 (2002), 557--570.
[33]
Christine Task and Chris Clifton. 2016. Differentially Private Significance Testing on Paired-Sample Data. In Proceedings of the 2016 SIAM International Conference on Data Mining. SIAM, 153--161.
[34]
Caroline Uhlerop, Aleksandra Slavković, and Stephen E Fienberg. 2013. Privacy-preserving data sharing for genome-wide association studies. The Journal of privacy and confidentiality, Vol. 5, 1 (2013), 137.
[35]
Duy Vu and Aleksandra Slavkovic. 2009. Differential privacy for clinical trial data: Preliminary evaluations. In Data Mining Workshops, 2009. ICDMW'09. IEEE International Conference on. IEEE, 138--143.
[36]
Yue Wang, Jaewoo Lee, and Daniel Kifer. 2015. Revisiting Differentially Private Hypothesis Tests for Categorical Data. arXiv preprint arXiv:1511.03376 (2015).
[37]
Larry Wasserman and Shuheng Zhou. 2010. A statistical framework for differential privacy. J. Amer. Statist. Assoc., Vol. 105, 489 (2010), 375--389.
[38]
Chris Whong. 2014. FOILing NYC's Taxi Trip Data. (2014).
[39]
Frank Wilcoxon. 1945 a. Individual comparisons by ranking methods. Biometrics bulletin, Vol. 1, 6 (1945), 80--83.
[40]
Frank Wilcoxon. 1945 b. Individual Comparisons by Ranking Methods. Biometrics Bulletin, Vol. 1, 6 (1945), 80--83.
[41]
Finance World Bank Development Research Group and Private Sector Development Unit. 2018. United States Global Financial Inclusion (Global Findex) Database 2017. (2018).

Cited By

View all
  • (2025)Optimising efficiency and patient-centredness in general hospitals: insights from data envelopment and matrix analysisJournal of Health Organization and Management10.1108/JHOM-07-2024-0302Online publication date: 21-Jan-2025
  • (2025)Simulation-Based, Finite-Sample Inference for Privatized DataJournal of the American Statistical Association10.1080/01621459.2024.2427436(1-14)Online publication date: 3-Jan-2025
  • (2024)Study of the Value of π Probability Sampling by Testing Hypothesis and ExperimentallyJournal of Computers, Mechanical and Management10.57159/gadl.jcmm.3.1.2401013:1(22-29)Online publication date: 29-Feb-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
CCS '19: Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security
November 2019
2755 pages
ISBN:9781450367479
DOI:10.1145/3319535
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 November 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. differential privacy
  2. hypothesis test
  3. nonparametric

Qualifiers

  • Research-article

Funding Sources

  • NSF
  • Richter Funds

Conference

CCS '19
Sponsor:

Acceptance Rates

CCS '19 Paper Acceptance Rate 149 of 934 submissions, 16%;
Overall Acceptance Rate 1,261 of 6,999 submissions, 18%

Upcoming Conference

CCS '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)277
  • Downloads (Last 6 weeks)29
Reflects downloads up to 28 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Optimising efficiency and patient-centredness in general hospitals: insights from data envelopment and matrix analysisJournal of Health Organization and Management10.1108/JHOM-07-2024-0302Online publication date: 21-Jan-2025
  • (2025)Simulation-Based, Finite-Sample Inference for Privatized DataJournal of the American Statistical Association10.1080/01621459.2024.2427436(1-14)Online publication date: 3-Jan-2025
  • (2024)Study of the Value of π Probability Sampling by Testing Hypothesis and ExperimentallyJournal of Computers, Mechanical and Management10.57159/gadl.jcmm.3.1.2401013:1(22-29)Online publication date: 29-Feb-2024
  • (2024)Privacy-Preserving Visualization of Brain Functional Network Connectivity2024 IEEE International Symposium on Biomedical Imaging (ISBI)10.1109/ISBI56570.2024.10635222(1-5)Online publication date: 27-May-2024
  • (2023)Nonparametric extensions of randomized response for private confidence setsProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619936(36748-36789)Online publication date: 23-Jul-2023
  • (2023)The test of testsProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619068(16131-16151)Online publication date: 23-Jul-2023
  • (2023)RRN: A differential private approach to preserve privacy in image classificationIET Image Processing10.1049/ipr2.1278417:7(2192-2203)Online publication date: 20-Mar-2023
  • (2022)Inference for Optimal Differential Privacy Procedures for Frequency TablesJournal of Data Science10.6339/22-JDS1044(253-276)Online publication date: 20-Apr-2022
  • (2022)Differentially Private Simple Linear RegressionProceedings on Privacy Enhancing Technologies10.2478/popets-2022-00412022:2(184-204)Online publication date: 3-Mar-2022
  • (2022)T-Friedman Test: A New Statistical Test for Multiple Comparison with an Adjustable Conservativeness MeasureInternational Journal of Computational Intelligence Systems10.1007/s44196-022-00083-815:1Online publication date: 30-Apr-2022
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media