When Random Sampling Preserves Privacy

Kamalika Chaudhuri¹⁷ &
Nina Mishra¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 4117))

Included in the following conference series:

Annual International Cryptology Conference

3374 Accesses
44 Citations
2 Altmetric

Abstract

Many organizations such as the U.S. Census publicly release samples of data that they collect about private citizens. These datasets are first anonymized using various techniques and then a small sample is released so as to enable “do-it-yourself” calculations. This paper investigates the privacy of the second step of this process: sampling. We observe that rare values – values that occur with low frequency in the table – can be problematic from a privacy perspective. To our knowledge, this is the first work that quantitatively examines the relationship between the number of rare values in a table and the privacy in a released random sample. If we require ε-privacy (where the larger ε is, the worse the privacy guarantee) with probability at least 1 – δ, we say that a value is rare if it occurs in at most \(\tilde{O}(\frac{1}{\epsilon})\) rows of the table (ignoring log factors). If there are no rare values, then we establish a direct connection between sample size that is safe to release and privacy. Specifically, if we select each row of the table with probability at most ε then the sample is O(ε)-private with high probability. In the case that there are t rare values, then the sample is \(\tilde{O}(\epsilon \delta /t)\)-private with probability at least 1–δ.

Research supported in part by NSF EIA-0137761.

Download to read the full chapter text

Chapter PDF

Finite Populations Sampling Strategies and Costs Control

No Calculation When Observation Can Be Made

Introduction to Sampling Techniques

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Blum, A., Dwork, C., McSherry, F., Nissim, K.: Practical privacy: the sulq framework. In: PODS, pp. 128–138 (2005)
Google Scholar
U.S. Census Bureau. Public use microdata sample (pums) (2003), http://www.census.gov/Press-Release/www/2003/PUMS.html
Chawla, S., Dwork, C., McSherry, F., Smith, A., Wee, H.M.: Toward privacy in public databases. In: Kilian, J. (ed.) TCC 2005. LNCS, vol. 3378, pp. 363–385. Springer, Heidelberg (2005)
Chapter Google Scholar
Dinur, I., Nissim, K.: Revealing information while preserving privacy. In: PODS, pp. 202–210 (2003)
Google Scholar
Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: Halevi, S., Rabin, T. (eds.) TCC 2006. LNCS, vol. 3876, pp. 265–284. Springer, Heidelberg (2006)
Chapter Google Scholar
Dwork, C., Nissim, K.: Privacy-preserving datamining on vertically partitioned databases. In: Franklin, M. (ed.) CRYPTO 2004. LNCS, vol. 3152, pp. 528–544. Springer, Heidelberg (2004)
Google Scholar
Evfimievski, A., Gehrke, J., Srikant, R.: Limiting privacy breaches in privacy preserving data mining. In: PODS, pp. 211–222 (2003)
Google Scholar
Goldreich, O.: Foundations of Cryptography, vol. I and II. Cambridge University Press, Cambridge (2004)
Book MATH Google Scholar
Kenthapadi, K., Mishra, N., Nissim, K.: Simulatable auditing. In: PODS, pp. 118–127 (2005)
Google Scholar
Mishra, N., Sandler, M.: Privacy via pseudorandom sketches. In: PODS (2006)
Google Scholar
Social Security Administration: Office of Policy Data. Benefits and earnings public-use file (2004), http://www.ssa.gov/policy/docs/microdata/earn/index.html
Sweeney, L.: Guaranteeing anonymity when sharing medical data, the datafly system. In: Proceedings AMIA Annual Fall Symposium (1997)
Google Scholar
Vitter, J.: Random sampling with a reservoir. ACM Transactions on Mathematical Software 11(1), 37–57 (1985)
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, UC Berkeley, Berkeley, CA, 94720, USA
Kamalika Chaudhuri
Computer Science Department, University of Virginia, Charlottesville, VA, 22904, USA
Nina Mishra

Authors

Kamalika Chaudhuri
View author publications
You can also search for this author in PubMed Google Scholar
Nina Mishra
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Microsoft Research,
Cynthia Dwork

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chaudhuri, K., Mishra, N. (2006). When Random Sampling Preserves Privacy. In: Dwork, C. (eds) Advances in Cryptology - CRYPTO 2006. CRYPTO 2006. Lecture Notes in Computer Science, vol 4117. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11818175_12

Download citation

DOI: https://doi.org/10.1007/11818175_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-37432-9
Online ISBN: 978-3-540-37433-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

When Random Sampling Preserves Privacy

Abstract

Chapter PDF

Similar content being viewed by others

Finite Populations Sampling Strategies and Costs Control

No Calculation When Observation Can Be Made

Introduction to Sampling Techniques

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

When Random Sampling Preserves Privacy

Abstract

Chapter PDF

Similar content being viewed by others

Finite Populations Sampling Strategies and Costs Control

No Calculation When Observation Can Be Made

Introduction to Sampling Techniques

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation