[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Optimal and Differentially Private Data Acquisition: : Central and Local Mechanisms

Published: 05 October 2023 Publication History

Abstract

The data for many machine learning tasks are owned by individuals who are typically concerned about privacy. Here, the authors study the optimal design of a data acquisition mechanism aimed at learning the mean of a population. This data acquisition scheme includes the design of a payment rule to compensate users for their privacy loss. It also involves selecting an estimator that minimizes estimation error while simultaneously providing privacy guarantees to users in line with their privacy preferences. The authors formulate this problem as a Bayesian mechanism design problem and propose approximately optimal data acquisition mechanisms.

Abstract

We consider a platform’s problem of collecting data from privacy sensitive users to estimate an underlying parameter of interest. We formulate this question as a Bayesian-optimal mechanism design problem, in which an individual can share their (verifiable) data in exchange for a monetary reward or services, but at the same time has a (private) heterogeneous privacy cost which we quantify using differential privacy. We consider two popular differential privacy settings for providing privacy guarantees for the users: central and local. In both settings, we establish minimax lower bounds for the estimation error and derive (near) optimal estimators for given heterogeneous privacy loss levels for users. Building on this characterization, we pose the mechanism design problem as the optimal selection of an estimator and payments that will elicit truthful reporting of users’ privacy sensitivities. Under a regularity condition on the distribution of privacy sensitivities, we develop efficient algorithmic mechanisms to solve this problem in both privacy settings. Our mechanism in the central setting can be implemented in time O(nlogn) where n is the number of users and our mechanism in the local setting admits a polynomial time approximation scheme (PTAS).
Funding: A. Fallah acknowledges support from the Apple Scholars in AI/ML PhD fellowship.
Supplemental Material: The online appendix is available at https://doi.org/10.1287/opre.2022.0014.

References

[1]
Abernethy JD, Cummings R, Kumar B, Morgenstern J, Taggart S (2019) Learning auctions with robust incentive guarantees. Proceedings of the 33rd International Conference on Neural Information Processing (Curran Associates, Inc., Red Hook, NY), 11587–11597.
[2]
Acemoglu D, Makhdoumi A, Malekian A, Ozdaglar A (2022) Too much data: Prices and inefficiencies in data markets. Amer. Econom. J. Microeconomics 14(4):218–256.
[3]
Acharya J, Sun Z, Zhang H (2021) Differentially private assouad, fano, and le cam. Algorithmic Learning Theory (PMLR, New York), 48–78.
[4]
Akbarpour M, Li S (2020) Credible auctions: A trilemma. Econometrica 88(2):425–467.
[5]
Alaggan M, Gambs S, Kermarrec A-M (2015) Heterogeneous differential privacy. Preprint, submitted April 27, https://arxiv.org/abs/1504.06998.
[6]
Anunrojwong J, Iyer K, Manshadi V (2023) Information design for congested social services: Optimal need-based persuasion. Management Sci. 697(7):3778–3796.
[7]
Apple. Differential privacy overview: Apple. Retrieved May 4, 2023, https://www.apple.com/privacy/docs/Differential_Privacy_Overview.pdf.
[8]
Ashlagi I, Monachou F, Nikzad A (2021) Optimal dynamic allocation: Simplicity through information design. Proc. 22nd ACM Conf. on Econom. and Comput. (Association for Computing Machinery, New York), 101–102.
[9]
Ashlagi I, Braverman M, Kanoria Y, Shi P (2020) Clearing matching markets efficiently: informative signals and match recommendations. Management Sci. 66(5):2163–2193.
[10]
Asoodeh S, Aliakbarpour M, Calmon FP (2021) Local differential privacy is equivalent to contraction of an f-divergence. Proc. IEEE Internat. Sympos. on Inform. Theory (IEEE, Piscataway, NJ), 545–550.
[11]
Babaioff M, Kleinberg R, Paes Leme R (2012) Optimal mechanisms for selling information. Proc. 13th ACM Conf. on Electronic Commerce (Association for Computing Machinery, New York), 92–109.
[12]
Balseiro SR, Besbes O, Castro F (2022) Mechanism design under approximate incentive compatibility. Oper. Res., ePub ahead of print August 24, https://doi.org/10.1287/opre.2022.2359.
[13]
Barber RF, Duchi JC (2014) Privacy and statistical risk: Formalisms and minimax bounds. Preprint, submitted December 15, https://arxiv.org/abs/1412.4451.
[14]
Bergemann D, Bonatti A (2015) Selling cookies. Amer. Econom. J. Microeconom. 7(3):259–294.
[15]
Bergemann D, Bonatti A (2019) Markets for information: An introduction. Annu. Rev. Econom. 11:85–107.
[16]
Bertsekas DP (1997) Nonlinear programming. J. Oper. Res. Soc. 48(3):334–334.
[17]
Besbes O, Mouchtaki O (2023) How big should your data really be? Data-driven newsvendor: Learning one sample at a time. Management Sci. Forthcoming.
[18]
Bimpikis K, Crapis D, Tahbaz-Salehi A (2019) Information sale and competition. Management Sci. 65(6):2646–2664.
[19]
Bimpikis K, Morgenstern I, Saban D (2023) Data tracking under competition. Oper. Res.
[20]
Cai Y, Daskalakis C, Papadimitriou C (2015) Optimum statistical estimation with strategic data sources. Proc. Conf. on Learn. Theory (PMLR, New York), 280–296.
[21]
Candogan O, Drakopoulos K (2020) Optimal signaling of content accuracy: Engagement vs. misinformation. Oper. Res. 68(2):497–515.
[22]
Chen X, Miao S, Wang Y (2023) Differential privacy in personalized pricing with nonparametric demand models. Oper. Res. 71(2):581–602.
[23]
Chen X, Simchi-Levi D, Wang Y (2022) Privacy-preserving dynamic personalized pricing with demand learning. Management Sci. 68(7):4878–4898.
[24]
Chen Y, Zheng S (2019) Prior-free data acquisition for accurate statistical estimation. Proc. ACM Conf. on Econom. and Comput. (ACM, New York), 659–677.
[25]
Chen Y, Immorlica N, Lucier B, Syrgkanis V, Ziani J (2018) Optimal data acquisition for statistical estimation. Proc. ACM Conf. on Econom. and Comput. (ACM, New York), 27–44.
[26]
Cummings R, Feldman V, McMillan A, Talwar K (2021) Mean estimation with user-level privacy under data heterogeneity. Proc. NeurIPS 2021 Workshop Privacy in Machine Learn (Curran Associates, Inc., Red Hook, NY), 29139–29151.
[27]
Cummings R, Elzayn H, Pountourakis E, Gkatzelis V, Ziani J (2023) 2023 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML) (IEEE, Piscataway, NJ), 210–224.
[28]
Cummings R, Ligett K, Roth A, Wu ZS, Ziani J (2015) Accuracy for sale: Aggregating data with a variance constraint. Proc. Conf. on Innovations in Theoretical Comput. Sci. (ACM, New York), 317–324.
[29]
Dekel O, Fischer F, Procaccia AD (2010) Incentive compatible regression learning. J. Comput. System Sci. 76(8):759–777.
[30]
Ding B, Kulkarni J, Yekhanin S (2017) Collecting telemetry data privately. Advances in Neural Information Processing Systems, vol. 30.
[31]
Drakopoulos K, Makhdoumi A (2023) Providing data samples for free. Management Sci. 69(6):3536–3560.
[32]
Duchi JC, Jordan MI, Wainwright MJ (2013) Local privacy and statistical minimax rates. Proc. IEEE 54th Annual Sympos. on Foundations of Comput. Sci. (IEEE, Piscataway, NJ), 429–438.
[33]
Dwork C, McSherry F, Nissim K, Smith A (2006a) Calibrating noise to sensitivity in private data analysis. Proc. Theory of Cryptography Conf. (Springer, Berlin), 265–284.
[34]
Dwork C, Roth A (2014) The algorithmic foundations of differential privacy. Foundations Trends Theoretical Comput. Sci. 9(3–4):211–407.
[35]
Dwork C, Kenthapadi K, McSherry F, Mironov I, Naor M (2006b) Our data, ourselves: Privacy via distributed noise generation. Proc. Annual Internat. Conf. on the Theory and Appl. of Cryptographic Techniques (Springer, Berlin), 486–503.
[36]
Erlingsson Ú, Pihur V, Korolova A (2014) Rappor: Randomized aggregatable privacy-preserving ordinal response. Proc. ACM SIGSAC Conf. on Comput. and Comm. Security (Association for Computing Machinery, New York), 1054–1067.
[37]
Fainmesser IP, Galeotti A, Momot R (2019) Digital privacy. Research Paper No. MOSI-2019-1351, HEC Paris, Paris.
[38]
Foster DJ, Li Z, Lykouris T, Sridharan K, Tardos E (2016) Learning in games: Robustness of fast convergence. Adv. Neural Inform. Processing Systems 29:4734–4742.
[39]
Ghosh A, Roth A (2011) Selling privacy at auction. Proc. 12th ACM Conf. on Electronic Commerce (Association for Computing Machinery, New York), 199–208.
[40]
Ghosh A, Ligett K, Roth A, Schoenebeck G (2014) Buying private data without verification. Proc. 15th ACM Conf. on Econom. and Comput. (Association for Computing Machinery, New York), 931–948.
[41]
Goldfarb A, Tucker C (2011) Online display advertising: Targeting and obtrusiveness. Marketing Sci. 30(3):389–404.
[42]
Gur Y, Macnamara G, Morgenstern I, Saban D (2023) Information disclosure and promotion policy design for platforms. Management Sci.
[43]
Ha AY, Tong S (2008) Contracting and information sharing under supply chain competition. Management Sci. 54(4):701–715.
[44]
Han Y, Liang Z, Wang Y, Zhang J (2021) Generalized linear bandits with local differential privacy. Advances in Neural Information Processing Systems 34:26511–26522.
[45]
Hörner J, Skrzypacz A (2016) Selling information. J. Political Econom. 124(6):1515–1562.
[46]
Hsu J, Gaboardi M, Haeberlen A, Khanna S, Narayan A, Pierce BC, Roth A (2014) Differential privacy: An economic method for choosing epsilon. Proc. IEEE 27th Comput. Security Foundations Sympos. (IEEE, Piscataway, NJ), 398–410.
[47]
Hu M, Momot R, Wang J (2020) Privacy management in service systems. Research Paper No. MOSI-2020-1379, HEC Paris, Paris.
[48]
Ichihashi S (2021) The economics of data externalities. J. Econom. Theory 196:105316.
[49]
Immorlica N, Kanoria Y, Lu J (2020) When does competition and costly information acquisition lead to a deadlock? Preprint, submitted November 12, https://doi.org/10.2139/ssrn.3697165.
[50]
Immorlica N, Kash IA, Lucier B (2021) Buying data over time: Approximately optimal strategies for dynamic data-driven decisions. Proc. 12th Innovations in Theoretical Comput. Sci. Conf. (Schloss Dagstuhl-Leibniz-Zentrum für Informatik, Wadern, Germany).
[51]
Jagabathula S, Mitrofanov D, Vulcano G (2020) Inferring Consideration Sets from Sales Transaction Data (NYU Stern School of Business, New York).
[52]
Jorgensen Z, Yu T, Cormode G (2015) Conservative or liberal? Personalized differential privacy. Proc. IEEE 31st Internat. Conf. on Data Engrg. (IEEE, Piscataway, NJ), 1023–1034.
[53]
Kamath G, Singhal V, Ullman J (2020) Private mean estimation of heavy-tailed distributions. Proc. Conf. on Learn. Theory (PMLR, New York), 2204–2235.
[54]
Kamath G, Li J, Singhal V, Ullman J (2019) Privately learning high-dimensional distributions. Proc. Conf. on Learn. Theory (PMLR, New York), 1853–1902.
[55]
Karwa V, Vadhan S (2017) Finite sample differentially private confidence intervals. Preprint, submitted November 10, https://arxiv.org/abs/1711.03908.
[56]
Lee J, Clifton C (2011) How much is enough? Choosing ε for differential privacy. Proc. 14th Internat. Conf. of Internet Security (Springer, Berlin), 325–340.
[57]
Lei YM, Miao S, Momot R (2020) Privacy-preserving personalized revenue management. Research Paper No. MOSI-2020-1391, HEC Paris, Paris.
[58]
Li L (2002) Information sharing in a supply chain with horizontal competition. Management Sci. 48(9):1196–1212.
[59]
Li L, Zhang H (2008) Confidentiality and information sharing in supply chain coordination. Management Sci. 54(8):1467–1481.
[60]
Liao G, Chen X, Huang J (2018) Social-aware privacy-preserving correlated data collection. Proc. 18th ACM Internat. Sympos. on Mobile Ad Hoc Networking and Comput. (Association for Computing Machinery, New York), 11–20.
[61]
Liao G, Su Y, Ziani J, Wierman A, Huang J (2022) The privacy paradox and optimal bias-variance trade-offs in data acquisition. ACM SIGMETRICS Performance Evaluation Review, vol. 49 (ACM, New York), 6–8.
[62]
Ligett K, Roth A (2012) Take it or leave it: Running a survey when privacy comes at a cost. Proc. Internat. Workshop on Internet and Network Econom. (Springer, Berlin), 378–391.
[63]
Liu Y, Chen Y (2016) Learning to incentivize: Eliciting effort via output agreement. Preprint, submitted April 17, https://arxiv.org/abs/1604.04928.
[64]
Liu Y, Chen Y (2017) Sequential peer prediction: Learning to elicit effort using posted prices. Proc. 31st AAAI Conf. on Artificial Intelligence (AAAI Press, Palo Alto, CA).
[65]
Lobel I, Xiao W (2017) Optimal long-term supply contracts with asymmetric demand information. Oper. Res. 65(5):1275–1284.
[66]
McSherry F, Talwar K (2007) Mechanism design via differential privacy. Proc. 48th Annual IEEE Sympos. on Foundations of Comput. Sci. (IEEE, Piscataway, NJ), 94–103.
[67]
Mehner L, von Voigt SN, Tschorsch F (2021) Toward explaining epsilon: A worst-case study of differential privacy risks. Proc. IEEE Eur. Sympos. on Security and Privacy Workshops (IEEE, Piscataway, NJ), 328–331.
[68]
Meir R, Procaccia AD, Rosenschein JS (2012) Algorithms for strategyproof classification. Artificial Intelligence 186:123–156.
[69]
Montes R, Sand-Zantman W, Valletti T (2019) The value of personal information in online markets with endogenous privacy. Management Sci. 65(3):1342–1362.
[70]
Myerson RB (1981) Optimal auction design. Math. Oper. Res. 6(1):58–73.
[71]
Nissim K, Orlandi C, Smorodinsky R (2012) Privacy-aware mechanism design. Proc. 13th ACM Conf. on Electronic Commerce (Association for Computing Machinery, New York), 774–789.
[72]
Nissim K, Vadhan S, Xiao D (2014) Redrawing the boundaries on purchasing data from privacy-sensitive individuals. Proc.5th Conf. on Innovations in Theoretical Comput. Sci. (Association for Computing Machinery, New York), 411–422.
[73]
Niu B, Chen Y, Wang B, Wang Z, Li F, Cao J (2021) Adapdp: Adaptive personalized differential privacy. Proc. IEEE INFOCOM Conf. on Comput. Comm. (IEEE, Piscataway, NJ), 1–10.
[74]
Pai MM, Roth A (2013) Privacy and mechanism design. ACM SIGecom Exchanges 12(1):8–29.
[75]
Perote J, Perote-Pena J (2003) The impossibility of strategy-proof clustering. Econom. Bull. 4(23):1–9.
[76]
Rosling K (2002) Inventory cost rate functions with nonlinear shortage costs. Oper. Res. 50(6):1007–1017.
[77]
Roth A, Schoenebeck G (2012) Conducting truthful surveys, cheaply. Proc. 13th ACM Conf. on Electronic Commerce (Association for Computing Machinery, New York), 826–843.
[78]
Shang W, Ha AY, Tong S (2015) Information sharing in a supply chain with a common retailer. Management Sci. 62(1):245–263.
[79]
Yu B (1997) Assouad, fano, and le cam. Festschrift for Lucien Le Cam (Springer, Berlin), 423–435.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Operations Research
Operations Research  Volume 72, Issue 3
May-June 2024
452 pages
DOI:10.1287/opre.2024.72.issue-3
Issue’s Table of Contents

Publisher

INFORMS

Linthicum, MD, United States

Publication History

Published: 05 October 2023
Accepted: 28 August 2023
Received: 09 January 2022

Author Tag

  1. Market Analytics and Revenue Management

Author Tags

  1. differential privacy
  2. Bayesian mechanism design
  3. minimax lower bound
  4. optimal data acquisition
  5. local and central differential privacy
  6. data markets

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 17 Jan 2025

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media