NSF Award Search: Award # 1237235 - TWC: Frontier: Privacy Tools for Sharing Research Data

Award Abstract # 1237235

TWC: Frontier: Privacy Tools for Sharing Research Data

NSF Org:	CNS Division Of Computer and Network Systems
Recipient:	PRESIDENT AND FELLOWS OF HARVARD COLLEGE
Initial Amendment Date:	September 19, 2012
Latest Amendment Date:	June 12, 2017
Award Number:	1237235
Award Instrument:	Continuing Grant
Program Manager:	Nina Amla namla@nsf.gov (703)292-7991 CNS Division Of Computer and Network Systems CSE Directorate for Computer and Information Science and Engineering
Start Date:	October 1, 2012
End Date:	March 31, 2018 (Estimated)
Total Intended Award Amount:	$4,863,840.00
Total Awarded Amount to Date:	$6,048,707.00
Funds Obligated to Date:	FY 2012 = $1,130,125.00 FY 2013 = $1,199,482.00 FY 2014 = $1,287,304.00 FY 2015 = $1,323,029.00 FY 2016 = $1,052,767.00 FY 2017 = $56,000.00
History of Investigator:	Salil Vadhan (Principal Investigator) salil_vadhan@harvard.edu Gary King (Co-Principal Investigator) Latanya Sweeney (Co-Principal Investigator) Edoardo Airoldi (Co-Principal Investigator) Urs Gasser (Co-Principal Investigator) Phillip Malone (Former Co-Principal Investigator)
Recipient Sponsored Research Office:	Harvard University 1033 MASSACHUSETTS AVE STE 3 CAMBRIDGE MA US 02138-5366 (617)495-5501
Sponsor Congressional District:	05
Primary Place of Performance:	Harvard University 33 Oxford Street Cambridge MA US 02138-2933
Primary Place of Performance Congressional District:	05
Unique Entity Identifier (UEI):	LN53LCFJFL45
Parent UEI:
NSF Program(s):	Special Projects - CNS, Secure &Trustworthy Cyberspace
Primary Program Source:	01001213DB NSF RESEARCH & RELATED ACTIVIT 01001516DB NSF RESEARCH & RELATED ACTIVIT 01001314DB NSF RESEARCH & RELATED ACTIVIT 01001415DB NSF RESEARCH & RELATED ACTIVIT 01001718DB NSF RESEARCH & RELATED ACTIVIT 01001617DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s):	9178, 8225, 9251, 8087, 025Z, 7434
Program Element Code(s):	171400, 806000
Award Agency Code:	4900
Fund Agency Code:	4900
Assistance Listing Number(s):	47.070

ABSTRACT

Information technology, advances in statistical computing, and the deluge of data available through the Internet are transforming computational social science. However, a major challenge is maintaining the privacy of human subjects. This project is a broad, multidisciplinary effort to help enable the collection, analysis, and sharing of sensitive data while providing privacy for individual subjects. Bringing together computer science, social science, statistics, and law, the investigators seek to refine and develop definitions and measures of privacy and data utility, and design an array of technological, legal, and policy tools for dealing with sensitive data. In addition to contributing to research infrastructure around the world, the ideas developed in this project will benefit society more broadly as it grapples with data privacy issues in many other domains, including public health and electronic commerce.

This project will define and measure privacy in both mathematical and legal terms, and explore alternate definitions of privacy that may be more general or more practical. The project will study variants of differential privacy and develop new theoretical results for use in contexts where it is currently inappropriate or impractical. The research will provide a better understanding of the practical performance and usability of a variety of algorithms for analyzing and sharing privacy-sensitive data. The project will develop secure implementations of these algorithms and legal instruments, which will be made publicly available and used to enable wider access to privacy-sensitive data sets at the Harvard Institute for Quantitative Social Science's Dataverse Network.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

(Showing: 1 - 10 of 68)

Show All

P Toulis, EM Airoldi. "Scalable estimation strategies based on stochastic approximations: Classical results and new insights" Statistics and Computing , v.Volume , 2015 , p.781 10.1007/s11222-015-9560-y

Micah Altman, Alexandra Wood, and Effy Vayena "A Harm-Reduction Framework for Algorithmic Fairness" IEEE Symposium on Security & Privacy , 2018

M. Gaboardi, H. woo Lim, R. Rogers, and S. Vadhan "Differentially Private Chi-Squared Hypothesis Testing: Goodness of Fit and Independence Testing" ICML , 2016

M. Bun and Thaler, J. "Dual Lower Bounds for Approximate Degree and Markov-Bernstein Inequalities" Automata, Languages, and Programming: 40th International Colloquium, ICALP 2013, Riga, Latvia, July 8-12, 2013, Proceedings, Part I; , v.7965 , 2013 , p.303-314 http://dx.doi.org/10.1007/978-3-642-39206-1_26

Mark Bun; Thomas Steinke; and Jonathan Ullman. "Make Up Your Mind: The Price of Online Queries in Differential Privacy" Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA). , 2017

Yiling Chen, Stephen Chong, Ian A. Kash, Tal Moran, and Salil Vadhan "Truthful Mechanisms for Agents That Value Privacy." ACM Transactions on Economics and Computation , 2016

Yiling Chen, Stephen Chong, Ian A. Kash, Tal Moran, and Salil P. Vadhan "Truthful Mechanisms for Agents that Value Privacy" ACM Transactions on Economics and Computation , v.4 , 2016

Y. Chen, O. Sheffet, and S. Vadhan, "?Privacy Games"" 10th Conference on Web and Internet Economics (WINE), , 2014

Y. Chen, K. Nissim, and B. Waggoner "Fair Information Sharing for Treasure Hunting" in Association for the Advancement of Artificial Intelligence (AAAI) , 2015

Y. Chen, Chong, S., Kash, I. A., Moran, T., and Vadhan, S. "Truthful mechanisms for agents that value privacy" Proceedings of the fourteenth ACM conference on Electronic commerce , v.EC '13 , 2013 , p.215-232 http://dx.doi.org/10.1145/2482540.2482549

Xianrui Meng, Seny Kamara, Kobbi Nissim, and George Kollios "Grecs: Graph encryption for approximate shortest distance queries." The 22nd ACM Conference on Computer and Communications Security (CCS ?15) , 2015 , p.450 978-1-4503-3832-5

(Showing: 1 - 10 of 68)

Show All

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

Computing technology and vast new sources of data are transforming the social sciences. With the ability to collect and analyze massive amounts of data on human behavior and interactions, social scientists can hope to uncover many more phenomena, with greater detail and confidence, than allowed by traditional means such as surveys and interviews. In addition to advancing the state of knowledge, the rich analysis of behavioral data can enable companies to better serve their customers, and governments their citizenry. However, a major challenge for computational social science is maintaining the privacy of human subjects. At present, an individual social science researcher is left to devise her own privacy shields, such as stripping the dataset of “personally identifiable information” (PII). However, such privacy shields are often ineffective and provide limited or no real-world privacy protection. Indeed, there have been a number of cases where the individuals in a supposedly anonymized dataset have been re-identified.

This project was a broad, multidisciplinary effort to help enable the collection, analysis, and sharing of sensitive research data while providing strong privacy protections for individual research subjects. Bringing together computer science, social science, statistics, and law, the investigators refined and developed definitions and measures of privacy and data utility, and designed an integrated array of technological, legal, and policy tools for dealing with sensitive research data.

In the year after the project’s completion, some of these tools will be deployed in digital data repositories around the world, offering the potential to have a large impact on many fields of human subject research, including social science, medicine, public health, and economics. Specifically, the project’s “DataTags” tool will enable an owner of a privacy-sensitive dataset to select a policy for the repository’s handling and sharing of the dataset, informed by relevant laws and best practices; the project’s “Robot Lawyers” tool will generate customized Data Use Agreements for the repository to use when sharing the dataset with other researchers; and the project’s “PSI” tool will provide a much wider set of users statistical access to this dataset with the strong privacy protections of “differential privacy.”

The intellectual contributions of the project include extensive mathematical work delineating the fundamental tradeoffs between privacy protection and statistical utility, an understanding of how mathematical and legal conceptions of privacy can be related to each other, and methods for automating reasoning about legal privacy requirements in order to generate custom data-sharing licenses. Many of the project’s contributions required extensive interdisciplinary collaboration, resulting in papers whose authors and publication venues span computer science, law, statistics, and social science. A number of the advances in the project relate to “differential privacy,” a powerful mathematical framework for protecting privacy when performing statistical analysis of sensitive data, which emerged from the theoretical computer science literature and has recently found large-scale practical deployments by Apple, Google, and the US Census Bureau. The “PSI” tool produced in the project is unique in enabling differential privacy to be used effectively by practicing social science researchers, with no specialized expertise in privacy, computer science, or statistics.

In addition to bringing data privacy solutions to practice in the sharing of research data, the project achieves broader impacts by exposing a multidisciplinary understanding of data privacy to a wide range of audiences. It has achieved this through organizing several workshops and symposia (including a public symposium with over 700 registrants), training many students in multidisciplinary research (including over 125 research assistants from computer science, law, social science, and statistics), sharing extensive policy recommendations and best practices with policymakers, practitioners, and the general public, and producing numerous open-access pedagogical materials.

Moreover, the ideas developed in this project will benefit society more broadly as it grapples with data privacy issues in many other domains, including national security, electronic commerce, public health, and government operations and accountability. Indeed, this project spawned a number of offshoot efforts on data privacy in these various domains, including helping the US Census Bureau and other government agencies adopt more modern privacy methods starting with the 2020 Decennial Census, an exploration of better methods for companies to analyze their user data in a privacy-protective manner, and the development of a new model for industry-academic partnerships to carry out social science research on sensitive corporate data.

Last Modified: 07/31/2018
Modified by: Salil P Vadhan

Please report errors in award information by writing to: awardsearch@nsf.gov.