NSF Org: |
CNS Division Of Computer and Network Systems |
Recipient: |
|
Initial Amendment Date: | September 19, 2012 |
Latest Amendment Date: | June 12, 2017 |
Award Number: | 1237235 |
Award Instrument: | Continuing Grant |
Program Manager: |
Nina Amla
namla@nsf.gov (703)292-7991 CNS Division Of Computer and Network Systems CSE Directorate for Computer and Information Science and Engineering |
Start Date: | October 1, 2012 |
End Date: | March 31, 2018 (Estimated) |
Total Intended Award Amount: | $4,863,840.00 |
Total Awarded Amount to Date: | $6,048,707.00 |
Funds Obligated to Date: |
FY 2013 = $1,199,482.00 FY 2014 = $1,287,304.00 FY 2015 = $1,323,029.00 FY 2016 = $1,052,767.00 FY 2017 = $56,000.00 |
History of Investigator: |
|
Recipient Sponsored Research Office: |
1033 MASSACHUSETTS AVE STE 3 CAMBRIDGE MA US 02138-5366 (617)495-5501 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
33 Oxford Street Cambridge MA US 02138-2933 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): |
Special Projects - CNS, Secure &Trustworthy Cyberspace |
Primary Program Source: |
01001516DB NSF RESEARCH & RELATED ACTIVIT 01001314DB NSF RESEARCH & RELATED ACTIVIT 01001415DB NSF RESEARCH & RELATED ACTIVIT 01001718DB NSF RESEARCH & RELATED ACTIVIT 01001617DB NSF RESEARCH & RELATED ACTIVIT |
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
Information technology, advances in statistical computing, and the deluge of data available through the Internet are transforming computational social science. However, a major challenge is maintaining the privacy of human subjects. This project is a broad, multidisciplinary effort to help enable the collection, analysis, and sharing of sensitive data while providing privacy for individual subjects. Bringing together computer science, social science, statistics, and law, the investigators seek to refine and develop definitions and measures of privacy and data utility, and design an array of technological, legal, and policy tools for dealing with sensitive data. In addition to contributing to research infrastructure around the world, the ideas developed in this project will benefit society more broadly as it grapples with data privacy issues in many other domains, including public health and electronic commerce.
This project will define and measure privacy in both mathematical and legal terms, and explore alternate definitions of privacy that may be more general or more practical. The project will study variants of differential privacy and develop new theoretical results for use in contexts where it is currently inappropriate or impractical. The research will provide a better understanding of the practical performance and usability of a variety of algorithms for analyzing and sharing privacy-sensitive data. The project will develop secure implementations of these algorithms and legal instruments, which will be made publicly available and used to enable wider access to privacy-sensitive data sets at the Harvard Institute for Quantitative Social Science's Dataverse Network.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
Computing technology and vast new sources of data are transforming the social sciences. With the ability to collect and analyze massive amounts of data on human behavior and interactions, social scientists can hope to uncover many more phenomena, with greater detail and confidence, than allowed by traditional means such as surveys and interviews. In addition to advancing the state of knowledge, the rich analysis of behavioral data can enable companies to better serve their customers, and governments their citizenry. However, a major challenge for computational social science is maintaining the privacy of human subjects. At present, an individual social science researcher is left to devise her own privacy shields, such as stripping the dataset of “personally identifiable information” (PII). However, such privacy shields are often ineffective and provide limited or no real-world privacy protection. Indeed, there have been a number of cases where the individuals in a supposedly anonymized dataset have been re-identified.
This project was a broad, multidisciplinary effort to help enable the collection, analysis, and sharing of sensitive research data while providing strong privacy protections for individual research subjects. Bringing together computer science, social science, statistics, and law, the investigators refined and developed definitions and measures of privacy and data utility, and designed an integrated array of technological, legal, and policy tools for dealing with sensitive research data.
In the year after the project’s completion, some of these tools will be deployed in digital data repositories around the world, offering the potential to have a large impact on many fields of human subject research, including social science, medicine, public health, and economics. Specifically, the project’s “DataTags” tool will enable an owner of a privacy-sensitive dataset to select a policy for the repository’s handling and sharing of the dataset, informed by relevant laws and best practices; the project’s “Robot Lawyers” tool will generate customized Data Use Agreements for the repository to use when sharing the dataset with other researchers; and the project’s “PSI” tool will provide a much wider set of users statistical access to this dataset with the strong privacy protections of “differential privacy.”
The intellectual contributions of the project include extensive mathematical work delineating the fundamental tradeoffs between privacy protection and statistical utility, an understanding of how mathematical and legal conceptions of privacy can be related to each other, and methods for automating reasoning about legal privacy requirements in order to generate custom data-sharing licenses. Many of the project’s contributions required extensive interdisciplinary collaboration, resulting in papers whose authors and publication venues span computer science, law, statistics, and social science. A number of the advances in the project relate to “differential privacy,” a powerful mathematical framework for protecting privacy when performing statistical analysis of sensitive data, which emerged from the theoretical computer science literature and has recently found large-scale practical deployments by Apple, Google, and the US Census Bureau. The “PSI” tool produced in the project is unique in enabling differential privacy to be used effectively by practicing social science researchers, with no specialized expertise in privacy, computer science, or statistics.
In addition to bringing data privacy solutions to practice in the sharing of research data, the project achieves broader impacts by exposing a multidisciplinary understanding of data privacy to a wide range of audiences. It has achieved this through organizing several workshops and symposia (including a public symposium with over 700 registrants), training many students in multidisciplinary research (including over 125 research assistants from computer science, law, social science, and statistics), sharing extensive policy recommendations and best practices with policymakers, practitioners, and the general public, and producing numerous open-access pedagogical materials.
Moreover, the ideas developed in this project will benefit society more broadly as it grapples with data privacy issues in many other domains, including national security, electronic commerce, public health, and government operations and accountability. Indeed, this project spawned a number of offshoot efforts on data privacy in these various domains, including helping the US Census Bureau and other government agencies adopt more modern privacy methods starting with the 2020 Decennial Census, an exploration of better methods for companies to analyze their user data in a privacy-protective manner, and the development of a new model for industry-academic partnerships to carry out social science research on sensitive corporate data.
Last Modified: 07/31/2018
Modified by: Salil P Vadhan
Please report errors in award information by writing to: awardsearch@nsf.gov.