[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2110363.2110422acmconferencesArticle/Chapter ViewAbstractPublication PagesihiConference Proceedingsconference-collections
research-article

Uniqueness and how it impacts privacy in health-related social science datasets

Published: 28 January 2012 Publication History

Abstract

Social scientists, like those performing research at the Kinsey Institute for Research in Sex, Gender and Reproduction, may use surveys to gather large amounts of sensitive data. Unlike purely medical-related datasets, these social science datasets tend to be sparse and high-dimensional, which presents opportunities to characterize participants in the dataset in unique ways. These unique characterizations may enable individuals to be linked to external data in ways that have not been previously considered. Therefore, traditional approaches to de-identifying data, such as fulfilling HIPAA requirements, may not be sufficient for preventing the re-identification of participants in large social science datasets.
In this paper, we evaluate the statistical characteristics of two high-dimensional social science datasets to better understand how unique features impact privacy. We apply a class of statistical de-anonymization attacks in an attempt to achieve theoretical re-identification of participants. We assume that an attacker has exact knowledge of a subset of attribute values for a particular record, and wants to link this subset of data to the actual record to discover the remaining content. We show that although 98% of the records within the dataset are unique given any three attributes, re-identification of the records may not be easily achieved. We attribute limited re-identification to the inherent similarity in the human behavior that the scientists measure. This work is the first to characterize re-identification risks in high-dimensional data that is collected in surveys designed to capture the various behaviors and experiences of groups of individuals.

References

[1]
M. Barbaro and T. Zeller. A face is exposed for aol searcher no. 4417749, Aug 9 2006.
[2]
L. A. Clark and D. Watson. Constructing validity: Basic issues in objective scale development. Psychological Assessment, 7, 1995.
[3]
C. Dwork. Differential privacy. In M. Bugliesi, B. Preneel, V. Sassone, and I. Wegener, editors, Automata, Languages and Programming, volume 4052 of Lecture Notes in Computer Science, pages 1--12. Springer Berlin / Heidelberg, 2006.
[4]
B. Malin. Re-identification of familial database records. In AMAI Annual Symposium Proceedings 2006, pages 524--528, 2006.
[5]
B. Malin and L. Sweeney. Re-identification of DNA through an automated linkage process. In Proceedings of the American Medical Informatics Association Fall Symposium, pages 423--427, 2001.
[6]
B. Malin and L. Sweeney. How (not) to protect genomic data privacy in a distributed network: using trail re-identification to evaluate and design anonymity protection systems. Journal of Biomedical Informatics, 37:179--192, 2004.
[7]
A. Narayanan and V. Shmatikov. Robust de-anonymization of large sparse datasets. IEE Symposium on Security and Privacy, 0:111--125, 2008.
[8]
A. C. Solomon, R. Hill, and E. Janssen. Poster: Privacy and de-identification in high-dimensional social science datasets, 2011. Presented at IEEE Security and Privacy 2011.
[9]
L. Sweeney. k-anonymity: a model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl.-Based Syst., 10:557--570, October 2002.
[10]
Y. Xiao, L. Xiong, and C. Yuan. Differentially private data release through multidimensional partitioning. In W. Jonker and M. Petkovic, editors, Secure Data Management, volume 6358 of Lecture Notes in Computer Science, pages 150--168. Springer Berlin / Heidelberg, 2010.
[11]
E. Zheleva and L. Getoor. To join or not to join: the illusion of privacy in social networks with mixed public and private user profiles. In Proceedings of the 18th international conference on World wide web, pages 531--540. ACM, 2009.

Cited By

View all
  • (2021)Protect and ProjectProceedings of the ACM on Human-Computer Interaction10.1145/34492335:CSCW1(1-19)Online publication date: 22-Apr-2021
  • (2019)Local Standards for Anonymization Practices in Health, Wellness, Accessibility, and Aging Research at CHIProceedings of the 2019 CHI Conference on Human Factors in Computing Systems10.1145/3290605.3300692(1-14)Online publication date: 2-May-2019
  • (2018)Driver Identification via Brake Pedal Signals — A Replication and Advancement of Existing Techniques2018 21st International Conference on Intelligent Transportation Systems (ITSC)10.1109/ITSC.2018.8569510(1415-1420)Online publication date: Nov-2018
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
IHI '12: Proceedings of the 2nd ACM SIGHIT International Health Informatics Symposium
January 2012
914 pages
ISBN:9781450307819
DOI:10.1145/2110363
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 January 2012

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. privacy
  2. re-identification
  3. similarity
  4. uniqueness

Qualifiers

  • Research-article

Conference

IHI '12
Sponsor:
IHI '12: ACM International Health Informatics Symposium
January 28 - 30, 2012
Florida, Miami, USA

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)11
  • Downloads (Last 6 weeks)1
Reflects downloads up to 04 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2021)Protect and ProjectProceedings of the ACM on Human-Computer Interaction10.1145/34492335:CSCW1(1-19)Online publication date: 22-Apr-2021
  • (2019)Local Standards for Anonymization Practices in Health, Wellness, Accessibility, and Aging Research at CHIProceedings of the 2019 CHI Conference on Human Factors in Computing Systems10.1145/3290605.3300692(1-14)Online publication date: 2-May-2019
  • (2018)Driver Identification via Brake Pedal Signals — A Replication and Advancement of Existing Techniques2018 21st International Conference on Intelligent Transportation Systems (ITSC)10.1109/ITSC.2018.8569510(1415-1420)Online publication date: Nov-2018
  • (2016)Scaling Up Scientific Discovery in Sleep Medicine: The National Sleep Research ResourceSleep10.5665/sleep.577439:5(1151-1164)Online publication date: 1-May-2016
  • (2015)Unique in the shopping mall: On the reidentifiability of credit card metadataScience10.1126/science.1256297347:6221(536-539)Online publication date: 30-Jan-2015
  • (2015)Evaluating the Utility of Differential Privacy: A Use Case Study of a Behavioral Science DatasetMedical Data Privacy Handbook10.1007/978-3-319-23633-9_4(59-82)Online publication date: 2015
  • (2014)Methodological considerations from a Kinsey Institute mixed methods pilot projectInternational Journal of Multiple Research Approaches10.5172/mra.2013.7.2.1787:2(178-188)Online publication date: 17-Dec-2014
  • (2014)openPDS: Protecting the Privacy of Metadata through SafeAnswersPLoS ONE10.1371/journal.pone.00987909:7(e98790)Online publication date: 9-Jul-2014
  • (2014)A Quantitative Approach for Evaluating the Utility of a Differentially Private Behavioral Science DatasetProceedings of the 2014 IEEE International Conference on Healthcare Informatics10.1109/ICHI.2014.45(276-284)Online publication date: 15-Sep-2014
  • (2013)Ethical and practical challenges to studying patients who opt out of large-scale biorepository researchJournal of the American Medical Informatics Association10.1136/amiajnl-2013-00193720:e2(e221-e225)Online publication date: 1-Dec-2013

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media