[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article
Open access

Exploiting Contextual Information in Attacking Set-Generalized Transactions

Published: 18 September 2017 Publication History

Abstract

Transactions are records that contain a set of items about individuals. For example, items browsed by a customer when shopping online form a transaction. Today, many activities are carried out on the Internet, resulting in a large amount of transaction data being collected. Such data are often shared and analyzed to improve business and services, but they also contain private information about individuals that must be protected. Techniques have been proposed to sanitize transaction data before their release, and set-based generalization is one such method. In this article, we study how well set-based generalization can protect transactions. We propose methods to attack set-generalized transactions by exploiting contextual information that is available within the released data. Our results show that set-based generalization may not provide adequate protection for transactions, and up to 70% of the items added into the transactions during generalization to obfuscate original data can be detected by our methods with a precision over 80%.

References

[1]
M. Barbaro and T. Zeller. 2006. A face is exposed for AOL searcher no. 4417749. New York Times (2006).
[2]
M. Bawa, R. J. Bayardo Jr, and R. Agrawal. 2003. Privacy-preserving indexing of documents on the network. In Proceedings of the 29th International Conference on VLDB. 922--933.
[3]
R. L. Cilibrasi and P. M. B. Vitányi. 2007. The google similarity distance. IEEE Trans. Knowl. Data Eng. 19, 3 (2007), 370--383.
[4]
A. Datta, D. Sharma, and A. Sinha. 2012. Provable de-anonymization of large datasets with sparse dimensions. In Principles of Security and Trust. 229--248.
[5]
S. Deeerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman. 1990. Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41, 6 (1990), 391--407.
[6]
C. R. Giannella, K. Liu, and H. Kargupta. 2013. Breaching euclidean distance-preserving data perturbation using few known inputs. Data Knowl. Eng. 84 (2013), 93--110.
[7]
P. Golle. 2006. Revisiting the uniqueness of simple demographics in the US population. In Proceedings of the 5th ACM Workshop on Privacy in Electronic Society. 77--80.
[8]
N. Li, T. Li, and S. Venkatasubramanian. 2007. t-Closeness: Privacy beyond k-anonymity and l-diversity. In Proceedings of the IEEE 23rd International Conference on Data Engineering. 106--115.
[9]
G. Loukides, A. Gkoulalas-Divanis, and B. Malin. 2011. COAT: COnstraint-based anonymization of transactions. Knowl. Inf. Syst. 28, 2 (2011), 251--282.
[10]
G. Loukides, A. Gkoulalas-Divanis, and J. Shao. 2013. Efficient and flexible anonymization of transaction data. Knowl. Inf. Syst. 36, 1 (2013), 153--210.
[11]
A. Machanavajjhala, J. Gehrke, D. Kifer, and M. Venkitasubramaniam. 2007. -Diversity: Privacy beyond -anonymity. ACM Trans. Knowl. Discov. Data 1, 1 (2007).
[12]
D. J. Martin, D. Kifer, A. Machanavajjhala, and J. Gehrke. 2007. Worse-case background knowledge for privacy-preserving data publishing. In Proceedings of the 23rd International Conference on Data Engineering (ICDE’07).
[13]
A. Narayanan and V. Shmatikov. 2008. Robust de-anonymization of large sparse datasets. In Proceedings of the IEEE Symposium on Security and Privacy. 111--125.
[14]
A. Nenkova and K. McKeown. 2012. A survey of text summarization techniques. In Mining Text Data, C. C. Aggarwal and C. Zhai (Eds.). 43--76.
[15]
H. Ong and J. Shao. 2014. De-anonymising set-generalised transactions based on semantic relationships. In Proceedings of the 1st International Conferenc on Future Data and Security Engineering. 107--121.
[16]
D. Sánchez, M. Batet, and A. Viejo. 2013. Detecting term relationships to improve textual document sanitization. In Proceedings of Pacific Asia Conference on Information Systems. 105--119.
[17]
L. Sweeney. 2002. k-Anonymity: A model for protecting privacy. Int. J. Uncert. Fuzz. Knowl.-Based Syst. 10, 5 (2002), 557--570.
[18]
Y. Tang and L. Liu. 2015. Privacy-preserving multi-keyword search in information networks. IEEE Trans. Knowl. Data Eng. 27, 4 (2015), 2424--2437.
[19]
Y. Tang, L. Liu, A. Iyengar, and K. Lee amd Q. Zhang. 2014. e-PPI: Locator service in information networks with personalized privacy preservation. In Proceedings of IEEE 34th International Conference on Distributed Computing Systems (ICDCS’14). 186--197.
[20]
M. Terrovitis, J. Liagouris, N. Mamoulis, and S. Skiadopoulos. 2012. Privacy preservation by disassociation. Proceedings of the VLDB Endowment (PVLDB’12) 5, 10 (2012), 944--955.
[21]
M. Terrovitis, N. Mamoulis, and P. Kalnis. 2008. Privacy-preserving anonymization of set-valued data. In Proceedings of International Conference on Very Large Data Bases (VLDB’08). 115--125.
[22]
R. C. Wong, A. W. Fu, K. Wang, and J. Pei. 2007. Minimality attack in privacy preserving data publishing. In Proceedings of the 33rd International Conference on VLDB. 543--554.
[23]
Z. Wu and M. Palmer. 1994. Verbs semantics and lexical selection. In Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics. 133--138.
[24]
X. Xiao, Y. Tao, and N. Koudas. 2010. Transparent anonymization: Thwarting adversaries who know the algorithm. ACM Trans. Database Syst. 35, 2 (2010).

Cited By

View all

Index Terms

  1. Exploiting Contextual Information in Attacking Set-Generalized Transactions

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Internet Technology
    ACM Transactions on Internet Technology  Volume 17, Issue 4
    Special Issue on Provenance of Online Data and Regular Papers
    November 2017
    165 pages
    ISSN:1533-5399
    EISSN:1557-6051
    DOI:10.1145/3133307
    • Editor:
    • Munindar P. Singh
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 18 September 2017
    Accepted: 01 May 2017
    Revised: 01 December 2016
    Received: 01 October 2015
    Published in TOIT Volume 17, Issue 4

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Privacy
    2. de-anonymization
    3. semantic relationship
    4. transaction data

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)51
    • Downloads (Last 6 weeks)9
    Reflects downloads up to 02 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Full Access

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media