kACTUS 2: Privacy Preserving in Classification Tasks Using k-Anonymity

Slava Kisilevich¹⁷,
Yuval Elovici¹⁸,
Bracha Shapira¹⁸ &
…
Lior Rokach¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 5661))

Included in the following conference series:

Annual Workshop on Information Privacy and National Security

639 Accesses

Abstract

k-anonymity is the method used for masking sensitive data which successfully solves the problem of re-linking of data with an external source and makes it difficult to re-identify the individual. Thus k-anonymity works on a set of quasi-identifiers (public sensitive attributes), whose possible availability and linking is anticipated from external dataset, and demands that the released dataset will contain at least k records for every possible quasi-identifier value. Another aspect of k is its capability of maintaining the truthfulness of the released data (unlike other existing methods). This is achieved by generalization, a primary technique in k-anonymity. Generalization consists of generalizing attribute values and substituting them with semantically consistent but less precise values. When the substituted value doesn’t preserve semantic validity the technique is called suppression which is a private case of generalization. We present a hybrid approach called compensation which is based on suppression and swapping for achieving privacy. Since swapping decreases the truthfulness of attribute values there is a tradeoff between level of swapping (information truthfulness) and suppression (information loss) incorporated in our algorithm.

We use k-anonymity to explore the issue of anonymity preservation. Since we do not use generalization, we do not need a priori knowledge of attribute semantics. We investigate data anonymization in the context of classification and use tree properties to satisfy k-anonymization. Our work improves previous approaches by increasing classification accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

An Anonymization Method to Improve Data Utility for Classification

Not a Free Lunch, But a Cheap One: On Classifiers Performance on Anonymized Datasets

Optimization algorithm for k-anonymization of datasets with low information loss

Article 23 October 2017

References

Sweeney, L.: k-anonymity: a model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl.-Based Syst. 10(5), 557–570 (2002)
Article MATH MathSciNet Google Scholar
Samarati, P., Sweeney, L.: Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression
Google Scholar
Sweeney, L.: Achieving k-anonymity privacy protection using generalization and suppression (2002)
Google Scholar
Iyengar, V.S.: Transforming data to satisfy privacy constraints. In: KDD 2002: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 279–288. ACM, New York (2002)
Chapter Google Scholar
Wang, K., Yu, P.S., Chakraborty, S.: Bottom-up generalization: a data mining solution to privacy protection. In: Proc. of the 4th IEEE International Conference on Data Mining (ICDM 2004) (November 2004)
Google Scholar
Fung, B.C.M., Wang, K., Yu, P.S.: Top-down specialization for information and privacy preservation. In: Proc. of the 21st IEEE International Conference on Data Engineering (ICDE 2005), Tokyo, Japan, April 2005, pp. 205–216 (2005)
Google Scholar
LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Incognito: efficient full-domain k-anonymity. In: SIGMOD 2005: Proceedings of the 2005 ACM SIGMOD international conference on Management of data, pp. 49–60. ACM, New York (2005)
Chapter Google Scholar
Friedman, A., Schuster, A., Wolff, R.: k-anonymous decision tree induction. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 151–162. Springer, Heidelberg (2006)
Chapter Google Scholar
Fung, B.C.M., Wang, K.: Anonymizing classification data for privacy preservation. IEEE Trans. on Knowl. and Data Eng. 19(5), 711–725 (2007); Fellow-Philip S. Yu
Article Google Scholar
Friedman, A., Wolff, R., Schuster, A.: Providing k-anonymity in data mining. VLDB J. (2008) (accepted for publication)
Google Scholar
Dalenius, T., Reiss, S.P.: Data-swapping, a Technique for Disclosure Control. Program in Computer Science and Division of Engineering. Brown University (1978)
Google Scholar
Reiss, S.P.: Practical data-swapping: the first steps. ACM Trans. Database Syst. 9(1), 20–37 (1984)
Article MATH Google Scholar
Richard, A., Moore, J.: Controlled data-swapping techniques for masking public use microdata sets. Statistical Research Division Report Series, RR96-04, U.S. Bureau of the Census (1996)
Google Scholar
Fienberg, S.E., McIntyre, J.: Data swapping: Variations on a theme by dalenius and reiss. Technical report, National Institute of Statistical Sciences, Research Triangle Park, NC (2003)
Google Scholar
Kisilevich, S., Elovici, Y., Shapira, B., Rokach, L.: A multi-dimensional suppression for k-anonymity (to appear, 2009)
Google Scholar
Shannon, C.E.: A mathematical theory of communication. Bell Systems Technical Journal 27, 379–423 (1948)
MATH MathSciNet Google Scholar
Newman, C.B.D., Merz, C.: UCI repository of machine learning databases (1998)
Google Scholar
Witten, I.H., Frank, E.: Data mining: practical machine learning tools and techniques with java implementations. SIGMOD Rec. 31(1), 76–77 (2002)
Article Google Scholar
Dietterich, T.G.: Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput. 10(7), 1895–1923 (1998)
Article Google Scholar
Salzberg, S.L.: C4.5: Programs for Machine Learning by J. Ross Quinlan. Machine Learning 16(3), 235–240 (1994)
MathSciNet Google Scholar
Cessie, S.L., Houwelingen, J.C.V.: Ridge estimators in logistic regression. Applied Statistics 41(1), 191–201 (1992)
Article MATH Google Scholar
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer and Information Science, Konstanz University, Box 78, Universitaets Strasse 10, 78457, Konstanz, Germany
Slava Kisilevich
Department of Information System Engineering and Deutsche Telekom Laboratories at Ben-Gurion University, Ben Gurion University, Be’er Sheva, 84105, Israel
Yuval Elovici, Bracha Shapira & Lior Rokach

Authors

Slava Kisilevich
View author publications
You can also search for this author in PubMed Google Scholar
Yuval Elovici
View author publications
You can also search for this author in PubMed Google Scholar
Bracha Shapira
View author publications
You can also search for this author in PubMed Google Scholar
Lior Rokach
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Communication and Information, Rutgers University, New Brunswick, NJ, USA
Cecilia S. Gal , Paul B. Kantor & Michael E. Lesk , &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kisilevich, S., Elovici, Y., Shapira, B., Rokach, L. (2009). kACTUS 2: Privacy Preserving in Classification Tasks Using k-Anonymity. In: Gal, C.S., Kantor, P.B., Lesk, M.E. (eds) Protecting Persons While Protecting the People. ISIPS 2008. Lecture Notes in Computer Science, vol 5661. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-10233-2_7

Download citation

DOI: https://doi.org/10.1007/978-3-642-10233-2_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-10232-5
Online ISBN: 978-3-642-10233-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics