[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

kACTUS 2: Privacy Preserving in Classification Tasks Using k-Anonymity

  • Conference paper
Protecting Persons While Protecting the People (ISIPS 2008)

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 5661))

Included in the following conference series:

Abstract

k-anonymity is the method used for masking sensitive data which successfully solves the problem of re-linking of data with an external source and makes it difficult to re-identify the individual. Thus k-anonymity works on a set of quasi-identifiers (public sensitive attributes), whose possible availability and linking is anticipated from external dataset, and demands that the released dataset will contain at least k records for every possible quasi-identifier value. Another aspect of k is its capability of maintaining the truthfulness of the released data (unlike other existing methods). This is achieved by generalization, a primary technique in k-anonymity. Generalization consists of generalizing attribute values and substituting them with semantically consistent but less precise values. When the substituted value doesn’t preserve semantic validity the technique is called suppression which is a private case of generalization. We present a hybrid approach called compensation which is based on suppression and swapping for achieving privacy. Since swapping decreases the truthfulness of attribute values there is a tradeoff between level of swapping (information truthfulness) and suppression (information loss) incorporated in our algorithm.

We use k-anonymity to explore the issue of anonymity preservation. Since we do not use generalization, we do not need a priori knowledge of attribute semantics. We investigate data anonymization in the context of classification and use tree properties to satisfy k-anonymization. Our work improves previous approaches by increasing classification accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Sweeney, L.: k-anonymity: a model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl.-Based Syst. 10(5), 557–570 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  2. Samarati, P., Sweeney, L.: Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression

    Google Scholar 

  3. Sweeney, L.: Achieving k-anonymity privacy protection using generalization and suppression (2002)

    Google Scholar 

  4. Iyengar, V.S.: Transforming data to satisfy privacy constraints. In: KDD 2002: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 279–288. ACM, New York (2002)

    Chapter  Google Scholar 

  5. Wang, K., Yu, P.S., Chakraborty, S.: Bottom-up generalization: a data mining solution to privacy protection. In: Proc. of the 4th IEEE International Conference on Data Mining (ICDM 2004) (November 2004)

    Google Scholar 

  6. Fung, B.C.M., Wang, K., Yu, P.S.: Top-down specialization for information and privacy preservation. In: Proc. of the 21st IEEE International Conference on Data Engineering (ICDE 2005), Tokyo, Japan, April 2005, pp. 205–216 (2005)

    Google Scholar 

  7. LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Incognito: efficient full-domain k-anonymity. In: SIGMOD 2005: Proceedings of the 2005 ACM SIGMOD international conference on Management of data, pp. 49–60. ACM, New York (2005)

    Chapter  Google Scholar 

  8. Friedman, A., Schuster, A., Wolff, R.: k-anonymous decision tree induction. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 151–162. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  9. Fung, B.C.M., Wang, K.: Anonymizing classification data for privacy preservation. IEEE Trans. on Knowl. and Data Eng. 19(5), 711–725 (2007); Fellow-Philip S. Yu

    Article  Google Scholar 

  10. Friedman, A., Wolff, R., Schuster, A.: Providing k-anonymity in data mining. VLDB J. (2008) (accepted for publication)

    Google Scholar 

  11. Dalenius, T., Reiss, S.P.: Data-swapping, a Technique for Disclosure Control. Program in Computer Science and Division of Engineering. Brown University (1978)

    Google Scholar 

  12. Reiss, S.P.: Practical data-swapping: the first steps. ACM Trans. Database Syst. 9(1), 20–37 (1984)

    Article  MATH  Google Scholar 

  13. Richard, A., Moore, J.: Controlled data-swapping techniques for masking public use microdata sets. Statistical Research Division Report Series, RR96-04, U.S. Bureau of the Census (1996)

    Google Scholar 

  14. Fienberg, S.E., McIntyre, J.: Data swapping: Variations on a theme by dalenius and reiss. Technical report, National Institute of Statistical Sciences, Research Triangle Park, NC (2003)

    Google Scholar 

  15. Kisilevich, S., Elovici, Y., Shapira, B., Rokach, L.: A multi-dimensional suppression for k-anonymity (to appear, 2009)

    Google Scholar 

  16. Shannon, C.E.: A mathematical theory of communication. Bell Systems Technical Journal 27, 379–423 (1948)

    MATH  MathSciNet  Google Scholar 

  17. Newman, C.B.D., Merz, C.: UCI repository of machine learning databases (1998)

    Google Scholar 

  18. Witten, I.H., Frank, E.: Data mining: practical machine learning tools and techniques with java implementations. SIGMOD Rec. 31(1), 76–77 (2002)

    Article  Google Scholar 

  19. Dietterich, T.G.: Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput. 10(7), 1895–1923 (1998)

    Article  Google Scholar 

  20. Salzberg, S.L.: C4.5: Programs for Machine Learning by J. Ross Quinlan. Machine Learning 16(3), 235–240 (1994)

    MathSciNet  Google Scholar 

  21. Cessie, S.L., Houwelingen, J.C.V.: Ridge estimators in logistic regression. Applied Statistics 41(1), 191–201 (1992)

    Article  MATH  Google Scholar 

  22. Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)

    MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kisilevich, S., Elovici, Y., Shapira, B., Rokach, L. (2009). kACTUS 2: Privacy Preserving in Classification Tasks Using k-Anonymity. In: Gal, C.S., Kantor, P.B., Lesk, M.E. (eds) Protecting Persons While Protecting the People. ISIPS 2008. Lecture Notes in Computer Science, vol 5661. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-10233-2_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-10233-2_7

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-10232-5

  • Online ISBN: 978-3-642-10233-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics