Abstract
Microaggregation is an anonymization technique consisting on partitioning the data into clusters no smaller than k elements and then replacing the whole cluster by its prototypical representant. Most of microaggregation techniques work on numerical attributes. However, many data sets are described by heterogeneous types of data, i.e., numerical and categorical attributes. In this paper we propose a new microaggregation method for achieving a compliant k-anonymous masked file for categorical microdata based on generalization. The goal is to build a generalized description satisfied by at least k domain objects and to replace these domain objects by the description. The way to construct that generalization is similar that the one used in growing decision trees. Records that cannot be generalized satisfactorily are discarded, therefore some information is lost. In the experiments we performed we prove that the new approach gives good results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Abril, D., Navarro-Arribas, G., Torra, V.: Supervised learning using a symmetric bilinear form for record linkage. Inf. Fusion 26, 144–153 (2015)
Armengol, E., Plaza, E.: Bottom-up induction of feature terms. Mach. Learn. 41, 259–294 (2000)
Bache, K., Lichman, M.: UCI machine learning repository (2013)
Bayardo, R.J., Agrawal, R.: Data privacy through optimal k-anonymization. In: Proceedings of the 21st International Conference on Data Engineering (ICDE 2005), pp. 217–228 (2005)
Domingo-Ferrer, J., Torra, V.: Ordinal, continuous and heterogeneous \(k\)-anonymity through microaggregation. Data Min. Knowl. Discov. 11(2), 195–212 (2005)
Duncan, G.T., Elliot, M., Salazar, J.J.: Statistical Confidentiality. Springer, New York (2011)
Dwork, C.: Differential privacy. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006. LNCS, vol. 4052, pp. 1–12. Springer, Heidelberg (2006)
Guo, L., Wu, X.: Privacy preserving categorical data analysis with unknown distortion parameters. Trans. Data Priv. 2(3), 185–205 (2009)
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. SIGKDD Explor. Newsl. 11, 10–18 (2009)
Hundepool, A., Domingo-Ferrer, J., Franconi, L., Giessing, S., Nordholt, E.S., Spicer, K., de Wolf, P.-P.: Statistical Disclosure Control. Wiley, New York (2012)
Iyengar, V.S.: Transforming data to satisfy privacy constraints. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2002, pp. 279–288. ACM, New York (2002)
LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Multidimensional \(k\)-anonymity, Technical report 1521, University of Wisconsin (2005)
LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Incognito: efficient full-domain \(k\)-anonymity, SIGMOD 2005 (2005)
Li, X.-B., Sarkar, S.: Privacy protection in data mining: a perturbation approach for categorical data. Inf. Syst. Res. 17(3), 254–270 (2004)
Marés, J., Torra, V.: Clustering-based categorical data protection. In: Domingo-Ferrer, J., Tinnirello, I. (eds.) PSD 2012. LNCS, vol. 7556, pp. 78–89. Springer, Heidelberg (2012)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc., San Francisco (1993)
Samarati, P., Sweeney, L.: Protecting privacy when disclosing information: \(k\)-anonymity and its enforcement through generalization and suppression, SRI International Technical report (1998)
Tassa, T., Mazza, A., Gionis, A.: k-concealment: an alternative model of k-type anonymity. Trans. Data Priv. 5(1), 189–222 (2012)
Torra, V., Stokes, K.: A formalization of record linkage and its application to data protection. Int. J. Uncertain. Fuzziness Knowl. Based Syst. 20, 907–919 (2012)
Wang, K.: Bottom-up generalization: a data mining solution to privacy protection. Proc. ICDM 2004, 249–256 (2004)
Winkler, W.E.: Re-identification methods for masked microdata. In: Domingo-Ferrer, J., Torra, V. (eds.) PSD 2004. LNCS, vol. 3050, pp. 216–230. Springer, Heidelberg (2004)
Acknowledgments
This research is partially funded by the Spanish MICINN projects COGNITIO (TIN-2012-38450-C03-03), EdeTRI (TIN2012-39348-C02-01) and COPRIVACY (TIN2011-27076-C03-03), the grant 2009-SGR-1434 from the Generalitat de Catalunya, and the European Project DwB (Grant Agreement Number 262608).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Armengol, E., Torra, V. (2015). Generalization-Based k-Anonymization. In: Torra, V., Narukawa, T. (eds) Modeling Decisions for Artificial Intelligence. MDAI 2015. Lecture Notes in Computer Science(), vol 9321. Springer, Cham. https://doi.org/10.1007/978-3-319-23240-9_17
Download citation
DOI: https://doi.org/10.1007/978-3-319-23240-9_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23239-3
Online ISBN: 978-3-319-23240-9
eBook Packages: Computer ScienceComputer Science (R0)