Abstract
Microaggregation is one of the most commonly employed microdata protection methods. The basic idea of microaggregation is to anonymize data by aggregating original records into small groups of at least \(k\) elements and, therefore, preserving \(k\)-anonymity. Usually, in order to avoid information loss, when records are large, i.e., the number of attributes of the data set is large, this data set is split into smaller blocks of attributes and microaggregation is applied to each block, successively and independently. This is called multivariate microaggregation. By using this technique, the information loss after collapsing several values to the centroid of their group is reduced. Unfortunately, with multivariate microaggregation, the \(k\)-anonymity property is lost when at least two attributes of different blocks are known by the intruder, which might be the usual case.
In this work, we present a new microaggregation method called one dimension microaggregation (\(Mic1D-k\)). With \(Mic1D-k\), the problem of \(k\)-anonymity loss is mitigated by mixing all the values in the original microdata file into a single non-attributed data set using a set of simple pre-processing steps and then, microaggregating all the mixed values together. Our experiments show that, using real data, our proposal obtains lower disclosure risk than previous approaches whereas the information loss is preserved.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Adam, N.R., Wortmann, J.C.: Security-control for statistical databases: a comparative study. ACM Comput. Surv. 21, 515–556 (1989)
Aggarwal, C.: On \(k\)-anonymity and the curse of dimensionality. In: Proceedings of the 31st International Conference on Very Large Databases, pp. 901–909 (2005)
Aggarwal, G., Feder, T., Kenthapadi, K., Khuller, S., Panigrahy, R., Thomas, D., Zhu, A.: Achieving anonymity via clustering. In: Proceedings of the 25th ACM Symposium on Principles of Databases Systems, pp. 153–162 (2006)
CASC: Computational Aspects of Statistical Confidentiality, European Project IST-2000-25069, http://neon.vb.cbs.nl/casc
Domingo-Ferrer, J., Torra, V.: Disclosure control methods and information loss for microdata, pp. 91–110 of [8] (2001)
Domingo-Ferrer, J., Torra, V.: A quantitative comparison of disclosure control methods for microdata, pp. 111–133 of [8] (2001)
Domingo-Ferrer, J., Mateo-Sanz, J.M.: Practical data-oriented microaggregation for statistical disclosure control. IEEE Trans. Knowl. Data Eng. 14(1), 189–201 (2002)
Doyle, P., Lane, J., Theeuwes, J., Zayatz, L. (eds.): Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies. Elsevier Science, New York (2001)
Felso, F., Theeuwes, J., Wagner, G.: Disclosure limitation in use: results of a survey, pp. 17–42 of [8] (2001)
Fung, B., Wang, K., Yu, P.: Top-down specialization for information and privacy preservation. In: Proceedings of the 21st IEEE International Conference on Data, Engineering, pp. 205–216 (2005)
Hansen, S., Mukherjee, S.: A polynomial algorithm for optimal univariate microaggregation. Trans. Knowl. Data Eng. 15(4), 1043–1044 (2003)
Jolliffe, I.T.: Principal Component Analysis. Springer Series in Statistics. Springer, New York (2002). ISBN: 978-0-387-95442-4
Larsen, R.J., Marx, M.L.: An Introduction to Mathematical Statistics and Its Applications, 3rd edn. Prentice Hall, Upper Saddle River (2005). ISBN-10: 0131867938
Mateo-Sanz, J.M., Domingo-Ferrer, J.: A method for data-oriented multivariate microaggregation. In: Statistical Data Protection for Official Publications of the European, Communities, pp. 89–99
Murphy, P., M., Aha, D.W.: UCI Repository machine learning databases. http://www.ics.uci.edu/~mlearn/MLRepository.html, University of California, Department of Information and Computer Science, Irvine, CA (1994)
Nin, J., Torra, V.: Empirical analysis of database privacy using twofold integrals. In: Hao, Y., Liu, J., Wang, Y.-P., Cheung, Y., Yin, H., Jiao, L., Ma, J., Jiao, Y.-C. (eds.) CIS 2005, vol. 3801, pp. 1–8. LNAI. Springer, Heidelberg (2005)
Nin, J., Herranz, J., Torra, V.: On the disclosure risk of multivariate microaggregation. Data. Knowl. Eng. (DKE), Elsevier 67(3), 399–412 (2008)
Nin, J., Herranz, J., Torra, V.: How to group attributes in multivariate microaggregation. Int. J. Uncertainty Fuzziness Knowl. Based Syst. 16(1), 121–138 (2008)
Nin, J., Herranz, J., Torra, V.: Towards a more realistic disclosure risk assessment. In: Domingo-Ferrer, J., Saygın, Y. (eds.) PSD 2008, vol. 5262, pp. 152–165. LNCS. Springer, Heidelberg (2008)
Oganian, A., Domingo-Ferrer, J.: On the complexity of optimal microaggregation for statistical disclosure control. Stat. J. United Nations Econ. Comm. Europe 18(4), 345–354 (2000)
Pagliuca, D., Seri, G.: Some results of individual ranking method on the system of enterprise accounts annual survey, Esprit SDC Project, Deliverable MI-3/D2 (1999)
Samarati, P., Sweeney, L.: Protecting privacy when disclosing information: \(k\)-anonymity and its enforcement through generalization and suppression. SRI International technical reports (1998)
Sande, G.: Exact and approximate methods for data directed microaggregation in one or more dimensions. Int. J. Unc. Fuzz. Knowl. Based Syst. 10(5), 459–476 (2002)
Sebé, F., Domingo-Ferrer, J., Mateo-Sanz, J.M., Torra, V.: Post-masking optimization of the tradeoff between information loss and disclosure risk in masked microdata sets. In: Domingo-Ferrer, J. (ed.) Inference Control in Statistical Databases, vol. 2316, pp. 163–171. LNCS. Springer, Heidelberg (2002)
Sweeney, L.: Achieving \(k\)-anonymity privacy protection using generalization and suppression. Int. J. Unc. Fuzz. Knowl. Based Syst. 10(5), 571–588 (2002)
Sweeney, L.: \(k\)-anonymity: a model for protecting privacy. Int. J. Unc. Fuzz. Knowl. Based Syst. 10(5), 557–570 (2002)
U.S. Census Bureau, Data Extraction System. http://www.census.gov/ (1990)
Willenborg, L., Waal, T.: Elements of Statistical Diclosure Control. Lecture Notes in Statistics. Springer, New York (2001)
Acknowledgments
This work is partially supported by the Ministry of Science and Technology of Spain under contract TIN2012-34557 and by the BSC-CNS Severo Ochoa program (SEV-2011-00067)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Nin, J. (2014). Beyond Multivariate Microaggregation for Large Record Anonymization. In: Nin, J., Villatoro, D. (eds) Citizen in Sensor Networks. CitiSens 2013. Lecture Notes in Computer Science(), vol 8313. Springer, Cham. https://doi.org/10.1007/978-3-319-04178-0_8
Download citation
DOI: https://doi.org/10.1007/978-3-319-04178-0_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-04177-3
Online ISBN: 978-3-319-04178-0
eBook Packages: Computer ScienceComputer Science (R0)