Beyond Multivariate Microaggregation for Large Record Anonymization

Jordi Nin⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8313))

Included in the following conference series:

International Workshop on Citizen in Sensor Networks

946 Accesses

Abstract

Microaggregation is one of the most commonly employed microdata protection methods. The basic idea of microaggregation is to anonymize data by aggregating original records into small groups of at least \(k\) elements and, therefore, preserving \(k\)-anonymity. Usually, in order to avoid information loss, when records are large, i.e., the number of attributes of the data set is large, this data set is split into smaller blocks of attributes and microaggregation is applied to each block, successively and independently. This is called multivariate microaggregation. By using this technique, the information loss after collapsing several values to the centroid of their group is reduced. Unfortunately, with multivariate microaggregation, the \(k\)-anonymity property is lost when at least two attributes of different blocks are known by the intruder, which might be the usual case.

In this work, we present a new microaggregation method called one dimension microaggregation (\(Mic1D-k\)). With \(Mic1D-k\), the problem of \(k\)-anonymity loss is mitigated by mixing all the values in the original microdata file into a single non-attributed data set using a set of simple pre-processing steps and then, microaggregating all the mixed values together. Our experiments show that, using real data, our proposal obtains lower disclosure risk than previous approaches whereas the information loss is preserved.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 35.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 44.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

An Efficient Microaggregation Method for Protecting Mixed Data

TBM, a transformation based method for microaggregation of large volume mixed data

Article 15 March 2016

Hybrid microaggregation for privacy preserving data mining

Article 26 November 2018

References

Adam, N.R., Wortmann, J.C.: Security-control for statistical databases: a comparative study. ACM Comput. Surv. 21, 515–556 (1989)
Article Google Scholar
Aggarwal, C.: On \(k\)-anonymity and the curse of dimensionality. In: Proceedings of the 31st International Conference on Very Large Databases, pp. 901–909 (2005)
Google Scholar
Aggarwal, G., Feder, T., Kenthapadi, K., Khuller, S., Panigrahy, R., Thomas, D., Zhu, A.: Achieving anonymity via clustering. In: Proceedings of the 25th ACM Symposium on Principles of Databases Systems, pp. 153–162 (2006)
Google Scholar
CASC: Computational Aspects of Statistical Confidentiality, European Project IST-2000-25069, http://neon.vb.cbs.nl/casc
Domingo-Ferrer, J., Torra, V.: Disclosure control methods and information loss for microdata, pp. 91–110 of [8] (2001)
Google Scholar
Domingo-Ferrer, J., Torra, V.: A quantitative comparison of disclosure control methods for microdata, pp. 111–133 of [8] (2001)
Google Scholar
Domingo-Ferrer, J., Mateo-Sanz, J.M.: Practical data-oriented microaggregation for statistical disclosure control. IEEE Trans. Knowl. Data Eng. 14(1), 189–201 (2002)
Article Google Scholar
Doyle, P., Lane, J., Theeuwes, J., Zayatz, L. (eds.): Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies. Elsevier Science, New York (2001)
Google Scholar
Felso, F., Theeuwes, J., Wagner, G.: Disclosure limitation in use: results of a survey, pp. 17–42 of [8] (2001)
Google Scholar
Fung, B., Wang, K., Yu, P.: Top-down specialization for information and privacy preservation. In: Proceedings of the 21st IEEE International Conference on Data, Engineering, pp. 205–216 (2005)
Google Scholar
Hansen, S., Mukherjee, S.: A polynomial algorithm for optimal univariate microaggregation. Trans. Knowl. Data Eng. 15(4), 1043–1044 (2003)
Article Google Scholar
Jolliffe, I.T.: Principal Component Analysis. Springer Series in Statistics. Springer, New York (2002). ISBN: 978-0-387-95442-4
MATH Google Scholar
Larsen, R.J., Marx, M.L.: An Introduction to Mathematical Statistics and Its Applications, 3rd edn. Prentice Hall, Upper Saddle River (2005). ISBN-10: 0131867938
Google Scholar
Mateo-Sanz, J.M., Domingo-Ferrer, J.: A method for data-oriented multivariate microaggregation. In: Statistical Data Protection for Official Publications of the European, Communities, pp. 89–99
Google Scholar
Murphy, P., M., Aha, D.W.: UCI Repository machine learning databases. http://www.ics.uci.edu/~mlearn/MLRepository.html, University of California, Department of Information and Computer Science, Irvine, CA (1994)
Nin, J., Torra, V.: Empirical analysis of database privacy using twofold integrals. In: Hao, Y., Liu, J., Wang, Y.-P., Cheung, Y., Yin, H., Jiao, L., Ma, J., Jiao, Y.-C. (eds.) CIS 2005, vol. 3801, pp. 1–8. LNAI. Springer, Heidelberg (2005)
Google Scholar
Nin, J., Herranz, J., Torra, V.: On the disclosure risk of multivariate microaggregation. Data. Knowl. Eng. (DKE), Elsevier 67(3), 399–412 (2008)
Article Google Scholar
Nin, J., Herranz, J., Torra, V.: How to group attributes in multivariate microaggregation. Int. J. Uncertainty Fuzziness Knowl. Based Syst. 16(1), 121–138 (2008)
Article Google Scholar
Nin, J., Herranz, J., Torra, V.: Towards a more realistic disclosure risk assessment. In: Domingo-Ferrer, J., Saygın, Y. (eds.) PSD 2008, vol. 5262, pp. 152–165. LNCS. Springer, Heidelberg (2008)
Google Scholar
Oganian, A., Domingo-Ferrer, J.: On the complexity of optimal microaggregation for statistical disclosure control. Stat. J. United Nations Econ. Comm. Europe 18(4), 345–354 (2000)
Google Scholar
Pagliuca, D., Seri, G.: Some results of individual ranking method on the system of enterprise accounts annual survey, Esprit SDC Project, Deliverable MI-3/D2 (1999)
Google Scholar
Samarati, P., Sweeney, L.: Protecting privacy when disclosing information: \(k\)-anonymity and its enforcement through generalization and suppression. SRI International technical reports (1998)
Google Scholar
Sande, G.: Exact and approximate methods for data directed microaggregation in one or more dimensions. Int. J. Unc. Fuzz. Knowl. Based Syst. 10(5), 459–476 (2002)
Article MATH MathSciNet Google Scholar
Sebé, F., Domingo-Ferrer, J., Mateo-Sanz, J.M., Torra, V.: Post-masking optimization of the tradeoff between information loss and disclosure risk in masked microdata sets. In: Domingo-Ferrer, J. (ed.) Inference Control in Statistical Databases, vol. 2316, pp. 163–171. LNCS. Springer, Heidelberg (2002)
Chapter Google Scholar
Sweeney, L.: Achieving \(k\)-anonymity privacy protection using generalization and suppression. Int. J. Unc. Fuzz. Knowl. Based Syst. 10(5), 571–588 (2002)
Article MATH MathSciNet Google Scholar
Sweeney, L.: \(k\)-anonymity: a model for protecting privacy. Int. J. Unc. Fuzz. Knowl. Based Syst. 10(5), 557–570 (2002)
Article MATH MathSciNet Google Scholar
U.S. Census Bureau, Data Extraction System. http://www.census.gov/ (1990)
Willenborg, L., Waal, T.: Elements of Statistical Diclosure Control. Lecture Notes in Statistics. Springer, New York (2001)
Book Google Scholar

Download references

Acknowledgments

This work is partially supported by the Ministry of Science and Technology of Spain under contract TIN2012-34557 and by the BSC-CNS Severo Ochoa program (SEV-2011-00067)

Author information

Authors and Affiliations

Barcelona Supercomputing Center (BSC), Universitat Politècnica de Catalunya (BarcelonaTech), Barcelona, Catalonia, Spain
Jordi Nin

Authors

Jordi Nin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jordi Nin .

Editor information

Editors and Affiliations

Universitat Politècnica de Catalunya Dept. Arquitectura de Computadors, Barcelona, Spain
Jordi Nin
Barcelona Digital Technology Centre, Barcelona, Spain
Daniel Villatoro

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nin, J. (2014). Beyond Multivariate Microaggregation for Large Record Anonymization. In: Nin, J., Villatoro, D. (eds) Citizen in Sensor Networks. CitiSens 2013. Lecture Notes in Computer Science(), vol 8313. Springer, Cham. https://doi.org/10.1007/978-3-319-04178-0_8

Download citation

DOI: https://doi.org/10.1007/978-3-319-04178-0_8
Published: 20 December 2013
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-04177-3
Online ISBN: 978-3-319-04178-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Beyond Multivariate Microaggregation for Large Record Anonymization

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

An Efficient Microaggregation Method for Protecting Mixed Data

TBM, a transformation based method for microaggregation of large volume mixed data

Hybrid microaggregation for privacy preserving data mining

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Beyond Multivariate Microaggregation for Large Record Anonymization

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

An Efficient Microaggregation Method for Protecting Mixed Data

TBM, a transformation based method for microaggregation of large volume mixed data

Hybrid microaggregation for privacy preserving data mining

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation