Abstract
Anonymization of graph-based data is a problem which has been widely studied over the last years, and several anonymization methods have been developed. Information loss measures have been used to evaluate data utility and information loss in the anonymized graphs. However, there is no consensus about how to evaluate data utility and information loss in privacy-preserving and anonymization scenarios, where the anonymous datasets were perturbed to hinder re-identification processes. Authors use diverse metrics to evaluate data utility and, consequently, it is complex to compare different methods or algorithms in the literature. In this paper, we propose a framework to evaluate and compare anonymous datasets in a common way, providing an objective score to clearly compare methods and algorithms. Our framework includes metrics based on generic information loss measures, such as average distance or betweenness centrality and also task-specific information loss measures, such as community detection or information flow. Additionally, we provide some metrics to examine re-identification and risk assessment. We demonstrate that our framework could help researchers and practitioners to select the best parametrization and/or algorithm to reduce information loss and maximize data utility.
Similar content being viewed by others
Notes
Available at: https://github.com/jcasasr/DUEF-GA.
References
Backstrom, L., Dwork, C., Kleinberg, J.: Wherefore art thou r3579x?: anonymized social networks, hidden patterns, and structural steganography. In: WWW ’07, pp. 181–190. ACM, New York, NY, USA (2007)
Barabási, A.-L., Pósfai, M.: Network Science. Cambridge University Press, Cambridge (2016)
Blondel, V.D., Guillaume, J.-L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech. Theory Exp. 2008(10), P10008 (2008)
Brandes, U.: A faster algorithm for betweenness centrality. J. Math. Sociol. 25(2), 163–177 (2001)
Brunet, S., Canard, S., Gambs, S., Olivier, B.: Novel differentially private mechanisms for graphs. IACR Cryptol. ePrint Arch. 2016, 745 (2016)
Bonchi, F., Gionis, A., Tassa, T.: Identity obfuscation in graphs through the information theoretic lens. Inf. Sci. 275, 232–256 (2014)
Cai, B.-J., Wang, H.-Y., Zheng, H.-R., Wang, H.: Evaluation repeated random walks in community detection of social networks. In: 2010 International Conference on Machine Learning and Cybernetics (ICMLC), pp. 1849–1854. IEEE Computer Society, Qingdao (2010)
Casas-Roma, J.: Privacy-preserving on graphs using randomization and edge-relevance. In: Torra, V. (ed.) International Conference on Modeling Decisions for Artificial Intelligence (MDAI), pp. 204–216. Springer International Publishing Switzerland, Tokyo, Japan (2014)
Casas-Roma, J., Herrera-joancomartí, J., Torra, V.: Anonymizing graphs: measuring quality for clustering. Knowl. Inf. Syst. 44(3), 507–528 (2015)
Casas-Roma, J., Herrera-joancomartí, J., Torra, V.: \(k\)-Degree anonymity and edge selection: improving data utility in large networks. Knowl. Inf. Syst. 50(2), 447–474 (2016)
Casas-Roma, J., Herrera-Joancomartí, J., Torra, V.: A survey of graph-modification techniques for privacy-preserving on networks. Artif. Intell. Rev. 47(3), 341–366 (2017)
Chakrabarti, D., Faloutsos, C.: Graph mining: laws, generators, and algorithms. ACM Comput. Surv. 38(1), 2:1-2:69 (2006)
Chester, S., Kapron, B.M., Ramesh, G., Srivastava, G., Thomo, A., Venkatesh, S.: \(k\)-Anonymization of social networks by vertex addition. In: ADBIS 2011 Research Communications, pp. 107–116. Vienna, Austria, CEUR-WS.org (2011)
Chester, S., Kapron, B.M., Ramesh, G., Srivastava, G., Thomo, A., Venkatesh, S.: Why Waldo befriended the dummy? \(k\)-Anonymization of social networks with pseudo-nodes. Soc. Netw. Anal. Min. 3(3), 381–399 (2013)
Clauset, A., Newman, M.E.J., Moore, C.: Finding community structure in very large networks. Phys. Rev. E 70(6), 66111 (2004)
Dwork, C.: Differential privacy. In: Proceedings of the 33rd International Conference on Automata, Languages and Programming (ICALP), pp 1–12. Springer-Verlag, Berlin (2006)
Ferri, F., Grifoni, P., Guzzo, T.: New forms of social and professional digital relationships: the case of Facebook. Soc. Netw. Anal. Min. 2(2), 121–137 (2011)
Girvan, M., Newman, M.E.J.: Community structure in social and biological networks. Proc. Natl. Acad. Sci. USA 99(12), 7821–7826 (2002)
Guimerà, R., Danon, L., Díaz-Guilera, A., Giralt, F., Arenas, A.: Self-similar community structure in a network of human interactions. Phys. Rev. E 68(6), 65–103 (2003)
Hay, M., Miklau, G., Jensen, D., Weis, P., Srivastava, S.: Anonymizing social networks. Report, University of Massachusetts Amherst (2007)
Hay, M., Miklau, G., Jensen, D., Towsley, D., Weis, P.: Resisting structural re-identification in anonymized social networks. Proc. VLDB Endow. 1(1), 102–114 (2008)
Hay, M., Li, C., Miklau, G., Jensen, D.: Accurate estimation of the degree distribution of private networks. In: IEEE International Conference on Data Mining (ICDM), pp. 169–178. IEEE Computer Society, Miami, FL (2009)
Hay, M., Liu, K., Miklau, G., Pei, J., and Terzi, E. (2011). Privacy-aware data management in information networks. In: International Conference on Management of Data (SIGMOD). ACM Press, New York, USA, pp. 1201–1204
Isella, L., Stehlé, J., Barrat, A., Cattuto, C., Pinton, J., Van den Broeck, W.: What’s in a crowd? Analysis of face-to-face behavioral networks. J. Theor. Biol. 271(1), 166–180 (2011)
KONECT: Hamsterster friendships network dataset, April 2017 (2017)
Lancichinetti, A., Fortunato, S.: Community detection algorithms: a comparative analysis. Phys. Rev. E 80(5), 56117 (2009)
Lindner, G., Staudt, C.L., Hamann, M., Meyerhenke, H., Wagner, D.: Structure-preserving sparsification of social networks. In: IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp. 448–454. ACM, Paris, France (2015)
Liu, K., Terzi, E.: Towards identity anonymization on graphs. In: Proceedings of the ACM International Conference on Management of Data (SIGMOD), pp. 93–106. ACM Press, New York (2008)
Macwan, K.R., Patel, S.J.: \(k\)-NMF anonymization in social network data publishing. Comput. J. 61(4), 601–613 (2018)
Ma, T., Zhang, Y., Cao, J., Shen, J., Tang, M., Tian, Y., Al-Dhelaan, A., Al-Rodhaan, M.: KDVEM: a \(k\)-degree anonymity with vertex and edge modification algorithm. Computing 97(12), 1165–1184 (2015)
Nagle, F.: Privacy breach analysis in social networks. Mining Social Networks and Security Informatics, pp. 63–77. Springer, Dordrecht (2013)
Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: bringing order to the web. In: Proceedings of the 7th International World Wide Web Conference (WWW), pp. 161–172. Brisbane, Australia (1998)
Pons, P., Latapy, M.: Computing communities in large networks using random walks. Computer and Information Sciences (ISCIS), vol. 10, pp. 284–293. Springer, Berlin (2005)
Rosvall, M., Bergstrom, C.T.: Maps of random walks on complex networks reveal community structure. Proc. Natl. Acad. Sci. 105(4), 1118–1123 (2008)
Sala, A., Zhao, X., Wilson, C., Zheng, H., Zhao, B.Y.: Sharing graphs using differentially private graph models. In: Proceedings of the 2011 ACM SIGCOMM Conference on Internet Measurement Conference (IMC ’11), Berlin, Germany, pp. 81–98 (2011)
Sarma, A.D., Molla, A.R., Pandurangan, G., Upfal, E.: Fast distributed PageRank computation. Theor. Comput. Sci. 561(B), 113–121 (2015)
Sweeney, L.: \(k\)-anonymity: a model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl. Based Syst. 10(5), 557–570 (2002)
Wagner, I., Eckhoff, D.: Technical privacy metrics. ACM Comput. Surv. 51(3), 1–38 (2018)
Wang, Y., Xie, L., Zheng, B., Lee, K.C.K.: High utility K-anonymization for social network publishing. Knowl. Inf. Syst. (KAIS) 41(3), 697–725 (2014)
Wang, Y., Zheng, B.: Preserving privacy in social networks against connection fingerprint attacks. In: International Conference on Data Engineering (ICDE), pp. 54–65. IEEE, Seoul, South Korea (2015)
Yang, Y., Lutes, J., Li, F., Luo, B., Liu, P.: Stalking online: on user privacy in social networks. In: Proceedings of 2nd ACM Conference on Data and Application Security and Privacy (CODASPY’12), pp. 37–48 (2012)
Ying, X ., Wu, X.: Randomizing social networks: a spectrum preserving approach. In: Proceedings of the SIAM International Conference on Data Mining (SDM), pp. 739–750. SIAM, Atlanta (2008)
Ying, X., Pan, K., Wu, X., Guo, L.: Comparisons of randomization and \(k\)-degree anonymization schemes for privacy preserving social network publishing. In: Proceedings of the 3rd Workshop on Social Network Mining and Analysis (SNA-KDD), pp. 10:1-10:10. ACM Press, New York (2009)
Yuan, M., Chen, L., Yu, P.S., Yu, T.: Protecting sensitive labels in social network data anonymization. IEEE Trans. Knowl. Data Eng. 25(3), 633–647 (2013)
Zhang, K., Lo, D., Lim, E.-P., Prasetyo, P.K.: Mining indirect antagonistic communities from social interactions. Knowl. Inf. Syst. 35(3), 553–583 (2013)
Zhou, B., Pei, J.: Preserving privacy in social networks against neighborhood attacks. In: Proceedings of the 24th International Conference on Data Engineering (ICDE), pp. 506–515. IEEE Computer Society, Washington (2008)
Zou, L., Chen, L., Özsu, M.T.: \(K\)-Automorphism: a general framework for privacy preserving network publication. Proc. VLDB Endow. 2(1), 946–957 (2009)
Acknowledgements
This work was supported by the Spanish Government, in part under Grant RTI2018-095094-B-C22 “CONSENT” and in part under Grant TIN2014-57364-C2-2-R “SMARTGLACIS”.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
Author Jordi Casas-Roma declares that he has no conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Casas-Roma, J. DUEF-GA: data utility and privacy evaluation framework for graph anonymization. Int. J. Inf. Secur. 19, 465–478 (2020). https://doi.org/10.1007/s10207-019-00469-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10207-019-00469-4