Abstract
With the increasing prevalence of information networks, research on privacy-preserving network data publishing has received substantial attention recently. There are two streams of relevant research, targeting different privacy requirements. A large body of existing works focus on preventing node re-identification against adversaries with structural background knowledge, while some other studies aim to thwart edge disclosure. In general, the line of research on preventing edge disclosure is less fruitful, largely due to lack of a formal privacy model. The recent emergence of differential privacy has shown great promise for rigorous prevention of edge disclosure. Yet recent research indicates that differential privacy is vulnerable to data correlation, which hinders its application to network data that may be inherently correlated. In this paper, we show that differential privacy could be tuned to provide provable privacy guarantees even in the correlated setting by introducing an extra parameter, which measures the extent of correlation. We subsequently provide a holistic solution for non-interactive network data publication. First, we generate a private vertex labeling for a given network dataset to make the corresponding adjacency matrix form dense clusters. Next, we adaptively identify dense regions of the adjacency matrix by a data-dependent partitioning process. Finally, we reconstruct a noisy adjacency matrix by a novel use of the exponential mechanism. To our best knowledge, this is the first work providing a practical solution for publishing real-life network data via differential privacy. Extensive experiments demonstrate that our approach performs well on different types of real-life network datasets.
Similar content being viewed by others
Notes
The strong attacker mentioned in [24] cannot be prevented as his prior knowledge (without accessing any database) has allowed him to succeed in a privacy attack. Such a privacy attack is not caused by data sharing and therefore beyond the scope of this paper.
A random vertex labeling satisfies 0-differential privacy because it can be viewed as an application of the exponential mechanism with privacy parameter 0.
If \(v_i\) and \(v_j\) are selected to swap with each other, only one query will be affected. This leads to a lower global sensitivity.
For a region \(R\) not containing elements on the diagonal, \(GS(f)=1\) because a single edge difference cannot change two elements in this region.
ca-GrQc, ca-HepTh and wiki-Vote are publicly available in the Stanford large network dataset collection (http://snap.stanford.edu/data/index.html). STM is provided by the Société de transport de Montréal (http://www.stm.info).
Laplace can barely provide any useful information in terms of degree distribution because its KL-divergence is almost the same as an empty graph (i.e., \(|E| = 0\)).
Step 1 means that sampling is not employed.
References
Backstrom, L., Dwork, C., Kleinberg, J.: Wherefore art thou r3579x? anonymized social networks, hidden patterns, and structural steganography. In: Proceedings of the 16th International Conference on World Wide Web (WWW), pp. 181–190 (2007)
Bearman, P.S., Moody, J., Stovel, K.: Chains of affection: the structure of adolescent romantic and sexual networks. Am. J. Sociol. 110(1), 44–91 (2004)
Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509–517 (1975)
Bhagat, S., Cormode, G., Krishnamurthy, B., Srivastava, D.: Class-based graph anonymization for social network data. Proc. VLDB Endow. 2(1), 766–777 (2009)
Chen, R., Mohammed, N., Fung, B.C.M., Desai, B.C., Xiong, L.: Publishing set-valued data via differential privacy. Proc. VLDB Endow. 4(11), 1087–1098 (2011)
Cheng, J., Fu, A.W.C., Liu, J.: K-isomorphism: privacy preserving network publication against structural attacks. In: Proceedings of the 36th ACM SIGMOD International Conference on Management of Data (SIGMOD), pp. 459–470 (2010)
Cormen, T.H., Stein, C., Rivest, R.L., Leiserson, C.E.: Introduction to Algorithms, 3rd edn. McGraw-Hill Higher Education, New York (2009)
Cormode, G., Srivastava, D., Yu, T., Zhang, Q.: Anonymizing bipartite graph data using safe groupings. Proc. VLDB Endow. 1(1), 833–844 (2008)
Cormode, G., Procopiuc, M., Shen, E., Srivastava, D., Yu, T.: Differentially private spatial decompositions. In: Proceedings of the 27th IEEE International Conference on Data Engineering (ICDE), pp. 20–31 (2012)
Cuthill, E., McKee, J.: Reducing the bandwidth of sparse symmetric matrices. In: Proceedings of the 1969 24th National Conference, pp. 157–172 (1969)
Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: Proceedings of the 3rd Theory of Cryptography Conference (TCC), pp. 265–284 (2006)
Faloutsos, M., Faloutsos, P., Faloutsos, C.: On power-law relationships of the internet topology. In: Proceedings of the 23rd ACM SIGCOMM Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications (SIGCOMM), pp. 251–262 (1999)
Finkel, R.A., Bentley, J.L.: Quad trees: a data structure for retrieval on composite keys. Acta Informatica 4(1), 1–9 (1974)
Fortunato, S.: Community Detection in Graphs. CoRR abs/0906.0612 (2009)
Gupta, A., Ligett, K., McSherry, F., Roth, A., Talwar, K.: Differentially private combinatorial optimization. In: Proceedings of the 21st ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 1106–1125 (2010)
Gupta, A., Roth, A., Ullman, J.: Iterative constructions and private data release. In: Proceedings of the 9th Theory of Cryptography Conference (TCC), pp. 339–356 (2012)
Hay, M., Miklau, G., Jensen, D., Towsley, D.F., Weis, P.: Resisting structural re-identification in anonymized social networks. Proc. VLDB Endow. 1(1), 102–114 (2008)
Hay, M., Li, C., Miklau, G., Jensen, D.: Accurate estimation of the degree distribution of private networks. In: Proceedings of the 9th IEEE International Conference on Data Mining (ICDM), pp. 169–178 (2009)
Hay, M., Rastogi, V., Miklau, G., Suciu, D.: Boosting the accuracy of differentially private histograms through consistency. Proc. VLDB Endow. 3(1), 1021–1032 (2010)
Hay, M., Liu, K., Miklau, G., Pei, J., Terzi, E.: Privacy-aware data management in information networks. In: Proceedings of the 37th ACM SIGMOD International Conference on Management of Data (SIGMOD), pp. 1201–1204 (2011)
Kamel, I., Faloutsos, C.: Hilbert R-tree: an improved R-tree using fractals. In: Proceedings of the 20th International Conference on Very Large Data Bases (VLDB), pp. 500–509 (1994)
Karwa, V., Raskhodnikova, S., Smith, A., Yaroslavtsev, G.: Private analysis of graph structure. Proc VLDB Endow. 4(11), 1146–1157 (2011)
Kifer, D., Gehrke, J.: Injecting utility into anonymized datasets. In: Proceedings of the 32nd ACM SIGMOD International Conference on Management of Data (SIGMOD), pp. 217–228 (2006)
Kifer, D., Machanavajjhala, A.: No free lunch in data privacy. In: Proceedings of the 37th ACM SIGMOD International Conference on Management of Data (SIGMOD), pp. 193–204 (2011)
Kossinets, G., Watts, D.: Empirical analysis of an evolving social networks. Science 311(5757), 88–90 (2006)
Liu, K., Terzi, E.: Towards identity anonymization on graphs. In: Proceedings of the 34th ACM SIGMOD International Conference on Management of Data (SIGMOD), pp. 93–106 (2008)
Liu, L., Wang, J., Liu, J., Zhang, J.: Privacy preservation in social networks with sensitive edge weights. In: Proceedings of the 9th SIAM International Conference on Data Mining (SDM), pp. 954–965 (2009)
McSherry, F.: Privacy integrated queries: an extensible platform for privacy-preserving data analysis. In: Proceedings of the 35th ACM SIGMOD International Conference on Management of Data (SIGMOD), pp. 19–30 (2009)
McSherry, F., Talwar, K.: Mechanism design via differential privacy. In: Proceedings of the 48th IEEE Symposium on Foundations of Computer Science (FOCS), pp. 94–103 (2007)
Mohammed, N., Chen, R., Fung, B.C.M., Yu, P.S.: Differentially private data release for data mining. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD), pp. 493–501 (2011)
Proserpio, D., Goldberg, S., McSherry, F.: A workflow for differentially-private graph synthesis. In: Proceedings of the 2012 ACM Workshop on Online Social Networks (WOSN), pp. 13–18 (2012)
Sala, A., Zhao, X., Wilson, C., Zheng, H., Zhao, B.Y.: Sharing graphs using differentially private graph models. In: Proceedings of the 11th ACM SIGCOMM Conference on Internet Measurement (IMC), pp. 81–98 (2011)
Schaeffer, S.E.: Graph clustering. Comput. Sci. Rev. 1(1), 27–64 (2007)
Wu, L., Ying, X., Wu, X.: Reconstruction from randomized graph via low rank approximation. In: Proceedings of the 10th SIAM International Conference on Data Mining (SDM), pp. 60–71 (2010)
Xiao, X., Wang, G., Gehrke, J.: Differential privacy via wavelet transforms. In: Proceedings of the 26th IEEE International Conference on Data Engineering (ICDE), pp. 225–236 (2010)
Xiao, X., Bender, G., Hay, M., Gehrke, J.: iReduct: differential privacy with reduced relative errors. In: Proceedings of the 37th ACM SIGMOD International Conference on Management of Data (SIGMOD), pp. 229–240 (2011)
Ying, X., Wu, X.: Randomizing social networks: a spectrum preserving approach. In: Proceedings of the 8th SIAM International Conference on Data Mining (SDM), pp. 739–750 (2008)
Yuan, M., Chen, L., Yu, P.S.: Personalized privacy protection in social networks. Proc. VLDB Endow. 4(2), 141–150 (2011)
Zhou, B., Pei, J.: Preserving privacy in social networks against neighborhood attacks. In: Proceedings of the 24th IEEE International Conference on Data Engineering (ICDE), pp. 506–515 (2008)
Zou, L., Chen, L., Ozsu, M.T.: K-automorphism: a general framework for privacy preserving network publication. Proc. VLDB Endow. 2(1), 946–957 (2009)
Acknowledgments
We sincerely thank the reviewers for their insightful comments. We thank James Cheng, Ada Wai-Chee Fu and Jia Liu for providing the source code of \(k\)-isomorphism. The research is supported in part by NSERC through Discovery Grants (356065-2013), US NSF through grants CNS-1115234, DBI-0960443 and OISE-1129076 and US Department of Army through grant W911NF-12-1-0066.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Chen, R., Fung, B.C.M., Yu, P.S. et al. Correlated network data publication via differential privacy. The VLDB Journal 23, 653–676 (2014). https://doi.org/10.1007/s00778-013-0344-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778-013-0344-8