[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Estimating Clustering Coefficients and Size of Social Networks via Random Walk

Published: 28 September 2015 Publication History

Abstract

This work addresses the problem of estimating social network measures. Specifically, the measures at hand are the network average and global clustering coefficients and the number of registered users. The algorithms at hand (1) assume no prior knowledge about the network and (2) access the network using only the publicly available interface. More precisely, this work provides (a) a unified approach for clustering coefficients estimation and (b) a new network size estimator. The unified approach for the clustering coefficients yields the first external access algorithm for estimating the global clustering coefficient. The new network size estimator offers improved accuracy compared to prior art estimators.
Our approach is to view a social network as an undirected graph and use the public interface to retrieve a random walk. To estimate the clustering coefficient, the connectivity of each node in the random walk sequence is tested in turn. We show that the error drops exponentially in the number of random walk steps. For the network size estimation we offer a generalized view of prior art estimators that in turn yields an improved estimator. All algorithms are validated on several publicly available social network datasets.

References

[1]
Louigi Addario-Berry and Tao Lei. 2012. The mixing time of the Newman--Watts small world. In SODA. 1661--1668.
[2]
Yong-Yeol Ahn, Seungyeop Han, Haewoon Kwak, Sue B. Moon, and Hawoong Jeong. 2007. Analysis of topological characteristics of huge online social networking services. In WWW. 835--844.
[3]
Noga Alon, Raphael Yuster, and Uri Zwick. 1997. Finding and counting given length cycles. Algorithmica 17, 3 (1997), 209--223.
[4]
Haim Avron. 2010. Counting triangles in large graphs using randomized matrix trace estimation. In Large-Scale Data Mining: Theory and Applications (KDD Workshop).
[5]
Lars Backstrom, Daniel P. Huttenlocher, Jon M. Kleinberg, and Xiangyang Lan. 2006. Group formation in large social networks: Membership, growth, and evolution. In KDD. 44--54.
[6]
Ziv Bar-Yossef and Maxim Gurevich. 2008. Random sampling from a search engine’s index. J. ACM 55, 5, Article 24 (Oct. 2008).
[7]
Ziv Bar-Yossef and Maxim Gurevich. 2009. Estimating the impressionrank of web pages. In WWW. 41--50.
[8]
Ziv Bar-Yossef and Maxim Gurevich. 2011. Efficient search engine measurements. TWEB 5, 4, Article 18 (Oct 2011).
[9]
Luca Becchetti, Paolo Boldi, Carlos Castillo, and Aristides Gionis. 2010. Efficient algorithms for large-scale local triangle counting. TKDD 4, 3, Article 13 (Oct. 2010).
[10]
Luciana S. Buriol, Gereon Frahling, Stefano Leonardi, Alberto Marchetti-Spaccamela, and Christian Sohler. 2006. Counting triangles in data streams. In PODS. 253--262.
[11]
Kai-Min Chung, Henry Lam, Zhenming Liu, and Michael Mitzenmacher. 2012. Chernoff-Hoeffding bounds for Markov chains: Generalized and simplified. In STACS. 124--135.
[12]
Luciano da F. Costa, Francisco A. Rodrigues, Gonzalo Travieso, and Paulino R. Villas Boas. 2006. Characterization of complex networks: A survey of measurements. Adv. Phys. 56, 1 (Aug. 2006), 167--242. http://dx.doi.org/10.1080/00018730601170527
[13]
Bradley Efron and Robert J. Tibshirani. 1993. An Introduction to the Bootstrap. Chapman & Hall, New York.
[14]
Minas Gjoka, Maciej Kurant, Carter T. Butts, and Athina Markopoulou. 2010. Walking in Facebook: A case study of unbiased sampling of OSNs. Proceedings of IEEE INFOCOM 2010, 1--9.
[15]
Minas Gjoka, Maciej Kurant, and Athina Markopoulou. 2013. 2.5K-Graphs: From sampling to generation. In Proceedings of IEEE INFOCOM’13.
[16]
Stephen James Hardiman, Peter Richmond, and Stefan Hutzler. 2009. Calculating statistics of complex networks through random walks with an application to the on-line social network Bebo. Eur. Phys. J. B 71, 4 (2009), 611--622.
[17]
Wolfgang Härdle, Joel Horowitz, and Jens Peter Kreiss. 2003. Bootstrap methods for time series. Int. Stat. Rev. 71, 2 (Aug. 2003), 435--459.
[18]
Liran Katzir, Edo Liberty, and Oren Somekh. 2011. Estimating sizes of social networks via biased sampling. In WWW. 597--606.
[19]
Jerome Kunegis. 2012. KONECT—The Koblenz Network Collection. http://konect.uni-koblenz.de/.
[20]
Hans R. Künsch. 1989. The jackknife and the bootstrap for general stationary observations. Ann. Stat. 17, 1217--1241.
[21]
Maciej Kurant, Carter T. Butts, and Athina Markopoulou. 2012. Graph size estimation. CoRR abs/1210.0460.
[22]
David A. Levin, Yuval Peres, and Elizabeth L. Wilmer. 2008. Markov Chains and Mixing Times. American Mathematical Society.
[23]
Michael Ley. 2002. The DBLP computer science bibliography: Evolution, research issues, perspectives. In Proceedings of the International Symposium on String Processing and Information Retrieval. 1--10.
[24]
László Lovász and Peter Winkler. 1998. Mixing times. Microsurveys in discrete. In DimacsWorkshop.
[25]
Laurent Massoulié, Erwan Le Merrer, Anne-Marie Kermarrec, and Ayalvadi Ganesh. 2006. Peer counting and sampling in overlay networks: Random walk methods. In Proceedings of the 25th Annual ACM Symposium on Principles of Distributed Computing (PODC’06). ACM, New York, NY, 123--132.
[26]
Alan Mislove, Hema Swetha Koppula, Krishna P. Gummadi, Peter Druschel, and Bobby Bhattacharjee. 2008. Growth of the flickr social network. In Proceedings of the 1st ACM SIGCOMM Workshop on Social Networks (WOSN’08).
[27]
Alan Mislove, Massimiliano Marcon, P. Krishna Gummadi, Peter Druschel, and Bobby Bhattacharjee. 2007. Measurement and analysis of online social networks. In Internet Measurement Comference. 29--42.
[28]
Abedelaziz Mohaisen, Aaram Yun, and Yongdae Kim. 2010. Measuring the mixing time of social graphs. In Internet Measurement Conference. 383--389.
[29]
Mark E. J. Newman and Duncan J. Watts. 1999a. Renormalization group analysis of the small-world network model. Phys. Lett. A 263, 341--346.
[30]
Mark E. J. Newman and Duncan J. Watts. 1999b. Scaling and percolation in the small-world network model. Phys. Rev. E 60, 7332--7342.
[31]
Bruno F. Ribeiro and Donald F. Towsley. 2010. Estimating and sampling graphs with multidimensional random walks. In Internet Measurement Conference. 390--403.
[32]
Reuven Y. Rubinstein and Dirk P. Kroese. 2007. Simulation and the Monte Carlo Method (2nd. ed.). Wiley Series in Probability and Statistics.
[33]
Thomas Schank and Dorothea Wagner. 2005. Approximating clustering coefficient and transitivity. J. Graph Algorithms Appl. 9, 2 (2005), 265--275.
[34]
Pinghui Wang, John C. S. Lui, Bruno F. Ribeiro, Don Towsley, Junzhou Zhao, and Xiaohong Guan. 2014. Efficiently estimating motif statistics of large networks. TKDD 9, 2, Article 8 (Nov. 2014).
[35]
Shaozhi Ye and Felix Wu. 2010. Estimating the size of online social networks. In 2010 IEEE 2nd International Conference on Social Computing (SocialCom). 169--176.

Cited By

View all
  • (2024)Discovering and Maintaining the Best k in Core DecompositionIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.338998936:11(5954-5971)Online publication date: Nov-2024
  • (2024)What is my privacy score? Measuring users’ privacy on social networking websitesElectronic Commerce Research10.1007/s10660-023-09796-0Online publication date: 1-Feb-2024
  • (2023)An Overlapping Community Detection Approach Based on Deepwalk and Improved Label PropagationIEEE Transactions on Computational Social Systems10.1109/TCSS.2022.315257910:1(311-321)Online publication date: Feb-2023
  • Show More Cited By

Index Terms

  1. Estimating Clustering Coefficients and Size of Social Networks via Random Walk

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on the Web
    ACM Transactions on the Web  Volume 9, Issue 4
    October 2015
    114 pages
    ISSN:1559-1131
    EISSN:1559-114X
    DOI:10.1145/2830542
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 28 September 2015
    Accepted: 01 June 2015
    Revised: 01 April 2015
    Received: 01 November 2014
    Published in TWEB Volume 9, Issue 4

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Estimation
    2. clustering coefficient
    3. sampling
    4. social network

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)27
    • Downloads (Last 6 weeks)6
    Reflects downloads up to 07 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Discovering and Maintaining the Best k in Core DecompositionIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.338998936:11(5954-5971)Online publication date: Nov-2024
    • (2024)What is my privacy score? Measuring users’ privacy on social networking websitesElectronic Commerce Research10.1007/s10660-023-09796-0Online publication date: 1-Feb-2024
    • (2023)An Overlapping Community Detection Approach Based on Deepwalk and Improved Label PropagationIEEE Transactions on Computational Social Systems10.1109/TCSS.2022.315257910:1(311-321)Online publication date: Feb-2023
    • (2023)Network analysis of pore structure of coral reef limestone and its implications for seepage flowEngineering Geology10.1016/j.enggeo.2023.107103318(107103)Online publication date: Jun-2023
    • (2022)Model for Generating Scale-Free Artificial Social Networks Using Small-World NetworksComputers, Materials & Continua10.32604/cmc.2022.02992773:3(6367-6391)Online publication date: 2022
    • (2022)Sampling Multiple Nodes in Large NetworksProceedings of the Fifteenth ACM International Conference on Web Search and Data Mining10.1145/3488560.3498383(37-47)Online publication date: 11-Feb-2022
    • (2022)Random Walk-steered Majority Undersampling2022 IEEE International Conference on Systems, Man, and Cybernetics (SMC)10.1109/SMC53654.2022.9945451(530-537)Online publication date: 9-Oct-2022
    • (2022)A random walk sampling on knowledge graphs for semantic-oriented statistical tasksData & Knowledge Engineering10.1016/j.datak.2022.102024140:COnline publication date: 1-Jul-2022
    • (2021)Estimating Distributions of Large Graphs from Incomplete Sampled Data2021 IFIP Networking Conference (IFIP Networking)10.23919/IFIPNetworking52078.2021.9472848(1-9)Online publication date: 21-Jun-2021
    • (2021)Sampling Graphlets of Multiplex Networks: A Restricted Random Walk ApproachACM Transactions on the Web10.1145/345629115:4(1-31)Online publication date: 14-Jun-2021
    • Show More Cited By

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media