[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2872427.2883045acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

On Sampling Nodes in a Network

Published: 11 April 2016 Publication History

Abstract

Random walk is an important tool in many graph mining applications including estimating graph parameters, sampling portions of the graph, and extracting dense communities. In this paper we consider the problem of sampling nodes from a large graph according to a prescribed distribution by using random walk as the basic primitive. Our goal is to obtain algorithms that make a small number of queries to the graph but output a node that is sampled according to the prescribed distribution. Focusing on the uniform distribution case, we study the query complexity of three algorithms and show a near-tight bound expressed in terms of the parameters of the graph such as average degree and the mixing time. Both theoretically and empirically, we show that some algorithms are preferable in practice than the others. We also extend our study to the problem of sampling nodes according to some polynomial function of their degrees; this has implications for designing efficient algorithms for applications such as triangle counting.

References

[1]
Z. Bar-Yossef, A. C. Berg, S. Chien, J. Fakcharoenphol, and D. Weitz. Approximating aggregate queries about Web pages via random walks. In VLDB, pages 535--544, 2000.
[2]
Z. Bar-Yossef and M. Gurevich. Random sampling from a search engine's index. J. ACM, 55(5), 2008.
[3]
S. Boyd, P. Diaconis, and L. Xiao. Fastest mixing Markov chain on a graph. SIAM Review, 46(4):667--689, 2004.
[4]
J. Cheeger. A lower bound for the smallest eigenvalue of the Laplacian. In R. C. Gunning, editor, Problems in Analysis (Papers dedicated to Salomon Bochner), pages 195--199. Princeton Univ. Press, 1970.
[5]
C. Cooper, T. Radzik, and Y. Siantos. Estimating network parameters using random walks. In CASoN, pages 33--40, 2012.
[6]
A. Dasgupta, R. Kumar, and T. Sarlós. On estimating the average degree. In WWW, pages 795--806, 2014.
[7]
M. Gjoka, M. Kurant, C. Butts, and A. Markopoulou. Walking in Facebook: A case study of unbiased sampling of OSNs. In INFOCOM, pages 1--9, 2010.
[8]
V. Guruswami. Rapidly mixing Markov chains: A comparison of techniques. A Survey, 2000.
[9]
S. J. Hardiman and L. Katzir. Estimating clustering coefficients and size of social networks via random walk. In WWW, pages 539--550, 2013.
[10]
W. K. Hastings. Monte Carlo sampling methods using Markov chains and their applications. Biometrika, 57(1):97--109, 1970.
[11]
M. Hildebrand. Random walks on random simple graphs. Random Structures & Algorithms, 8(4):301--318, 1996.
[12]
M. Jha, C. Seshadhri, and A. Pinar. Path sampling: A fast and provable method for estimating 4-vertex subgraph counts. In WWW, pages 495--505, 2015.
[13]
L. Katzir, E. Liberty, and O. Somekh. Estimating sizes of social networks via biased sampling. In WWW, pages 597--606, 2011.
[14]
L. Katzir, E. Liberty, and O. Somekh. Framework and algorithms for network bucket testing. In WWW, pages 1029--1036, 2012.
[15]
J. Leskovec and C. Faloutsos. Sampling from large graphs. In KDD, pages 631--636, 2006.
[16]
D. Levin, Y. Peres, and E. Wilmer. Markov Chains and Mixing Times. American Mathematical Society, 2009.
[17]
R.-H. Li, J. X. Yu, L. Qin, R. Mao, and T. Jin. On random walk based graph sampling. In ICDE, pages 927--938, 2015.
[18]
L. Lovász and P. Winkler. Efficient stopping rules for Markov chains. In STOC, pages 76--82, 1995.
[19]
A. Nazi, Z. Zhou, S. Thirumuruganathan, N. Zhang, and G. Das. Walk, not wait: Faster sampling over online social networks. PVLDB, 8(6):678--689, 2015.
[20]
B. Ribeiro and D. Towsley. Estimating and sampling graphs with multidimensional random walks. In IMC, pages 390--403, 2010.
[21]
C. Seshadhri, A. Pinar, and T. G. Kolda. Wedge sampling for computing clustering coefficients and triangle counts on large graphs. Statistical Analysis and Data Mining, 7(4):294--307, 2014.
[22]
A. Sinclair. Improved bounds for mixing rates of Markov chains and multicommodity flow. Combinatorics, Probability, & Computing 1, pages 351--370, 1992.
[23]
D. Stutzbach, R. Rejaie, N. G. Duffield, S. Sen, and W. Willinger. On unbiased sampling for unstructured peer-to-peer networks. IEEE/ACM Trans. Netw., 17(2):377--390, 2009.
[24]
S. Ye and F. Wu. Estimating the size of online social networks. In SocialCom, pages 169--176, 2010.
[25]
Z. Zhou, N. Zhang, Z. Gong, and G. Das. Faster random walks by rewiring online social networks on-the-fly. In ICDE, pages 769--780, 2013.

Cited By

View all
  • (2024)Estimation of Graph Features Based on Random Walks Using Neighbors’ PropertiesWeb Information Systems Engineering – WISE 202410.1007/978-981-96-0567-5_32(456-466)Online publication date: 3-Dec-2024
  • (2023)MaNIACS: Approximate Mining of Frequent Subgraph Patterns through SamplingACM Transactions on Intelligent Systems and Technology10.1145/358725414:3(1-29)Online publication date: 13-Apr-2023
  • (2023) Bavarian: Betweenness Centrality Approximation with Variance-aware Rademacher AveragesACM Transactions on Knowledge Discovery from Data10.1145/357702117:6(1-47)Online publication date: 6-Mar-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
WWW '16: Proceedings of the 25th International Conference on World Wide Web
April 2016
1482 pages
ISBN:9781450341431

Sponsors

  • IW3C2: International World Wide Web Conference Committee

In-Cooperation

Publisher

International World Wide Web Conferences Steering Committee

Republic and Canton of Geneva, Switzerland

Publication History

Published: 11 April 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. metropolis-hastings
  2. mixing time
  3. random walks
  4. stationary distribution

Qualifiers

  • Research-article

Funding Sources

  • SIR Grant
  • Google Focused Research Award
  • ERC Starting Grant DMAP
  • Google Faculty Research Award

Conference

WWW '16
Sponsor:
  • IW3C2
WWW '16: 25th International World Wide Web Conference
April 11 - 15, 2016
Québec, Montréal, Canada

Acceptance Rates

WWW '16 Paper Acceptance Rate 115 of 727 submissions, 16%;
Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)36
  • Downloads (Last 6 weeks)4
Reflects downloads up to 09 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Estimation of Graph Features Based on Random Walks Using Neighbors’ PropertiesWeb Information Systems Engineering – WISE 202410.1007/978-981-96-0567-5_32(456-466)Online publication date: 3-Dec-2024
  • (2023)MaNIACS: Approximate Mining of Frequent Subgraph Patterns through SamplingACM Transactions on Intelligent Systems and Technology10.1145/358725414:3(1-29)Online publication date: 13-Apr-2023
  • (2023) Bavarian: Betweenness Centrality Approximation with Variance-aware Rademacher AveragesACM Transactions on Knowledge Discovery from Data10.1145/357702117:6(1-47)Online publication date: 6-Mar-2023
  • (2023)Random Walk Sampling in Social Networks Involving Private NodesACM Transactions on Knowledge Discovery from Data10.1145/356138817:4(1-28)Online publication date: 24-Feb-2023
  • (2023)Efficiently Sampling and Estimating Hypergraphs By Hybrid Random Walk2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00102(1273-1285)Online publication date: Apr-2023
  • (2022)Edge sampling and graph parameter estimation via vertex neighborhood accessesProceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing10.1145/3519935.3520059(1116-1129)Online publication date: 9-Jun-2022
  • (2022)Sampling Multiple Nodes in Large NetworksProceedings of the Fifteenth ACM International Conference on Web Search and Data Mining10.1145/3488560.3498383(37-47)Online publication date: 11-Feb-2022
  • (2022)Combining clutter reduction methods for temporal network visualizationProceedings of the 37th ACM/SIGAPP Symposium on Applied Computing10.1145/3477314.3507018(1748-1755)Online publication date: 25-Apr-2022
  • (2022)Common Neighbors Matter: Fast Random Walk Sampling with Common Neighbor AwarenessIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.3150427(1-1)Online publication date: 2022
  • (2022)Social Graph Restoration via Random Walk Sampling2022 IEEE 38th International Conference on Data Engineering (ICDE)10.1109/ICDE53745.2022.00065(01-14)Online publication date: May-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media