More Web Proxy on the site http://driver.im/

research-article

Open access

Sampling Multiple Nodes in Large Networks: Beyond Random Walks

Authors:

Omri Ben-Eliezer,

Dimitris FotakisAuthors Info & Claims

WSDM '22: Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining

Pages 37 - 47

https://doi.org/10.1145/3488560.3498383

Published: 15 February 2022 Publication History

Abstract

Sampling random nodes is a fundamental algorithmic primitive in the analysis of massive networks, with many modern graph mining algorithms critically relying on it. We consider the task of generating a large collection of random nodes in the network assuming limited query access (where querying a node reveals its set of neighbors). In current approaches, based on long random walks, the number of queries per sample scales linearly with the mixing time of the network, which can be prohibitive for large real-world networks. We propose a new method for sampling multiple nodes that bypasses the dependence in the mixing time by explicitly searching for less accessible components in the network. We test our approach on a variety of real-world and synthetic networks with up to tens of millions of nodes, demonstrating a query complexity improvement of up to x20 compared to the state of the art.

Supplementary Material

MP4 File (WSDM22-fp116.mp4)

Short presentation of the work "Sampling multiple nodes in large networks: Beyond random walks" by Omri Ben-Eliezer, Talya Eden, Joel Oren, and Dimitris Fotakis. Speaker: Omri Ben-Eliezer.

Download
17.94 MB

References

[1]

José Ignacio Alvarez-Hamelin, Luca Dall'Asta, Alain Barrat, and Alessandro Vespignani. K-core decomposition of internet graphs: hierarchies, self-similarity and measurement biases. Networks & Heterogeneous Media, 3(2):371--393, 2008.

[2]

L. Becchetti, P. Boldi, Carlos Castillo, and Aristides Gionis. Efficient algorithms for large-scale local triangle counting. ACM Trans. Knowl. Discov. Data, 4(3):13:1--13:28, 2010.

Digital Library

[3]

Omri Ben-Eliezer, Talya Eden, Joel Oren, and Dimitris Fotakis. Sampling multiple nodes in large networks: Beyond random walks. CoRR, abs/2110.13324, 2021.

[4]

A. R. Benson and J. M. Kleinberg. Link prediction in networks with core-fringe data. In Proc. World Wide Web Conference (WWW 2019), pages 94--104, 2019.

Digital Library

[5]

Suman K. Bera and C. Seshadhri. How to count triangles, without seeing the whole graph. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 306--316, 2020.

Digital Library

[6]

C.A. Bliss, C.M. Danforth, and P.S. Dodds. Estimation of global network statistics from incomplete data. PLoS ONE, 9, 2014.

[7]

M. Borassi and E. Natale. KADABRA is an adaptive algorithm for betweenness via random approximation. ACM J. Experimental Algorithmics, 24(1):1.2:1--1.2:35, 2019.

[8]

F. Chierichetti, A. Dasgupta, R. Kumar, S. Lattanzi, and T. Sarlós. On sampling nodes in a network. In Proceedings of the 25th International World Wide Web Conference (WWW 2016), pages 471--481, 2016.

Digital Library

[9]

F. Chierichetti and S. Haddadan. On the complexity of sampling vertices uniformly from a graph. In 45th ICALP, pages 149:1--149:13, 2018.

[10]

C. Cooper, T. Radzik, and Y. Siantos. Estimating network parameters using random walks. Social Netw. Analys. Mining, 4(1):168, 2014.

[11]

A. Dasgupta, R. Kumar, and T. Sarló s. On estimating the average degree. In Proceedings of the 23rd International World Wide Web Conference (WWW 2014), pages 795--806. ACM, 2014.

Digital Library

[12]

M. De Domenico, A. Lima, P. Mougel, and M. Musolesi. The anatomy of a scientific rumor. Scientific reports, 3:2980, 10 2013.

[13]

Matteo Dell'Amico and Yves Roudier. A measurement of mixing time in social networks. In Proceedings of the 5th International Workshop on Security and Trust Management, Saint Malo, France, page 72, 2009.

[14]

S. N. Dorogovtsev, A. V. Goltsev, and J. F. F. Mendes. k-core organization of complex networks. Phys. Rev. Lett., 96:040601, 2006.

[15]

T. Eden, S. Jain, A. Pinar, D. Ron, and C. Seshadhri. Provable and practical approximations for the degree distribution using sublinear graph samples. In Proc. 2018 World Wide Web Conference WWW, pages 449--458, 2018.

Digital Library

[16]

T. Eden, D. Ron, and C. Seshadhri. On approximating the number of k-cliques in sublinear time. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, pages 722--734. ACM, 2018.

Digital Library

[17]

T. Eden and W. Rosenbaum. On sampling edges almost uniformly. In 1st Symposium on Simplicity in Algorithms, SOSA, pages 7:1--7:9, 2018.

[18]

Talya Eden, Saleet Mossel, and Ronitt Rubinfeld. Sampling multiple edges efficiently. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, APPROX/RANDOM, pages 51:1--51:15, 2021.

[19]

T. Eliassi-Rad, R.S. Caceres, and T. LaRock. Incompleteness in networks: Biases, skewed results, and some solutions. In SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD), pages 3217--3218. ACM, 2019.

Digital Library

[20]

M. Gjoka, M. Kurant, C. T. Butts, and A. Markopoulou. Walking in Facebook: A case study of unbiased sampling of OSNs. In Proc. IEEE INFOCOM, 2010.

Digital Library

[21]

D. F. Gleich. Pagerank beyond the web. SIAM Review, 57(3):321--363, 2015.

Digital Library

[22]

O. Goldreich, S. Goldwasser, and D. Ron. Property testing and its connection to learning and approximation. J. ACM, 45(4):653--750, 1998.

Digital Library

[23]

Oded Goldreich and Dana Ron. On testing expansion in bounded-degree graphs. In Studies in Complexity and Cryptography. Miscellanea on the Interplay between Randomness and Computation, pages 68--75. Springer, 2011.

Digital Library

[24]

S. Hanneke and E.P. Xing. Network completion and survey sampling. In Proc. 12th Conf. on Artificial Intelligence and Statistics (AISTATS), pages 209--215, 2009.

[25]

Shlomo Hoory, Nathan Linial, and Avi Wigderson. Expander graphs and their applications. Bull. Amer. Math. Soc., 43(4):439--561, 2006.

[26]

K. Iwasaki and K. Shudo. Comparing graph sampling methods based on the number of queries. In 11th International Conference on Social Computing and Networking, SocialCom '18, pages 1136--1143, 2018.

[27]

L. Jin, Y. Chen, P. Hui, C. Ding, T. Wang, A. V. Vasilakos, B. Deng, and X. Li. Albatross sampling: Robust and effective hybrid vertex sampling for social graphs. In Proc. 3rd ACM Intl. Workshop on MobiArch, HotPlanet '11, pages 11--16, 2011.

Digital Library

[28]

L. Katzir and S. J. Hardiman. Estimating clustering coefficients and size of social networks via random walk. ACM Trans. Web, 9(4):19:1--19:20, 2015.

Digital Library

[29]

L. Katzir, E. Liberty, O. Somekh, and I. A. Cosma. Estimating sizes of social networks via biased sampling. Internet Mathematics, 10(3--4):335--359, 2014.

[30]

Liran Katzir, Clara Shikhelman, and Eylon Yogev. Interactive proofs for social graphs. In Advances in Cryptology -- CRYPTO 2020, pages 574--601, 2020.

Digital Library

[31]

M. Kim and J. Leskovec. The network completion problem: Inferring missing nodes and edges in networks. In Proc. of the 11th SIAM International Conference on Data Mining, (SDM 2011), pages 47--58. SIAM / Omnipress, 2011.

[32]

J. M. Kleinberg. Detecting a network failure. Internet Math., 1(1):37--55, 2003.

[33]

T. LaRock, T. Sakharov, S. Bhadra, and T. Eliassi-Rad. Reducing network incompleteness through online learning: A feasibility study. In Proc. of the 14th International Workshop on Mining and Learning with Graphs (MLG). ACM, 2018.

[34]

J. Leskovec and C. Faloutsos. Sampling from large graphs. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '06, pages 631--636, 2006.

Digital Library

[35]

J. Leskovec and A. Krevl. SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data, 2014.

[36]

Jure Leskovec, Jon Kleinberg, and Christos Faloutsos. Graphs over time: densification laws, shrinking diameters and possible explanations. In Proc. 11th ACM SIGKDD conference on Knowledge discovery in data mining, pages 177--187, 2005.

Digital Library

[37]

Jure Leskovec, Jon Kleinberg, and Christos Faloutsos. Graph evolution: Densification and shrinking diameters. ACM transactions on Knowledge Discovery from Data (TKDD), 1(1):2--es, 2007.

[38]

Jure Leskovec, Kevin J Lang, Anirban Dasgupta, and Michael W Mahoney. Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Internet Mathematics, 6(1):29--123, 2009.

[39]

R. Li, J. X. Yu, L. Qin, R. Mao, and T. Jin. On random walk based graph sampling. In IEEE 31st Int. Conf. Data Engineering, pages 927--938, 2015.

[40]

A. Mohaisen, A. Yun, and Y. Kim. Measuring the mixing time of social graphs. In Proceedings of the 10th ACM SIGCOMM Conference on Internet Measurement, IMC '10, pages 383--389, 2010.

Digital Library

[41]

A. Nazi, Z. Zhou, S. Thirumuruganathan, N. Zhang, and G. Das. Walk, not wait: Faster sampling over online social networks. Proc. VLDB., 8(6):678--689, 2015.

Digital Library

[42]

Giulia Preti, Gianmarco De Francisci Morales, and Matteo Riondato. Maniacs: Approximate mining of frequent subgraph patterns through sampling. In Proc. 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 1348--1358, 2021.

Digital Library

[43]

Yi Qi, Wanyue Xu, Liwang Zhu, and Zhongzhi Zhang. Real-world networks are not always fast mixing. The Computer Journal, 2020.

[44]

M. Rahmani, A. Beckus, A. Karimian, and G. K. Atia. Scalable and robust community detection with randomized sketching. IEEE Transactions on Signal Processing, 68:962--977, 2020.

[45]

B. F. Ribeiro and D. F. Towsley. Estimating and sampling graphs with multidimensional random walks. In Proceedings of the 10th ACM SIGCOMM Internet Measurement Conference, IMC 2010, pages 390--403. ACM, 2010.

Digital Library

[46]

B. F. Ribeiro, P. Wang, F. Murai, and D. Towsley. Sampling directed graphs with random walks. In Proc. IEEE INFOCOM, pages 1692--1700, 2012.

[47]

M. Richardson, R. Agrawal, and P. Domingos. Trust management for the semantic web. In ISWC, pages 351--368, 2003.

Digital Library

[48]

M. Rombach, M. Porter, J. Fowler, and P. Mucha. Core-periphery structure in networks. SIAM Journal on Applied Mathematics, 74(1):167--190, 2014.

Digital Library

[49]

R. A. Rossi and N. K. Ahmed. The network data repository with interactive graph analytics and visualization. In Proc. 29th AAAI Conf. Artificial Intelligence, 2015.

[50]

N. Salamanos, E. Voudigari, and E. J. Yannakoudakis. Deterministic graph exploration for efficient graph sampling. Social Network Analysis and Mining, 7(1):24, 2017.

[51]

C. Seshadhri, Ali Pinar, and Tamara G. Kolda. Wedge sampling for computing clustering coefficients and triangle counts on large graphs. Statistical Analysis and Data Mining, 7(4):294--307, 2014.

Digital Library

[52]

S. Soundarajan, T. Eliassi-Rad, B. Gallagher, and A. Pinar. MaxOutProbe: An algorithm for increasing the size of partially observed networks. In Workshop on Networks in the Social and Information Sciences, 29th Conference on Neural Information Processing Systems (NIPS 2015), 2015.

[53]

S. Soundarajan, T. Eliassi-Rad, B. Gallagher, and A. Pinar. ε-WGX: adaptive edge probing for enhancing incomplete networks. In Proc. of the 2017 ACM on Web Science Conference (WebSci 2017), pages 161--170. ACM, 2017.

Digital Library

[54]

L. Takac and M. Zabovsky. Data analysis in public social networks. In International Workshop Present Day Trends of Innovations, 2012.

[55]

Jakub T"tek and Mikkel Thorup. Sampling and counting edges via vertex accesses. CoRR, abs/2107.03821, 2021.

[56]

J. Yang and J. Leskovec. Defining and evaluating network communities based on ground-truth. Knowledge and Information Systems, 42(1):181--213, 2015.

Digital Library

[57]

Se-Young Yun and Alexandre Proutiere. Community detection via random and adaptive sampling. In Conference on learning theory, pages 138--175. PMLR, 2014.

[58]

Kai Zhang, Qian Yu, Kai Lei, and Kuai Xu. Characterizing tweeting behaviors of sina weibo users via public data streaming. In Web-Age Information Management, pages 294--297. Springer, 2014.

[59]

Y. Zhang, E. D. Kolaczyk, and B. D. Spencer. Estimating network degree distributions under sampling: An inverse problem, with applications to monitoring social media networks. Annals of Applied Statistics, 9(1):166--199, 2015.

[60]

Z. Zhou, N. Zhang, Z. Gong, and G. Das. Faster random walks by rewiring online social networks on-the-fly. ACM Trans. Database Syst., 40(4):26:1--26:36, 2016.

Digital Library

Cited By

Jiao B(2024)Sampling unknown large networks restricted by low sampling ratesScientific Reports10.1038/s41598-024-64018-314:1Online publication date: 10-Jun-2024
https://doi.org/10.1038/s41598-024-64018-3
Wieczorek OHeiberger R(2024)Analyse großer NetzwerkeHandbuch Netzwerkforschung10.1007/978-3-658-37507-2_38-1(1-11)Online publication date: 1-Mar-2024
https://doi.org/10.1007/978-3-658-37507-2_38-1
Wang K(2023)An adaptive graph sampling framework for graph analyticsSocial Network Analysis and Mining10.1007/s13278-023-01157-x14:1Online publication date: 6-Dec-2023
https://doi.org/10.1007/s13278-023-01157-x

Index Terms

Sampling Multiple Nodes in Large Networks: Beyond Random Walks

Recommendations

Distributed Sampling of Social Networks: A New Approach Based on Node's Importance
Abstract
Social networks allow efficient interchange of ideas and informations which makes them important dissemination platforms. Due to the explosive use of such networks in our everyday life, their structure becomes more and more complexe. Consequently, ...
Finding Skyline Nodes in Large Networks
ICDEW '12: Proceedings of the 2012 IEEE 28th International Conference on Data Engineering Workshops

Given a query node q in some graph dataset, we say that a node u is not dominated by another node v, if (1) the distance between u and q is less than the distance between v and q, or (2) the set of query node labels carried by u is a superset of the set ...
Improving the Accuracy of Peer-to-Peer Sampling Services
ICCGI '08: Proceedings of the 2008 The Third International Multi-Conference on Computing in the Global Information Technology (iccgi 2008)

Node sampling services provide peers in a peer-to-peer system with a source of randomly chosen addresses of other nodes. Ideally, samples should be independent and uniform. The restrictions of a distributed environment, however, introduce various ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

WSDM '22: Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining

February 2022

1690 pages

ISBN:9781450391320

DOI:10.1145/3488560

General Chairs:
K. Selcuk Candan
Arizona State University, USA
,
Huan Liu
Arizona State University, USA
,
Program Chairs:
Leman Akoglu
Carnegie Mellon University, USA
,
Xin Luna Dong
Meta Platforms, Inc. (former Facebook), USA
,
Jiliang Tang
Michigan State University, USA

Copyright © 2022 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 February 2022

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Science Foundation
Eric and Wendy Schmidt Fund for Strategic Innovation
National Technical University of Athens
Ben-Gurion University

Conference

WSDM '22

Sponsor:

WSDM '22: The Fifteenth ACM International Conference on Web Search and Data Mining

February 21 - 25, 2022

AZ, Virtual Event, USA

Acceptance Rates

Overall Acceptance Rate 498 of 2,863 submissions, 17%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
447
Total Downloads

Downloads (Last 12 months)136
Downloads (Last 6 weeks)16

Reflects downloads up to 20 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Jiao B(2024)Sampling unknown large networks restricted by low sampling ratesScientific Reports10.1038/s41598-024-64018-314:1Online publication date: 10-Jun-2024
https://doi.org/10.1038/s41598-024-64018-3
Wieczorek OHeiberger R(2024)Analyse großer NetzwerkeHandbuch Netzwerkforschung10.1007/978-3-658-37507-2_38-1(1-11)Online publication date: 1-Mar-2024
https://doi.org/10.1007/978-3-658-37507-2_38-1
Wang K(2023)An adaptive graph sampling framework for graph analyticsSocial Network Analysis and Mining10.1007/s13278-023-01157-x14:1Online publication date: 6-Dec-2023
https://doi.org/10.1007/s13278-023-01157-x

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents