[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3488560.3498383acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article
Open access

Sampling Multiple Nodes in Large Networks: Beyond Random Walks

Published: 15 February 2022 Publication History

Abstract

Sampling random nodes is a fundamental algorithmic primitive in the analysis of massive networks, with many modern graph mining algorithms critically relying on it. We consider the task of generating a large collection of random nodes in the network assuming limited query access (where querying a node reveals its set of neighbors). In current approaches, based on long random walks, the number of queries per sample scales linearly with the mixing time of the network, which can be prohibitive for large real-world networks. We propose a new method for sampling multiple nodes that bypasses the dependence in the mixing time by explicitly searching for less accessible components in the network. We test our approach on a variety of real-world and synthetic networks with up to tens of millions of nodes, demonstrating a query complexity improvement of up to x20 compared to the state of the art.

Supplementary Material

MP4 File (WSDM22-fp116.mp4)
Short presentation of the work "Sampling multiple nodes in large networks: Beyond random walks" by Omri Ben-Eliezer, Talya Eden, Joel Oren, and Dimitris Fotakis. Speaker: Omri Ben-Eliezer.

References

[1]
José Ignacio Alvarez-Hamelin, Luca Dall'Asta, Alain Barrat, and Alessandro Vespignani. K-core decomposition of internet graphs: hierarchies, self-similarity and measurement biases. Networks & Heterogeneous Media, 3(2):371--393, 2008.
[2]
L. Becchetti, P. Boldi, Carlos Castillo, and Aristides Gionis. Efficient algorithms for large-scale local triangle counting. ACM Trans. Knowl. Discov. Data, 4(3):13:1--13:28, 2010.
[3]
Omri Ben-Eliezer, Talya Eden, Joel Oren, and Dimitris Fotakis. Sampling multiple nodes in large networks: Beyond random walks. CoRR, abs/2110.13324, 2021.
[4]
A. R. Benson and J. M. Kleinberg. Link prediction in networks with core-fringe data. In Proc. World Wide Web Conference (WWW 2019), pages 94--104, 2019.
[5]
Suman K. Bera and C. Seshadhri. How to count triangles, without seeing the whole graph. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 306--316, 2020.
[6]
C.A. Bliss, C.M. Danforth, and P.S. Dodds. Estimation of global network statistics from incomplete data. PLoS ONE, 9, 2014.
[7]
M. Borassi and E. Natale. KADABRA is an adaptive algorithm for betweenness via random approximation. ACM J. Experimental Algorithmics, 24(1):1.2:1--1.2:35, 2019.
[8]
F. Chierichetti, A. Dasgupta, R. Kumar, S. Lattanzi, and T. Sarlós. On sampling nodes in a network. In Proceedings of the 25th International World Wide Web Conference (WWW 2016), pages 471--481, 2016.
[9]
F. Chierichetti and S. Haddadan. On the complexity of sampling vertices uniformly from a graph. In 45th ICALP, pages 149:1--149:13, 2018.
[10]
C. Cooper, T. Radzik, and Y. Siantos. Estimating network parameters using random walks. Social Netw. Analys. Mining, 4(1):168, 2014.
[11]
A. Dasgupta, R. Kumar, and T. Sarló s. On estimating the average degree. In Proceedings of the 23rd International World Wide Web Conference (WWW 2014), pages 795--806. ACM, 2014.
[12]
M. De Domenico, A. Lima, P. Mougel, and M. Musolesi. The anatomy of a scientific rumor. Scientific reports, 3:2980, 10 2013.
[13]
Matteo Dell'Amico and Yves Roudier. A measurement of mixing time in social networks. In Proceedings of the 5th International Workshop on Security and Trust Management, Saint Malo, France, page 72, 2009.
[14]
S. N. Dorogovtsev, A. V. Goltsev, and J. F. F. Mendes. k-core organization of complex networks. Phys. Rev. Lett., 96:040601, 2006.
[15]
T. Eden, S. Jain, A. Pinar, D. Ron, and C. Seshadhri. Provable and practical approximations for the degree distribution using sublinear graph samples. In Proc. 2018 World Wide Web Conference WWW, pages 449--458, 2018.
[16]
T. Eden, D. Ron, and C. Seshadhri. On approximating the number of k-cliques in sublinear time. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, pages 722--734. ACM, 2018.
[17]
T. Eden and W. Rosenbaum. On sampling edges almost uniformly. In 1st Symposium on Simplicity in Algorithms, SOSA, pages 7:1--7:9, 2018.
[18]
Talya Eden, Saleet Mossel, and Ronitt Rubinfeld. Sampling multiple edges efficiently. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, APPROX/RANDOM, pages 51:1--51:15, 2021.
[19]
T. Eliassi-Rad, R.S. Caceres, and T. LaRock. Incompleteness in networks: Biases, skewed results, and some solutions. In SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD), pages 3217--3218. ACM, 2019.
[20]
M. Gjoka, M. Kurant, C. T. Butts, and A. Markopoulou. Walking in Facebook: A case study of unbiased sampling of OSNs. In Proc. IEEE INFOCOM, 2010.
[21]
D. F. Gleich. Pagerank beyond the web. SIAM Review, 57(3):321--363, 2015.
[22]
O. Goldreich, S. Goldwasser, and D. Ron. Property testing and its connection to learning and approximation. J. ACM, 45(4):653--750, 1998.
[23]
Oded Goldreich and Dana Ron. On testing expansion in bounded-degree graphs. In Studies in Complexity and Cryptography. Miscellanea on the Interplay between Randomness and Computation, pages 68--75. Springer, 2011.
[24]
S. Hanneke and E.P. Xing. Network completion and survey sampling. In Proc. 12th Conf. on Artificial Intelligence and Statistics (AISTATS), pages 209--215, 2009.
[25]
Shlomo Hoory, Nathan Linial, and Avi Wigderson. Expander graphs and their applications. Bull. Amer. Math. Soc., 43(4):439--561, 2006.
[26]
K. Iwasaki and K. Shudo. Comparing graph sampling methods based on the number of queries. In 11th International Conference on Social Computing and Networking, SocialCom '18, pages 1136--1143, 2018.
[27]
L. Jin, Y. Chen, P. Hui, C. Ding, T. Wang, A. V. Vasilakos, B. Deng, and X. Li. Albatross sampling: Robust and effective hybrid vertex sampling for social graphs. In Proc. 3rd ACM Intl. Workshop on MobiArch, HotPlanet '11, pages 11--16, 2011.
[28]
L. Katzir and S. J. Hardiman. Estimating clustering coefficients and size of social networks via random walk. ACM Trans. Web, 9(4):19:1--19:20, 2015.
[29]
L. Katzir, E. Liberty, O. Somekh, and I. A. Cosma. Estimating sizes of social networks via biased sampling. Internet Mathematics, 10(3--4):335--359, 2014.
[30]
Liran Katzir, Clara Shikhelman, and Eylon Yogev. Interactive proofs for social graphs. In Advances in Cryptology -- CRYPTO 2020, pages 574--601, 2020.
[31]
M. Kim and J. Leskovec. The network completion problem: Inferring missing nodes and edges in networks. In Proc. of the 11th SIAM International Conference on Data Mining, (SDM 2011), pages 47--58. SIAM / Omnipress, 2011.
[32]
J. M. Kleinberg. Detecting a network failure. Internet Math., 1(1):37--55, 2003.
[33]
T. LaRock, T. Sakharov, S. Bhadra, and T. Eliassi-Rad. Reducing network incompleteness through online learning: A feasibility study. In Proc. of the 14th International Workshop on Mining and Learning with Graphs (MLG). ACM, 2018.
[34]
J. Leskovec and C. Faloutsos. Sampling from large graphs. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '06, pages 631--636, 2006.
[35]
J. Leskovec and A. Krevl. SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data, 2014.
[36]
Jure Leskovec, Jon Kleinberg, and Christos Faloutsos. Graphs over time: densification laws, shrinking diameters and possible explanations. In Proc. 11th ACM SIGKDD conference on Knowledge discovery in data mining, pages 177--187, 2005.
[37]
Jure Leskovec, Jon Kleinberg, and Christos Faloutsos. Graph evolution: Densification and shrinking diameters. ACM transactions on Knowledge Discovery from Data (TKDD), 1(1):2--es, 2007.
[38]
Jure Leskovec, Kevin J Lang, Anirban Dasgupta, and Michael W Mahoney. Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Internet Mathematics, 6(1):29--123, 2009.
[39]
R. Li, J. X. Yu, L. Qin, R. Mao, and T. Jin. On random walk based graph sampling. In IEEE 31st Int. Conf. Data Engineering, pages 927--938, 2015.
[40]
A. Mohaisen, A. Yun, and Y. Kim. Measuring the mixing time of social graphs. In Proceedings of the 10th ACM SIGCOMM Conference on Internet Measurement, IMC '10, pages 383--389, 2010.
[41]
A. Nazi, Z. Zhou, S. Thirumuruganathan, N. Zhang, and G. Das. Walk, not wait: Faster sampling over online social networks. Proc. VLDB., 8(6):678--689, 2015.
[42]
Giulia Preti, Gianmarco De Francisci Morales, and Matteo Riondato. Maniacs: Approximate mining of frequent subgraph patterns through sampling. In Proc. 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 1348--1358, 2021.
[43]
Yi Qi, Wanyue Xu, Liwang Zhu, and Zhongzhi Zhang. Real-world networks are not always fast mixing. The Computer Journal, 2020.
[44]
M. Rahmani, A. Beckus, A. Karimian, and G. K. Atia. Scalable and robust community detection with randomized sketching. IEEE Transactions on Signal Processing, 68:962--977, 2020.
[45]
B. F. Ribeiro and D. F. Towsley. Estimating and sampling graphs with multidimensional random walks. In Proceedings of the 10th ACM SIGCOMM Internet Measurement Conference, IMC 2010, pages 390--403. ACM, 2010.
[46]
B. F. Ribeiro, P. Wang, F. Murai, and D. Towsley. Sampling directed graphs with random walks. In Proc. IEEE INFOCOM, pages 1692--1700, 2012.
[47]
M. Richardson, R. Agrawal, and P. Domingos. Trust management for the semantic web. In ISWC, pages 351--368, 2003.
[48]
M. Rombach, M. Porter, J. Fowler, and P. Mucha. Core-periphery structure in networks. SIAM Journal on Applied Mathematics, 74(1):167--190, 2014.
[49]
R. A. Rossi and N. K. Ahmed. The network data repository with interactive graph analytics and visualization. In Proc. 29th AAAI Conf. Artificial Intelligence, 2015.
[50]
N. Salamanos, E. Voudigari, and E. J. Yannakoudakis. Deterministic graph exploration for efficient graph sampling. Social Network Analysis and Mining, 7(1):24, 2017.
[51]
C. Seshadhri, Ali Pinar, and Tamara G. Kolda. Wedge sampling for computing clustering coefficients and triangle counts on large graphs. Statistical Analysis and Data Mining, 7(4):294--307, 2014.
[52]
S. Soundarajan, T. Eliassi-Rad, B. Gallagher, and A. Pinar. MaxOutProbe: An algorithm for increasing the size of partially observed networks. In Workshop on Networks in the Social and Information Sciences, 29th Conference on Neural Information Processing Systems (NIPS 2015), 2015.
[53]
S. Soundarajan, T. Eliassi-Rad, B. Gallagher, and A. Pinar. ε-WGX: adaptive edge probing for enhancing incomplete networks. In Proc. of the 2017 ACM on Web Science Conference (WebSci 2017), pages 161--170. ACM, 2017.
[54]
L. Takac and M. Zabovsky. Data analysis in public social networks. In International Workshop Present Day Trends of Innovations, 2012.
[55]
Jakub T"tek and Mikkel Thorup. Sampling and counting edges via vertex accesses. CoRR, abs/2107.03821, 2021.
[56]
J. Yang and J. Leskovec. Defining and evaluating network communities based on ground-truth. Knowledge and Information Systems, 42(1):181--213, 2015.
[57]
Se-Young Yun and Alexandre Proutiere. Community detection via random and adaptive sampling. In Conference on learning theory, pages 138--175. PMLR, 2014.
[58]
Kai Zhang, Qian Yu, Kai Lei, and Kuai Xu. Characterizing tweeting behaviors of sina weibo users via public data streaming. In Web-Age Information Management, pages 294--297. Springer, 2014.
[59]
Y. Zhang, E. D. Kolaczyk, and B. D. Spencer. Estimating network degree distributions under sampling: An inverse problem, with applications to monitoring social media networks. Annals of Applied Statistics, 9(1):166--199, 2015.
[60]
Z. Zhou, N. Zhang, Z. Gong, and G. Das. Faster random walks by rewiring online social networks on-the-fly. ACM Trans. Database Syst., 40(4):26:1--26:36, 2016.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
WSDM '22: Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining
February 2022
1690 pages
ISBN:9781450391320
DOI:10.1145/3488560
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 February 2022

Check for updates

Author Tags

  1. graph and network sampling
  2. node sampling

Qualifiers

  • Research-article

Funding Sources

Conference

WSDM '22

Acceptance Rates

Overall Acceptance Rate 498 of 2,863 submissions, 17%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)136
  • Downloads (Last 6 weeks)16
Reflects downloads up to 20 Dec 2024

Other Metrics

Citations

Cited By

View all

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media