[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3292500.3330995acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article
Open access

Estimating Graphlet Statistics via Lifting

Published: 25 July 2019 Publication History

Abstract

Exploratory analysis over network data is often limited by the ability to efficiently calculate graph statistics, which can provide a model-free understanding of the macroscopic properties of a network. We introduce a framework for estimating the graphlet count---the number of occurrences of a small subgraph motif (e.g. a wedge or a triangle) in the network. For massive graphs, where accessing the whole graph is not possible, the only viable algorithms are those that make a limited number of vertex neighborhood queries. We introduce a Monte Carlo sampling technique for graphlet counts, called Lifting, which can simultaneously sample all graphlets of size up to k vertices for arbitrary k. This is the first graphlet sampling method that can provably sample every graphlet with positive probability and can sample graphlets of arbitrary size k. We outline variants of lifted graphlet counts, including the ordered, unordered, and shotgun estimators, random walk starts, and parallel vertex starts. We prove that our graphlet count updates are unbiased for the true graphlet count and have a controlled variance for all graphlets. We compare the experimental performance of lifted graphlet counts to the state-of-the art graphlet sampling procedures: Waddling and the pairwise subgraph random walk.

References

[1]
Nesreen K. Ahmed, Jennifer Neville, Ryan A. Rossi, Nick G. Duffield, and Theodore L. Willke. 2017. Graphlet decomposition: framework, algorithms, and applications. Knowledge and Information Systems 50, 3 (01 Mar 2017), 689--722.
[2]
Albert-László Barabási and Réka Albert. 1999. Emergence of scaling in random networks. science 286, 5439 (1999), 509--512.
[3]
Mansurul A Bhuiyan, Mahmudur Rahman, andMAl Hasan. 2012. Guise: Uniform sampling of graphlets for large graph analysis. In Data Mining (ICDM), 2012 IEEE 12th International Conference on. IEEE, 91--100.
[4]
Peter J Bickel, Aiyou Chen, Elizaveta Levina, and others. 2011. The method of moments and degree distributions for network models. The Annals of Statistics 39, 5 (2011), 2280--2301.
[5]
Marco Bressan, Flavio Chierichetti, Ravi Kumar, Stefano Leucci, and Alessandro Panconesi. 2017. Counting Graphlets: Space vs Time. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining (WSDM '17). ACM, New York, NY, USA, 557--566.
[6]
Xiaowei Chen, Yongkun Li, Pinghui Wang, and John Lui. 2016. A general framework for estimating graphlet statistics via random walk. Proceedings of the VLDB Endowment 10, 3 (2016), 253--264.
[7]
James A Davis. 1970. Clustering and hierarchy in interpersonal relations: Testing two graph theoretical models on 742 sociomatrices. American Sociological Review (1970), 843--851.
[8]
Ove Frank and David Strauss. 1986. Markov graphs. Journal of the american Statistical association 81, 395 (1986), 832--842.
[9]
Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 855--864.
[10]
Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. In Advances in Neural Information Processing Systems. 1024--1034.
[11]
Guyue Han and Harish Sethu. 2016. Waddling Random Walk: Fast and Accurate Mining of Motif Statistics in Large Graphs. 2016 IEEE 16th International Conference on Data Mining (ICDM) (2016), 181--190.
[12]
Brendan D McKay and Adolfo Piperno. 2014. Practical graph isomorphism, II. Journal of Symbolic Computation 60 (2014), 94--112.
[13]
Ron Milo, Shai Shen-Orr, Shalev Itzkovitz, Nadav Kashtan, Dmitri Chklovskii, and Uri Alon. 2002. Network motifs: simple building blocks of complex networks. Science 298, 5594 (2002), 824--827.
[14]
Ali Pinar, C Seshadhri, and Vaidyanathan Vishal. 2017. Escape: Efficiently counting all 5-vertex subgraphs. In Proceedings of the 26th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 1431--1440.
[15]
Natasa Prulj, Derek G Corneil, and Igor Jurisica. 2004. Modeling interactome: scale-free or geometric? Bioinformatics 20, 18 (2004), 3508--3515.
[16]
N Prulj, Derek G Corneil, and Igor Jurisica. 2006. Efficient estimation of graphlet frequency distributions in protein--protein interaction networks. Bioinformatics 22, 8 (2006), 974--980.
[17]
Mahmudur Rahman, Mansurul Alam Bhuiyan, and Mohammad Al Hasan. 2014. Graft: An efficient graphlet counting method for large graph analysis. IEEE Transactions on Knowledge and Data Engineering 26, 10 (2014), 2466--2478.
[18]
Ryan A. Rossi and Nesreen K. Ahmed. 2015. The Network Data Repository with Interactive Graph Analytics and Visualization. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence. http://networkrepository.com
[19]
Alistair Sinclair. 1992. Improved bounds for mixing rates of Markov chains and multicommodity flow. Springer Berlin Heidelberg, Berlin, Heidelberg, 474--487.
[20]
Tom AB Snijders. 2002. Markov Chain Monte Carlo Estimation of Exponential Random Graph Models. In Journal of Social Structure. Citeseer.
[21]
Pinghui Wang, John C. S. Lui, Bruno Ribeiro, Don Towsley, Junzhou Zhao, and Xiaohong Guan. 2014. Efficiently Estimating Motif Statistics of Large Networks. ACM Trans. Knowl. Discov. Data 9, 2, Article 8 (Sept. 2014), 27 pages.
[22]
StanleyWasserman and Philippa Pattison. 1996. Logit models and logistic regressions for social networks: I. An introduction to Markov graphs andp. Psychometrika 61, 3 (1996), 401--425.
[23]
Duncan J Watts and Steven H Strogatz. 1998. Collective dynamics of 'smallworld'networks. nature 393, 6684 (1998), 440--442.

Cited By

View all
  • (2024)Fast and Perfect Sampling of Subgraphs and Polymer SystemsACM Transactions on Algorithms10.1145/363229420:1(1-30)Online publication date: 22-Jan-2024
  • (2024)SGES: A General and Space-efficient Framework for Graphlet Counting in Graph StreamsProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679739(2817-2826)Online publication date: 21-Oct-2024
  • (2023)Efficient and Near-optimal Algorithms for Sampling Small Connected SubgraphsACM Transactions on Algorithms10.1145/359649519:3(1-40)Online publication date: 31-Jul-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
July 2019
3305 pages
ISBN:9781450362016
DOI:10.1145/3292500
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 July 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. markov-chain monte carlo
  2. networks

Qualifiers

  • Research-article

Funding Sources

  • NSF DMS

Conference

KDD '19
Sponsor:

Acceptance Rates

KDD '19 Paper Acceptance Rate 110 of 1,200 submissions, 9%;
Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)97
  • Downloads (Last 6 weeks)14
Reflects downloads up to 18 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Fast and Perfect Sampling of Subgraphs and Polymer SystemsACM Transactions on Algorithms10.1145/363229420:1(1-30)Online publication date: 22-Jan-2024
  • (2024)SGES: A General and Space-efficient Framework for Graphlet Counting in Graph StreamsProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679739(2817-2826)Online publication date: 21-Oct-2024
  • (2023)Efficient and Near-optimal Algorithms for Sampling Small Connected SubgraphsACM Transactions on Algorithms10.1145/359649519:3(1-40)Online publication date: 31-Jul-2023
  • (2023)MaNIACS: Approximate Mining of Frequent Subgraph Patterns through SamplingACM Transactions on Intelligent Systems and Technology10.1145/3587254Online publication date: 10-Mar-2023
  • (2023)A Graph Entropy Measure From Urelement to Higher-Order Graphlets for Network AnalysisIEEE Transactions on Network Science and Engineering10.1109/TNSE.2022.321680310:2(631-644)Online publication date: 1-Mar-2023
  • (2023)Index-free triangle-based graph local clusteringFrontiers of Computer Science10.1007/s11704-023-2768-718:3Online publication date: 13-Dec-2023
  • (2022)Mosar: Efficiently Characterizing Both Frequent and Rare Motifs in Large GraphsApplied Sciences10.3390/app1214721012:14(7210)Online publication date: 18-Jul-2022
  • (2021)On analyzing graphs with motif-pathsProceedings of the VLDB Endowment10.14778/3447689.344771414:6(1111-1123)Online publication date: 12-Apr-2021
  • (2021)Faster Motif Counting via Succinct Color Coding and Adaptive SamplingACM Transactions on Knowledge Discovery from Data10.1145/344739715:6(1-27)Online publication date: 19-May-2021
  • (2021)Tiered SamplingACM Transactions on Knowledge Discovery from Data10.1145/344129915:5(1-52)Online publication date: 10-May-2021
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media