[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3627673.3679739acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

SGES: A General and Space-efficient Framework for Graphlet Counting in Graph Streams

Published: 21 October 2024 Publication History

Abstract

Graphlets are small, connected, and non-isomorphic induced subgraphs that describe the topological structure of a graph. Counting graphlets is a fundamental task in graph mining and social network analysis. It has numerous applications in many fields, including dense subgraph discovery, anomaly detection, etc. Most existing work assumes a static graph. However, graphs are dynamic in the real world, which can be described as graph streams. Counting graphlets in graph streams is a challenge due to the streaming nature of the input. While there have been several studies on counting graphlets in graph streams, these works are limited to simple graphlets like triangles and butterflies. In this paper, we propose SGES algorithm to estimate more complex graphlets in graph streams. In SGES, we first propose an unbiased sampling strategy to maintain fixed-size sampled edges, which in turn allows us to unbiasedly estimate the number of subgraphs and then count graphlets based on the combinational relationship between the number of subgraphs and the number of graphlets. Extensive experiments over large real-world graph streams prove that our algorithm can obtain accurate estimation values of graphlet counts with high throughput.

References

[1]
Nesreen K Ahmed, Nick Duffield, Jennifer Neville, and Ramana Kompella. 2014. Graph sample and hold: A framework for big-graph analytics. In KDD. ACM.
[2]
Nesreen K Ahmed, Jennifer Neville, Ryan A Rossi, Nick G Duffield, and Theodore L Willke. 2016. Graphlet decomposition: Framework, algorithms, and applications. Knowledge and Information Systems (2016), 1--34.
[3]
Sinan G Aksoy, Tamara G Kolda, and Ali Pinar. 2017. Measuring and modeling bipartite graphs with community structure. Journal of Complex Networks, Vol. 5, 4 (2017), 581--603.
[4]
Mansurul A Bhuiyan, Mahmudur Rahman, and M Al Hasan. 2012. Guise: Uniform sampling of graphlets for large graph analysis. In ICDM. IEEE.
[5]
Yixiang Fang, Xin Huang, Lu Qin, Ying Zhang, Wenjie Zhang, Reynold Cheng, and Xuemin Lin. 2020. A survey of community search over big graphs. The VLDB Journal, Vol. 29 (2020), 353--392.
[6]
Yixiang Fang, Kai Wang, Xuemin Lin, and Wenjie Zhang. 2021. Cohesive subgraph search over big heterogeneous information networks: Applications, challenges, and solutions. In Proceedings of the 2021 International Conference on Management of Data. 2829--2838.
[7]
Rainer Gemulla, Wolfgang Lehner, and Peter J Haas. 2008. Maintaining bounded-size sample synopses of evolving datasets. The VLDB Journal, Vol. 17 (2008), 173--201.
[8]
Guyue Han and Harish Sethu. 2016. Waddling Random Walk: Fast and Accurate Mining of Motif Statistics in Large Graphs. In ICDM. IEEE.
[9]
Guyue Han and Harish Sethu. 2017. Edge sample and discard: A new algorithm for counting triangles in large dynamic graphs. In Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017. 44--49.
[10]
Peng Jia, Pinghui Wang, Jing Tao, and Xiaohong Guan. 2019. A fast sketch method for mining user similarities over fully dynamic graph streams. In 2019 IEEE 35th International Conference on Data Engineering (ICDE). IEEE, 1682--1685.
[11]
Konstantin Kutzkov and Rasmus Pagh. 2014. Triangle counting in dynamic graph streams. In Scandinavian Workshop on Algorithm Theory. Springer, 306--318.
[12]
Jure Leskovec and Andrej Krevl. 2014. SNAP Datasets: Stanford Large Network Dataset Collection. http://snap.stanford.edu/data.
[13]
Rundong Li, Pinghui Wang, Peng Jia, Xiangliang Zhang, Junzhou Zhao, Jing Tao, Ye Yuan, and Xiaohong Guan. 2021. Approximately counting butterflies in large bipartite graph streams. IEEE Transactions on Knowledge and Data Engineering, Vol. 34, 12 (2021), 5621--5635.
[14]
Pedro G Lind, Marta C Gonzalez, and Hans J Herrmann. 2005. Cycles and clustering in bipartite networks. Physical review E, Vol. 72, 5 (2005), 056127.
[15]
Dror Marcus and Yuval Shavitt. 2012. Rage--a rapid graphlet enumerator for large networks. Computer Networks, Vol. 56, 2 (2012), 810--819.
[16]
Tore Opsahl. 2013. Triadic closure in two-mode networks: Redefining the global and local clustering coefficients. Social networks, Vol. 35, 2 (2013), 159--167.
[17]
Rasmus Pagh and Charalampos E Tsourakakis. 2012. Colorful triangle counting and a mapreduce implementation. Inform. Process. Lett., Vol. 112, 7 (2012), 277--281.
[18]
Serafeim Papadias. 2020. Tunable Streaming Graph Embeddings at Scale. In PhD@ VLDB.
[19]
Serafeim Papadias, Zoi Kaoudi, Varun Pandey, Jorge-Arnulfo Quiane-Ruiz, and Volker Markl. 2023. Counting Butterflies in Fully Dynamic Bipartite Graph Streams. arxiv: 2312.03435 [cs.DB]
[20]
Kirill Paramonov, Dmitry Shemetov, and James Sharpnack. 2019. Estimating graphlet statistics via lifting. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 587--595.
[21]
Ali Pinar, C Seshadhri, and V Vishal. 2016. ESCAPE: Efficiently Counting All 5-Vertex Subgraphs. arXiv preprint arXiv:1610.09411 (2016).
[22]
Xiafei Qiu, Wubin Cen, Zhengping Qian, You Peng, Ying Zhang, Xuemin Lin, and Jingren Zhou. 2018. Real-time constrained cycle detection in large dynamic graphs. Proceedings of the VLDB Endowment, Vol. 11, 12 (2018), 1876--1888.
[23]
Yuan Qiu, Serafeim Papadias, and Ke Yi. 2019. Streaming HyperCube: A Massively Parallel Stream Join Algorithm. In EDBT. 642--645.
[24]
Stephen Ranshous, Shitian Shen, Danai Koutra, Steve Harenberg, Christos Faloutsos, and Nagiza F Samatova. 2015. Anomaly detection in dynamic networks: a survey. Wiley Interdisciplinary Reviews: Computational Statistics, Vol. 7, 3 (2015), 223--247.
[25]
Pedro Ribeiro and Fernando Silva. 2010. G-tries: an efficient data structure for discovering network motifs. In Proceedings of the 2010 ACM Symposium on Applied Computing. ACM, 1559--1566.
[26]
Seyed-Vahid Sanei-Mehri, Ahmet Erdem Sariyuce, and Srikanta Tirthapura. 2018. Butterfly counting in bipartite networks. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2150--2159.
[27]
Aida Sheshbolouki and M Tamer Özsu. 2022. sGrapp: Butterfly approximation in streaming graphs. ACM Transactions on Knowledge Discovery from Data (TKDD), Vol. 16, 4 (2022), 1--43.
[28]
Kijung Shin, Jisu Kim, Bryan Hooi, and Christos Faloutsos. 2018. Think before you discard: Accurate triangle counting in graph streams with deletions. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 141--157.
[29]
Kijung Shin, Sejoon Oh, Jisu Kim, Bryan Hooi, and Christos Faloutsos. 2020. Fast, accurate and provable triangle counting in fully dynamic graph streams. ACM Transactions on Knowledge Discovery from Data (TKDD), Vol. 14, 2 (2020), 1--39.
[30]
Lorenzo De Stefani, Alessandro Epasto, Matteo Riondato, and Eli Upfal. 2017. Triest: Counting local and global triangles in fully dynamic streams with fixed memory size. ACM Transactions on Knowledge Discovery from Data (TKDD), Vol. 11, 4 (2017), 1--50.
[31]
Jia Wang, Ada Wai-Chee Fu, and James Cheng. 2014. Rectangle counting in large bipartite graphs. In 2014 IEEE International Congress on Big Data. IEEE, 17--24.
[32]
Kai Wang, Xuemin Lin, Lu Qin, Wenjie Zhang, and Ying Zhang. 2019. Vertex Priority Based Butterfly Counting for Large-scale Bipartite Networks. PVLDB (2019).
[33]
Pinghui Wang, Junzhou Zhao, John CS Lui, Don Towsley, and Xiaohong Guan. 2013. Sampling node pairs over large graphs. In ICDE. IEEE, 781--792.
[34]
Chen Yang, Min Lyu, Yongkun Li, Qianqian Zhao, and Yinlong Xu. 2018. Ssrw: A scalable algorithm for estimating graphlet statistics based on random walk. In International Conference on Database Systems for Advanced Applications. Springer, 272--288.
[35]
Jingren Zhou. 2019. Managing, analyzing, and learning heterogeneous graph data: Challenges and opportunities. (2019).

Index Terms

  1. SGES: A General and Space-efficient Framework for Graphlet Counting in Graph Streams

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      CIKM '24: Proceedings of the 33rd ACM International Conference on Information and Knowledge Management
      October 2024
      5705 pages
      ISBN:9798400704369
      DOI:10.1145/3627673
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 21 October 2024

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. count
      2. estimation
      3. graphlet
      4. streams
      5. subgraph

      Qualifiers

      • Research-article

      Conference

      CIKM '24
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

      Upcoming Conference

      CIKM '25

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 72
        Total Downloads
      • Downloads (Last 12 months)72
      • Downloads (Last 6 weeks)61
      Reflects downloads up to 11 Dec 2024

      Other Metrics

      Citations

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media