[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3539597.3570441acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article
Open access

Graph Summarization via Node Grouping: A Spectral Algorithm

Published: 27 February 2023 Publication History

Abstract

Graph summarization via node grouping is a popular method to build concise graph representations by grouping nodes from the original graph into supernodes and encoding edges into superedges such that the loss of adjacency information is minimized. Such summaries have immense applications in large-scale graph analytics due to their small size and high query processing efficiency. In this paper, we reformulate the loss minimization problem for summarization into an equivalent integer maximization problem. By initially allowing relaxed (fractional) solutions for integer maximization, we analytically expose the underlying connections to the spectral properties of the adjacency matrix. Consequently, we design an algorithm called SpecSumm that consists of two phases. In the first phase, motivated by spectral graph theory, we apply k-means clustering on the k largest (in magnitude) eigenvectors of the adjacency matrix to assign nodes to supernodes. In the second phase, we propose a greedy heuristic that updates the initial assignment to further improve summary quality. Finally, via extensive experiments on 11 datasets, we show that SpecSumm efficiently produces high-quality summaries compared to state-of-the-art summarization algorithms and scales to graphs with millions of nodes.

Supplementary Material

MP4 File (WSDM23-fp0453.mp4)
We present an overview of graph summarization along with analytical and empirical evaluations of our algorithms for this task.

References

[1]
Emmanuel Abbe, Afonso S. Bandeira, and Georgina Hall. 2016. Exact Recovery in the Stochastic Block Model. IEEE Trans. Inf. Theory 62, 1 (2016), 471--487.
[2]
James Baglama and Lothar Reichel. 2005. Augmented Implicitly Restarted Lanczos Bidiagonalization Methods. SIAM J. Sci. Comput. 27, 1 (2005), 19--42.
[3]
Maham Anwar Beg, Muhammad Ahmad, Arif Zaman, and Imdadullah Khan. 2018. Scalable Approximation Algorithm for Graph Summarization. In Advances in Knowledge Discovery and Data Mining. Springer, Cham, 502--514.
[4]
Paolo Boldi, Marco Rosa, Massimo Santini, and Sebastiano Vigna. 2011. Layered Label Propagation: A Multiresolution Coordinate-Free Ordering for Compressing Social Networks. In Proceedings of the 20th International Conference on World Wide Web (WWW '11). ACM, New York, NY, USA, 587--596.
[5]
Paolo Boldi and Sebastiano Vigna. 2004. The Webgraph Framework I: Compression Techniques. In Proceedings of the 13th International Conference on World Wide Web (WWW '04). ACM, New York, NY, USA, 595--602.
[6]
Giorgos Bouritsas, Andreas Loukas, Nikolaos Karalias, and Michael Bronstein. 2021. Partition and Code: learning how to compress graphs. Advances in Neural Information Processing Systems 34 (2021), 18603--18619.
[7]
Gregory Buehrer and Kumar Chellapilla. 2008. A Scalable Pattern Mining Approach to Web Graph Compression with Communities. In Proceedings of the 2008 International Conference on Web Search and Data Mining (WSDM '08). ACM, New York, NY, USA, 95--106.
[8]
Flavio Chierichetti, Ravi Kumar, Silvio Lattanzi, Michael Mitzenmacher, Alessandro Panconesi, and Prabhakar Raghavan. 2009. On Compressing Social Networks. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '09). ACM, New York, NY, USA, 219--228.
[9]
Graham Cormode and S. Muthukrishnan. 2005. An improved data stream summary: the count-min sketch and its applications. J. Algorithms 55, 1 (2005), 58--75.
[10]
Laxman Dhulipala, Igor Kabiljo, Brian Karrer, Giuseppe Ottaviano, Sergey Pupyrev, and Alon Shalita. 2016. Compressing Graphs and Indexes with Recursive Graph Bisection. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '16). ACM, New York, NY, USA, 1535--1544.
[11]
Cody Dunne and Ben Shneiderman. 2013. Motif Simplification: Improving Network Visualization Readability with Fan, Connector, and Clique Glyphs. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '13). ACM, New York, NY, USA, 3247--3256.
[12]
Wenfei Fan, Jianzhong Li, Xin Wang, and Yinghui Wu. 2012. Query Preserving Graph Compression. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data (SIGMOD '12). ACM, New York, NY, USA, 157--168.
[13]
Wenfei Fan, Yuanhao Li, Muyang Liu, and Can Lu. 2021. Making Graphs Compact by Lossless Contraction. In Proceedings of the 2021 International Conference on Management of Data (SIGMOD '21). ACM, New York, NY, USA, 472--484.
[14]
Robert Görke, Pascal Maillard, Christian Staudt, and Dorothea Wagner. 2010. Modularity-Driven Clustering of Dynamic Graphs. In Experimental Algorithms. Springer, Berlin, Heidelberg, 436--448.
[15]
Mahdi Hajiabadi, Jasbir Singh, Venkatesh Srinivasan, and Alex Thomo. 2021. Graph Summarization with Controlled Utility Loss. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining (KDD '21). ACM, New York, NY, USA, 536--546.
[16]
Kifayat-Ullah Khan, Waqas Nawaz, and Young-Koo Lee. 2015. Set-based approximate approach for lossless graph summarization. Computing 97, 12 (2015), 1185--1207.
[17]
Jihoon Ko, Yunbum Kook, and Kijung Shin. 2020. Incremental Lossless Graph Summarization. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD '20). ACM, New York, NY, USA, 317--327.
[18]
Danai Koutra, U Kang, Jilles Vreeken, and Christos Faloutsos. 2015. Summarizing and Understanding Large Graphs. Stat. Anal. Data Min. 8, 3 (2015), 183--202.
[19]
K. Ashwin Kumar and Petros Efstathopoulos. 2018. Utility-Driven Graph Summarization. Proc. VLDB Endow. 12, 4 (2018), 335--347.
[20]
Kyuhan Lee, Hyeonsoo Jo, Jihoon Ko, Sungsu Lim, and Kijung Shin. 2020. SSumM: Sparse Summarization of Massive Graphs. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD '20). ACM, New York, NY, USA, 144--154.
[21]
Kristen LeFevre and Evimaria Terzi. 2010. GraSS: Graph Structure Summarization. In Proceedings of the 2010 SIAM International Conference on Data Mining (SDM). SIAM, 454--465.
[22]
Jure Leskovec, Daniel Huttenlocher, and Jon Kleinberg. 2010. Predicting Positive and Negative Links in Online Social Networks. In Proceedings of the 19th Inter- national Conference on World Wide Web (WWW '10). ACM, New York, NY, USA, 641--650.
[23]
Jure Leskovec, Jon M. Kleinberg, and Christos Faloutsos. 2007. Graph evolution: Densification and shrinking diameters. ACM Trans. Knowl. Discov. Data 1, 1, Article 2 (2007), 41 pages.
[24]
Jure Leskovec, Kevin J. Lang, Anirban Dasgupta, and Michael W. Mahoney. 2009. Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters. Internet Math. 6, 1 (2009), 29--123.
[25]
Yuzhi Liang, Chen Chen, Yukun Wang, Kai Lei, Min Yang, and Ziyu Lyu. 2020. Reachability preserving compression for dynamic graph. Inf. Sci. 520 (2020), 232--249.
[26]
Yike Liu, Tara Safavi, Abhilash Dighe, and Danai Koutra. 2018. Graph Summarization Methods and Applications: A Survey. ACM Comput. Surv. 51, 3, Article 62 (2018), 34 pages.
[27]
Hossein Maserrat and Jian Pei. 2010. Neighbor Query Friendly Compression of Social Networks. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '10). ACM, New York, NY, USA, 533--542.
[28]
Saket Navlakha, Rajeev Rastogi, and Nisheeth Shrivastava. 2008. Graph Summarization with Bounded Error. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (SIGMOD '08). ACM, New York, NY, USA, 419--432.
[29]
Amin Emamzadeh Esmaeili Nejad, Mansoor Zolghadri Jahromi, and Mohammad Taheri. 2021. Graph compression based on transitivity for neighborhood query. Inf. Sci. 576 (2021), 312--328.
[30]
Jorge Nocedal and Stephen J. Wright. 1999. Numerical Optimization. Springer, Berlin, Heidelberg.
[31]
Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. DeepWalk: Online Learning of Social Representations. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '14). ACM, New York, NY, USA, 701--710.
[32]
Jiezhong Qiu, Yuxiao Dong, Hao Ma, Jian Li, Kuansan Wang, and Jie Tang. 2018. Network Embedding as Matrix Factorization: Unifying DeepWalk, LINE, PTE, and Node2vec. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining (WSDM '18). ACM, New York, NY, USA, 459--467.
[33]
Matteo Riondato, David García-Soriano, and Francesco Bonchi. 2017. Graph summarization with quality guarantees. Data Min. Knowl. Discov. 31, 2 (2017), 314--349.
[34]
Benedek Rozemberczki, Carl Allen, and Rik Sarkar. 2021. Multi-Scale attributed node embedding. J. Complex Networks 9, 2 (2021), 22 pages. cnab014.
[35]
Benedek Rozemberczki and Rik Sarkar. 2020. Characteristic Functions on Graphs: Birds of a Feather, from Statistical Descriptors to Parametric Models. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management (CIKM '20). ACM, New York, NY, USA, 1325--1334.
[36]
Amin Sadri, Flora D. Salim, Yongli Ren, Masoomeh Zameni, Jeffrey Chan, and Timos Sellis. 2017. Shrink: Distance preserving graph compression. Inf. Syst. 69 (2017), 180--193.
[37]
David Sculley. 2010. Web-Scale k-Means Clustering. In Proceedings of the 19th International Conference on World Wide Web (WWW '10). ACM, New York, NY, USA, 1177--1178.
[38]
Prithviraj Sen, Galileo Namata, Mustafa Bilgic, Lise Getoor, Brian Gallagher, and Tina Eliassi-Rad. 2008. Collective Classification in Network Data. AI Mag. 29, 3 (2008), 93--106.
[39]
Kijung Shin, Amol Ghoting, Myunghwan Kim, and Hema Raghavan. 2019. SWeG: Lossless and Lossy Summarization of Web-Scale Graphs. In The World Wide Web Conference (WWW '19). ACM, New York, NY, USA, 1679--1690.
[40]
Nan Tang, Qing Chen, and Prasenjit Mitra. 2016. Graph Stream Summarization: From Big Bang to Big Crunch. In Proceedings of the 2016 International Conference on Management of Data (SIGMOD '16). ACM, New York, NY, USA, 1481--1496.
[41]
Yuanyuan Tian, Richard A. Hankins, and Jignesh M. Patel. 2008. Efficient Aggregation for Graph Summarization. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (SIGMOD '08). ACM, New York, NY, USA, 567--580.
[42]
Hannu Toivonen, Fang Zhou, Aleksi Hartikainen, and Atte Hinkka. 2011. Compression of Weighted Graphs. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '11). ACM, New York, NY, USA, 965--973.
[43]
Zaiwen Wen and Wotao Yin. 2013. A feasible method for optimization with orthogonality constraints. Math. Program. 142, 1--2 (2013), 397--434.
[44]
Donghui Yan, Ling Huang, and Michael I. Jordan. 2009. Fast Approximate Spectral Clustering. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '09). ACM, New York, NY, USA, 907--916.
[45]
Jaewon Yang and Jure Leskovec. 2012. Defining and Evaluating Network Communities Based on Ground-Truth. In Proceedings of the ACM SIGKDD Workshop on Mining Data Semantics (MDS '12). ACM, New York, NY, USA, Article 3, 8 pages.
[46]
Quinton Yong, Mahdi Hajiabadi, Venkatesh Srinivasan, and Alex Thomo. 2021. Efficient Graph Summarization Using Weighted LSH at Billion-Scale. In Proceedings of the 2021 International Conference on Management of Data (SIGMOD '21). ACM, New York, NY, USA, 2357--2365.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
WSDM '23: Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining
February 2023
1345 pages
ISBN:9781450394079
DOI:10.1145/3539597
This work is licensed under a Creative Commons Attribution-NonCommercial International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 February 2023

Check for updates

Author Tags

  1. clustering
  2. graph summarization
  3. spectral algorithms

Qualifiers

  • Research-article

Funding Sources

Conference

WSDM '23

Acceptance Rates

Overall Acceptance Rate 498 of 2,863 submissions, 17%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)182
  • Downloads (Last 6 weeks)25
Reflects downloads up to 06 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media