Abstract
The densest subgraph problem (DSP) is of great significance due to its wide applications in different domains. Meanwhile, diverse requirements in various applications lead to different density variants for DSP. Unfortunately, existing DSP algorithms cannot be easily extended to handle those variants efficiently and accurately. To fill this gap, we first unify different density metrics into a generalized density definition. We further propose a new model, c-core, to locate the general densest subgraph and show its advantage in accelerating the search process. Extensive experiments show that our c-core-based optimization can provide up to three orders of magnitude speedup over baselines. Methods for maintenance of c-core location are designed to accelerate updates on dynamic graphs. Moreover, we study an important variant of DSP under a size constraint, namely the densest-at-least-k-subgraph (DalkS) problem. We propose an algorithm based on graph decomposition, and it is likely to give a solution that is at least 0.8 of the optimal density in our experiments, while the state-of-the-art method can only ensure a solution with a density of at least 0.5 of the optimal density. Our experiments show that our DalkS algorithm can achieve at least 0.99 of the optimal density for over one-third of all possible size constraints. In addition, we develop an approximation algorithm for the DalkS problem that can be more efficient than the state-of-the-art algorithm while keeping the same approximation ratio of \(\frac{1}{3}\).
Similar content being viewed by others
Notes
A solution with at least \(f \cdot OPT\) density (\(0 < f \le 1\)) means its density is at least f of the optimal density. For simplicity, we call this solution an \(f \cdot OPT\) solution.
Empirically, the method based on [13] is slower than Greedy++ equipped with c-core location in our experiments.
f-approximation method/algorithm means that for every input, the algorithm can guarantee a \(f \cdot OPT\) solution, \(0< f\le 1\). In general, all solutions output by the f-approximation algorithm are called f-approximation.
We implement Algorithm 3 in our experiments. The algo has no significant influence on the time cost because the located core in Algorithm 2 is already very small.
The process of adding a new vertex can be decomposed into the following: add an isoloated vertex; add its adjacent edges; increase its vertex weight.
The coreness of a vertex is defined as the highest value of c for which the vertex is part of the corresponding c-core.
This complexity is a version of Theorem 2.1 from [13] when their algo is equipped with our c-core location and density search strategy.
We require \(0.909 \cdot OPT\) solution for Greedy++ because better solutions, e.g., \(0.99 \cdot OPT\) solution, cost too much time.
Each vertical line symbolizes a distinct graph with overall graph size and core size compared.
\(w_v = \frac{1}{2t}\) for v in left partition and \(w_v = \frac{t}{2}\) for v in right partition. We pick \(t = 2\) in our experiment.
By combining techniques from Combinatorial-DalkSS it is a \(\text {max}(\frac{k}{|\tilde{K}^*|}, \frac{1}{2}) \cdot OPT\) solution.
The reason we choose these datasets is that their sizes vary from small to large, and thus are representative.
References
Ahuja, R., Orlin, J., Stein, C., Tarjan, R.: Improved algorithms for bipartite network flow. SIAM J. Comput. 23, 906–933 (1994)
Alper, B., Bach, B., Riche, N.H., Isenberg, T., Fekete, J.-D.: Weighted graph comparison techniques for brain connectivity analysis. In: CHI, pages 483–492. Association for Computing Machinery (2013)
Andersen, R., Chellapilla, K.: Finding dense subgraphs with size bounds. In: Algorithms and Models for the Web-Graph, pages 25–37. Springer Berlin Heidelberg (2009)
Anderson, R.J., Setubal, J.C.: On the parallel implementation of goldberg’s maximum flow algorithm. In: SPAA, page 168–177. ACM (1992)
Aridhi, S., Brugnara, M., Montresor, A., Velegrakis, Y.: Distributed k-core decomposition and maintenance in large dynamic graphs. In: Proceedings of the 10th ACM International Conference on Distributed and Event-Based Systems, DEBS ’16, page 161–168, New York, NY, USA, (2016). Association for Computing Machinery
Asahiro, Y., Iwama, K., Tamaki, H., Tokuyama, T.: Greedily finding a dense subgraph. J. Algorithms 34, 203–221 (2000)
Asahiro, Y., Hassin, R., Iwama, K.: Complexity of finding dense subgraphs. Discrete Appl. Math. 121(1–3), 15–26 (2002)
Asahiro, Y., Hassin, R., Iwama, K.: Complexity of finding dense subgraphs. Discret. Appl. Math. 121(1), 15–26 (2002)
Bahmani, B., Kumar, R., Vassilvitskii, S.: Densest subgraph in streaming and mapreduce. PVLDB 5(5), 454–465 (2012)
Bera, S.K., Bhattacharya, S., Choudhari, J., Ghosh, P.: A new dynamic algorithm for densest subhypergraphs. In: WWW, pages 1093–1103 (2022)
Boob, D., Gao, Y., Peng, R., Sawlani, S., Tsourakakis, C.E., Wang, D., Wang, J.: Flowless: extracting densest subgraphs without flow computations. In: WWW, pages 573–583. ACM / IW3C2 (2020)
Charikar, M.: Greedy approximation algorithms for finding dense components in a graph. In: Approximation Algorithms for Combinatorial Optimization, pages 84–95. Springer Berlin Heidelberg (2000)
Chekuri, C., Quanrud, Kent, T., Manuel R.: Densest Subgraph: Supermodularity, Iterative Peeling, and Flow, pages 1531–1555 (2022)
Chen, J., Saad, Y.: Dense subgraph extraction with application to community detection. TKDE 24, 1216–1230 (2012)
Chen, T., Tsourakakis, C.: Antibenford subgraphs: Unsupervised anomaly detection in financial networks. In: KDD, pages 2762–2770 (2022)
Conti, E., Cao, S., Thomas, A.J.: Disruptions in the us airport network. arXiv, (2013)
Danisch, M., Hubert Chan, T.-H., Sozio, M.: Large scale density-friendly graph decomposition via convex programming. In: WWW, pages 233–242. ACM (2017)
Dinitz, Y: Dinitz’algorithm: The original version and even’s version. In: Theoretical Computer Science: Essays in Memory of Shimon Even, pages 218–240. Springer (2006)
Du, X., Jin, R., Ding, L., Lee, V.E., Thornton Jr., J.H.: Migration motif: a spatial - temporal pattern mining approach for financial markets. In: KDD, pages 1135–1144. ACM (2009)
Eidsaa, M., Almaas, E.: \(s\)-core network decomposition: a generalization of \(k\)-core analysis to weighted networks. Phys. Rev. E 88, 062819 (2013)
Epasto, A., Lattanzi, S., Sozio, M.: Efficient densest subgraph computation in evolving graphs. In: WWW, pages 300–310. ACM (2015)
Fang, Y., Kaiqiang, Yu., Cheng, R., Lakshmanan, L.V.S., Lin, X.: Efficient algorithms for densest subgraph discovery. PVLDB 12(11), 1719–1732 (2019)
Fang, Y., Luo, W., Ma, C.: Densest subgraph discovery on large graphs: applications, challenges, and techniques. PVLDB 15(12), 3766–3769 (2022)
Feige, U., Kortsarz, G., Peleg, D.: The dense k-subgraph problem. Algorithmica 29(3), 12 (2001)
Feng, W., Liu, S., Koutra, D., Shen, H., Cheng, X.: Specgreedy: unified dense subgraph detection. In: ECML/PKDD, pages 181–197. Springer-Verlag (2020)
Fratkin, E., Naughton, B.T., Brutlag, D.L., Batzoglou, S.: Motifcut: regulatory motifs finding with maximum density subgraphs. Bioinformatics 22(14), e150–e7 (2006)
Gibson, D., Kumar, R., Tomkins, A.: Discovering large dense subgraphs in massive graphs. In: PVLDB, pages 721–732. ACM (2005)
Gionis, A., Tsourakakis, C.E.: Dense subgraph discovery: Kdd 2015 tutorial. In: KDD, pages 2313–2314. ACM (2015)
Goldberg, A.V.: Finding a maximum density subgraph (1984)
Goldberg, A.V., Tarjan, R.E.: Finding minimum-cost circulations by successive approximation. Math. Oper. Res. 15(3), 430–466 (1990)
Goldberg, A.V., Tarjan, R.E.: Finding minimum-cost circulations by successive approximation. Math. Oper. Res. 15, 430–466 (1990)
He, Y., Wang, K., Zhang, W., Lin, X., Zhang, Y.: Scaling up k-clique densest subgraph detection. PACMMOD, 1(1), (2023)
Hooi, B., Song, H.A.H., Beutel, A., Shah, N., Shin, K., Faloutsos, C.: FRAUDAR: bounding graph fraud in the face of camouflage. In: KDD, pages 895–904. ACM (2016)
Hu, S., Wu, X., Hubert Chan, T.-H.: Maintaining densest subsets efficiently in evolving hypergraphs. In: CIKM, pages 929–938. ACM (2017)
Huang, X., Cheng, H., Qin, L., Tian, W., Yu, J.X.: Querying k-truss community in large and dynamic graphs. In: SIGMOD, pages 1311–1322. ACM (2014)
Khuller, S., Saha, B.: On finding dense subgraphs. In: Automata, Languages and Programming, pages 597–608. Springer Berlin Heidelberg (2009)
Kumar, R., Raghavan, P., Rajagopalan, S., Sivakumar, D., Tompkins, A., Upfal, E.: The web as a graph. In: PODS, pages 1–10. ACM (2000)
Leskovec, J., Krevl, A.: SNAP Datasets: stanford large network dataset collection, (2014)
Li, N., Zhu, H., Wenhao, L., Cui, N., Liu, W., Yin, J., Jianliang, X., Lee, W.-C.: The most tenuous group query. Front. Comp. Sci. 17(2), 172605 (2023)
Lin, Z., Zhang, F., Lin, X., Zhang, W., Tian, Z.: Hierarchical core maintenance on large dynamic graphs. PVLDB 14(5), 757–770 (2021)
Ma, C., Fang, Y., Cheng, R., Lakshmanan, L.V.S., Zhang, W., Lin, X.: Efficient algorithms for densest subgraph discovery on large directed graphs. In: SIGMOD, pages 1051–1066. ACM (2020)
Ma, C., Fang, Y., Cheng, R., Lakshmanan, L.V.S., Zhang, W., Lin, X.: Efficient algorithms for densest subgraph discovery on large directed graphs. In: SIGMOD, pages 1051–1066 (2020)
Ma, C., Fang, Y., Cheng, R., Lakshmanan, L.V.S., Zhang, W., Lin, X.: On directed densest subgraph discovery. TODS 46(4), 1–45 (2021)
Ma, C., Cheng, R., Lakshmanan, L.V.S., Han, X.: Finding locally densest subgraphs: A convex programming approach. PVLDB 15(11), 2719–2732 (2022)
Ma, C., Fang, Y., Cheng, R., Lakshmanan, L.V.S., Han, X.: A convex-programming approach for efficient directed densest subgraph discovery. In: SIGMOD, pages 845–859 (2022)
Manurangsi, P.: Inapproximability of maximum biclique problems, minimum k-cut and densest at-least-k-subgraph from the small set expansion hypothesis. (2018). arXiv:1705.03581
Mathieu, C., de Rougemont, M.: Large very dense subgraphs in a stream of edges (2020)
McGregor, A., Tench, D., Vorotnikova, S., Vu, H.T.: Densest subgraph in dynamic graph streams. In: International Symposium on Mathematical Foundations of Computer Science, volume 9235, pages 472–482. Springer (2015)
Newman, M.E.J.: The structure of scientific collaboration networks. Proc. Natl. Acad. Sci. 98(2), 404–409 (2001)
Opsahl, T., Agneessens, F., Skvoretz, J.: Node centrality in weighted networks: generalizing degree and shortest paths. Social Netw. 32(3), 245–251 (2010)
Opsahl, T.: Triadic closure in two-mode networks: redefining the global and local clustering coefficients. Social Netw. 35(2), 159–167 (2013)
Qin, L., Li, R.-H., Chang, L., Zhang, C.: Locally densest subgraph discovery. In: KDD, pages 965–974 (2015)
Rossi, R.A., Ahmed, N.K.: The network data repository with interactive graph analytics and visualization. In: AAAI (2015)
Saha, A., Ke, X., Khan, A., Long, C.: Most probable densest subgraphs. In: ICDE, pages 1447–1460. IEEE (2023)
Sarma, A.D., Lall, A., Nanongkai, D., Trehan, A.: Dense subgraphs on dynamic networks. arXiv (2012)
Sawlani, S., Wang, J.: Near-optimal fully dynamic densest subgraph. In: STOC, pages 181–193. ACM (2020)
Seidman, S.B.: Network structure and minimum degree. Social Netw. 5(3), 269–287 (1983)
Shiloach, Y., Vishkin, U.: An o(n2log n) parallel max-flow algorithm. J. Algorithms 3(2), 128–146 (1982)
Tang, J.K., Leontiadis, I., Scellato, S., Nicosia, V., Mascolo, C., Musolesi, M., Latora, V.: Applications of temporal graph metrics to real-world networks. CoRR, arXiv:1305.6974 (2013)
Tarjan, R.E.: A simple version of karzanov’s blocking flow algorithm. Oper. Res. Lett. 2(6), 265–268 (1984)
Tatti, N.: Density-friendly graph decomposition. ACM TKDD 13(5), 54:1-54:29 (2019)
Tsourakakis, C.: The k-clique densest subgraph problem. In: WWW, page 1122–1132. IW3C2, (2015)
Tsourakakis, C.E.: The k-clique densest subgraph problem. In: WWW, pages 1122–1132. ACM, (2015)
Tsourakakis, C.E., Bonchi, F., Gionis, A., Gullo, F., Tsiarli, M.A.: Denser than the densest subgraph: extracting optimal quasi-cliques with quality guarantees. In: KDD, pages 104–112. ACM, (2013)
Ugander, J., Karrer, B., Backstrom, L., Marlow, C.: The anatomy of the facebook social graph. arXiv, (2011)
Vishveshwara, S., Brinda, K.V., Kannan, N.: Protein structure: insights from graph theory. J. Theor. Comput. Chem. 01, 187–211 (2002)
Shijie, X., Fang, J., Li, X.: Weighted laplacian method and its theoretical applications. IOP Conf. Ser. Mater. Sci. Eng. 768(7), 072032 (2020)
Yichen, X., Ma, C., Fang, Y., Bao, Z.: Efficient and effective algorithms for generalized densest subgraph discovery. PACMMOD 1(2), 169:1-169:27 (2023)
Zhang, Y., Yu, J.X., Zhang, Y., Qin, L.: A fast order-based approach for core maintenance. In: ICDE, pages 337–348. IEEE Computer Society, (2017)
Acknowledgements
We would like to thank Wensheng Luo for his help during our revision. This work was partially supported by NSFC under Grant 62302421 and 62102341, Basic and Applied Basic Research Fund in Guangdong Province under Grant 2023A1515011280 and 2022A1515010166, and Shenzhen Science and Technology Program under Grants JCYJ20220530143602006 and ZDSYS 20211021111415025. Zhifeng Bao is supported in part by ARC DP220101434 and DP240101211. This paper was also supported by Guangdong Key Lab of Mathematical Foundations for Artificial Intelligence.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Xu, Y., Ma, C., Fang, Y. et al. Efficient and effective algorithms for densest subgraph discovery and maintenance. The VLDB Journal 33, 1427–1452 (2024). https://doi.org/10.1007/s00778-024-00855-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778-024-00855-y