[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Efficient algorithms for densest subgraph discovery

Published: 01 July 2019 Publication History

Abstract

Densest subgraph discovery (DSD) is a fundamental problem in graph mining. It has been studied for decades, and is widely used in various areas, including network science, biological analysis, and graph databases. Given a graph G, DSD aims to find a subgraph D of G with the highest density (e.g., the number of edges over the number of vertices in D). Because DSD is difficult to solve, we propose a new solution paradigm in this paper. Our main observation is that the densest subgraph can be accurately found through a k-core (a kind of dense subgraph of G), with theoretical guarantees. Based on this intuition, we develop efficient exact and approximation solutions for DSD. Moreover, our solutions are able to find the densest subgraphs for a wide range of graph density definitions, including clique-based- and general pattern-based density. We have performed extensive experimental evaluation on both real and synthetic datasets. Our results show that our algorithms are up to four orders of magnitude faster than existing approaches.

References

[1]
https://en.wikipedia.org/wiki/Clique_(graph_theory).
[2]
R. K. Ahuja, J. B. Orlin, C. Stein, and R. E. Tarjan. Improved algorithms for bipartite network flow. SIAM Journal on Computing, 23(5):906--933, 1994.
[3]
R. Andersen and K. Chellapilla. Finding dense subgraphs with size bounds. In International Workshop on Algorithms and Models for the Web-Graph, pages 25--37, 2009.
[4]
Y. Asahiro, R. Hassin, and K. Iwama. Complexity of finding dense subgraphs. Discrete Applied Mathematics, 121(1):15--26, 2002.
[5]
Y. Asahiro, K. Iwama, H. Tamaki, and T. Tokuyama. Greedily finding a dense subgraph. Journal of Algorithms, 34(2):203--221, 2000.
[6]
B. Bahmani, R. Kumar, and S. Vassilvitskii. Densest subgraph in streaming and mapreduce. PVLDB, 5(5):454--465, 2012.
[7]
V. Batagelj and M. Zaversnik. An o(m) algorithm for cores decomposition of networks. arXiv preprint cs/0310049, 2003.
[8]
A. Bhaskara, M. Charikar, E. Chlamtac, U. Feige, and A. Vijayaraghavan. Detecting high log-densities: an o (n 1/4) approximation for densest k-subgraph. In STOC, pages 201--210, 2010.
[9]
E. T. Charalampos. Mathematical and Algorithmic Analysis of Network and Biological Data. PhD thesis, Carnegie Mellon University, 2013. 2.1.
[10]
M. Charikar. Greedy approximation algorithms for finding dense components in a graph. In APPROX, pages 84--95. Springer, 2000.
[11]
J. Chen and Y. Saad. Dense subgraph extraction with application to community detection. TKDE, 24(7):1216--1230, 2012.
[12]
Y. Chen, Y. Fang, R. Cheng, Y. Li, X. Chen, and J. Zhang. Exploring communities in large profiled graphs. IEEE Transactions on Knowledge and Data Engineering, 31(8):1624--1629, 2019.
[13]
J. Cheng, Y. Ke, S. Chu, and M. T. Özsu. Efficient core decomposition in massive networks. In ICDE, pages 51--62, 2011.
[14]
E. Cohen, E. Halperin, H. Kaplan, and U. Zwick. Reachability and distance queries via 2-hop labels. SIAM Journal on Computing, 32(5):1338--1355, 2003.
[15]
J. Cohen. Trusses: Cohesive subgraphs for social network analysis. National Security Agency Technical Report, 16, 2008.
[16]
W. Cui, Y. Xiao, H. Wang, Y. Lu, and W. Wang. Online search of overlapping communities. In SIGMOD, pages 277--288. ACM, 2013.
[17]
M. Danisch, O. Balalau, and M. Sozio. Listing k-cliques in sparse real-world graphs. In WWW, pages 589--598, 2018.
[18]
M. Danisch, T.-H. H. Chan, and M. Sozio. Large scale density-friendly graph decomposition via convex programming. In Proceedings of the 26th International Conference on World Wide Web, pages 233--242, 2017.
[19]
A. Epasto, S. Lattanzi, and M. Sozio. Efficient densest subgraph computation in evolving graphs. In WWW, pages 300--310, 2015.
[20]
Y. Fang, R. Cheng, Y. Chen, S. Luo, and J. Hu. Effective and efficient attributed community search. The VLDB Journal, 26(6):803--828, 2017.
[21]
Y. Fang, R. Cheng, G. Cong, N. Mamoulis, and Y. Li. On spatial pattern matching. In IEEE International Conference on Data Engineering (ICDE), pages 293--304. IEEE, 2018.
[22]
Y. Fang, R. Cheng, X. Li, S. Luo, and J. Hu. Effective community search over large spatial graphs. PVLDB, 10(6):709--720, 2017.
[23]
Y. Fang, R. Cheng, S. Luo, and J. Hu. Effective community search for large attributed graphs. PVLDB, 9(12):1233--1244, 2016.
[24]
Y. Fang, X. Huang, L. Qin, Y. Zhang, W. Zhang, R. Cheng, and X. Lin. A survey of community search over big graphs. The VLDB Journal, 2019.
[25]
Y. Fang, Z. Wang, R. Cheng, X. Li, S. Luo, J. Hu, and X. Chen. On spatial-aware community search. IEEE Transactions on Knowledge and Data Engineering, 31(4):783--798, 2018.
[26]
Y. Fang, Z. Wang, R. Cheng, H. Wang, and J. Hu. Effective and efficient community search over large directed graphs. IEEE Transactions on Knowledge and Data Engineering (TKDE), 2018.
[27]
Y. Fang, K. Yu, R. Cheng, L. V. S. Lakshmanan, and X. Lin. Efficient algorithms for densest subgraph discovery (technical report). arXiv, 2019. https://arxiv.org/abs/1906.00341.
[28]
E. Fratkin, B. T. Naughton, D. L. Brutlag, and S. Batzoglou. Motifcut: regulatory motifs finding with maximum density subgraphs. Bioinformatics, 22(14):e150--e157, 2006.
[29]
G. Gallo, M. D. Grigoriadis, and R. E. Tarjan. A fast parametric maximum flow algorithm and applications. SIAM Journal on Computing, 18(1):30--55, 1989.
[30]
A. Gionis, F. Junqueira, V. Leroy, M. Serafini, and I. Weber. Piggybacking on social networks. PVLDB, 6(6):409--420, 2013.
[31]
A. Gionis and C. E. Tsourakakis. Dense subgraph discovery: Kdd 2015 tutorial. In SIGKDD, pages 2313--2314, NY, USA, 2015. ACM.
[32]
A. V. Goldberg. Finding a maximum density subgraph. UC Berkeley, 1984.
[33]
G. T. Heineman, G. Pollice, and S. Selkow. Chapter 8: Network flow algorithms. Algorithms in a Nutshell, pages 226--250, 2008.
[34]
J. Hu, R. Cheng, K. C.-C. Chang, A. Sankar, Y. Fang, and B. Y. Lam. Discovering maximal motif cliques in large heterogeneous information networks. In IEEE International Conference on Data Engineering (ICDE), pages 746--757. IEEE, 2019.
[35]
J. Hu, X. Wu, R. Cheng, S. Luo, and Y. Fang. Querying minimal steiner maximum-connected subgraphs in large graphs. In International on Conference on Information and Knowledge Management (CIKM), pages 1241--1250. ACM, 2016.
[36]
J. Hu, X. Wu, R. Cheng, S. Luo, and Y. Fang. On minimal steiner maximum-connected subgraph queries. IEEE Transactions on Knowledge and Data Engineering, 29(11):2455--2469, 2017.
[37]
X. Huang, H. Cheng, L. Qin, W. Tian, and J. X. Yu. Querying k-truss community in large and dynamic graphs. In SIGMOD, pages 1311--1322. ACM, 2014.
[38]
X. Huang and L. V. Lakshmanan. Attribute-driven community search. PVLDB, 10(9):949--960, 2017.
[39]
X. Huang, W. Lu, and L. V. Lakshmanan. Truss decomposition of probabilistic graphs: Semantics and algorithms. In Proceedings of the 2016 International Conference on Management of Data, pages 77--90. ACM, 2016.
[40]
M. Jha, C. Seshadhri, and A. Pinar. Path sampling: A fast and provable method for estimating 4-vertex subgraph counts. In WWW, pages 495--505, 2015.
[41]
R. Jin, Y. Xiang, N. Ruan, and D. Fuhry. 3-hop: a high-compression indexing scheme for reachability query. In SIGMOD, pages 813--826. ACM, 2009.
[42]
D. B. Johnson. Parallel algorithms for minimum cuts and maximum flows in planar networks. Journal of the ACM (JACM), 34(4):950--967, 1987.
[43]
R. Kannan and V. Vinay. Analyzing the structure of large graphs. Rheinische Friedrich-Wilhelms-Universität Bonn, 1999.
[44]
S. Khuller and B. Saha. On finding dense subgraphs. Automata, Languages and Programming, pages 597--608, 2009.
[45]
L. Lai, L. Qin, X. Lin, Y. Zhang, L. Chang, and S. Yang. Scalable distributed subgraph enumeration. PVLDB, 10(3):217--228, 2016.
[46]
J. Leskovec, A. Singh, and J. Kleinberg. Patterns of influence in a recommendation network. In PAKDD, pages 380--389. Springer, 2006.
[47]
L. Lü, T. Zhou, Q.-M. Zhang, and H. E. Stanley. The h-index of a network node and its relation to degree and coreness. Nature communications, 7:10168, 2016.
[48]
A. Mandal and M. Al Hasan. A distributed k-core decomposition algorithm on spark. In International Conference on Big Data, pages 976--981. IEEE, 2017.
[49]
M. Mitzenmacher, J. Pachocki, R. Peng, C. Tsourakakis, and S. C. Xu. Scalable large near-clique detection in large-scale networks via sampling. In International Conference on Knowledge Discovery and Data Mining (SIGKDD), pages 815--824. ACM, 2015.
[50]
A. Montresor, F. De Pellegrini, and D. Miorandi. Distributed k-core decomposition. IEEE Transactions on parallel and distributed systems, 24(2):288--300, 2013.
[51]
Y. Peng, Y. Zhang, W. Zhang, X. Lin, and L. Qin. Efficient probabilistic k-core computation on uncertain graphs. In International Conference on Data Engineering (ICDE), pages 1192--1203. IEEE, 2018.
[52]
T. L. Pham, I. Lavallee, M. Bui, and S. H. Do. A distributed algorithm for the maximum flow problem. In International Symposium on Parallel and Distributed Computing (ISPDC), pages 131--138. IEEE, 2005.
[53]
M. Qiao, H. Zhang, and H. Cheng. Subgraph matching: on compression and computation. PVLDB, 11(2):176--188, 2017.
[54]
L. Qin, R.-H. Li, L. Chang, and C. Zhang. Locally densest subgraph discovery. In KDD, pages 965--974. ACM, 2015.
[55]
B. Saha, A. Hoch, S. Khuller, L. Raschid, and X.-N. Zhang. Dense subgraphs with restrictions and applications to gene annotation graphs. In RECOMB, volume 6044, pages 456--472, 2010.
[56]
S. Sahu, A. Mhedhbi, S. Salihoglu, J. Lin, and M. T. Özsu. The ubiquity of large graphs and surprising challenges of graph processing. PVLDB, 11(4):420--431, 2017.
[57]
R. Samusevich, M. Danisch, and M. Sozio. Local triangle-densest subgraphs. In International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pages 33--40. IEEE, 2016.
[58]
A. E. Sariyüce and A. Pinar. Fast hierarchy construction for dense subgraphs. PVLDB, 10(3):97--108, 2016.
[59]
A. E. Sariyüce, C. Seshadhri, and A. Pinar. Local algorithms for hierarchical dense subgraph discovery. PVLDB, 12(1):43--56, 2018.
[60]
A. E. Sariyüce, C. Seshadhri, A. Pinar, and U. V. Catalyurek. Finding the hierarchy of dense subgraphs using nucleus decompositions. In WWW, pages 927--937, 2015.
[61]
A. E. Sariyüce, C. Seshadhri, A. Pinar, and Ü. V. Çatalyürek. Nucleus decompositions for identifying hierarchy of dense subgraphs. ACM Transactions on the Web (TWEB), 11(3):16, 2017.
[62]
S. B. Seidman. Network structure and minimum degree. Social networks, 1983.
[63]
S. B. Seidman and B. L. Foster. A graph-theoretic generalization of the clique concept. Journal of Mathematical sociology, 6(1):139--154, 1978.
[64]
N. Tatti and A. Gionis. Density-friendly graph decomposition. In Proceedings of the 24th International Conference on World Wide Web, pages 1089--1099, 2015.
[65]
C. Tsourakakis. The k-clique densest subgraph problem. In WWW, pages 1122--1132, 2015.
[66]
C. Tsourakakis, F. Bonchi, A. Gionis, F. Gullo, and M. Tsiarli. Denser than the densest subgraph: extracting optimal quasi-cliques with quality guarantees. In KDD, pages 104--112. ACM, 2013.
[67]
K. Wang, X. Cao, X. Lin, W. Zhang, and L. Qin. Efficient computing of radius-bounded k-cores. In IEEE International Conference on Data Engineering (ICDE), pages 233--244. IEEE, 2018.
[68]
K. Wang, X. Lin, L. Qin, W. Zhang, and Y. Zhang. Vertex priority based butterfly counting for large-scale bipartite networks. PVLDB, 12(10):1139--1152, 2019.
[69]
Y. Wu, R. Jin, J. Li, and X. Zhang. Robust local community detection: on free rider effect and its elimination. PVLDB, 8(7):798--809, 2015.
[70]
S. Wuchty, Z. N. Oltvai, and A.-L. Barabási. Evolutionary conservation of motif constituents within the yeast protein interaction network. Nature Genetics, 35:176--179, 2003.
[71]
Y. Zhang and S. Parthasarathy. Extracting analyzing and visualizing triangle k-core motifs within networks. In ICDE, pages 1049--1060, 2012.
[72]
F. Zhao and A. K. Tung. Large scale cohesive subgraphs discovery for social network visual analysis. PVLDB, 6(2):85--96, 2012.

Cited By

View all
  • (2024)Efficient Parallel D-Core Decomposition at ScaleProceedings of the VLDB Endowment10.14778/3675034.367505417:10(2654-2667)Online publication date: 1-Jun-2024
  • (2024)BYO: A Unified Framework for Benchmarking Large-Scale Graph ContainersProceedings of the VLDB Endowment10.14778/3665844.366585917:9(2307-2320)Online publication date: 1-May-2024
  • (2024)An Efficient and Exact Algorithm for Locally h-Clique Densest Subgraph DiscoveryProceedings of the ACM on Management of Data10.1145/36988002:6(1-26)Online publication date: 20-Dec-2024
  • Show More Cited By
  1. Efficient algorithms for densest subgraph discovery

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Proceedings of the VLDB Endowment
    Proceedings of the VLDB Endowment  Volume 12, Issue 11
    July 2019
    543 pages

    Publisher

    VLDB Endowment

    Publication History

    Published: 01 July 2019
    Published in PVLDB Volume 12, Issue 11

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)16
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 30 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Efficient Parallel D-Core Decomposition at ScaleProceedings of the VLDB Endowment10.14778/3675034.367505417:10(2654-2667)Online publication date: 1-Jun-2024
    • (2024)BYO: A Unified Framework for Benchmarking Large-Scale Graph ContainersProceedings of the VLDB Endowment10.14778/3665844.366585917:9(2307-2320)Online publication date: 1-May-2024
    • (2024)An Efficient and Exact Algorithm for Locally h-Clique Densest Subgraph DiscoveryProceedings of the ACM on Management of Data10.1145/36988002:6(1-26)Online publication date: 20-Dec-2024
    • (2024)A Counting-based Approach for Efficient k-Clique Densest Subgraph DiscoveryProceedings of the ACM on Management of Data10.1145/36549222:3(1-27)Online publication date: 30-May-2024
    • (2024)A Survey on the Densest Subgraph Problem and its VariantsACM Computing Surveys10.1145/365329856:8(1-40)Online publication date: 22-Mar-2024
    • (2024)A Combined Content Addressable Memory and In-Memory Processing Approach for k-Clique Counting AccelerationProceedings of the 61st ACM/IEEE Design Automation Conference10.1145/3649329.3656513(1-6)Online publication date: 23-Jun-2024
    • (2024)Parallel Algorithms for Hierarchical Nucleus DecompositionProceedings of the ACM on Management of Data10.1145/36392872:1(1-27)Online publication date: 26-Mar-2024
    • (2024)Efficient k-Clique Listing: An Edge-Oriented Branching StrategyProceedings of the ACM on Management of Data10.1145/36392622:1(1-26)Online publication date: 26-Mar-2024
    • (2024)Efficient and Effective Anchored Densest Subgraph Search: A Convex-programming based ApproachProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671727(3907-3918)Online publication date: 25-Aug-2024
    • (2024)PSMC: Provable and Scalable Algorithms for Motif Conductance Based Graph ClusteringProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671666(1793-1803)Online publication date: 25-Aug-2024
    • Show More Cited By

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media