[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2807591.2807668acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

GossipMap: a distributed community detection algorithm for billion-edge directed graphs

Published: 15 November 2015 Publication History

Abstract

In this paper, we describe a new distributed community detection algorithm for billion-edge directed graphs that, unlike modularity-based methods, achieves cluster quality on par with the best-known algorithms in the literature. We show that a simple approximation to the best-known serial algorithm dramatically reduces computation and enables distributed evaluation yet incurs only a very small impact on cluster quality.
We present three main results: First, we show that the clustering produced by our scalable approximate algorithm compares favorably with prior results on small synthetic benchmarks and small real-world datasets (70 million edges). Second, we evaluate our algorithm on billion-edge directed graphs (a 1.5B edge social network graph, and a 3.7B edge web crawl), and show that the results exhibit the structural properties predicted by analysis of much smaller graphs from similar sources. Third, we show that our algorithm exhibits over 90% parallel efficiency on massive graphs in weak scaling experiments.

References

[1]
R. Aldecoa and I. Marín. Exploring the limits of community detection strategies in complex networks. Scientific reports, 3(2216), 2013.
[2]
A. Allavena, A. Demers, and J. E. Hopcroft. Correctness of a gossip based membership protocol. In Proceedings of the twenty-fourth annual ACM symposium on Principles of distributed computing, pages 292--301. ACM, 2005.
[3]
S.-H. Bae, D. Halperin, J. West, M. Rosvall, and B. Howe. Scalable flow-based community detection for large-scale network analysis. In Proceedings of 2013 IEEE 13th International Conference on Data Mining Workshops (ICDMW), pages 303--310. IEEE, 2013.
[4]
V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre. Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 2008(10):P10008, 2008.
[5]
P. Boldi and S. Vigna. The WebGraph framework I: Compression techniques. In Proc. of the Thirteenth International World Wide Web Conference (WWW 2004), pages 595--601, Manhattan, USA, 2004. ACM Press.
[6]
A. Clauset, M. E. Newman, and C. Moore. Finding community structure in very large networks. Physical review E, 70(6):066111, 2004.
[7]
A. Condon and R. M. Karp. Algorithms for graph partitioning on the planted partition model. Random Structures and Algorithms, 18(2):116--140, 2001.
[8]
L. Danon, A. Diaz-Guilera, J. Duch, and A. Arenas. Comparing community structure identification. Journal of Statistical Mechanics: Theory and Experiment, 2005(09):P09008, 2005.
[9]
J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. Commun. ACM, 51(1):107--113, Jan. 2008.
[10]
A. Demers, D. Greene, C. Hauser, W. Irish, J. Larson, S. Shenker, H. Sturgis, D. Swinehart, and D. Terry. Epidemic algorithms for replicated database maintenance. In Proceedings of the sixth annual ACM Symposium on Principles of distributed computing, pages 1--12. ACM, 1987.
[11]
P. T. Eugster, R. Guerraoui, A.-M. Kermarrec, and L. Massoulié. Epidemic information dissemination in distributed systems. Computer, 37(5):60--67, 2004.
[12]
S. Fortunato. Community detection in graphs. Physics Reports, 486(3):75--174, 2010.
[13]
S. Fortunato and M. Barthélemy. Resolution limit in community detection. Proceedings of the National Academy of Sciences, 104(1):36--41, Jan. 2007.
[14]
T. A. S. Foundation. Apache hadoop project. http://hadoop.apache.org.
[15]
A.-C. Gavin, P. Aloy, P. Grandi, R. Krause, M. Boesche, M. Marzioch, C. Rau, L. J. Jensen, S. Bastuck, B. Dümpelfeld, et al. Proteome survey reveals modularity of the yeast cell machinery. Nature, 440(7084):631--636, 2006.
[16]
M. Girvan and M. E. Newman. Community structure in social and biological networks. Proceedings of the National Academy of Sciences, 99(12):7821--7826, 2002.
[17]
R. Guimera and L. A. N. Amaral. Functional cartography of complex metabolic networks. Nature, 433(7028):895--900, 2005.
[18]
R. Guimerà, M. Sales-Pardo, and L. A. N. Amaral. Modularity from fluctuations in random graphs and complex networks. Physical Review E (Statistical, Nonlinear, and Soft Matter Physics), 70(2), 2004.
[19]
H. Kwak, C. Lee, H. Park, and S. Moon. What is Twitter, a social network or a news media? In WWW '10: Proceedings of the 19th international conference on World wide web, pages 591--600, New York, NY, USA, 2010. ACM.
[20]
A. Lancichinetti and S. Fortunato. Community detection algorithms: A comparative analysis. Physical Review E (Statistical, Nonlinear, and Soft Matter Physics), 80(056117), 2009.
[21]
J. Leskovec and A. Krevl. SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data, Jun. 2014.
[22]
J. Leskovec, K. J. Lang, and M. Mahoney. Empirical comparison of algorithms for network community detection. In Proceedings of the 19th international conference on World wide web, WWW '10, pages 631--640, New York, NY, USA, 2010. ACM.
[23]
Y. Low, D. Bickson, J. Gonzalez, C. Guestrin, A. Kyrola, and J. M. Hellerstein. Distributed graphlab: a framework for machine learning and data mining in the cloud. Proc. VLDB Endow., 5(8):716--727, Apr. 2012.
[24]
M. Ovelgonne. Distributed community detection in web-scale networks. In Advances in Social Networks Analysis and Mining (ASONAM), 2013 IEEE/ACM International Conference on, pages 66--73. IEEE, 2013.
[25]
A. Prat-Pérez, D. Dominguez-Sal, and J.-L. Larriba-Pey. High quality, scalable and parallel community detection for large real graphs. In Proceedings of the 23rd International Conference on World Wide Web, WWW '14, pages 225--236, New York, NY, USA, 2014. ACM.
[26]
T. R. Project. The clueweb12 dataset. http://www.lemurproject.org/clueweb12/index.php/. {Online; accessed March-04-2015}.
[27]
U. N. Raghavan, R. Albert, and S. Kumara. Near linear time algorithm to detect community structures in large-scale networks. Physical Review E, 76:036106, Sep 2007.
[28]
E. J. Riedy, H. Meyerhenke, D. Ediger, and D. A. Bader. Parallel community detection for massive graphs. In Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics-Volume Part I, pages 286--296. Springer-Verlag, 2011.
[29]
J. Riedy, D. A. Bader, and H. Meyerhenke. Scalable multi-threaded community detection in social networks. In Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), 2012 IEEE 26th International, pages 1619--1628. IEEE, 2012.
[30]
M. Rosvall, D. Axelsson, and C. T. Bergstrom. The map equation. The European Physical Journal Special Topics, 178(1):13--23, 2009.
[31]
M. Rosvall and C. T. Bergstrom. Multilevel compression of random walks on networks reveals hierarchical organization in large integrated systems. PloS one, 6(4):e18209, 2011.
[32]
C. E. Shannon. A mathematical theory of communication. The Bell System Technical Journal, 27:379--423, July 1948.
[33]
C. E. Shannon. A mathematical theory of communication. The Bell System Technical Journal, 27:623--656, October 1948.
[34]
J. Shi, W. Xue, W. Wang, Y. Zhang, B. Yang, and J. Li. Scalable community detection in massive social networks using mapreduce. IBM Journal of Research and Development, 57(3-4):1:12--1:12, May 2013.
[35]
H. Shiokawa, Y. Fujiwara, and M. Onizuka. Fast algorithm for modularity-based graph clustering. In Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence, pages 1170--1176, 2013.
[36]
S. Yang, B. Wang, H. Zhao, and B. Wu. Efficient dense structure mining using mapreduce. In Data Mining Workshops, 2009. ICDMW'09. IEEE International Conference on, pages 332--337. IEEE, 2009.
[37]
Y. Zhang, J. Wang, Y. Wang, and L. Zhou. Parallel community detection on large networks with propinquity dynamics. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 997--1006. ACM, 2009.

Cited By

View all
  • (2024)Towards a Scalable Parallel Infomap Algorithm for Community Detection2024 32nd Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)10.1109/PDP62718.2024.00023(116-123)Online publication date: 20-Mar-2024
  • (2024)Distributed Multi-GPU Community Detection on Exascale Computing Platforms2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW63119.2024.00147(815-824)Online publication date: 27-May-2024
  • (2024)Node Density and Attraction Detection Method (NDAD) for Community Detection in Complex NetworkAdvances in Data-Driven Computing and Intelligent Systems10.1007/978-981-99-9518-9_35(481-492)Online publication date: 20-Mar-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SC '15: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
November 2015
985 pages
ISBN:9781450337236
DOI:10.1145/2807591
  • General Chair:
  • Jackie Kern,
  • Program Chair:
  • Jeffrey S. Vetter
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 November 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. community detection
  2. distributed computing
  3. gossip protocol
  4. large-scale graph analysis

Qualifiers

  • Research-article

Conference

SC15
Sponsor:

Acceptance Rates

SC '15 Paper Acceptance Rate 79 of 358 submissions, 22%;
Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)33
  • Downloads (Last 6 weeks)3
Reflects downloads up to 10 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Towards a Scalable Parallel Infomap Algorithm for Community Detection2024 32nd Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)10.1109/PDP62718.2024.00023(116-123)Online publication date: 20-Mar-2024
  • (2024)Distributed Multi-GPU Community Detection on Exascale Computing Platforms2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW63119.2024.00147(815-824)Online publication date: 27-May-2024
  • (2024)Node Density and Attraction Detection Method (NDAD) for Community Detection in Complex NetworkAdvances in Data-Driven Computing and Intelligent Systems10.1007/978-981-99-9518-9_35(481-492)Online publication date: 20-Mar-2024
  • (2023)Fast Community Detection in Graphs with Infomap Method using Accelerated Sparse Accumulation2023 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW59300.2023.00103(601-610)Online publication date: May-2023
  • (2023)Unfairness in Distributed Graph Frameworks2023 IEEE International Conference on Data Mining (ICDM)10.1109/ICDM58522.2023.00203(1529-1534)Online publication date: 1-Dec-2023
  • (2022)Fregel: a functional domain-specific language for vertex-centric large-scale graph processingJournal of Functional Programming10.1017/S095679682100027732Online publication date: 20-Jan-2022
  • (2022)Parallel and distributed paradigms for community detection in social networks: A methodological reviewExpert Systems with Applications10.1016/j.eswa.2021.115956187(115956)Online publication date: Jan-2022
  • (2022)Scalable multi‐node multi‐GPU Louvain community detection algorithm for heterogeneous architecturesConcurrency and Computation: Practice and Experience10.1002/cpe.698734:17Online publication date: 11-Apr-2022
  • (2021)HyPC-Map: A Hybrid Parallel Community Detection Algorithm Using Information-Theoretic Approach2021 IEEE High Performance Extreme Computing Conference (HPEC)10.1109/HPEC49654.2021.9622866(1-8)Online publication date: 20-Sep-2021
  • (2021)Artificial Benchmark for Community Detection (ABCD)—Fast random graph model with community structureNetwork Science10.1017/nws.2020.45(1-26)Online publication date: 26-Jan-2021
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media