[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3159652.3159696acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article

Shortcutting Label Propagation for Distributed Connected Components

Published: 02 February 2018 Publication History

Abstract

Connected Components is a fundamental graph mining problem that has been studied for the PRAM, MapReduce and BSP models. We present a simple CC algorithm for BSP that does not mutate the graph, converges in O(log n) supersteps and scales to graphs of trillions of edges.

References

[1]
2016. Common Crawl. (2016). http://commoncrawl.org/.
[2]
2016. Twitter Graph. (2016). http://konect.uni-koblenz.de/networks/twitter_mpi.
[3]
Ajit Agrawal, Lena Nekludova, and Willie Lim. 1987. A parallel O (log N) algorithm for finding connected components in planar images. Thinking Machines Corporation.
[4]
Baruch Awerbuch and Yossi Shiloach. 1987. New connectivity and MSF algorithms for shuffle-exchange network and PRAM. IEEE Trans. Comput. 10, C-36 (1987), 1258--1263.
[5]
Baruch Awerbuch and Tripurari Singh. 1983. New Connectivity and MSF Algorithms for Ultracomputer and PRAM. In ICPP, Vol. 83. 175--179.
[6]
David A Bader and Guojing Cong. 2005. A fast, parallel spanning tree algorithm for symmetric multiprocessors (SMPs). J. Parallel and Distrib. Comput. 65, 9 (2005), 994--1006.
[7]
David A Bader, Guojing Cong, and John Feo. 2005. On the architectural requirements for efficient execution of graph algorithms. In ICPP '05. IEEE, 547--556.
[8]
David A Bader and Joseph JáJá. 1996. Parallel algorithms for image histogramming and connected components with an experimental study. Journal of parallel and distributed computing 35, 2 (1996), 173--190.
[9]
Dip Sankar Banerjee and Kishore Kothapalli. 2011. Hybrid algorithms for list ranking and graph connected components. In HiPC (2011). IEEE, 1--10.
[10]
Guy E Blelloch. 1996. Programming parallel algorithms. Commun. ACM 39, 3 (1996), 85--97.
[11]
Paolo Boldi, Massimo Santini, and Sebastiano Vigna. 2008. A Large Time-Aware Graph. SIGIR Forum 42, 2 (2008), 33--38.
[12]
Robert S Boyer and J Strother Moore. 1991. MJRTY--a fast majority vote algorithm. In Automated Reasoning. Springer, 105--117.
[13]
Libor Buš and Pavel Tvrdík. 2001. A parallel algorithm for connected components on distributed memory machines. In European Parallel Virtual Machine / Message Passing Interface Users Group Meeting. Springer, 280--287.
[14]
Edson Norberto Cáceres, Henrique Mongelli, Christiane Nishibe, and Siang Wun Song. 2010. Experimental results of a coarse-grained parallel algorithm for spanning tree and connected components. In High Performance Computing and Simulation (HPCS), 2010. IEEE, 631--637.
[15]
Deepayan Chakrabarti, Yiping Zhan, and Christos Faloutsos. 2004. R-MAT: A recursive model for graph mining. In Proceedings of the 2004 SIAM International Conference on Data Mining. SIAM, 442--446.
[16]
Francis Y Chin, John Lam, and I-Ngo Chen. 1982. Efficient parallel algorithms for some graph problems. Commun. ACM 25, 9 (1982), 659--665.
[17]
Ka Wong Chong and Tak Wah Lam. 1995. Finding connected components in O(log n log log n) time on the EREW PRAM. J. of Algorithms 18, 3 (1995), 378-- 402.
[18]
Richard Cole, Philip N Klein, and Robert E Tarjan. 1996. Finding minimum spanning forests in logarithmic time and linear work using random sampling. In Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures. ACM, 243--250.
[19]
Richard Cole and Uzi Vishkin. 1991. Approximate parallel scheduling. II. Applications to logarithmic-time optimal parallel graph algorithms. Information and Computation 92, 1 (1991), 1--47.
[20]
Graham Cormode and Marios Hadjieleftheriou. 2008. Finding Frequent Items in Data Streams. Proceedings of the VLDB Endowment 1, 2 (2008), 1530--1541.
[21]
Xing Feng, Lijun Chang, Xuemin Lin, Lu Qin, and Wenjie Zhang. 2016. Computing Connected Components with linear communication cost in pregel-like systems. In International Conference on Data Engineering (ICDE). IEEE, 85--96.
[22]
Hillel Gazit. 1991. An optimal randomized parallel algorithm for finding connected components in a graph. SIAM J. Comput. 20, 6 (1991), 1046--1067.
[23]
Shay Halperin and Uri Zwick. 1996. An optimal randomised logarithmic time connectivity algorithm for the EREW PRAM. J. Comput. System Sci. 53, 3 (1996), 395--416.
[24]
Shay Halperin and Uri Zwick. 2001. Optimal randomized EREW PRAM algorithms for finding spanning forests. Journal of Algorithms 39, 1 (2001), 1--46.
[25]
Yujie Han and Robert A Wagner. 1990. An efficient and fast parallel-connected component algorithm. Journal of the ACM (JACM) 37, 3 (1990), 626--642.
[26]
Andreas Harth. 2009. Billion Triples Challenge data set. Downloaded from http://km.aifb.kit.edu/projects/btc-2009/. (2009).
[27]
Kenneth A Hawick, Arno Leist, and Daniel P Playne. 2010. Parallel graph component labelling with GPUs and CUDA. Parallel Comput. 36, 12 (2010), 655--678.
[28]
Daniel S. Hirschberg, Ashok K. Chandra, and Dilip V. Sarwate. 1979. Computing connected components on parallel computers. Commun. ACM 22, 8 (1979), 461-- 464.
[29]
Tsan-Sheng Hsu, Vijaya Ramachandran, and Nathaniel Dean. 1997. Parallel implementation of algorithms for finding connected components in graphs. Parallel Algorithms: Third DIMACS Implementation Challenge, October 17--19, 1994 30 (1997), 20.
[30]
Kazuo Iwama and Yahiko Kambayashi. 1994. A simpler parallel algorithm for graph connectivity. Journal of Algorithms 16, 2 (1994), 190--217.
[31]
Joseph JáJá. 1992. An introduction to parallel algorithms. Addison-Wesley.
[32]
Donald B Johnson and Panagiotis Metaxas. 1997. Connected components in O(log3/2 n) parallel time for the CREW PRAM. J. Comput. System Sci. 54, 2 (1997), 227--242.
[33]
U Kang, Charalampos E Tsourakakis, and Christos Faloutsos. 2009. Pegasus: A peta-scale graph mining system implementation and observations. In ICDM'09. IEEE, 229--238.
[34]
Hakan Kardes, Siddharth Agrawal, Xin Wang, and Ang Sun. 2014. Ccf: Fast and scalable connected component computation in mapreduce. In ICNC (2014). IEEE, 994--998.
[35]
David R Karger, Noam Nisan, and Michal Parnas. 1992. Fast connected components algorithms for the EREW PRAM. In Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures. ACM, 373--381.
[36]
Raimondas Kiveris, Silvio Lattanzi, Vahab Mirrokni, Vibhor Rastogi, and Sergei Vassilvitskii. 2014. Connected components in mapreduce and beyond. In Proceedings of the ACM Symposium on Cloud Computing. ACM, 1--13.
[37]
Václav Koubek and Jana Kršňáková. 1985. Parallel algorithms for connected components in a graph. In Fund. of Computation Theory. Springer, 208--217.
[38]
Arvind Krishnamurthy, Steven Lumetta, David E Culler, and Katherine Yelick. 1997. Connected components on distributed memory machines. Third DIMACS Implementation Challenge 30 (1997), 1--21.
[39]
Clyde P Kruskal, Larry Rudolph, and Marc Snir. 1990. Efficient parallel algorithms for graph problems. Algorithmica 5, 1 (1990), 43--64.
[40]
Subodh Kumar, Stephen M Goddard, and Jan F Prins. 1995. ConnectedComponents Algorithms For Mesh-Connected Parallel Computers. In 3rd Dimacs Implementation Challenge Workshop.
[41]
Willie Lim, Ajit Agrawal, and Lena Nekludova. 1986. A fast parallel algorithm for labeling connected components in image arrays. Thinking Machines Corporation.
[42]
Alessandro Lulli, Emanuele Carlini, Patrizio Dazzi, Claudio Lucchese, and Laura Ricci. 2017. Fast connected components computation in large graphs by vertex pruning. IEEE Transactions on Parallel and Distributed systems 28, 3 (2017), 760-- 773.
[43]
Dhruva Nath and SN Maheshwari. 1982. Parallel algorithms for the connected components and minimal spanning tree problems. Inform. Process. Lett. 14, 1 (1982), 7--11.
[44]
Donald Nguyen, Andrew Lenharth, and Keshav Pingali. 2013. A lightweight infrastructure for graph analytics. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles. ACM, 456--471.
[45]
Noam Nisan, Endre Szemeredi, and Avi Wigderson. 1992. Undirected connectivity in O(log1.5 n) space. In Foundations of Computer Science, 1992. IEEE, 24--29.
[46]
Ha-Myung Park, Namyong Park, Sung-Hyon Myaeng, and U Kang. 2016. Partition aware connected component computation in distributed systems. In International Conference on Data Mining (2016). IEEE, 420--429.
[47]
Md Mostofa Ali Patwary, Peder Refsnes, and Fredrik Manne. 2012. Multi-core spanning forest algorithms using the disjoint-set data structure. In International Parallel & Distributed Processing Symposium (2012). IEEE, 827--835.
[48]
Seth Pettie and Vijaya Ramachandran. 2002. A randomized time-work optimal parallel algorithm for finding a minimum spanning forest. SIAM J. Comput. 31, 6 (2002), 1879--1895.
[49]
CA Philips. 1989. Parallel graph contraction. In Proceedings of the first annual ACM symposium on Parallel algorithms and architectures. ACM, 148--157.
[50]
Chung Keung Poon and Vijaya Ramachandran. 1997. A randomized linear work EREW PRAM algorithm to find a minimum spanning forest. In International Symposium on Algorithms and Computation. Springer, 212--222.
[51]
Vibhor Rastogi, Ashwin Machanavajjhala, Laukik Chitnis, and Anish Das Sarma. 2013. Finding connected components in map-reduce in logarithmic rounds. In International Conference on Data Engineering (2013). IEEE, 50--61.
[52]
Yossi Shiloach and Uzi Vishkin. 1982. An O(log n) parallel connectivity algorithm. Journal of Algorithms 3, 1 (1982), 57--67.
[53]
Julian Shun, Laxman Dhulipala, and Guy Blelloch. 2014. A simple and practical linear-work parallel algorithm for connectivity. In Proceedings of the 26th ACM symposium on Parallelism in algorithms and architectures. ACM, 143--153.
[54]
George M Slota, Sivasankaran Rajamanickam, and Kamesh Madduri. 2014. BFS and coloring-based parallel algorithms for strongly connected components and related problems. In International Parallel and Distributed Processing Symposium (2014). IEEE, 550--559.
[55]
Jyothish Soman, Kothapalli Kishore, and PJ Narayanan. 2010. A fast GPU algorithm for graph connectivity. In International Symposium on Parallel & Distributed Processing, Workshops and PhD Forum (IPDPSW) (2010). IEEE, 1--8.
[56]
Stergios Stergiou, Zygimantas Straznickas, Rolina Wu, and Kostas Tsioutsiouliklis. 2017. Distributed Negative Sampling for Word Embeddings. In AAAI Conference on Artificial Intelligence (2017). 2569--2575.

Cited By

View all
  • (2024)PrismX: A Single-Machine System for Querying Big GraphsProceedings of the VLDB Endowment10.14778/3685800.368590617:12(4485-4488)Online publication date: 8-Nov-2024
  • (2024)Kimbap: A Node-Property Map System for Distributed Graph AnalyticsProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3620665.3640421(566-581)Online publication date: 27-Apr-2024
  • (2023)MiniGraph: Querying Big Graphs with a Single MachineProceedings of the VLDB Endowment10.14778/3598581.359859016:9(2172-2185)Online publication date: 1-May-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
WSDM '18: Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining
February 2018
821 pages
ISBN:9781450355810
DOI:10.1145/3159652
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 February 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. connected components
  2. distributed systems
  3. graph algorithms
  4. label propagation
  5. lpsc

Qualifiers

  • Research-article

Conference

WSDM 2018

Acceptance Rates

WSDM '18 Paper Acceptance Rate 81 of 514 submissions, 16%;
Overall Acceptance Rate 498 of 2,863 submissions, 17%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)34
  • Downloads (Last 6 weeks)5
Reflects downloads up to 11 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)PrismX: A Single-Machine System for Querying Big GraphsProceedings of the VLDB Endowment10.14778/3685800.368590617:12(4485-4488)Online publication date: 8-Nov-2024
  • (2024)Kimbap: A Node-Property Map System for Distributed Graph AnalyticsProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3620665.3640421(566-581)Online publication date: 27-Apr-2024
  • (2023)MiniGraph: Querying Big Graphs with a Single MachineProceedings of the VLDB Endowment10.14778/3598581.359859016:9(2172-2185)Online publication date: 1-May-2023
  • (2023)Embracing Irregular Parallelism in HPC with YGMProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607103(1-13)Online publication date: 12-Nov-2023
  • (2023)Contour Algorithm for Connectivity2023 IEEE 30th International Conference on High Performance Computing, Data, and Analytics (HiPC)10.1109/HiPC58850.2023.00022(66-75)Online publication date: 18-Dec-2023
  • (2022)BlazeProceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis10.5555/3571885.3571943(1-15)Online publication date: 13-Nov-2022
  • (2022)UniCon: A unified star-operation to efficiently find connected components on a cluster of commodity hardwarePLOS ONE10.1371/journal.pone.027752717:11(e0277527)Online publication date: 30-Nov-2022
  • (2022)Direction-optimizing Label Propagation Framework for Structure Detection in Graphs: Design, Implementation, and Experimental AnalysisACM Journal of Experimental Algorithmics10.1145/356459327(1-31)Online publication date: 13-Dec-2022
  • (2022)Simple Concurrent Connected Components AlgorithmsACM Transactions on Parallel Computing10.1145/35435469:2(1-26)Online publication date: 1-Sep-2022
  • (2022)Blaze: Fast Graph Processing on Fast SSDsSC22: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41404.2022.00049(1-15)Online publication date: Nov-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media