[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2487575.2487581acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

TurboGraph: a fast parallel graph engine handling billion-scale graphs in a single PC

Published: 11 August 2013 Publication History

Abstract

Graphs are used to model many real objects such as social networks and web graphs. Many real applications in various fields require efficient and effective management of large-scale graph structured data. Although distributed graph engines such as GBase and Pregel handle billion-scale graphs, the user needs to be skilled at managing and tuning a distributed system in a cluster, which is a nontrivial job for the ordinary user. Furthermore, these distributed systems need many machines in a cluster in order to provide reasonable performance. In order to address this problem, a disk-based parallel graph engine called Graph-Chi, has been recently proposed. Although Graph-Chi significantly outperforms all representative (disk-based) distributed graph engines, we observe that Graph-Chi still has serious performance problems for many important types of graph queries due to 1) limited parallelism and 2) separate steps for I/O processing and CPU processing. In this paper, we propose a general, disk-based graph engine called TurboGraph to process billion-scale graphs very efficiently by using modern hardware on a single PC. TurboGraph is the first truly parallel graph engine that exploits 1) full parallelism including multi-core parallelism and FlashSSD IO parallelism and 2) full overlap of CPU processing and I/O processing as much as possible. Specifically, we propose a novel parallel execution model, called pin-and-slide. TurboGraph also provides engine-level operators such as BFS which are implemented under the pin-and-slide model. Extensive experimental results with large real datasets show that TurboGraph consistently and significantly outperforms Graph-Chi by up to four orders of magnitude! Our implementation of TurboGraph is available at ``http://wshan.net/turbograph}" as executable files.

References

[1]
Yahoo webscope. yahoo! altavista web page hyperlink connectivity graph. http://webscope.sandbox.yahoo.com.
[2]
L. Addario-Berry, W. S. Kennedy, A. D. King, Z. Li, and B. A. Reed. Finding a maximum-weight induced k-partite subgraph of an i-triangulated graph. Discrete Applied Mathematics, 158(7):765--770, 2010.
[3]
L. Backstrom, D. Huttenlocher, J. Kleinberg, and X. Lan. Group formation in large social networks: membership, growth, and evolution. In KDD, pages 44--54. ACM, 2006.
[4]
J. E. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin. Powergraph: distributed graph-parallel computation on natural graphs. In OSDI, pages 17--30, 2012.
[5]
W.-S. Han, J. Lee, and J.-H. Lee. TurboISO: Towards ultrafast and robust subgraph isomorphism search in large graph databases. In SIGMOD, 2013.
[6]
W.-S. Han, J. Lee, M.-D. Pham, and J. X. Yu. igraph: a framework for comparisons of disk-based graph indexing techniques. Proc. VLDB Endow., 3(1--2):449--459, Sept. 2010.
[7]
S. Hong, H. Chafi, E. Sedlar, and K. Olukotun. Green-marl: a dsl for easy and efficient graph analysis. In ASPLOS, pages 349--362, 2012.
[8]
M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad: distributed data-parallel programs from sequential building blocks. In EuroSys, pages 59--72, 2007.
[9]
U. Kang, H. Tong, J. Sun, C.-Y. Lin, and C. Faloutsos. Gbase: a scalable and general graph management system. In KDD, pages 1091--1099, 2011.
[10]
U. Kang, H. Tong, J. Sun, C.-Y. Lin, and C. Faloutsos. gbase: an efficient analysis platform for large graphs. VLDB J., 21(5):637--650, 2012.
[11]
U. Kang, C. E. Tsourakakis, and C. Faloutsos. Pegasus: A peta-scale graph mining system - implementation and observations. In ICDM, pages 229--238, 2009.
[12]
G. Karypis and V. Kumar. Parallel multilevel k-way partitioning for irregular graphs. SIAM Review, 41(2):278--300, 1999.
[13]
H. Kwak, C. Lee, H. Park, and S. B. Moon. What is twitter, a social network or a news media? In WWW, pages 591--600, 2010.
[14]
A. Kyrola, G. Blelloch, and C. Guestrin. Graphchi: large-scale graph computation on just a pc. In OSDI, pages 31--46, 2012.
[15]
J. Lee, W.-S. Han, R. Kasperovics, and J.-H. Lee. An in-depth comparison of subgraph isomorphism algorithms in graph databases. PVLDB, 6(2):133--144, 2012.
[16]
Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J. M. Hellerstein. Distributed graphlab: A framework for machine learning in the cloud. PVLDB, 5(8):716--727, 2012.
[17]
G. Malewicz, M. H. Austern, A. J. C. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: a system for large-scale graph processing. In SIGMOD, pages 135--146, 2010.
[18]
H. Maserrat and J. Pei. Neighbor query friendly compression of social networks. In KDD, pages 533--542, 2010.
[19]
B. Shao, H. Wang, and Y. Li. The trinity graph engine. Technical Report 161291, Microsoft Research, 2012.
[20]
Y. Shiloach and U. Vishkin. An o(logn) parallel connectivity algorithm. Journal of Algorithms, 3(1):57--67, 1982.
[21]
M. Stonebraker et al. C-store: a column-oriented dbms. In VLDB, pages 553--564, 2005.
[22]
X. Yan, P. S. Yu, and J. Han. Graph indexing: A frequent structure-based approach. In SIGMOD, pages 335--346, 2004.
[23]
P. Zhao and J. Han. On graph query optimization in large networks. PVLDB, 3(1):340--351, 2010.

Cited By

View all
  • (2024)Graph Crypto-Stego System for Securing Graph Data Using Association SchemesJournal of Applied Mathematics10.1155/2024/20843422024(1-12)Online publication date: 2-Mar-2024
  • (2024)SmartGraph: A Framework for Graph Processing in Computational StorageProceedings of the 2024 ACM Symposium on Cloud Computing10.1145/3698038.3698538(737-754)Online publication date: 20-Nov-2024
  • (2024)CAVE: Concurrency-Aware Graph Processing on SSDsProceedings of the ACM on Management of Data10.1145/36549282:3(1-26)Online publication date: 30-May-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '13: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
August 2013
1534 pages
ISBN:9781450321747
DOI:10.1145/2487575
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 August 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. big data
  2. graph processing
  3. parallelism
  4. pin-and-slide

Qualifiers

  • Research-article

Conference

KDD' 13
Sponsor:

Acceptance Rates

KDD '13 Paper Acceptance Rate 125 of 726 submissions, 17%;
Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)172
  • Downloads (Last 6 weeks)77
Reflects downloads up to 11 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Graph Crypto-Stego System for Securing Graph Data Using Association SchemesJournal of Applied Mathematics10.1155/2024/20843422024(1-12)Online publication date: 2-Mar-2024
  • (2024)SmartGraph: A Framework for Graph Processing in Computational StorageProceedings of the 2024 ACM Symposium on Cloud Computing10.1145/3698038.3698538(737-754)Online publication date: 20-Nov-2024
  • (2024)CAVE: Concurrency-Aware Graph Processing on SSDsProceedings of the ACM on Management of Data10.1145/36549282:3(1-26)Online publication date: 30-May-2024
  • (2024)GraphZeppelin: How to Find Connected Components (Even When Graphs Are Dense, Dynamic, and Massive)ACM Transactions on Database Systems10.1145/364384649:3(1-31)Online publication date: 16-May-2024
  • (2024)How to create graphs in hardware-constrained environments? Choosing the best creation approach via machine learning-based predictive modelsInternational Journal of Data Science and Analytics10.1007/s41060-023-00495-5Online publication date: 2-Feb-2024
  • (2024)Secure shortest distance queries over encrypted graph in cloud computingWireless Networks10.1007/s11276-024-03692-730:4(2633-2646)Online publication date: 29-Feb-2024
  • (2023)MITra: A Framework for Multi-Instance Graph TraversalProceedings of the VLDB Endowment10.14778/3603581.360359416:10(2551-2564)Online publication date: 1-Jun-2023
  • (2023)SAGE: A Storage-Based Approach for Scalable and Efficient Sparse Generalized Matrix-Matrix MultiplicationProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3615044(923-933)Online publication date: 21-Oct-2023
  • (2023)NosWalker: A Decoupled Architecture for Out-of-Core Random Walk ProcessingProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3582016.3582025(466-482)Online publication date: 25-Mar-2023
  • (2023)LOSC: A locality-optimized subgraph construction scheme for out-of-core graph processingJournal of Parallel and Distributed Computing10.1016/j.jpdc.2022.10.005172(51-68)Online publication date: Feb-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media