[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.5555/3014904.3014999acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

G-store: high-performance graph store for trillion-edge processing

Published: 13 November 2016 Publication History

Abstract

High-performance graph processing brings great benefits to a wide range of scientific applications, e.g., biology networks, recommendation systems, and social networks, where such graphs have grown to terabytes of data with billions of vertices and trillions of edges. Subsequently, storage performance plays a critical role in designing a high-performance computer system for graph analytics. In this paper, we present G-Store, a new graph store that incorporates three techniques to accelerate the I/O and computation of graph algorithms. First, G-Store develops a space-efficient tile format for graph data, which takes advantage of the symmetry present in graphs as well as a new smallest number of bits representation. Second, G-Store utilizes tile-based physical grouping on disks so that multi-core CPUs can achieve high cache and memory performance and fully utilize the throughput from an array of solid-state disks. Third, G-Store employs a novel slide-cache-rewind strategy to pipeline graph I/O and computing. With a modest amount of memory, G-Store utilizes a proactive caching strategy in the system so that all fetched graph data are fully utilized before evicted from memory. We evaluate G-Store on a number of graphs against two state-of-the-art graph engines and show that G-Store achieves 2 to 8X saving in storage and outperforms both by 2 to 32X. G-Store is able to run different algorithms on trillion-edge graphs within tens of minutes, setting a new milestone in semi-external graph processing system.

References

[1]
Friendster Network Dataset - KONECT. http://konect.uni-koblenz.de/networks/friendster.
[2]
Twitter (MPI) Network Dataset - KONECT. http://konect.uni-koblenz.de/networks/twitter_mpi.
[3]
Web Graphs. http://webdatacommons.org/hyperlinkgraph/2012-08/download.html.
[4]
D. A. Bader, G. Cong, and J. Feo. On the Architectural Requirements for Efficient Execution of Graph Algorithms. In Proceedings of the 2005 International Conference on Parallel Processing, 2005.
[5]
S. Beamer, K. Asanovic, and D. Patterson. Direction-Optimizing Breadth-First Search. In International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2012.
[6]
S. Brin and L. Page. The Anatomy of a Large-scale Hypertextual Web Search Engine. In Proceedings of the Seventh International Conference on World Wide Web 7, Amsterdam, The Netherlands, 1998.
[7]
F. Checconi and F. Petrini. Traversing Trillions of Edges in Real Time: Graph Exploration on Large-Scale Parallel Machines. In Proceedings of the 28th IEEE International Parallel and Distributed Processing Symposium (IPDPS), May 2014.
[8]
R. Chen, J. Shi, Y. Chen, and H. Chen. PowerLyra: Differentiated Graph Computation and Partitioning on Skewed Graphs. In Proceedings of the Tenth European Conference on Computer Systems, 2015.
[9]
A. Ching, S. Edunov, M. Kabiljo, D. Logothetis, and S. Muthukrishnan. One Trillion Edges: Graph Processing at Facebook-scale. Proceedings of the 41st International Conference on Very Large Data Bases (VLDB), Aug. 2015.
[10]
L. K. Fleischer, B. Hendrickson, and A. Pinar. On identifying strongly connected components in parallel. In Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing. 2000.
[11]
Giraph. http://giraph.apache.org.
[12]
J. E. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin. Power-Graph: Distributed Graph-Parallel Computation on Natural Graphs. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2012.
[13]
J. E. Gonzalez, R. S. Xin, A. Dave, D. Crankshaw, M. J. Franklin, and I. Stoica. GraphX: Graph Processing in a Distributed Dataflow Framework. In Proceedings of the 11th USENIX conference on Operating Systems Design and Implementation (OSDI), 2014.
[14]
Graph500. http://www.graph500.org/.
[15]
W.-S. Han, S. Lee, K. Park, J.-H. Lee, M.-S. Kim, J. Kim, and H. Yu. TurboGraph: A Fast Parallel Graph Engine Handling Billion-scale Graphs in a Single PC. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2013.
[16]
B. A. Huberman and L. A. Adamic. Internet: Growth dynamics of the World-Wide Web. Nature, 1999.
[17]
Intel Haswell. https://en.wikipedia.org/wiki/Haswell_(microarchitecture).
[18]
H. Jeong, B. Tombor, R. Albert, Z. N. Oltvai, and A.-L. Barabási. The large-scale organization of metabolic networks. Nature, 2000.
[19]
U. Kang, H. Tong, J. Sun, C.-Y. Lin, and C. Faloutsos. GBASE: A Scalable and General Graph Management System. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2011.
[20]
A. Kyrola, G. E. Blelloch, and C. Guestrin. GraphChi: Large-Scale Graph Computation on Just a PC. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2012.
[21]
H. Liu and H. H. Huang. Enterprise: Breadth-First Graph Traversal on GPUs. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2015.
[22]
H. Liu, H. H. Huang, and Y. Hu. iBFS: Concurrent Breadth-First Search on GPUs. In Proceedings of the SIGMOD International Conference on Management of Data, 2016.
[23]
Y. Low, D. Bickson, J. Gonzalez, C. Guestrin, A. Kyrola, and J. M. Hellerstein. Distributed GraphLab: A Framework for Machine Learning and Data Mining in the Cloud. Proceedings of the VLDB Endowment (VLDB), 2012.
[24]
G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: A System for Large-Scale Graph Processing. In Proceedings of the SIGMOD International Conference on Management of data, 2010.
[25]
D. Nguyen, A. Lenharth, and K. Pingali. A Lightweight Infrastructure for Graph Analytics. In Proceedings of the ACM Symposium on Operating Systems Principles (SOSP), 2013.
[26]
R. Pearce, M. Gokhale, and N. M. Amato. Multithreaded Asynchronous Graph Traversal for In-Memory and Semi-External Memory. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis(SC), 2010.
[27]
A. Roy, L. Bindschaedler, J. Malicevic, and W. Zwaenepoel. Chaos: Scale-out Graph Processing from Secondary Storage. In Proceedings of the 25th Symposium on Operating Systems Principles, 2015.
[28]
A. Roy, I. Mihailovic, and W. Zwaenepoel. X-Stream: Edge-centric Graph Processing using Streaming Partitions. In Proceedings of the ACM Symposium on Operating Systems Principles (SOSP), 2013.
[29]
S. M. Rumble, A. Kejriwal, and J. Ousterhout. Log-structured Memory for DRAM-based Storage. In Proceedings of the 12th USENIX Conference on File and Storage Technologies (FAST), 2014.
[30]
B. Shao, H. Wang, and Y. Li. Trinity: A Distributed Graph Engine on a Memory Cloud. In Proceedings of the SIGMOD International Conference on Management of Data, 2013.
[31]
Y. Shiloach and U. Vishkin. An O(log n) Parallel Connectivity Algorithm. Journal of Algorithms, 1982.
[32]
J. Shun and G. E. Blelloch. Ligra: A Lightweight Graph Processing Framework for Shared Memory. In Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming (PPoPP), 2013.
[33]
J. Shun, L. Dhulipala, and G. E. Blelloch. Smaller and Faster: Parallel Processing of Compressed Graphs with Ligra+. In Proceedings of the 2015 Data Compression Conference (DCC), 2015.
[34]
J. Wang, Q. Xiao, J. Yin, and P. Shang. DRAW: A New Data-gRouping-AWare Data Placement Scheme for Data Intensive Applications With Interest Locality. IEEE Transactions on Magnetics, June 2013.
[35]
K. Wang, G. Xu, Z. Su, and Y. D. Liu. GraphQ: Graph Query Processing with Abstraction Refinement---Scalable and Programmable Analytics over Very Large Graphs on a Single PC. In Proceedings of the Usenix Annual Technical Conference, 2015.
[36]
A. Yoo, E. Chow, K. Henderson, W. McLendon, B. Hendrickson, and U. Catalyurek. A Scalable Distributed Parallel Breadth-First Search Algorithm on BlueGene/L. In Proceedings of the 2005 ACM/IEEE conference on Supercomputing, Nov 2005.
[37]
P. Yuan, W. Zhang, C. Xie, H. Jin, L. Liu, and K. Lee. Fast Iterative Graph Computation: A Path Centric Approach. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2014.
[38]
Y. Zhang, Q. Gao, L. Gao, and C. Wang. Maiter: An Asynchronous Graph Processing Framework for Delta-Based Accumulative Iterative Computation. IEEE Transactions on Parallel and Distributed Systems, Aug 2014.
[39]
D. Zheng, D. Mhembere, R. Burns, J. Vogelstein, C. E. Priebe, and A. S. Szalay. FlashGraph: Processing Billion-Node Graphs on an Array of Commodity SSDs. In Proceedings of the 13th USENIX Conference on File and Storage Technologies (FAST), 2015.
[40]
X. Zhu, W. Han, and W. Chen. GridGraph: Large-scale Graph Processing on a Single Machine Using 2-level Hierarchical Partitioning. In Proceedings of the USENIX Conference on Usenix Annual Technical Conference, 2015.

Cited By

View all
  • (2019)Pre-select static caching and neighborhood ordering for BFS-like algorithms on disk-based graph enginesProceedings of the 2019 USENIX Conference on Usenix Annual Technical Conference10.5555/3358807.3358846(459-473)Online publication date: 10-Jul-2019
  • (2019)SIMD-XProceedings of the 2019 USENIX Conference on Usenix Annual Technical Conference10.5555/3358807.3358843(411-427)Online publication date: 10-Jul-2019
  • (2019)GRAPHONEProceedings of the 17th USENIX Conference on File and Storage Technologies10.5555/3323298.3323322(249-263)Online publication date: 25-Feb-2019
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SC '16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
November 2016
1034 pages
ISBN:9781467388153
  • Conference Chair:
  • John West

Sponsors

In-Cooperation

Publisher

IEEE Press

Publication History

Published: 13 November 2016

Check for updates

Qualifiers

  • Research-article

Conference

SC16
Sponsor:

Acceptance Rates

SC '16 Paper Acceptance Rate 81 of 442 submissions, 18%;
Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)1
Reflects downloads up to 15 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2019)Pre-select static caching and neighborhood ordering for BFS-like algorithms on disk-based graph enginesProceedings of the 2019 USENIX Conference on Usenix Annual Technical Conference10.5555/3358807.3358846(459-473)Online publication date: 10-Jul-2019
  • (2019)SIMD-XProceedings of the 2019 USENIX Conference on Usenix Annual Technical Conference10.5555/3358807.3358843(411-427)Online publication date: 10-Jul-2019
  • (2019)GRAPHONEProceedings of the 17th USENIX Conference on File and Storage Technologies10.5555/3323298.3323322(249-263)Online publication date: 25-Feb-2019
  • (2018)iSpanProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.5555/3291656.3291734(1-12)Online publication date: 11-Nov-2018
  • (2018)ShenTuProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.5555/3291656.3291731(1-11)Online publication date: 11-Nov-2018
  • (2018)TriCoreProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.5555/3291656.3291675(1-12)Online publication date: 11-Nov-2018
  • (2018)LA3Proceedings of the VLDB Endowment10.14778/3204028.320403511:8(920-933)Online publication date: 1-Apr-2018
  • (2018)GraPUProceedings of the ACM Symposium on Cloud Computing10.1145/3267809.3267811(301-312)Online publication date: 11-Oct-2018
  • (2018)Log(graph)Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques10.1145/3243176.3243198(1-13)Online publication date: 1-Nov-2018
  • (2018)iSpanProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.1109/SC.2018.00061(1-12)Online publication date: 11-Nov-2018
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media