More Web Proxy on the site http://driver.im/

research-article

G-store: high-performance graph store for trillion-edge processing

Authors:

H. Howie HuangAuthors Info & Claims

SC '16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

Article No.: 71, Pages 1 - 12

Published: 13 November 2016 Publication History

Abstract

High-performance graph processing brings great benefits to a wide range of scientific applications, e.g., biology networks, recommendation systems, and social networks, where such graphs have grown to terabytes of data with billions of vertices and trillions of edges. Subsequently, storage performance plays a critical role in designing a high-performance computer system for graph analytics. In this paper, we present G-Store, a new graph store that incorporates three techniques to accelerate the I/O and computation of graph algorithms. First, G-Store develops a space-efficient tile format for graph data, which takes advantage of the symmetry present in graphs as well as a new smallest number of bits representation. Second, G-Store utilizes tile-based physical grouping on disks so that multi-core CPUs can achieve high cache and memory performance and fully utilize the throughput from an array of solid-state disks. Third, G-Store employs a novel slide-cache-rewind strategy to pipeline graph I/O and computing. With a modest amount of memory, G-Store utilizes a proactive caching strategy in the system so that all fetched graph data are fully utilized before evicted from memory. We evaluate G-Store on a number of graphs against two state-of-the-art graph engines and show that G-Store achieves 2 to 8X saving in storage and outperforms both by 2 to 32X. G-Store is able to run different algorithms on trillion-edge graphs within tens of minutes, setting a new milestone in semi-external graph processing system.

References

[1]

Friendster Network Dataset - KONECT. http://konect.uni-koblenz.de/networks/friendster.

[2]

Twitter (MPI) Network Dataset - KONECT. http://konect.uni-koblenz.de/networks/twitter_mpi.

[3]

Web Graphs. http://webdatacommons.org/hyperlinkgraph/2012-08/download.html.

[4]

D. A. Bader, G. Cong, and J. Feo. On the Architectural Requirements for Efficient Execution of Graph Algorithms. In Proceedings of the 2005 International Conference on Parallel Processing, 2005.

Digital Library

[5]

S. Beamer, K. Asanovic, and D. Patterson. Direction-Optimizing Breadth-First Search. In International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2012.

Digital Library

[6]

S. Brin and L. Page. The Anatomy of a Large-scale Hypertextual Web Search Engine. In Proceedings of the Seventh International Conference on World Wide Web 7, Amsterdam, The Netherlands, 1998.

Digital Library

[7]

F. Checconi and F. Petrini. Traversing Trillions of Edges in Real Time: Graph Exploration on Large-Scale Parallel Machines. In Proceedings of the 28th IEEE International Parallel and Distributed Processing Symposium (IPDPS), May 2014.

Digital Library

[8]

R. Chen, J. Shi, Y. Chen, and H. Chen. PowerLyra: Differentiated Graph Computation and Partitioning on Skewed Graphs. In Proceedings of the Tenth European Conference on Computer Systems, 2015.

Digital Library

[9]

A. Ching, S. Edunov, M. Kabiljo, D. Logothetis, and S. Muthukrishnan. One Trillion Edges: Graph Processing at Facebook-scale. Proceedings of the 41st International Conference on Very Large Data Bases (VLDB), Aug. 2015.

Digital Library

[10]

L. K. Fleischer, B. Hendrickson, and A. Pinar. On identifying strongly connected components in parallel. In Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing. 2000.

Digital Library

[11]

Giraph. http://giraph.apache.org.

[12]

J. E. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin. Power-Graph: Distributed Graph-Parallel Computation on Natural Graphs. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2012.

Digital Library

[13]

J. E. Gonzalez, R. S. Xin, A. Dave, D. Crankshaw, M. J. Franklin, and I. Stoica. GraphX: Graph Processing in a Distributed Dataflow Framework. In Proceedings of the 11th USENIX conference on Operating Systems Design and Implementation (OSDI), 2014.

Digital Library

[14]

Graph500. http://www.graph500.org/.

[15]

W.-S. Han, S. Lee, K. Park, J.-H. Lee, M.-S. Kim, J. Kim, and H. Yu. TurboGraph: A Fast Parallel Graph Engine Handling Billion-scale Graphs in a Single PC. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2013.

Digital Library

[16]

B. A. Huberman and L. A. Adamic. Internet: Growth dynamics of the World-Wide Web. Nature, 1999.

[17]

Intel Haswell. https://en.wikipedia.org/wiki/Haswell_(microarchitecture).

[18]

H. Jeong, B. Tombor, R. Albert, Z. N. Oltvai, and A.-L. Barabási. The large-scale organization of metabolic networks. Nature, 2000.

[19]

U. Kang, H. Tong, J. Sun, C.-Y. Lin, and C. Faloutsos. GBASE: A Scalable and General Graph Management System. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2011.

Digital Library

[20]

A. Kyrola, G. E. Blelloch, and C. Guestrin. GraphChi: Large-Scale Graph Computation on Just a PC. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2012.

Digital Library

[21]

H. Liu and H. H. Huang. Enterprise: Breadth-First Graph Traversal on GPUs. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2015.

Digital Library

[22]

H. Liu, H. H. Huang, and Y. Hu. iBFS: Concurrent Breadth-First Search on GPUs. In Proceedings of the SIGMOD International Conference on Management of Data, 2016.

Digital Library

[23]

Y. Low, D. Bickson, J. Gonzalez, C. Guestrin, A. Kyrola, and J. M. Hellerstein. Distributed GraphLab: A Framework for Machine Learning and Data Mining in the Cloud. Proceedings of the VLDB Endowment (VLDB), 2012.

Digital Library

[24]

G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: A System for Large-Scale Graph Processing. In Proceedings of the SIGMOD International Conference on Management of data, 2010.

Digital Library

[25]

D. Nguyen, A. Lenharth, and K. Pingali. A Lightweight Infrastructure for Graph Analytics. In Proceedings of the ACM Symposium on Operating Systems Principles (SOSP), 2013.

Digital Library

[26]

R. Pearce, M. Gokhale, and N. M. Amato. Multithreaded Asynchronous Graph Traversal for In-Memory and Semi-External Memory. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis(SC), 2010.

Digital Library

[27]

A. Roy, L. Bindschaedler, J. Malicevic, and W. Zwaenepoel. Chaos: Scale-out Graph Processing from Secondary Storage. In Proceedings of the 25th Symposium on Operating Systems Principles, 2015.

Digital Library

[28]

A. Roy, I. Mihailovic, and W. Zwaenepoel. X-Stream: Edge-centric Graph Processing using Streaming Partitions. In Proceedings of the ACM Symposium on Operating Systems Principles (SOSP), 2013.

Digital Library

[29]

S. M. Rumble, A. Kejriwal, and J. Ousterhout. Log-structured Memory for DRAM-based Storage. In Proceedings of the 12th USENIX Conference on File and Storage Technologies (FAST), 2014.

Digital Library

[30]

B. Shao, H. Wang, and Y. Li. Trinity: A Distributed Graph Engine on a Memory Cloud. In Proceedings of the SIGMOD International Conference on Management of Data, 2013.

Digital Library

[31]

Y. Shiloach and U. Vishkin. An O(log n) Parallel Connectivity Algorithm. Journal of Algorithms, 1982.

Digital Library

[32]

J. Shun and G. E. Blelloch. Ligra: A Lightweight Graph Processing Framework for Shared Memory. In Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming (PPoPP), 2013.

Digital Library

[33]

J. Shun, L. Dhulipala, and G. E. Blelloch. Smaller and Faster: Parallel Processing of Compressed Graphs with Ligra+. In Proceedings of the 2015 Data Compression Conference (DCC), 2015.

Digital Library

[34]

J. Wang, Q. Xiao, J. Yin, and P. Shang. DRAW: A New Data-gRouping-AWare Data Placement Scheme for Data Intensive Applications With Interest Locality. IEEE Transactions on Magnetics, June 2013.

[35]

K. Wang, G. Xu, Z. Su, and Y. D. Liu. GraphQ: Graph Query Processing with Abstraction Refinement---Scalable and Programmable Analytics over Very Large Graphs on a Single PC. In Proceedings of the Usenix Annual Technical Conference, 2015.

Digital Library

[36]

A. Yoo, E. Chow, K. Henderson, W. McLendon, B. Hendrickson, and U. Catalyurek. A Scalable Distributed Parallel Breadth-First Search Algorithm on BlueGene/L. In Proceedings of the 2005 ACM/IEEE conference on Supercomputing, Nov 2005.

Digital Library

[37]

P. Yuan, W. Zhang, C. Xie, H. Jin, L. Liu, and K. Lee. Fast Iterative Graph Computation: A Path Centric Approach. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2014.

Digital Library

[38]

Y. Zhang, Q. Gao, L. Gao, and C. Wang. Maiter: An Asynchronous Graph Processing Framework for Delta-Based Accumulative Iterative Computation. IEEE Transactions on Parallel and Distributed Systems, Aug 2014.

[39]

D. Zheng, D. Mhembere, R. Burns, J. Vogelstein, C. E. Priebe, and A. S. Szalay. FlashGraph: Processing Billion-Node Graphs on an Array of Commodity SSDs. In Proceedings of the 13th USENIX Conference on File and Storage Technologies (FAST), 2015.

Digital Library

[40]

X. Zhu, W. Han, and W. Chen. GridGraph: Large-scale Graph Processing on a Single Machine Using 2-level Hierarchical Partitioning. In Proceedings of the USENIX Conference on Usenix Annual Technical Conference, 2015.

Digital Library

Cited By

Lee EKim JLim KNoh SSeo JDan TDahlia M(2019)Pre-select static caching and neighborhood ordering for BFS-like algorithms on disk-based graph enginesProceedings of the 2019 USENIX Conference on Usenix Annual Technical Conference10.5555/3358807.3358846(459-473)Online publication date: 10-Jul-2019
https://dl.acm.org/doi/10.5555/3358807.3358846
Liu HHuang HDan TDahlia M(2019)SIMD-XProceedings of the 2019 USENIX Conference on Usenix Annual Technical Conference10.5555/3358807.3358843(411-427)Online publication date: 10-Jul-2019
https://dl.acm.org/doi/10.5555/3358807.3358843
Kumar PHuang HMerchant AWeatherspoon H(2019)GRAPHONEProceedings of the 17th USENIX Conference on File and Storage Technologies10.5555/3323298.3323322(249-263)Online publication date: 25-Feb-2019
https://dl.acm.org/doi/10.5555/3323298.3323322
Show More Cited By

Recommendations

G-Store: a scalable data store for transactional multi key access in the cloud
SoCC '10: Proceedings of the 1st ACM symposium on Cloud computing

Cloud computing has emerged as a preferred platform for deploying scalable web-applications. With the growing scale of these applications and the data associated with them, scalable data management systems form a crucial part of the cloud ...
The tail at store: a revelation from millions of hours of disk and SSD deployments
FAST'16: Proceedings of the 14th Usenix Conference on File and Storage Technologies

We study storage performance in over 450,000 disks and 4,000 SSDs over 87 days for an overall total of 857 million (disk) and 7 million (SSD) drive hours. We find that storage performance instability is not uncommon: 0.2% of the time, a disk is more ...
LSM-tree managed storage for large-scale key-value store
SoCC '17: Proceedings of the 2017 Symposium on Cloud Computing

Key-value stores are increasingly adopting LSM-trees as their enabling data structure in the backend storage, and persisting their clustered data through a file system. A file system is expected to not only provide file/directory abstraction to organize ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

SC '16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

November 2016

1034 pages

ISBN:9781467388153

Conference Chair:
John West
University of Texas at Austin

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture
IEEE-CS: Computer Society

In-Cooperation

SIGHPC: ACM Special Interest Group on High Performance Computing, Special Interest Group on High Performance Computing

Publisher

IEEE Press

Publication History

Published: 13 November 2016

Check for updates

Qualifiers

Research-article

Conference

SC16

Sponsor:

SIGARCH
IEEE-CS

SC16: The International Conference for High Performance Computing, Networking, Storage and Analysis

November 13 - 18, 2016

Utah, Salt Lake City

Acceptance Rates

SC '16 Paper Acceptance Rate 81 of 442 submissions, 18%;

Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

16
Total Citations
View Citations
317
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)1

Reflects downloads up to 15 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Lee EKim JLim KNoh SSeo JDan TDahlia M(2019)Pre-select static caching and neighborhood ordering for BFS-like algorithms on disk-based graph enginesProceedings of the 2019 USENIX Conference on Usenix Annual Technical Conference10.5555/3358807.3358846(459-473)Online publication date: 10-Jul-2019
https://dl.acm.org/doi/10.5555/3358807.3358846
Liu HHuang HDan TDahlia M(2019)SIMD-XProceedings of the 2019 USENIX Conference on Usenix Annual Technical Conference10.5555/3358807.3358843(411-427)Online publication date: 10-Jul-2019
https://dl.acm.org/doi/10.5555/3358807.3358843
Kumar PHuang HMerchant AWeatherspoon H(2019)GRAPHONEProceedings of the 17th USENIX Conference on File and Storage Technologies10.5555/3323298.3323322(249-263)Online publication date: 25-Feb-2019
https://dl.acm.org/doi/10.5555/3323298.3323322
Ji YLiu HHuang H(2018)iSpanProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.5555/3291656.3291734(1-12)Online publication date: 11-Nov-2018
https://dl.acm.org/doi/10.5555/3291656.3291734
Lin HZhu XYu BTang XXue WChen WZhang LHoefler TMa XLiu XZheng WXu J(2018)ShenTuProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.5555/3291656.3291731(1-11)Online publication date: 11-Nov-2018
https://dl.acm.org/doi/10.5555/3291656.3291731
Hu YLiu HHuang H(2018)TriCoreProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.5555/3291656.3291675(1-12)Online publication date: 11-Nov-2018
https://dl.acm.org/doi/10.5555/3291656.3291675
Ahmad YKhattab OMalik AMusleh AHammoud MKutlu MShehata MElsayed T(2018)LA3Proceedings of the VLDB Endowment10.14778/3204028.320403511:8(920-933)Online publication date: 1-Apr-2018
https://dl.acm.org/doi/10.14778/3204028.3204035
Sheng FCao QCai HYao JXie C(2018)GraPUProceedings of the ACM Symposium on Cloud Computing10.1145/3267809.3267811(301-312)Online publication date: 11-Oct-2018
https://dl.acm.org/doi/10.1145/3267809.3267811
Besta MStanojevic DZivic TSingh JHoerold MHoefler TEvripidou SStenström PO'Boyle M(2018)Log(graph)Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques10.1145/3243176.3243198(1-13)Online publication date: 1-Nov-2018
https://dl.acm.org/doi/10.1145/3243176.3243198
Ji YLiu HHuang H(2018)iSpanProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.1109/SC.2018.00061(1-12)Online publication date: 11-Nov-2018
https://dl.acm.org/doi/10.1109/SC.2018.00061
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents