More Web Proxy on the site http://driver.im/

research-article

A scalable implementation of a mapreduce-based graph processing algorithm for large-scale heterogeneous supercomputers

Authors:

Koichi Shirahata,

Toyotaro Suzumura,

Satoshi MatsuokaAuthors Info & Claims

CCGRID '13: Proceedings of the 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing

Pages 277 - 284

https://doi.org/10.1109/CCGrid.2013.85

Published: 13 May 2013 Publication History

Abstract

Fast processing for extremely large-scale graph is becoming increasingly important in various domains such as health care, social networks, intelligence, system biology, and electric power grids. The GIM-V algorithm based on MapReduce programing model is designed as a general graph processing method for supporting petabyte-scale graph data. On the other hand, recent large-scale data-intensive computing systems tend to employ GPU accelerators to gain good peak performance and high memory bandwidth; however, the validity of acceleration, including optimization techniques, of the GIM-V algorithm using GPUs is an open problem. To address the problem, we implemented a multi-GPU-based GIM-V application with load balance optimization between GPU devices. Our implementation extends the existing MapReduce library for supporting multi-GPU-environments using the MPI library and optimizes load balance between GPU devices by employing task scheduling-based graph partitioning. We conducted our implementation on the TSUBAME2.0 supercomputer using 256 nodes (6144 hyper-threaded CPU cores, 768 GPUs). The results exhibit that our GPU-based implementation performed 87.04 ME/s on 2³⁰ (1.07 billion) vertices and 2³⁴ (17.2 billion) edges, and 1.52 times faster than the CPU-based naive implementation with 2²⁹ vertices and 2³³ edges. We also studied the performance characteristics of our implementation and load balance optimization technique.

References

[1]

"Facebook." {Online}. Available: http://www.facebook.com

[2]

J. A. Ang, B. W. Barrett, K. B. Wheeler, and R. C. Murphy, "Introducing the graph 500," May 2010.

[3]

J. Dean and S. Ghemawat, "MapReduce: Simplified Data Processing on Large Clusters," OSDI '04, Sixth Symposium on Operating System Design and Implementation, pp. 137--150, 2004.

Digital Library

[4]

U. Kang, C. E. Tsourakakis, and C. Faloutsos, "PEGASUS: A Peta-Scale Graph Mining System- Implementation and Observations," 2009.

[5]

A. Bialecki, M. Cordova, D. Cutting, and O. O'Malley, "Hadoop: a framework for running applications on large clusters built of commodity hardware," 2005. {Online}. Available: http://lucene.apache.org/hadoop

[6]

"Amazon elastic compute cloud (amazon ec2)." {Online}. Available: http://aws.amazon.com/ec2/

[7]

K. Shirahata, H. Sato, T. Suzumura, and S. Matsuoka, "A GPU Implementation of Generalized Graph Processing Algorithm GIM-V," in 2012 International Workshop on Parallel Algorithm and Parallel Software (IWPAPS12), September 2012.

Digital Library

[8]

J. Leskovec, D. Chakrabarti, J. Kleinberg, C. Faloutsos, and Z. Ghahramani, "Kronecker graphs: An approach to modeling networks," J. Mach. Learn. Res., vol. 11, pp. 985--1042, Mar. 2010. {Online}. Available: http://dl.acm.org/citation.cfm?id=1756006.1756039

Digital Library

[9]

S. Brin and L. Page, "The Anatomy of a Large-Scale Hypertextual Web Search Engine," in Seventh International World-Wide Web Conference (WWW 1998), 1998. {Online}. Available: http://ilpubs.stanford.edu:8090/361/

Digital Library

[10]

B. He, W. Fang, Q. Luo, N. K.Govindaraju, and T. Wang, "Mars: A mapreduce framework on graphics processors," Parallel Architectures and Compilation Techniques, pp. 260--269, 2008.

Digital Library

[11]

W. Fang, B. He, Q. Luo, and N. K. Govindaraju, "Mars: Accelerating MapReduce with Graphics Processors," IEEE Transactions on Parallel and Distributed Systems, vol. 22, pp. 608--620, 2011.

Digital Library

[12]

N. Govindaraju, J. Gray, R. Kumar, and D. Manocha, "GPUTeraSort: high performance graphics coprocessor sorting for large database management," SIGMOD, 2006.

Digital Library

[13]

R. L. Graham, "Bounds on multiprocessing anomalies and related packing algorithms," in Proceedings of the May 16--18, 1972, spring joint computer conference, ser. AFIPS '72 (Spring). New York, NY, USA: ACM, 1972, pp. 205--217. {Online}. Available

Digital Library

[14]

J. Bruno, E. G. Coffman, Jr., and R. Sethi, "Scheduling independent tasks to reduce mean finishing time," Commun. ACM, vol. 17, no. 7, pp. 382--387, Jul. 1974. {Online}. Available

Digital Library

[15]

C. Chekuri, S. Khanna, and A. Zhu, "Algorithms for minimizing weighted flow time," in Proceedings of the thirty-third annual ACM symposium on Theory of computing, ser. STOC '01. New York, NY, USA: ACM, 2001, pp. 84--93. {Online}. Available

Digital Library

[16]

D. A. Bader, J. Berry, S. Kahan, R. Murphy, and E. J. Riedy. The graph 500 list. Graph500.org. {Online}. Available: http://www.graph500.org/

[17]

J. Leskovec, D. Chakrabarti, J. Kleinberg, C. Faloutsos, and Z. Ghahramani, "Kronecker graphs: An approach to modeling networks," J. Mach. Learn. Res., vol. 11, pp. 985--1042, March 2010. {Online}. Available: http://portal.acm.org/citation.cfm?id=1756006.1756039

Digital Library

[18]

S. Matsuoka, T. Endo, N. Maruyama, H. Sato, and S. Takizawa, "The total picture of tsubame2.0," Tsubame e-Science Journal, vol. 1, pp. 2 -- 4, 2010.

[19]

P. Harish and P. J. Narayanan, "Accelerating large graph algorithms on the GPU using CUDA," in Proceedings of the 14th international conference on High performance computing, ser. HiPC'07. Berlin, Heidelberg: Springer-Verlag, 2007, pp. 197--208. {Online}. Available: http://portal.acm.org/citation.cfm?id=1782174.1782200

Digital Library

[20]

J. A. Stuart and J. D. Owens, "Multi-GPU MapReduce on GPU Clusters," in Proceedings of the 25th IEEE International Parallel and Distributed Processing Symposium. IEEE, May 2011.

Digital Library

[21]

S. J. Plimpton and K. D. Devine, "MapReduce in MPI for Large-scale Graph Algorithms," Parallel Computing, vol. In Press, no. February, pp. 1--39, 2011. {Online}. Available: http://linkinghub.elsevier.com/retrieve/pii/S0167819111000172

Digital Library

[22]

B. Catanzaro, N. Sundaram, and K. Keutzer, "A map reduce framework for programming graphics processors," in In Workshop on Software Tools for MultiCore Systems, 2008.

[23]

G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski, "Pregel: a system for large-scale graph processing," in Proceedings of the 28th ACM symposium on Principles ofdistributed computing, ser. PODC '09. New York, NY, USA: ACM, 2009, pp. 6--6.

Digital Library

[24]

J. Chhugani, N. Satish, C. Kim, J. Sewall, and P. Dubey, "Fast and efficient graph traversal algorithm for cpus: Maximizing single-node efficiency," in Parallel Distributed Processing Symposium (IPDPS), 2012 IEEE 26th International, may 2012, pp. 378--389.

Digital Library

[25]

A. Buluç and K. Madduri, "Parallel breadth-first search on distributed memory systems," in Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, ser. SC '11. New York, NY, USA: ACM, 2011, pp. 65:1--65:12. {Online}. Available

Digital Library

Cited By

Suzumura TLange KMurphy JBinder WMerseguer J(2014)Extreme big data processing in large-scale graph analytics and billion-scale social simulationProceedings of the 5th ACM/SPEC international conference on Performance engineering10.1145/2568088.2576096(1-2)Online publication date: 22-Mar-2014
https://dl.acm.org/doi/10.1145/2568088.2576096

A scalable implementation of a mapreduce-based graph processing algorithm for large-scale heterogeneous supercomputers

Recommendations

A GPU Implementation of Generalized Graph Processing Algorithm GIM-V
CLUSTERW '12: Proceedings of the 2012 IEEE International Conference on Cluster Computing Workshops

Fast processing for extremely large-scale graph, which consists of millions to trillions of vertices and 100 billions to 100 trillions of edges, is becoming increasingly important in various domains such as health care, social networks, intelligence, ...
Towards GPU-Accelerated Large-Scale Graph Processing in the Cloud
CLOUDCOM '13: Proceedings of the 2013 IEEE International Conference on Cloud Computing Technology and Science - Volume 01

Recently, we have witnessed that cloud providers start to offer heterogeneous computing environments. There have been wide interests in both clusters and cloud of adopting graphics processors (GPUs) as accelerators for various applications. On the other ...
A timeenergy performance analysis of MapReduce on heterogeneous systems with GPUs

Motivated by the explosion of Big Data analytics, performance improvements in low-power (wimpy) systems and the increasing energy efficiency of GPUs, this paper presents a timeenergy performance analysis of MapReduce on heterogeneous systems with GPUs. ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

CCGRID '13: Proceedings of the 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing

May 2013

710 pages

ISBN:9780768549965

General Chair:
Dick Epema
Delft University of Technology and Eindhoven University of Technology, the Netherlands

Publisher

IEEE Press

Publication History

Published: 13 May 2013

Check for updates

Author Tags

Qualifiers

Research-article

Conference

CCGrid '13

CCGrid '13: 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing

May 13 - 16, 2013

Delft, Netherlands

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
11
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Suzumura TLange KMurphy JBinder WMerseguer J(2014)Extreme big data processing in large-scale graph analytics and billion-scale social simulationProceedings of the 5th ACM/SPEC international conference on Performance engineering10.1145/2568088.2576096(1-2)Online publication date: 22-Mar-2014
https://dl.acm.org/doi/10.1145/2568088.2576096

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents