More Web Proxy on the site http://driver.im/

Article

GPU Cluster for High Performance Computing

Authors:

Suzanne Yoakum-StoverAuthors Info & Claims

SC '04: Proceedings of the 2004 ACM/IEEE conference on Supercomputing

Page 47

https://doi.org/10.1109/SC.2004.26

Published: 06 November 2004 Publication History

Abstract

Inspired by the attractive Flops/dollar ratio and the incredible growth in the speed of modern graphics processing units (GPUs), we propose to use a cluster of GPUs for high performance scientific computing. As an example application, we have developed a parallel flow simulation using the lattice Boltzmann model (LBM) on a GPU cluster and have simulated the dispersion of airborne contaminants in the Times Square area of New York City. Using 30 GPU nodes, our simulation can compute a 480x400x80 LBM in 0.31second/step, a speed which is 4.6 times faster than that of our CPU cluster implementation. Besides the LBM, we also discuss other potential applications of the GPU cluster, such as cellular automata, PDE solvers, and FEM.

References

[1]

{1} General-Purpose Computation Using Graphics Hardware (GPGPU). http://www.gpgpu.org.

[2]

{2} J. Backus. Can programming be liberated from the von Neumann style? A functional style and its algebra of programs. ACM Turing Award Lecture, 1977.

[3]

{3} J. Bolz, I. Farmer, E. Grinspun, and P. Schröder. Sparse matrix solvers on the GPU: conjugate gradients and multigrid. ACM Trans. Graph. (SIGGRAPH), 22(3):917-924, 2003.

Digital Library

[4]

{4} M. Brown, M. Leach, R. Calhoun, W.S. Smith, D. Stevens, J. Reisner, R. Lee, N.-H. Chin, and D. DeCroix. Multiscale modeling of air flow in Salt Lake City and the surrounding region. ASCE Structures Congress, 2001. LA-UR-01-509.

[5]

{5} M. Brown, M. Leach, J. Reisner, D. Stevens, S. Smith, H.- N. Chin, S. Chan, and B. Lee. Numerical modeling from mesoscale to urban scale to building scale. AMS 3rd Urb. Env. Symp., 2000.

[6]

{6} I. Buck, T. Foley, D. Horn, J. Sugerman, K. Fatahalian, M. Houston, and P. Hanrahan. Brook for GPUs: Stream Computing on Graphics Hardware. ACM Trans. Graph. (SIGGRAPH), to appear, 2004.

Digital Library

[7]

{7} N. A. Carr, J. D. Hall, and J. C. Hart. The ray engine. Proceedings of Graphics Hardware, pages 37-46, September 2002.

Digital Library

[8]

{8} D. D'Humieres, M. Bouzidi, and P. Lallemand. Thirteen-velocity three-dimensional lattice Boltzmann model. Phys. Rev. E, 63(066702), 2001.

[9]

{9} N. K. Govindaraju, A. Sud, S.-E. Yoon, and D. Manocha. Interactive visibility culling in complex environments using occlusion-switches. In Proceedings Symposium on Interactive 3D Graphics, pages 103-112, 2003.

Digital Library

[10]

{10} M. Harris, G. Coombe, T. Scheuermann, and A. Lastra. Physically-based visual simulation on graphics hardware. SIGGRAPH/Eurographics Workshop on Graphics Hardware, pages 109-118, September 2002.

Digital Library

[11]

{11} M. J. Harris. GPGPU: Beyond graphics. Eurographics Tutorial , August 2004.

[12]

{12} A. Heirich, P. Ezolt, M. Shand, E. Oertli, and G. Lupton. Performance scaling and depth/alpha acquisition in DVI graphics clusters. In Proc. Workshop on Commodity-Based Visualization Clusters CCViz02, 2002.

[13]

{13} G. Humphreys, M. Eldridge, I. Buck, G. Stoll, M. Everett, and P. Hanrahan. Wiregl: a scalable graphics system for clusters. In Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH), pages 129-140, 2001.

Digital Library

[14]

{14} G. Humphreys, M. Houston, R. Ng, R. Frank, S. Ahern, P. D. Kirchner, and J. T. Klosowski. Chromium: a stream-processing framework for interactive rendering on clusters. In Proceedings of the 29th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH), pages 693-702, 2002.

Digital Library

[15]

{15} D. Kirk. Innovation in graphics technology. Talk in Canadian Undergraduate Technology Conference, 2004.

[16]

{16} J. Krüger and R. Westermann. Linear algebra operators for GPU implementation of numerical algorithms. ACM Trans. Graph. (SIGGRAPH), 22(3):908-916, 2003.

Digital Library

[17]

{17} P. Lallemand and L. Luo. Theory of the lattice Boltzmann method: Accoustic and thermal properties in two and three dimensions. Phys. Rev. E, 68(036706), 2003.

[18]

{18} W. Li, X. Wei, and A. Kaufman. Implementing lattice Boltzmann computation on graphics hardware. Visual Computer, 19(7-8): 444-456, December 2003.

Digital Library

[19]

{19} C. P. Lowe and S. Succi. Go-with-the-flow lattice Boltzmann methods for tracer dynamics, chapter 9. Lecture Notes in Physics. Springer-Verlag, 2002.

[20]

{20} W. R. Mark, R. S. Glanville, K. Akeley, and M. J. Kilgard. Cg: a system for programming graphics hardware in a C-like language. ACM Trans. Graph. (SIGGRAPH), 22(3): 896-907, 2003.

Digital Library

[21]

{21} N. Martys, J. Hagedorn, D. Goujon, and J. Devaney. Large scale simulations of single and multi-component flow in porous media. Proceedings of The International Symposium on Optical Science, Engineering, and Instrumentation, June 1999.

[22]

{22} F. Massaioli and G. Amati. Optimization and scaling of an OpenMP LBM code on IBM SP nodes. Scicomp06 Talk, August 2002.

[23]

{23} F. Massaioli and G. Amati. Performance portability of a lattice Boltzmann code. Scicomp09 Talk, March 2004.

[24]

{24} R. Mei, W. Shyy, D. Yu, and L. S. Luo. Lattice Boltzmann method for 3-D flows with curved boundary. J. Comput. Phys., 161:680-699, March 2000.

Digital Library

[25]

{25} L. Moll, A. Heirich, and M. Shand. Sepia: scalable 3D compositing using PCI pamette. In Proc. IEEE Symposium on Field Programmable Custom Computing Machines, pages 146-155, April 1999.

Digital Library

[26]

{26} S. Succi. The Lattice Boltzmann Equation for Fluid Dynamics and Beyond. Numerical Mathematics and Scientific Computation. Oxford University Press, 2001.

[27]

{27} A.T.C. Tam and C.-L. Wang. Contention-aware communication schedule for high-speed communication. Cluster Computing , (4), 2003.

Digital Library

[28]

{28} C. J. Thompson, S. Hahn, and M. Oskin. Using modern graphics architectures for general-purpose computing: A framework and analysis. International Symposium on Microarchitecture (MICRO), November 2002.

Digital Library

[29]

{29} S. Venkatasubramanian. The graphics card as a stream computer. SIGMOD Workshop on Management and Processing of Massive Data, June 2003.

[30]

{30} A. Wilen, J. Schade, and R. Thornburg. Introduction to PCI Express^*: A Hardware and Software Developer's Guide. 2003.

Digital Library

[31]

{31} D. A. Wolf-Gladrow. Lattice Gas Cellular Automata and Lattice Boltzmann Models: an Introduction. Springer-Verlag, 2000.

[32]

{32} F. Zara, F. Faure, and J-M. Vincent. Physical cloth simulation on a PC cluster. In Proceedings of the Fourth Eurographics Workshop on Parallel Graphics and Visualization, pages 105- 112, 2002.

Digital Library

Cited By

Eiling NLankes SMonti A(2023)Checkpoint/Restart for CUDA KernelsProceedings of the SC '23 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624254(1729-1737)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3624062.3624254
Song LChen FLi HChen YMohror KArnold DBadia R(2023)ReFloat: Low-Cost Floating-Point Processing in ReRAM for Accelerating Iterative Linear SolversProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607077(1-15)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3581784.3607077
Welton BMiller BEl-Araby EEl-Ghazawi TPanda D(2018)Exposing hidden performance opportunities in high performance GPU applicationsProceedings of the 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing10.1109/CCGRID.2018.00045(301-310)Online publication date: 1-May-2018
https://dl.acm.org/doi/10.1109/CCGRID.2018.00045
Show More Cited By

Recommendations

Performance modeling of 3D MPDATA simulations on GPU cluster

The goal of this study is to parallelize the multidimensional positive definite advection transport algorithm (MPDATA) across a computational cluster equipped with GPUs. Our approach permits us to provide an extensive overlapping GPU computations and ...
High Performance Computing via a GPU
ICISE '09: Proceedings of the 2009 First IEEE International Conference on Information Science and Engineering

Graphics processor units (GPUs), such as the AMD FireStream series, offer a tremendous computing power that is frequently an order of magnitude larger than even the most modern multi-core CPUs, making them an attractive platform for high performance ...
Accelerated high-performance computing through efficient multi-process GPU resource sharing
CF '12: Proceedings of the 9th conference on Computing Frontiers

The HPC field is witnessing a widespread adoption of GPUs as accelerators for traditional homogeneous HPC systems. One of the prevalent parallel programming models is the SPMD paradigm, which has been adapted for GPU-based parallel processing. Since ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

SC '04: Proceedings of the 2004 ACM/IEEE conference on Supercomputing

November 2004

724 pages

ISBN:0769521533

General Chair:
Jeff Huskamp

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture
IEEE-CS: Computer Society

Publisher

IEEE Computer Society

United States

Publication History

Published: 06 November 2004

Check for updates

Author Tags

Qualifiers

Article

Conference

SC '04

Sponsor:

SIGARCH
IEEE-CS

SC '04: International Conference for High Performance Computing, Networking, Storage and Analysis

November 6 - 12, 2004

Acceptance Rates

SC '04 Paper Acceptance Rate 60 of 200 submissions, 30%;

Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

59
Total Citations
View Citations
97
Total Downloads

Downloads (Last 12 months)24
Downloads (Last 6 weeks)2

Reflects downloads up to 08 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Eiling NLankes SMonti A(2023)Checkpoint/Restart for CUDA KernelsProceedings of the SC '23 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624254(1729-1737)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3624062.3624254
Song LChen FLi HChen YMohror KArnold DBadia R(2023)ReFloat: Low-Cost Floating-Point Processing in ReRAM for Accelerating Iterative Linear SolversProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607077(1-15)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3581784.3607077
Welton BMiller BEl-Araby EEl-Ghazawi TPanda D(2018)Exposing hidden performance opportunities in high performance GPU applicationsProceedings of the 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing10.1109/CCGRID.2018.00045(301-310)Online publication date: 1-May-2018
https://dl.acm.org/doi/10.1109/CCGRID.2018.00045
Navarro-Hinojosa ORuiz-Loza SAlencastre-Miranda M(2018)Physically based visual simulation of the Lattice Boltzmann method on the GPUThe Journal of Supercomputing10.1007/s11227-018-2392-874:7(3441-3467)Online publication date: 1-Jul-2018
https://dl.acm.org/doi/10.1007/s11227-018-2392-8
Yu LYao HLiao X(2016)A novel GPU resources management and scheduling system based on virtual machinesInternational Journal of High Performance Computing and Networking10.1504/ijhpcn.2016.0804159:5-6(423-430)Online publication date: 1-Jan-2016
https://dl.acm.org/doi/10.1504/ijhpcn.2016.080415
Kemal JDavis ROwens J(2016)Multidisciplinary simulation acceleration using multiple shared memory graphical processing unitsInternational Journal of High Performance Computing Applications10.1177/109434201663911430:4(486-508)Online publication date: 1-Nov-2016
https://dl.acm.org/doi/10.1177/1094342016639114
Liu BQiu WJiang LGong Z(2016)Software pipelining for graphic processing unit accelerationInternational Journal of High Performance Computing Applications10.1177/109434201558584530:2(169-185)Online publication date: 1-May-2016
https://dl.acm.org/doi/10.1177/1094342015585845
de Carvalho Junior FRezende Cde Carvalho Silva JGuimarães Al-Alam WUchoa de Alencar J(2016)Contextual abstraction in a type system for component-based high performance computing platformsScience of Computer Programming10.1016/j.scico.2016.07.005132:P1(96-128)Online publication date: 15-Dec-2016
https://dl.acm.org/doi/10.1016/j.scico.2016.07.005
Gray AHart AHenrich OStratford K(2015)Scaling soft matter physics to thousands of graphics processing units in parallelInternational Journal of High Performance Computing Applications10.1177/109434201557684829:3(274-283)Online publication date: 1-Aug-2015
https://dl.acm.org/doi/10.1177/1094342015576848
Kim BJung HNadimi ECerny TKim SWang W(2015)A case study of data transfer efficiency optimization for GPU- and infiniband-based clustersProceedings of the 2015 Conference on research in adaptive and convergent systems10.1145/2811411.2811468(247-250)Online publication date: 9-Oct-2015
https://dl.acm.org/doi/10.1145/2811411.2811468
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten