Originally developed to support video games, graphics processor units (GPUs) are now increasingly used for general-purpose (non-graphics) applications ranging from machine learning to mining of cryptographic currencies. GPUs can achieve improved performance and efficiency versus central processing units (CPUs) by dedicating a larger fraction of hardware resources to computation. In addition, their general-purpose programmability makes contemporary GPUs appealing to software developers in comparison to domain-specific accelerators. This book provides an introduction to those interested in studying the architecture of GPUs that support general-purpose computing. It collects together information currently only found among a wide range of disparate sources. The authors led development of the GPGPU-Sim simulator widely used in academic research on GPU architectures. The first chapter of this book describes the basic hardware structure of GPUs and provides a brief overview of their history. Chapter 2 provides a summary of GPU programming models relevant to the rest of the book. Chapter 3 explores the architecture of GPU compute cores. Chapter 4 explores the architecture of the GPU memory system. After describing the architecture of existing systems, Chapters \ref{ch03} and \ref{ch04} provide an overview of related research. Chapter 5 summarizes cross-cutting research impacting both the compute core and memory system. This book should provide a valuable resource for those wishing to understand the architecture of graphics processor units (GPUs) used for acceleration of general-purpose applications and to those who want to obtain an introduction to the rapidly growing body of research exploring how to improve the architecture of these GPUs.
Cited By
- Okanovic P, Kwasniewski G, Labini P, Besta M, Vella F and Hoefler T High Performance Unstructured SpMM Computation Using Tensor Cores Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, (1-14)
- Zhang Y, Wang M, Mai Y and Yu Z (2023). TensorCache: Reconstructing Memory Architecture With SRAM-Based In-Cache Computing for Efficient Tensor Computations in GPGPUs, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 31:12, (2030-2043), Online publication date: 1-Dec-2023.
- Klashtorny A, Wu Z, Kaushik A and Patel H (2023). Predictable GPU Wavefront Splitting for Safety-Critical Systems, ACM Transactions on Embedded Computing Systems, 22:5s, (1-25), Online publication date: 31-Oct-2023.
- Topçu B and Öz I (2023). Soft error vulnerability prediction of GPGPU applications, The Journal of Supercomputing, 79:6, (6965-6990), Online publication date: 1-Apr-2023.
- Saed M, Chou Y, Liu L, Nowicki T and Aamodt T Vulkan-Sim: A GPU Architecture Simulator for Ray Tracing Proceedings of the 55th Annual IEEE/ACM International Symposium on Microarchitecture, (263-281)
- Oz I and Arslan S (2019). A Survey on Multithreading Alternatives for Soft Error Fault Tolerance, ACM Computing Surveys, 52:2, (1-38), Online publication date: 31-Mar-2020.
- Zhao X, Adileh A, Yu Z, Wang Z, Jaleel A and Eeckhout L Adaptive memory-side last-level GPU caching Proceedings of the 46th International Symposium on Computer Architecture, (411-423)
Index Terms
- General-purpose Graphics Processor Architectures
Recommendations
A performance study of general-purpose applications on graphics processors using CUDA
Graphics processors (GPUs) provide a vast number of simple, data-parallel, deeply multithreaded cores and high memory bandwidths. GPU architectures are becoming increasingly programmable, offering the potential for dramatic speedups for a variety of ...
Using modern graphics architectures for general-purpose computing: a framework and analysis
MICRO 35: Proceedings of the 35th annual ACM/IEEE international symposium on MicroarchitectureRecently, graphics hardware architectures have begun to emphasize versatility, offering rich new ways to programmatically reconfigure the graphics pipeline. In this paper, we explore whether current graphics architectures can be applied to problems ...
Exploring Graphics Processor Performance for General Purpose Applications
DSD '05: Proceedings of the 8th Euromicro Conference on Digital System DesignGraphics processors are designed to perform many floating-point operations per second. Consequently, they are an attractive architecture for high-performance computing at a low cost. Nevertheless, it is still not very clear how to exploit all their ...