Abstract
This chapter investigates the architectural design of a 3D die-stacked Graphics Processing Unit. The investigation includes a discussion of the design space of the system as well as some empirical results that quantify the expected performance gain of such a system. Also, the chapter discusses the cost, power and thermal aspects of the proposed designs.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Stanford University CS488a Spring 2007 Real-Time Graphics Architecture, available at: http://graphics.stanford.edu/cs448-07-spring/
R. del Barrio, V. M. Gonzalez, C. Roca, J. Fernandez, and A. Espasa E., “ATTILA: A Cycle-Level Execution-Driven Simulator for Modern GPU Architectures,” in Proc. International Symposium on Performance Analysis of Systems and Software, 2006, pages 231–241
General-Purpose Computation Using Graphics Hardware, available at: www.gpgpu.com
Nvidia: CUDA Homepage, available at: http://www.nvidia.com/object/cuda_home.html
ATI Stream Software Development Kit (SDK), available at: http://developer.amd.com/gpu/ATIStreamSDK/Pages/default.aspx
GeForce GTX200 Technical Brief, available at: http://www.nvidia.com/docs/IO/55506/GeForce_GTX_200_GPU_Technical_Brief.pdf
Yuh-Fang Tsai, Y. Xie, N. Vijaykrishnan, and M. Jane Irwin, “Three-Dimensional Cache Design Exploration Using 3DCacti,” in Proc. International Conference on Computer Design, 2005, pages 519–524
N. Govindaraju, S. Larsen, J. Gray, and D. Manocha, “A Memory Model for Scientific Algorithms on Graphics Processors,” in Proc. Conference on High Performance Networking and Computing, 2006. Article No. 89
N. Goodnight, C. Woolley, G. Lewin, D. Luebke, and G. Humphreys, “A Multigrid Solver for Boundary Value Problems Using Programmable Graphics Hardware,” in Proc. SIGGRAPH/EUROGRAPHICS Conference on Graphics Hardware, 2003, pages 102–111
K. Fatahalian, J. Sugerman, and P. Hanrahan, “Understanding the Efficiency of GPU Algorithms for Matrix-Matrix Multiplication,” in Proc. SIGGRAPH, 2004, pages 133–137
CACTI Cache Simulator, available at: http://www.hpl.hp.com/research/cacti/
V. K. Kodavalla, “IP Gate Count Estimation Methodology During Micro-Architecture Phase,” in IP based Electronic System Conference and Exhibition, Dec. 5–6 2007, Grenoble, France, available at: http://www.design-reuse.com/ipbasedsocdesign/slides_2007-32_01.html
ITRS, “International Technology Roadmap for Semiconductors,” available at: www.itrs.net
X. Dong, and Y. Xie, “System-Level Cost Analysis and Design Exploration for 3D ICs,” in Proc. Asia and South Pacific Design Automation Conference, 2009, pages 234–241, Yokohama, Japan
J. L. Hennessy, and D. A. Patterson, Computer Architecture: A Quantitative Approach. Fourth Edition, Wiley, San Francisco, CA, 2010
M. Saravana Sibi Govindan, S. W. Keckler, S. R. Nassif, and E. Acar, “A Temperature Aware Power Estimation Methodology,” ASPDAC, January 2008
K. Skadron, M. R. Stan, W. Velusamy, K. Sankaranarayanan, and D. Tarjan, “Temperature-Aware Microarchitecture,” in Proc. International Symposium on Computer Architecture, 2003, pages 2–13
Attila Project: AttilaWiki, available at: https://attila.ac.upc.edu/wiki/index.php/Main_Page, 2008
OpenGL, available at: http://www.opengl.org/
DirectX Library, available at: http://www.microsoft.com/games/en-US/aboutGFW/pages/directx.aspx
D. Luebke, and G. Humphreys, How GPUs Work, in IEEE Computer, vol. 40, no. 2, pages 126–130, 2007
S. Jones, “2008 IC Economics Report,” in IC Knowledge LLC, 2008, available at: http://www.icknowledge.com/
S. Rodriguez, and B. Jacob, “Energy/power Breakdown of Pipelined Nanometer Caches (90nm/65nm/45nm/32),” in Proc. International Symposium on Low Power Electronics and Design, 2006, pages 25–30
J. D. Hall, N. Carr, and J. Hart, “Cache and Bandwidth Aware Matrix Multiplication on the GPU,” Technical Report UIUCDCS-R-2003-2328, University of Illinois Urbana-Champain, 2003
M. Silberstein, A. Schuster, D. Geiger, A. Patney, and J. D. Owens, “Efficient Computation of Sum-Products on GPUs Through Software-Managed Cache,” in Proc. Inter. Conference on Supercomputing, 2008, pages 308–318
G. Luca Loi, B. Agrawal, N. Srivastava, Sheng-Chih Lin, T. Sherwood, and K. Banerjee, “A Thermally-Aware Performance Analysis of Vertically Integrated (3-D) Processor-Memory Hierarchy,” in Proc. Design Automation Conference, 2006, pages 991–996
K. Puttaswamy, and G. H. Loh, “Thermal Herding: Microarchitecture Techniques for Controlling Hotspots in High-Performance 3D-Integrated Processors,” in Proc. HPCA, 2007, pages 193–204
M. Hosomi, H. Yamagishi, and T. Yamamoto, “A Novel Nonvolatile Memory with Spin Torque Transfer Magnetization Switching: Spin-Ram,” in International Electron Devices Meeting, 2005, pages 459–462
J. Owens, “GPU Architecture Overview,” in Proc. International Conference on Computer Graphics and Interactive Techniques, 2007, Article No. 2
A. Al Maashri, G. Sun, X. Dong, V. Narayanan, and Y. Xie, “3D GPU Architecture Using Cache Stacking: Performance, Cost, Power, and Thermal Analysis,” in Proc. International Conference on Computer Design (ICCD), 2009
Acknowledgment
The work appeared in this chapter was supported in part by NSF grants 0903432; 0702617.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Al Maashri, A., Sun, G., Dong, X., Xie, Y., Vijaykrishnan, N. (2011). Influence of Stacked 3D Memory/Cache Architectures on GPUs. In: Sheibanyrad, A., Pétrot, F., Jantsch, A. (eds) 3D Integration for NoC-based SoC Architectures. Integrated Circuits and Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-7618-5_11
Download citation
DOI: https://doi.org/10.1007/978-1-4419-7618-5_11
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4419-7617-8
Online ISBN: 978-1-4419-7618-5
eBook Packages: EngineeringEngineering (R0)