Influence of Stacked 3D Memory/Cache Architectures on GPUs

Ahmed Al Maashri⁴,
Guangyu Sun,
Xiangyu Dong,
Yuan Xie &
…
Narayanan Vijaykrishnan

Part of the book series: Integrated Circuits and Systems ((ICIR))

1402 Accesses

Abstract

This chapter investigates the architectural design of a 3D die-stacked Graphics Processing Unit. The investigation includes a discussion of the design space of the system as well as some empirical results that quantify the expected performance gain of such a system. Also, the chapter discusses the cost, power and thermal aspects of the proposed designs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 79.50; Price includes VAT (United Kingdom)

Softcover Book: GBP 99.99; Price includes VAT (United Kingdom)

Hardcover Book: GBP 129.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A Study on Non-volatile 3D Stacked Memory for Big Data Applications

DVFS Space Exploration in Power Constrained Processing-in-Memory Systems

PRO3D, Programming for Future 3D Manycore Architectures: Project’s Interim Status

References

Stanford University CS488a Spring 2007 Real-Time Graphics Architecture, available at: http://graphics.stanford.edu/cs448-07-spring/
R. del Barrio, V. M. Gonzalez, C. Roca, J. Fernandez, and A. Espasa E., “ATTILA: A Cycle-Level Execution-Driven Simulator for Modern GPU Architectures,” in Proc. International Symposium on Performance Analysis of Systems and Software, 2006, pages 231–241
Google Scholar
General-Purpose Computation Using Graphics Hardware, available at: www.gpgpu.com
Nvidia: CUDA Homepage, available at: http://www.nvidia.com/object/cuda_home.html
ATI Stream Software Development Kit (SDK), available at: http://developer.amd.com/gpu/ATIStreamSDK/Pages/default.aspx
GeForce GTX200 Technical Brief, available at: http://www.nvidia.com/docs/IO/55506/GeForce_GTX_200_GPU_Technical_Brief.pdf
Yuh-Fang Tsai, Y. Xie, N. Vijaykrishnan, and M. Jane Irwin, “Three-Dimensional Cache Design Exploration Using 3DCacti,” in Proc. International Conference on Computer Design, 2005, pages 519–524
Google Scholar
N. Govindaraju, S. Larsen, J. Gray, and D. Manocha, “A Memory Model for Scientific Algorithms on Graphics Processors,” in Proc. Conference on High Performance Networking and Computing, 2006. Article No. 89
Google Scholar
N. Goodnight, C. Woolley, G. Lewin, D. Luebke, and G. Humphreys, “A Multigrid Solver for Boundary Value Problems Using Programmable Graphics Hardware,” in Proc. SIGGRAPH/EUROGRAPHICS Conference on Graphics Hardware, 2003, pages 102–111
Google Scholar
K. Fatahalian, J. Sugerman, and P. Hanrahan, “Understanding the Efficiency of GPU Algorithms for Matrix-Matrix Multiplication,” in Proc. SIGGRAPH, 2004, pages 133–137
Google Scholar
CACTI Cache Simulator, available at: http://www.hpl.hp.com/research/cacti/
V. K. Kodavalla, “IP Gate Count Estimation Methodology During Micro-Architecture Phase,” in IP based Electronic System Conference and Exhibition, Dec. 5–6 2007, Grenoble, France, available at: http://www.design-reuse.com/ipbasedsocdesign/slides_2007-32_01.html
ITRS, “International Technology Roadmap for Semiconductors,” available at: www.itrs.net
X. Dong, and Y. Xie, “System-Level Cost Analysis and Design Exploration for 3D ICs,” in Proc. Asia and South Pacific Design Automation Conference, 2009, pages 234–241, Yokohama, Japan
Google Scholar
J. L. Hennessy, and D. A. Patterson, Computer Architecture: A Quantitative Approach. Fourth Edition, Wiley, San Francisco, CA, 2010
Google Scholar
M. Saravana Sibi Govindan, S. W. Keckler, S. R. Nassif, and E. Acar, “A Temperature Aware Power Estimation Methodology,” ASPDAC, January 2008
Google Scholar
K. Skadron, M. R. Stan, W. Velusamy, K. Sankaranarayanan, and D. Tarjan, “Temperature-Aware Microarchitecture,” in Proc. International Symposium on Computer Architecture, 2003, pages 2–13
Article Google Scholar
Attila Project: AttilaWiki, available at: https://attila.ac.upc.edu/wiki/index.php/Main_Page, 2008
OpenGL, available at: http://www.opengl.org/
DirectX Library, available at: http://www.microsoft.com/games/en-US/aboutGFW/pages/directx.aspx
D. Luebke, and G. Humphreys, How GPUs Work, in IEEE Computer, vol. 40, no. 2, pages 126–130, 2007
Article Google Scholar
S. Jones, “2008 IC Economics Report,” in IC Knowledge LLC, 2008, available at: http://www.icknowledge.com/
S. Rodriguez, and B. Jacob, “Energy/power Breakdown of Pipelined Nanometer Caches (90nm/65nm/45nm/32),” in Proc. International Symposium on Low Power Electronics and Design, 2006, pages 25–30
Google Scholar
J. D. Hall, N. Carr, and J. Hart, “Cache and Bandwidth Aware Matrix Multiplication on the GPU,” Technical Report UIUCDCS-R-2003-2328, University of Illinois Urbana-Champain, 2003
Google Scholar
M. Silberstein, A. Schuster, D. Geiger, A. Patney, and J. D. Owens, “Efficient Computation of Sum-Products on GPUs Through Software-Managed Cache,” in Proc. Inter. Conference on Supercomputing, 2008, pages 308–318
Google Scholar
G. Luca Loi, B. Agrawal, N. Srivastava, Sheng-Chih Lin, T. Sherwood, and K. Banerjee, “A Thermally-Aware Performance Analysis of Vertically Integrated (3-D) Processor-Memory Hierarchy,” in Proc. Design Automation Conference, 2006, pages 991–996
Google Scholar
K. Puttaswamy, and G. H. Loh, “Thermal Herding: Microarchitecture Techniques for Controlling Hotspots in High-Performance 3D-Integrated Processors,” in Proc. HPCA, 2007, pages 193–204
Google Scholar
M. Hosomi, H. Yamagishi, and T. Yamamoto, “A Novel Nonvolatile Memory with Spin Torque Transfer Magnetization Switching: Spin-Ram,” in International Electron Devices Meeting, 2005, pages 459–462
Google Scholar
J. Owens, “GPU Architecture Overview,” in Proc. International Conference on Computer Graphics and Interactive Techniques, 2007, Article No. 2
Google Scholar
A. Al Maashri, G. Sun, X. Dong, V. Narayanan, and Y. Xie, “3D GPU Architecture Using Cache Stacking: Performance, Cost, Power, and Thermal Analysis,” in Proc. International Conference on Computer Design (ICCD), 2009
Google Scholar

Download references

Acknowledgment

The work appeared in this chapter was supported in part by NSF grants 0903432; 0702617.

Author information

Authors and Affiliations

The Pennsylvania State University, University Park, PA, USA
Ahmed Al Maashri

Authors

Ahmed Al Maashri
View author publications
You can also search for this author in PubMed Google Scholar
Guangyu Sun
View author publications
You can also search for this author in PubMed Google Scholar
Xiangyu Dong
View author publications
You can also search for this author in PubMed Google Scholar
Yuan Xie
View author publications
You can also search for this author in PubMed Google Scholar
Narayanan Vijaykrishnan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ahmed Al Maashri .

Editor information

Editors and Affiliations

TIMA Laboratory, 46, Avenue Felix Viallet, Grenoble, 38000, France
Abbas Sheibanyrad
TIMA Laboratory, Avenue Felix Viallet 46, Grenoble, 38000, France
Frédéric Pétrot
Royal Institute of Technology, Forum 120, Kista, SE-16440, Sweden
Axel Jantsch

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Al Maashri, A., Sun, G., Dong, X., Xie, Y., Vijaykrishnan, N. (2011). Influence of Stacked 3D Memory/Cache Architectures on GPUs. In: Sheibanyrad, A., Pétrot, F., Jantsch, A. (eds) 3D Integration for NoC-based SoC Architectures. Integrated Circuits and Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-7618-5_11

Download citation

DOI: https://doi.org/10.1007/978-1-4419-7618-5_11
Published: 06 November 2010
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4419-7617-8
Online ISBN: 978-1-4419-7618-5
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Influence of Stacked 3D Memory/Cache Architectures on GPUs

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A Study on Non-volatile 3D Stacked Memory for Big Data Applications

DVFS Space Exploration in Power Constrained Processing-in-Memory Systems

PRO3D, Programming for Future 3D Manycore Architectures: Project’s Interim Status

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Influence of Stacked 3D Memory/Cache Architectures on GPUs

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A Study on Non-volatile 3D Stacked Memory for Big Data Applications

DVFS Space Exploration in Power Constrained Processing-in-Memory Systems

PRO3D, Programming for Future 3D Manycore Architectures: Project’s Interim Status

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation