[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3203217.3203280acmconferencesArticle/Chapter ViewAbstractPublication PagescfConference Proceedingsconference-collections
research-article

Design space exploration for PIM architectures in 3D-stacked memories

Published: 08 May 2018 Publication History

Abstract

Scaling existing architectures to large-scale data-intensive applications is limited by energy and performance losses caused by off-chip memory communication and data movements in the cache hierarchy. Processing-in-Memory (PIM) has been recently revisited to address the issues of memory and power wall, mainly due to the maturity of 3D-stacking manufacturing technology and the increasing demand for bandwidth and parallel access in emerging data-centric applications. Recent studies have shown a wide variety of processing mechanisms to be placed in the logic layer of 3D-stacked memories, not to mention the already available 3D-stacked DRAMs, such as Micron's Hybrid Memory Cube (HMC). Nevertheless, a few studies compare PIM accelerators to each other and have made efforts to indicate the trade-offs between power, area, and performance. In this paper, we review different state-of-the-art 3D-stacked in-memory accelerators, and we analyze them considering important constraints regarding area and power due to critical embedded nature of PIM. Aiming to point in the direction of massive parallel PIM designs, we take the simplest design found in this survey, and we explore the architectural design space to meet the constraints imposed by HMC. Our results show that the most straightforward approach can provide the highest performance while consuming the lowest amount of area and power, which makes it the most suitable design found in this survey for an energy-efficient in-memory accelerator, whether it goes in High-Performance Computing or Embedded Systems. For instance, the outstanding point in the design space indicates that a performance density of 320 GBps/mm2 and a performance efficiency of 0.6 GBps/mW can be achieved in the best scenario, that is, when a massive parallel application reaches the peak bandwidth.

References

[1]
Shaizeen Aga, Supreet Jeloka, Arun Subramaniyan, Satish Narayanasamy, David Blaauw, and Reetuparna Das. 2017. Compute Caches. In Int. Symp. on High Performance Computer Architecture (HPCA). IEEE.
[2]
Junwhan Ahn, Sungpack Hong, Sungjoo Yoo, Onur Mutlu, and Kiyoung Choi. 2015. A scalable processing-in-memory accelerator for parallel graph processing. In Int. Symp. on Computer Architecture (ISCA). IEEE.
[3]
Berkin Akin, Franz Franchetti, and James C Hoe. 2015. Data reorganization in memory using 3D-stacked DRAM. In Computer Architecture (ISCA), 2015 ACM/IEEE 42nd Annual International Symposium on. IEEE, 131--143.
[4]
Marco AZ Alves, Matthias Diener, Paulo C Santos, and Luigi Carro. 2016. Large vector extensions inside the HMC. In Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE.
[5]
Erfan Azarkhish, Davide Rossi, Igor Loi, and Luca Benini. 2016. Design and evaluation of a processing-in-memory architecture for the smart memory cube. In Int. Conf. on Architecture of Computing Systems (ARCS). Springer.
[6]
Tom M. Bruintjes, Karel H. G. Walters, Sabih H. Gerez, Bert Molenkamp, and Gerard J. M. Smit. 2012. Sabrewing: A Lightweight Architecture for Combined Floating-point and Integer Arithmetic. ACM Trans. Architecture and Code Optimization 8, 4 (Jan. 2012).
[7]
Ke Chen, Sheng Li, Naveen Muralimanohar, Jung Ho Ahn, Jay B. Brockman, and Norman P. Jouppi. 2012. CACTI-3DD: Architecture-level Modeling for 3D Die-stacked DRAM Main Memory. In Design, Automation & Test in Europe Conference & Exhibition (DATE).
[8]
Robert H Dennard, Fritz H Gaensslen, V Leo Rideout, Ernest Bassous, and Andre R LeBlanc. 1974. Design of ion-implanted MOSFET's with very small physical dimensions. IEEE Journal of Solid-State Circuits 9, 5 (1974).
[9]
Mario Drumond, Alexandros Daglis, Nooshin Mirzadeh, Dmitrii Ustiugov, Javier Picorel, Babak Falsafi, Boris Grot, and Dionisios Pnevmatikatos. 2017. The mondrian data engine. In Int. Symp. on Computer Architecture. ACM.
[10]
Yasuko Eckert, Nuwan Jayasena, and Gabriel H. Loh. 2014. Thermal Feasibility of Die-Stacked Processing in Memory. In 2nd Workshop on Near-Data Processing (WoNDP).
[11]
Duncan G Elliott, Michael Stumm, W Martin Snelgrove, Christian Cojocaru, and Robert McKenzie. 1999. Computational RAM: Implementing processors in memory. IEEE Design & Test of Computers 16 (1999).
[12]
Mingyu Gao and Christos Kozyrakis. 2016. HRL: efficient and flexible reconfigurable logic for near-data processing. In Int. Symp. High Performance Computer Architecture (HPCA).
[13]
Mingyu Gao, Jing Pu, Xuan Yang, Mark Horowitz, and Christos Kozyrakis. 2017. Tetris: Scalable and efficient neural network acceleration with 3d memory. In Int. Conf. on Architectural Support for Programming Languages and Operating Systems.
[14]
Boncheol Gu, Andre S Yoon, Duck-Ho Bae, Insoon Jo, Jinyoung Lee, Jonghyun Yoon, Jeong-Uk Kang, Moonsang Kwon, Chanho Yoon, Sangyeun Cho, et al. 2016. Biscuit: A framework for near-data processing of big data workloads. In Int. Symp. on Computer Architecture (ISCA). IEEE.
[15]
Byungchul Hong, Gwangsun Kim, Jung Ho Ahn, Yongkee Kwon, Hongsik Kim, and John Kim. 2016. Accelerating linked-list traversal through near-data processing. In Int. Conf. on Parallel Architecture and Compilation Techniques (PACT).
[16]
L. Huang, L. Shen, K. Dai, and Z. Wang. 2007. A New Architecture For Multiple-Precision Floating-Point Multiply-Add Fused Unit Design. In Symposium on Computer Arithmetic (ARITH), 2007.
[17]
Hybrid Memory Cube Consortium. 2013. Hybrid Memory Cube Specification Rev. 2.0. (2013). http://www.hybridmemorycube.org/.
[18]
Joe Jeddeloh and Brent Keeth. 2012. Hybrid memory cube new DRAM architecture increases density and performance. In Int. Symp. on VLSI Technology (VLSIT).
[19]
Yi Kang, Wei Huang, Seung-Moon Yoo, D. Keen, Zhenzhou Ge, V. Lam, P. Pattnaik, and J. Torrellas. 1999. FlexRAM: toward an advanced intelligent memory system. In Int. Conf. on Computer Design: VLSI in Computers and Processors.
[20]
Chad D Kersey, Hyesoon Kim, and Sudhakar Yalamanchili. 2017. Lightweight SIMT core designs for intelligent 3D stacked DRAM. In Int. Symp. on Memory Systems (MEMSYS). ACM.
[21]
Duckhwan Kim, Jaeha Kung, Sek Chai, Sudhakar Yalamanchili, and Saibal Mukhopadhyay. 2016. Neurocube: A programmable digital neuromorphic architecture with high-density 3D memory. In Int. Symp. on Computer Architecture (ISCA). IEEE.
[22]
Sheng Li, Jung Ho Ahn, Richard D Strong, Jay B Brockman, Dean M Tullsen, and Norman P Jouppi. 2009. McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures. In Int. Symp. on Microarchitecture (MICRO-42). IEEE/ACM.
[23]
Gabriel H Loh, Nuwan Jayasena, M Oskin, Mark Nutter, David Roberts, Mitesh Meswani, D Ping Zhang, and Mike Ignatowski. 2013. A processing in memory taxonomy and a case for studying fixed-function pim. In Workshop on Near-Data Processing.
[24]
K. Manolopoulos, D. Reisis, and V.A. Chouliaras. 2016. An Efficient Multiple Precision Floating-point Multiply-Add Fused Unit. Journal of Microelectronic 49 (2016).
[25]
Ravi Nair, Samuel F Antao, Carlo Bertolli, Pradip Bose, Jose R Brunheroto, Tong Chen, C-Y Cher, Carlos HA Costa, Jun Doi, Constantinos Evangelinos, et al. 2015. Active memory cube: A processing-in-memory architecture for exascale systems. IBM Journal of Research and Development 59 (2015).
[26]
Geraldo F. Oliveira, Paulo C. Santos, Marco A.Z. Alves, and Luigi Carro. 2017. NIM: An HMC-Based Machine for Neuron Computation. In Int. Symp. on Applied Reconfigurable Computing (ARC). Springer.
[27]
David Patterson, Thomas Anderson, Neal Cardwell, Richard Fromm, Kimberly Keeton, Christoforos Kozyrakis, Randi Thomas, and Katherine Yelick. 1997. A case for intelligent RAM. IEEE micro 17 (1997).
[28]
J Thomas Pawlowski. 2011. Hybrid memory cube (HMC). In Hot Chips 23 Symposium (HCS). IEEE.
[29]
Seth H Pugsley, Jeffrey Jestes, Huihui Zhang, Rajeev Balasubramonian, Vijayalakshmi Srinivasan, Alper Buyuktosunoglu, Al Davis, and Feifei Li. 2014. NDC: Analyzing the impact of 3D-stacked memory+ logic devices on MapReduce workloads. In Int. Symp. on Performance Analysis of Systems and Software (ISPASS).
[30]
Paulo C. Santos, Geraldo F. Oliveira, Joao P. Lima, Marco A.Z. Alves, Luigi Carro, and Antonio C.S. Beck. 2018. Processing in 3D memories to speed up operations on complex data structures. In Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE.
[31]
Paulo C. Santos, Geraldo F. Oliveira, Diego G. Tomé, Marco A.Z. Alves, Eduardo C. Almeida, and Luigi Carro. 2017. Operand size reconfiguration for big data processing in memory. In Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE.
[32]
Marko Scrbak, Mahzabeen Islam, Krishna M Kavi, Mike Ignatowski, and Nuwan Jayasena. 2017. Exploring the Processing-in-Memory design space. Journal of Systems Architecture 75 (2017).
[33]
Patrick Siegl, Rainer Buchty, and Mladen Berekovic. 2016. Data-Centric Computing Frontiers: A Survey On Processing-In-Memory. In Int. Symp. on Memory Systems (MEMSYS). ACM.
[34]
Christian Weis, Matthias Jung, Peter Ehses, Cristiano Santos, Pascal Vivet, Sven Goossens, Martijn Koedam, and Norbert Wehn. 2015. Retention time measurements and modelling of bit error rates of WIDE I/O DRAM in MPSoCs. In Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE.
[35]
Dongping Zhang, Nuwan Jayasena, Alexander Lyashevsky, Joseph L Greathouse, Lifan Xu, and Michael Ignatowski. 2014. TOP-PIM: throughput-oriented programmable processing in memory. In Int. Symp. on High-performance Parallel and Distributed Computing. ACM.
[36]
Qiuling Zhu, Berkin Akin, H Ekin Sumbul, Fazle Sadi, James C Hoe, Larry Pileggi, and Franz Franchetti. 2013. A 3D-stacked logic-in-memory accelerator for application-specific data intensive computing. In Int. 3D Systems Integration Conference (3DIC). IEEE.

Cited By

View all
  • (2023)Automatic DRAM Subsystem Configuration with iraceProceedings of the DroneSE and RAPIDO: System Engineering for constrained embedded systems10.1145/3579170.3579259(66-72)Online publication date: 17-Jan-2023
  • (2023)Exploiting Heterogeneity in PIM Architectures for Data-Intensive ApplicationsDesigning Modern Embedded Systems: Software, Hardware, and Applications10.1007/978-3-031-34214-1_5(53-64)Online publication date: 11-Jun-2023
  • (2022)pLUTo: Enabling Massively Parallel Computation in DRAM via Lookup Tables2022 55th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO56248.2022.00067(900-919)Online publication date: Oct-2022
  • Show More Cited By

Index Terms

  1. Design space exploration for PIM architectures in 3D-stacked memories

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CF '18: Proceedings of the 15th ACM International Conference on Computing Frontiers
    May 2018
    401 pages
    ISBN:9781450357616
    DOI:10.1145/3203217
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 08 May 2018

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. 3D-stacked memories
    2. energy efficiency
    3. hybrid memory cube
    4. near data processing
    5. processing in memory
    6. vector processing

    Qualifiers

    • Research-article

    Conference

    CF '18
    Sponsor:
    CF '18: Computing Frontiers Conference
    May 8 - 10, 2018
    Ischia, Italy

    Acceptance Rates

    Overall Acceptance Rate 273 of 785 submissions, 35%

    Upcoming Conference

    CF '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)68
    • Downloads (Last 6 weeks)7
    Reflects downloads up to 15 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Automatic DRAM Subsystem Configuration with iraceProceedings of the DroneSE and RAPIDO: System Engineering for constrained embedded systems10.1145/3579170.3579259(66-72)Online publication date: 17-Jan-2023
    • (2023)Exploiting Heterogeneity in PIM Architectures for Data-Intensive ApplicationsDesigning Modern Embedded Systems: Software, Hardware, and Applications10.1007/978-3-031-34214-1_5(53-64)Online publication date: 11-Jun-2023
    • (2022)pLUTo: Enabling Massively Parallel Computation in DRAM via Lookup Tables2022 55th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO56248.2022.00067(900-919)Online publication date: Oct-2022
    • (2022)Video Decoder Improvements with Near-Data Speculative Motion Compensation Processing2022 IEEE International Symposium on Circuits and Systems (ISCAS)10.1109/ISCAS48785.2022.9937683(399-403)Online publication date: 28-May-2022
    • (2021)Providing Plug N' Play for Processing-in-Memory AcceleratorsProceedings of the 26th Asia and South Pacific Design Automation Conference10.1145/3394885.3431527(651-656)Online publication date: 18-Jan-2021
    • (2021)DAMOV: A New Methodology and Benchmark Suite for Evaluating Data Movement BottlenecksIEEE Access10.1109/ACCESS.2021.31109939(134457-134502)Online publication date: 2021
    • (2021)Enabling Near-Data Accelerators Adoption by Through Investigation of Datapath SolutionsInternational Journal of Parallel Programming10.1007/s10766-020-00674-yOnline publication date: 28-Jan-2021
    • (2020)A Classification of Memory-Centric ComputingACM Journal on Emerging Technologies in Computing Systems10.1145/336583716:2(1-26)Online publication date: 30-Jan-2020
    • (2019)A Compiler for Automatic Selection of Suitable Processing-in-Memory Instructions2019 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE.2019.8714956(564-569)Online publication date: Mar-2019
    • (2019)Near-memory computingMicroprocessors & Microsystems10.1016/j.micpro.2019.10286871:COnline publication date: 1-Nov-2019
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media