[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article
Open access

CACTI 7: New Tools for Interconnect Exploration in Innovative Off-Chip Memories

Published: 28 June 2017 Publication History

Abstract

Historically, server designers have opted for simple memory systems by picking one of a few commoditized DDR memory products. We are already witnessing a major upheaval in the off-chip memory hierarchy, with the introduction of many new memory products—buffer-on-board, LRDIMM, HMC, HBM, and NVMs, to name a few. Given the plethora of choices, it is expected that different vendors will adopt different strategies for their high-capacity memory systems, often deviating from DDR standards and/or integrating new functionality within memory systems. These strategies will likely differ in their choice of interconnect and topology, with a significant fraction of memory energy being dissipated in I/O and data movement. To make the case for memory interconnect specialization, this paper makes three contributions.
First, we design a tool that carefully models I/O power in the memory system, explores the design space, and gives the user the ability to define new types of memory interconnects/topologies. The tool is validated against SPICE models, and is integrated into version 7 of the popular CACTI package. Our analysis with the tool shows that several design parameters have a significant impact on I/O power.
We then use the tool to help craft novel specialized memory system channels. We introduce a new relay-on-board chip that partitions a DDR channel into multiple cascaded channels. We show that this simple change to the channel topology can improve performance by 22% for DDR DRAM and lower cost by up to 65% for DDR DRAM. This new architecture does not require any changes to DIMMs, and it efficiently supports hybrid DRAM/NVM systems.
Finally, as an example of a more disruptive architecture, we design a custom DIMM and parallel bus that moves away from the DDR3/DDR4 standards. To reduce energy and improve performance, the baseline data channel is split into three narrow parallel channels and the on-DIMM interconnects are operated at a lower frequency. In addition, this allows us to design a two-tier error protection strategy that reduces data transfers on the interconnect. This architecture yields a performance improvement of 18% and a memory power reduction of 23%.
The cascaded channel and narrow channel architectures serve as case studies for the new tool and show the potential for benefit from re-organizing basic memory interconnects.

Supplementary Material

TACO1402-14 (taco1402-14.pdf)
Slide deck associated with this paper

References

[1]
AMP. 2014. TE DDR2 Connector Model. Retrieved May 6, 2017, from http://www.te.com/documentation/electrical-models/files/slm/DDR2_DIMM_240-Solder_tail.pdf.
[2]
D. H. Bailey, E. Barszcz, J. T. Barton, D. S. Browning, R. L. Carter, D. Dagum, R. A. Fatoohi, et al. 1994. The NAS parallel benchmarks. International Journal of Supercomputer Applications 5, 3, 63--73. http://www.nas.nasa.gov/Software/NPB/.
[3]
G. W. Burr, M. J. Breitwisch, M. Franceschini, D. Garetto, K. Gopalakrishnan, B. Jackson, B. Kurdi, et al. 2010. Phase Change Memory Technology. Retrieved May 6, 2017, from http://arxiv.org/abs/1001.1164v1.
[4]
K. Chandrasekar, C. Weis, Y. Li, S. Goossens, M. Jung, O. Naji, B. Akesson, N. Wehn, and K. Goossens. 2012. DRAMPower: Open Source DRAM Power and Energy Estimation Tool. Technical Report. DRAMPower.
[5]
N. Chatterjee, R. Balasubramonian, M. Shevgoor, S. Pugsley, A. Udipi, A. Shafiee, K. Sudan, M. Awasthi, and Z. Chishti. 2012. USIMM: The Utah Simulated Memory Module. Technical Report UUCS-12-002. University of Utah.
[6]
Dell. 2010. Dell PowerEdge 11th Generation Servers: R810, R910, and M910 Memory Guidance. Retrieved May 6, 2017, from http://goo.gl/30QkU.
[7]
Dell. 2014. Dell PowerEdge R910 Technical Guide. Retrieved May 6, 2017, from http://www.avsys.mx/es/hosting/docs/PowerEdge-R910-Technical-Guide.pdf.
[8]
X. Dong, C. Xu, Y. Xie, and N. P. Jouppi. 2012. NVSim: A Circuit-Level Performance, Energy, and Area Model for Emerging Nonvolatile Memory. Technical Report. Pennsylvania State University.
[9]
ESG Memory Engineering. 2012. Memory for Dell PowerEdge 12th Generation Servers. Retrieved May 6, 2017, from https://www.dell.com/downloads/global/products/pledge/poweredge_12th_generation_server_memory.pdf.
[10]
B. Ganesh, A. Jaleel, D. Wang, and B. Jacob. 2007. Fully-buffered DIMM memory architectures: Understanding mechanisms, overheads, and scaling. In Proceedings of the 2007 HPCA Conference (HPCA’07).
[11]
T. Ham, B. Chelepalli, N. Xue, and B. Lee. 2013. Disintegrated control for energy-efficient and heterogeneous memory systems. In Proceedings of the 2013 HPCA Conference (HPCA’13).
[12]
HP. 2014. HP ProLiant DL580 Gen8 Server Technology. Retrieved May 6, 2017, from http://www.ikt-handel.no/pdf/C5AF9260-A142-4A1C-A4F4-BD9063EBE19A.pdf.
[13]
HP. 2015. HP ProLiant DL580 G7 Server Technology. Retrieved May 6, 2017, from https://www.hpe.com/h20195/V2/getpdf.aspx/c04128284.pdf?ver=45.
[14]
Hybrid Memory Cube. 2013. Hybrid Memory Cube Specification 1.0. Retrieved May 6, 2017, from http://hybridmemorycube.org/files/SiteDownloads/HMC_Specification%201_0.pdf.
[15]
IBIS. 2014. IBIS. Retrieved May 6, 2017, from https://ibis.org/.
[16]
Intel. 2014. Intel C102/C104 Scalable Memory Buffer Datasheet. Retrieved May 6, 2017, from http://www.intel.com/content/dam/www/public/us/en/documents/datasheets/c102-c104-scalable-memory-buffer-datasheet.pdf.
[17]
Intel. 2016. Product Specifications. Retrieved May 6, 2017, from http://ark.intel.com/.
[18]
J. Jeddeloh and B. Keeth. 2012. Hybrid memory cube—new DRAM architecture increases density and performance. In Proceedings of the 2012 Symposium on VLSI Technology.
[19]
N. P. Jouppi, A. B. Kahng, N. Muralimanohar, and V. Srinivas. 2015. CACTI-IO: CACTI with off-chip power-area-timing models. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 23, 7 (2015), 1254--1267.
[20]
Perry Keller. 2012. Understanding the New Bit Error Rate Based DRAM Timing Specifications. Retrieved May 6, 2017, from https://www.jedec.org/sites/default/files/Perry_keller.pdf.
[21]
G. Kim, J. Kim, J. Ahn, and J. Kim. 2013. Memory-centric system interconnect design with hybrid memory cubes. In Proceedings of the 2013 PACT Conference (PACT’13).
[22]
Y. Kim, W. Yang, and O. Mutlu. 2015. Ramulator: A Fast and Extensible DRAM Simulator. Technical Report.
[23]
B. Lee, E. Ipek, O. Mutlu, and D. Burger. 2009b. Architecting phase change memory as a scalable DRAM alternative. In Proceedings of the 2009 ISCA Conference (ISCA’09).
[24]
H. Lee, K.-Y. K. Chang, J.-H. Chun, T. Wu, Y. Frans, B. Leibowitz, N. Nguyen, et al. 2009a. A 16 Gb/s/Link, 64 GB/s bidirectional asymmetric memory interface. IEEE Journal of Solid-State Circuits 44, 4, 1235--1247.
[25]
K. Lim, J. Chang, T. Mudge, P. Ranganathan, S. K. Reinhardt, and T. F. Wenisch. 2009. Disaggregated memory for expansion and sharing in blade servers. In Proceedings of the 2009 ISCA Conference (ISCA’09).
[26]
K. T. Malladi, F. A. Nothaft, K. Periyathambi, B. C. Lee, C. Kozyrakis, and M. Horowitz. 2012. Towards energy-proportional datacenter memory with mobile DRAM. In Proceedings of the 2012 ISCA Conference (ISCA’12).
[27]
Micron. 2005. Calculating Memory System Power for DDR2. Technical Note TN-47-07. Micron.
[28]
Micron. 2006. Micron DDR3 SDRAM Part MT41J256M8. Retrieved from https://www.micron.com/∼/media/documents/products/data-sheet/dram/ddr3/2gb_ddr3_sdram.pdf.
[29]
Micron. 2014. TN-40-03: DDR4 Networking Design Guide. Retrieved May 6, 2017, from https://www.micron.com/∼/media/documents/products/technical-not e/dram/tn_4003_ddr4_network_design_guide.pdf.
[30]
Micron. 2015a. LRDIMM. Retrieved May 6, 2017, from https://www.micron.com/products/dram-modules/lrdimm.
[31]
Micron. 2015b. System Power Calculator Information. Retrieved May 6, 2017, from https://www.micron.com/support/tools-and-utilities/power-calc.
[32]
N. Muralimanohar, R. Balasubramonian, and N. Jouppi. 2007. Optimizing NUCA organizations and wiring alternatives for large caches with CACTI 6.0. In Proceedings of the 2007 MICRO Conference (MICRO’07).
[33]
R. Myslewski. 2014. HP busts out new ProLiant rack mount based on Intel’s new top o’line server chippery. Retrieved May 6, 2017, from http://www.theregister.co.uk/2014/02/19/hp_busts_out_new_proliant_rack_ mount_based_on_intels_new_top_o_line_server_chippery/.
[34]
Netlist. 2012. HyperCloud Memory Outperforms LRDIMM in Big Data and Big Mem Applications. Retrieved May 6, 2017, from http://www.marketwired.com/press-release/hypercloud-memory-outperforms-lrdimm-in-big-data-big-memory-applications-nasdaq-nlst-1628259.htm.
[35]
F. O’Mahony, J. Kennedy, J. E. Jaussi, and B. Casper. 2010. A 47x10Gb/s 1.4mW/(Gb/s) parallel interface in 45nm CMOS. In Proceedings of the 2010 IEEE ISSCC Conference (ISSCC’10).
[36]
J. Ousterhout, P. Agrawal, D. Erickson, C. Kozyrakis, J. Leverich, D. Mazieres, S. Mitra, et al. 2009. The case for RAMClouds: Scalable high-performance storage entirely in DRAM. ACM SIGOPS Operating Systems Review 43, 4, 92--105.
[37]
R. Palmer, J. Poulton, A. Fuller, J. Chen, and J. Zerbe. 2008. Design considerations for low-power high-performance mobile logic and memory interfaces. In Proceedings of the 2008 IEEE ASSCC Conference (ASSCC’08).
[38]
T. Pawlowski. 2011. Hybrid memory cube (HMC). In Proceedings of the 2011 HotChips Conference (HotChips’11).
[39]
T. Pawlowski. 2014. The future of memory technology. Keynote presented at the Memory Forum.
[40]
J. Poulton, R. Palmer, A. M. Fuller, T. Greer, J. Eyles, W. J. Dally, and M. Horowitz. 2009. A 14mW 6.25-Gb/s transceiver in 90nm CMOS. IEEE Journal of Solid State Circuits 42, 12, 2745--2757.
[41]
M. Qureshi, V. Srinivasan, and J. Rivers. 2009. Scalable high performance main memory system using phase-change memory technology. In Proceedings of the 2009 ISCA Conference (ISCA’09).
[42]
L. Ramos, E. Gorbatov, and R. Bianchini. 2011. Page placement in hybrid memory systems. In Proceedings of the 2011 ICS Conference (ICS’11).
[43]
SanDisk. 2014. SanDisk Announces ULLtraDIMM Design Win with Huawei. Retrieved May 6, 2017, from https://www.sandisk.com/about/media-center/press-releases/2014/sandisk-announces-ulltradimm-design-win-with-huawei.
[44]
SAP. 2013. SAP HANA In-Memory Computing Community. Retrieved May 6, 2017, from http://scn.sap.com/http://scn.sap.com/community/hana-in-memorycommunity/hana-in-memory.
[45]
SAS. 2013. In-Memory Analytics. Retrieved May 6, 2017, from http://www.sas.com/en_us/software/in-memory-analytics.html.
[46]
D. B. Strukov, G. S. Snider, D. R. Stewart, and R. Williams. 2008. The missing memristor found. Nature 453, 80--83.
[47]
Jeffrey Stuecheli. 2014. Power Technology for a Smarter Future. Available at https://www.ibm.com.
[48]
Supermicro. 2015. Supermicro Solutions. Retrieved May 6, 2017, from http://www.supermicro.com/products/nfo/Xeon_X10_E5.cfm.
[49]
G. A. Van Huben, K. D. Lamb, R. B. Tremaine, B. S. Aleman, S. M. Rubow, S. H. Rider, W. E. Maule, and M. E. Wazlowski. 2012. Server-class DDR3 SDRAM memory buffer chip. IBM Journal of Research and Development 56, 1, 32--42.
[50]
T. Vogelsang. 2010. Understanding the energy consumption of dynamic random access memories. In Proceedings of the 2010 MICRO Conference (MICRO’10).
[51]
P. Vogt. 2004. Fully Buffered DIMM (FB-DIMM) Server Memory Architecture: Capacity, Performance, Reliability, and Longevity. Intel Developer Forum.
[52]
D. Wang, B. Ganesh, N. Tuaycharoen, K. Baynes, A. Jaleel, and B. Jacob. 2005. DRAMsim: A memory-system simulator. SIGARCH Computer Architecture News 33, 4, 100--107.
[53]
Wikipedia. 2014. DDR4 SDRAM. Retrieved May 6, 2017, from http://en.wikipedia.org/wiki/DDR4_SDRAM.
[54]
Wind. 2007. Wind River Simics Full System Simulator. Retrieved May 6, 2017, from http://www.windriver.com/products/simics/.
[55]
D. H. Yoon, J. Chang, N. Muralimanohar, and P. Ranganathan. 2012a. BOOM: Enabling mobile memory based low-power server DIMMs. In Proceedings of the 2012 ISCA Conference (ISCA’12).
[56]
H. Yoon, J. Meza, R. Ausavarungnirun, R. A. Harding, O. Mutlu. 2012b. Row buffer locality aware caching policies for hybrid memories. In Proceedings of the 2012 ICCD Conference (ICCD’12).
[57]
M. Zaharia, M. Chowdhury, M. Franklin, S. Shenker, and I. Stoica. 2010. Spark: Cluster computing with working sets. In Proceedings of the 2010 HotCloud Conference (HotCloud’10).
[58]
H. Zheng, J. Lin, Z. Zhang, and Z. Zhu. 2009. Decoupled DIMM: Building high-bandwidth memory system from low-speed DRAM devices. In Proceedings of the 2009 ISCA Conference (ISCA’09).

Cited By

View all
  • (2025)APoX-M: Accelerate deep point cloud analysis via adaptive graph constructionIntegration10.1016/j.vlsi.2024.102313101(102313)Online publication date: Mar-2025
  • (2025)Survey of CPU and memory simulators in computer architecture: A comprehensive analysis including compiler integration and emerging technology applicationsSimulation Modelling Practice and Theory10.1016/j.simpat.2024.103032138(103032)Online publication date: Jan-2025
  • (2024)Dynamic Performance and Power Optimization with Heterogeneous Processing-in-Memory for AI Applications on Edge DevicesMicromachines10.3390/mi1510122215:10(1222)Online publication date: 30-Sep-2024
  • Show More Cited By

Index Terms

  1. CACTI 7: New Tools for Interconnect Exploration in Innovative Off-Chip Memories

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Architecture and Code Optimization
    ACM Transactions on Architecture and Code Optimization  Volume 14, Issue 2
    June 2017
    259 pages
    ISSN:1544-3566
    EISSN:1544-3973
    DOI:10.1145/3086564
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 28 June 2017
    Accepted: 01 March 2017
    Revised: 01 February 2017
    Received: 01 October 2016
    Published in TACO Volume 14, Issue 2

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. DRAM
    2. Memory
    3. NVM
    4. interconnects
    5. tools

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    • NSF
    • Center for Design-Enabled Nanofabrication (C-DEN)

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)590
    • Downloads (Last 6 weeks)86
    Reflects downloads up to 12 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)APoX-M: Accelerate deep point cloud analysis via adaptive graph constructionIntegration10.1016/j.vlsi.2024.102313101(102313)Online publication date: Mar-2025
    • (2025)Survey of CPU and memory simulators in computer architecture: A comprehensive analysis including compiler integration and emerging technology applicationsSimulation Modelling Practice and Theory10.1016/j.simpat.2024.103032138(103032)Online publication date: Jan-2025
    • (2024)Dynamic Performance and Power Optimization with Heterogeneous Processing-in-Memory for AI Applications on Edge DevicesMicromachines10.3390/mi1510122215:10(1222)Online publication date: 30-Sep-2024
    • (2024)BTIP: Branch Triggered Instruction Prefetcher Ensuring TimelinessElectronics10.3390/electronics1321432313:21(4323)Online publication date: 4-Nov-2024
    • (2024)VerSA: Versatile Systolic Array Architecture for Sparse and Dense Matrix MultiplicationsElectronics10.3390/electronics1308150013:8(1500)Online publication date: 15-Apr-2024
    • (2024)PhotonNTT: Energy-Efficient Parallel Photonic Number Theoretic Transform Accelerator2024 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE58400.2024.10546638(1-6)Online publication date: 25-Mar-2024
    • (2024)DAISM: Digital Approximate In-SRAM Multiplier-Based Accelerator for DNN Training and Inference2024 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE58400.2024.10546578(1-6)Online publication date: 25-Mar-2024
    • (2024)A Cascaded ReRAM-based Crossbar Architecture for Transformer Neural Network AccelerationACM Transactions on Design Automation of Electronic Systems10.1145/370103430:1(1-23)Online publication date: 18-Oct-2024
    • (2024)High Performance and Predictable Shared Last-level Cache for Safety-Critical SystemsACM Transactions on Embedded Computing Systems10.1145/368730823:6(1-30)Online publication date: 11-Sep-2024
    • (2024)Sectored DRAM: A Practical Energy-Efficient and High-Performance Fine-Grained DRAM ArchitectureACM Transactions on Architecture and Code Optimization10.1145/3673653Online publication date: 14-Jun-2024
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media