More Web Proxy on the site http://driver.im/

research-article

Public Access

Quantifying Memory Underutilization in HPC Systems and Using it to Improve Performance via Architecture Support

Authors:

Gagandeep Panwar,

Nathan DeBardeleben,

Binoy Ravindran,

Xun JianAuthors Info & Claims

MICRO '52: Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture

Pages 821 - 835

https://doi.org/10.1145/3352460.3358267

Published: 12 October 2019 Publication History

Abstract

A system's memory size is often dictated by worst-case workloads with highest memory requirements; this causes memory to be underutilized in the common case when the system is not running its worst-case workloads. Cognizant of this memory underutilization problem, many prior works have studied memory utilization and explored how to improve it in the context of cloud.

In this paper, we perform the first large-scale study of system-level memory utilization in the context of HPC systems; through seven million machine-hours of measurements across four HPC systems, we find memory underutilization in HPC systems is much more severe than in cloud. Subsequently, we also perform the first exploration of architectural techniques to improve memory utilization specifically for HPC systems. We propose exposing each compute node's currently unused memory to its CPU(s) via novel architectural support for OS. This can enable many new microarchitecture techniques that use the abundant free memory to boost microarchitecture performance transparently without requiring any user code modification or recompilation; we refer to them as Free-memory-aware Microarchitecture Techniques (FMTs). We then present a detailed example of an FMT -- Free-memory-aware Memory Replication (FMR). On average across five HPC benchmark suites, FMR provides 13% performance and 8% system-level energy improvement compared to a highly optimized baseline representative of modern memory systems. To check the performance benefits our simulation reports, we emulated FMR in a real system and found close corroboration between simulation results and real-system emulation results. The paper ends by discussing other possible FMTs and applicability to other types of systems.

References

[1]

[n. d.]. Graph500. https://graph500.org/.

[2]

[n. d.]. LANL CTS-1 Grizzly - Tundra Extreme Scale, Xeon E5-2695v4 18C 2.1GHz, Intel Omni-Path. https://www.top500.org/system/178972.

[3]

[n. d.]. LINPACK. http://www.netlib.org/linpack/.

[4]

2017. Advanced Configuration and Power Interface Specification Version 6.2. https://uefi.org/sites/default/files/resources/ACPI_6_2.pdf.

[5]

A. Agelastos, B. Allan, J. Brandt, P. Cassella, J. Enos, J. Fullop, A. Gentile, S. Monk, N. Naksinehaboon, J. Ogden, M. Rajan, M. Showerman, J. Stevenson, N. Taerat, and T. Tucker. 2014. Lightweight Distributed Metric Service: A Scalable Infrastructure for Continuous Monitoring of Large Scale Computing Systems and Applications. In Proc. IEEE/ACM International Conference for High Performance Storage, Networking, and Analysis (SC14). IEEE/ACM.

[6]

Jung Ho Ahn, Norman P. Jouppi, Christos Kozyrakis, Jacob Leverich, and Robert S. Schreiber. 2012. Improving System Energy Efficiency with Memory Rank Sub-setting. ACM Trans. Archit. Code Optim. 9, 1, Article 4 (March 2012), 28 pages. https://doi.org/10.1145/2133382.2133386

Digital Library

[7]

D. H. Bailey, E. Barszcz, J. T. Barton, D. S. Browning, R. L. Carter, L. Dagum, R. A. Fatoohi, P. O. Frederickson, T. A. Lasinski, R. S. Schreiber, H. D. Simon, V. Venkatakrishnan, and S. K. Weeratunga. 1991. The NAS Parallel Benchmarks---Summary and Preliminary Results. In Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91). ACM, New York, NY, USA, 158--165. https://doi.org/10.1145/125826.125925

[8]

Ishan Banerjee, Fei Guo, Kiran Tati, and Rajesh Venkatasubramanian. 2013. Memory Overcommitment in the ESX Server. VMWare Technical Journal (2013). https://labs.vmware.com/vmtj/memory-overcommitment-in-the-esx-server.

[9]

Luiz AndrÃľrroso and Urs HÃülzle. 2009. The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines. Synthesis Lectures on Computer Architecture. Morgan&ClayPool Publishers.

[10]

Luiz AndrÃľ Barroso, Urs HÃülzle, and Parthasarathy Ranganathan. 2018. The Datacenter as a Computer: Designing Warehouse-Scale Machines, Third Edition. Synthesis Lectures on Computer Architecture. Morgan&ClayPool Publishers. https://doi.org/10.2200/S00874ED3V01Y201809CAC046

[11]

Scott Beamer, Krste Asanovic, and David A. Patterson. 2015. The GAP Benchmark Suite. CoRR abs/1508.03619 (2015). arXiv:1508.03619 http://arxiv.org/abs/1508.03619

[12]

I. Bhati, Z. Chishti, S. L. Lu, and B. Jacob. 2015. Flexible auto-refresh: Enabling scalable and energy-efficient DRAM refresh reductions. In 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA). 235--246. https://doi.org/10.1145/2749469.2750408

[13]

Nathan Binkert, Bradford Beckmann, Gabriel Black, Steven K. Reinhardt, Ali Saidi, Arkaprava Basu, Joel Hestness, Derek R. Hower, Tushar Krishna, Somayeh Sardashti, Rathijit Sen, Korey Sewell, Muhammad Shoaib, Nilay Vaish, Mark D. Hill, and David A. Wood. 2011. The Gem5 Simulator. SIGARCH Comput. Archit. News 39, 2 (Aug. 2011), 1--7. https://doi.org/10.1145/2024716.2024718

Digital Library

[14]

John Blackwood. 2012. An Overview of Kernel Text Page Replication in RedHawk Linux 6.3. https://www.concurrent-rt.com/wp-content/uploads/2016/11/kernel-page-replication.pdf.

[15]

Yuri Bubly. 2019. AMD Ryzen Memory Tweaking & Overclocking Guide. https://www.techpowerup.com/review/amd-ryzen-memory-tweaking-overclocking-guide/9.html.

[16]

Karthik Chandrasekar, Christian Weis, Yonghui Li, Sven Goossens, Matthias Jung, Omar Naji, Benny Akesson, Norbert Wehn, and Kees Goossens. [n. d.]. DRAMPower: Open-source DRAM Power & Energy Estimation Tool. URL: http://www.drampower.info.

[17]

K. K. Chang, D. Lee, Z. Chishti, A. R. Alameldeen, C. Wilkerson, Y. Kim, and O. Mutlu. 2014. Improving DRAM performance by parallelizing refreshes with accesses. In 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA). 356--367. https://doi.org/10.1109/HPCA.2014.6835946

[18]

Sylvester Cash Charles Stephan, Alicia Boozer. 2016. Optimizing Memory Performance of Lenovo Servers Based on Intel Xeon E7 v3 Processors. https://lenovopress.com/lp0048.pdf.

[19]

N. Chatterjee, N. Muralimanohar, R. Balasubramonian, A. Davis, and N. P. Jouppi. 2012. Staged Reads: Mitigating the impact of DRAM writes on DRAM reads. In IEEE International Symposium on High-Performance Comp Architecture. 1--12. https://doi.org/10.1109/HPCA.2012.6168943

Digital Library

[20]

W. Chen, K. Ye, Y. Wang, G. Xu, and C. Xu. 2018. How Does the Workload Look Like in Production Cloud? Analysis and Clustering of Workloads on Alibaba Cluster Trace. In 2018 IEEE 24th International Conference on Parallel and Distributed Systems (ICPADS). 102--109. https://doi.org/10.1109/PADSW.2018.8644579

[21]

Jonathan Corbet. 2010. Memory compaction. https://lwn.net/Articles/368869/.

[22]

Neo Cui. 2017. Demonstrating the Memory RAS Features of Lenovo ThinkSystem Servers. https://lenovopress.com/lp0778.pdf.

[23]

Christina Delimitrou and Christos Kozyrakis. 2014. Quasar: Resource-efficient and QoS-aware Cluster Management. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '14). ACM, New York, NY, USA, 127--144. https://doi.org/10.1145/2541940.2541941

Digital Library

[24]

S. Di, D. Kondo, and W. Cirne. 2012. Characterization and Comparison of Cloud versus Grid Workloads. In 2012 IEEE International Conference on Cluster Computing. 230--238. https://doi.org/10.1109/CLUSTER.2012.35

Digital Library

[25]

Jack Dongarra, Michael A Heroux, and Piotr Luszczek. 2016. High-performance Conjugate-gradient Benchmark. Int. J. High Perform. Comput. Appl. 30, 1 (Feb. 2016), 3--10. https://doi.org/10.1177/1094342015593158

[26]

L. Ecco and R. Ernst. 2017. Tackling the Bus Turnaround Overhead in Real-Time SDRAM Controllers. IEEE Trans. Comput. 66, 11 (Nov 2017), 1961--1974. https://doi.org/10.1109/TC.2017.2714672

[27]

M. J. Feeley, W. E. Morgan, E. P. Pighin, A. R. Karlin, H. M. Levy, and C. A. Thekkath. 1995. Implementing Global Memory Management in a Workstation Cluster. In Proceedings of the Fifteenth ACM Symposium on Operating Systems Principles (SOSP '95). ACM, New York, NY, USA, 201--212. https://doi.org/10.1145/224056.224072

[28]

Adi Fuchs and David Wentzlaff. 2018. Scaling Datacenter Accelerators with Compute-reuse Architectures. In Proceedings of the 45th Annual International Symposium on Computer Architecture (ISCA '18). IEEE Press, Piscataway, NJ, USA, 353--366. https://doi.org/10.1109/ISCA.2018.00038

Digital Library

[29]

Fabien Gaud, Baptiste Lepers, Justin Funston, Mohammad Dashti, Alexandra Fedorova, Vivien Quéma, Renaud Lachaize, and Mark Roth. 2015. Challenges of Memory Management on Modern NUMA Systems. Commun. ACM 58, 12 (Nov. 2015), 59--66. https://doi.org/10.1145/2814328

[30]

Dave Glen. 2014. Optimized Client Computing With Dynamic Write Acceleration. https://www.micron.com/~/media/documents/products/technical-marketing-brief/brief_ssd_dynamic_write_accel.pdf.

[31]

Mel Gorman. 2010. Memory Compaction v1. https://lwn.net/Articles/368854/.

[32]

M. Gottscho, S. Govindan, B. Sharma, M. Shoaib, and P. Gupta. 2016. X-Mem: A cross-platform and extensible memory characterization tool for the cloud. In 2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). 263--273. https://doi.org/10.1109/ISPASS.2016.7482101

[33]

Daniel Gruss, Clémentine Maurice, and Stefan Mangard. 2016. Rowhammer.Js: A Remote Software-Induced Fault Attack in JavaScript. In Proceedings of the 13th International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment - Volume 9721 (DIMVA 2016). Springer-Verlag New York, Inc., New York, NY, USA, 300--321. https://doi.org/10.1007/978-3-319-40667-1_15

Digital Library

[34]

Juncheng Gu, Youngmoon Lee, Yiwen Zhang, Mosharaf Chowdhury, and Kang G. Shin. 2017. Efficient Memory Disaggregation with Infiniswap. In 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 17). USENIX Association, Boston, MA, 649--667. https://www.usenix.org/conference/nsdi17/technical-sessions/presentation/gu

Digital Library

[35]

HP. [n. d.]. CACTI. https://www.hpl.hp.com/research/cacti/.

[36]

Intel. [n. d.]. Intel Xeon Processor E7 Family: Reliability, Availability, and Serviceability: Advanced data integrity and resiliency support for mission-critical deployments. https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/xeon-e7-family-ras-server-paper.pdf.

[37]

Intel. 2017. 5-Level Paging and 5-Level EPT. https://software.intel.com/sites/default/files/managed/2b/80/5-level_paging_white_paper.pdf.

[38]

Yeongjin Jang, Jaehyuk Lee, Sangho Lee, and Taesoo Kim. 2017. SGX-Bomb: Locking Down the Processor via Rowhammer Attack. Proceedings of the 2nd Workshop on System Software for Trusted Execution (SysTEX) (October 2017).

Digital Library

[39]

JEDEC. 2009. DDR2 SDRAM SPECIFICATION. https://www.jedec.org/system/files/docs/JESD79-2F.pdf.

[40]

JEDEC. 2012. DDR3 SDRAM SPECIFICATION. https://www.jedec.org/sites/default/files/docs/JESD79-3F.pdf.

[41]

JEDEC. 2017. JEDEC STANDARD DDR4 SDRAM JESD79-4B. https://www.jedec.org/standards-documents/docs/jesd79-4a.

[42]

C. Jiang, G. Han, J. Lin, G. Jia, W. Shi, and J. Wan. 2019. Characteristics of Co-Allocated Online Services and Batch Jobs in Internet Data Centers: A Case Study From Alibaba Cloud. IEEE Access 7 (2019), 22495--22508. https://doi.org/10.1109/ACCESS.2019.2897898

[43]

Vincent J. Zimmer Jiewen Yao. 2015. A Tour beyond BIOS Memory Map Design in UEFI BIOS. https://firmware.intel.com/sites/default/files/resources/A_Tour_Beyond_BIOS_Memory_Map_in%20UEFI_BIOS.pdf.

[44]

Samira Khan, Donghyuk Lee, Yoongu Kim, Alaa R. Alameldeen, Chris Wilkerson, and Onur Mutlu. 2014. The Efficacy of Error Mitigation Techniques for DRAM Retention Failures: A Comparative Experimental Study. SIGMETRICS Perform. Eval. Rev. 42, 1 (June 2014), 519--532. https://doi.org/10.1145/2637364.2592000

Digital Library

[45]

Samira Khan, Chris Wilkerson, Zhe Wang, Alaa R. Alameldeen, Donghyuk Lee, and Onur Mutlu. 2017. Detecting and Mitigating Data-dependent DRAM Failures by Exploiting Current Memory Content. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-50 '17). ACM, New York, NY, USA, 27--40. https://doi.org/10.1145/3123939.3123945

Digital Library

[46]

Y. Kim, R. Daly, J. Kim, C. Fallin, J. H. Lee, D. Lee, C. Wilkerson, K. Lai, and O. Mutlu. 2014. Flipping bits in memory without accessing them: An experimental study of DRAM disturbance errors. In 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA). 361--372. https://doi.org/10.1109/ISCA.2014.6853210

[47]

Yoongu Kim, Weikun Yang, and Onur Mutlu. 2016. Ramulator: A Fast and Extensible DRAM Simulator. IEEE Comput. Archit. Lett. 15, 1 (Jan. 2016), 45--49. https://doi.org/10.1109/LCA.2015.2414456

Digital Library

[48]

Mark Lanteigne. 2016. How Rowhammer Could Be Used to Exploit Weaknesses in Computer Hardware. (march 2016). http://www.thirdio.com/rowhammer.

[49]

D. Lee, Y. Kim, G. Pekhimenko, S. Khan, V. Seshadri, K. Chang, and O. Mutlu. 2015. Adaptive-latency DRAM: Optimizing DRAM timing for the common-case. In 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA). 489--501. https://doi.org/10.1109/HPCA.2015.7056057

[50]

H. Li and L. Wolters. 2007. Towards A Better Understanding of Workload Dynamics on Data-Intensive Clusters and Grids. In 2007 IEEE International Parallel and Distributed Processing Symposium. 1--10. https://doi.org/10.1109/IPDPS.2007.370250

[51]

Sheng Li, Jung Ho Ahn, Richard D. Strong, Jay B. Brockman, Dean M. Tullsen, and Norman P. Jouppi. 2009. McPAT: An Integrated Power, Area, and Timing Modeling Framework for Multicore and Manycore Architectures. In Proceedings of the 42Nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 42). ACM, New York, NY, USA, 469--480. https://doi.org/10.1145/1669112.1669172

[52]

Kevin Lim, Jichuan Chang, Trevor Mudge, Parthasarathy Ranganathan, Steven K. Reinhardt, and Thomas F. Wenisch. 2009. Disaggregated Memory for Expansion and Sharing in Blade Servers. In Proceedings of the 36th Annual International Symposium on Computer Architecture (ISCA '09). ACM, New York, NY, USA, 267--278. https://doi.org/10.1145/1555754.1555789

[53]

K. Lim, Y. Turner, J. R. Santos, A. AuYoung, J. Chang, P. Ranganathan, and T. F. Wenisch. 2012. System-level implications of disaggregated memory. In IEEE International Symposium on High-Performance Comp Architecture. 1--12. https://doi.org/10.1109/HPCA.2012.6168955

Digital Library

[54]

Ben Lin, Michael Healy, Rustam Miftakhutdinov, Phil Emma, and Yale Patt. 2018. Mitigating Off-Chip Memory Bank and Bank Group Conflicts via Data Duplication. In Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO '18).

Digital Library

[55]

C. Lu, K. Ye, G. Xu, C. Xu, and T. Bai. 2017. Imbalance in the cloud: An analysis on Alibaba cluster trace. In 2017 IEEE International Conference on Big Data (Big Data). 2884--2892. https://doi.org/10.1109/BigData.2017.8258257

[56]

Abdelhafid Mazouz, Alexandre Laurent, Benoît Pradelle, and William Jalby. 2014. Evaluation of CPU frequency transition latency. Computer Science - Research and Development 29, 3 (01 Aug 2014), 187--195. https://doi.org/10.1007/s00450-013-0240-x

[57]

MICRON. 2015. DDR4 SDRAM LRDIMM MTA72ASS4G72LZ - 32GB. https://www.micron.com/-/media/documents/products/data-sheet/modules/lrdimm/ddr4/ass72c4gx72lz.pdf.

[58]

MICRON. 2017. 8Gb: x4, x8, x16 DDR4 SDRAM. https://classes.engineering.wustl.edu/permanant/cse260m/images/0/0c/8Gb_DDR4_SDRAM.pdf.

[59]

MICRON. 2017. DDR4 SDRAM UDIMM MTA18ASF2G72AZ - 16GB. https://www.micron.com/-/media/documents/products/data-sheet/modules/unbuffered_dimm/ddr4/asf18c2gx72az.pdf.

[60]

MICRON. 2019. DDR4 SDRAM RDIMM: MTA144ASQ16G72PSZ - 128GB. https://www.micron.com/-/media/client/global/documents/products/data-sheet/modules/lrdimm/ddr4/asq144c16gx72psz.pdf.

[61]

MICRON. 2019. DDR4 SDRAM RDIMM: MTA36ASF4G72PZ - 32GB. https://www.micron.com/-/media/documents/products/data-sheet/modules/rdimm/ddr4/asf36c4gx72pz.pdf.

[62]

K. Nguyen, K. Lyu, X. Meng, V. Sridharan, and X. Jian. 2018. Nonblocking Memory Refresh. In 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). 588--599. https://doi.org/10.1109/ISCA.2018.00055

[63]

K. Nguyen, K. Lyu, X. Meng, V. Sridharan, and X. Jian. 2019. Nonblocking DRAM Refresh. IEEE Micro 39, 3 (May 2019), 103--109. https://doi.org/10.1109/MM.2019.2907486

Digital Library

[64]

Minesh Patel, Jeremie S. Kim, and Onur Mutlu. 2017. The Reach Profiler (REAPER): Enabling the Mitigation of DRAM Retention Failures via Profiling at Aggressive Conditions. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA '17). ACM, New York, NY, USA, 255--268. https://doi.org/10.1145/3079856.3080242

Digital Library

[65]

J. Thomas Pawlowski. 2018. In-person Interview.

[66]

J. Thomas Pawlowski. 2019. Email Interview.

[67]

Erez Perelman, Greg Hamerly, Michael Van Biesbrouck, Timothy Sherwood, and Brad Calder. 2003. Using SimPoint for Accurate and Efficient Simulation. In Proceedings of the 2003 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS '03). ACM, New York, NY, USA, 318--319. https://doi.org/10.1145/781027.781076

Digital Library

[68]

Peter Pessl, Daniel Gruss, Clémentine Maurice, Michael Schwarz, and Stefan Mangard. 2016. DRAMA: Exploiting DRAM Addressing for Cross-CPU Attacks. In 25th USENIX Security Symposium (USENIX Security 16). USENIX Association, Austin, TX, 565--581. https://www.usenix.org/conference/usenixsecurity16/technical-sessions/presentation/pessl

Digital Library

[69]

Peter Pessl, Daniel Gruss, Clémentine Maurice, Michael Schwarz, and Stefan Mangard. 2016. DRAMA: Exploiting DRAM Addressing for Cross-CPU Attacks. In 25th USENIX Security Symposium (USENIX Security 16). USENIX Association, Austin, TX, 565--581. https://www.usenix.org/conference/usenixsecurity16/technical-sessions/presentation/pessl

Digital Library

[70]

Moinuddin K. Qureshi, Michele M. Franceschini, Luis A. Lastras-Montaño, and John P. Karidis. 2010. Morphable Memory System: A Robust Architecture for Exploiting Multi-level Phase Change Memories. SIGARCH Comput. Archit. News 38, 3 (June 2010), 153--162. https://doi.org/10.1145/1816038.1815981

Digital Library

[71]

M. K. Qureshi, D. H. Kim, S. Khan, P. J. Nair, and O. Mutlu. 2015. AVATAR: A Variable-Retention-Time (VRT) Aware Refresh for DRAM Systems. In 2015 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks. 427--437. https://doi.org/10.1109/DSN.2015.58

[72]

redhat. 2019. CACHE LIMITATIONS WITH NFS. https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/storage_administration_guide/fscachelimitnfs.

[73]

Charles Reiss, Alexey Tumanov, Gregory R. Ganger, Randy H. Katz, and Michael A. Kozuch. 2012. Heterogeneity and Dynamicity of Clouds at Scale: Google Trace Analysis. In Proceedings of the Third ACM Symposium on Cloud Computing (SoCC '12). ACM, New York, NY, USA, Article 7, 13 pages. https://doi.org/10.1145/2391229.2391236

[74]

Jeffrey Stuecheli, Dimitris Kaseridis, David Daly, Hillery C. Hunter, and Lizy K. John. 2010. The Virtual Write Queue: Coordinating DRAM and Last-level Cache Policies. SIGARCH Comput. Archit. News 38, 3 (June 2010), 72--82. https://doi.org/10.1145/1816038.1815972

Digital Library

[75]

G. Tziantzioulis, N. Hardavellas, and S. Campanoni. 2018. Temporal Approximate Function Memoization. IEEE Micro 38, 4 (Jul 2018), 60--70. https://doi.org/10.1109/MM.2018.043191126

[76]

Jingjing Wang and Magdalena Balazinska. 2017. Elastic Memory Management for Cloud Data Analytics. In Proceedings of the 2017 USENIX Conference on Usenix Annual Technical Conference (USENIX ATC '17). USENIX Association, Berkeley, CA, USA, 745--758. http://dl.acm.org/citation.cfm?id=3154690.3154760

[77]

T. F. Wenisch, M. Ferdman, A. Ailamaki, B. Falsafi, and A. Moshovos. 2010. Making Address-Correlated Prefetching Practical. IEEE Micro 30, 1 (Jan 2010), 50--59. https://doi.org/10.1109/MM.2010.21

Digital Library

[78]

Wikichip. [n. d.]. Skylake (server) - Microarchitectures - Intel. https://en.wikichip.org/wiki/intel/microarchitectures/skylake_(server).

[79]

Guowei Zhang and Daniel Sanchez. 2018. Leveraging Hardware Caches for Memoization. IEEE Comput. Archit. Lett. 17, 1 (Jan. 2018), 59--63. https://doi.org/10.1109/LCA.2017.2762308

Digital Library

[80]

Darko Zivanovic, Milan Pavlovic, Milan Radulovic, Hyunsung Shin, Jongpil Son, Sally A. Mckee, Paul M. Carpenter, Petar Radojković, and Eduard Ayguadé. 2017. Main Memory in HPC: Do We Need More or Could We Live with Less? ACM Trans. Archit. Code Optim. 14, 1, Article 3 (March 2017), 26 pages. https://doi.org/10.1145/3023362

Digital Library

Cited By

Wahlgren JSchieffer GGokhale MPearce RPeng I(2024)Disaggregated Memory with SmartNIC Offloading: a Case Study on Graph Processing2024 IEEE 36th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)10.1109/SBAC-PAD63648.2024.00022(159-169)Online publication date: 13-Nov-2024
https://doi.org/10.1109/SBAC-PAD63648.2024.00022
Kwon OLee YPark JJang STak BHong S(2024)Distributed Page Table: Harnessing Physical Memory as an Unbounded Hashed Page Table2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00013(36-49)Online publication date: 2-Nov-2024
https://doi.org/10.1109/MICRO61859.2024.00013
Panwar GLaghari MChoukse EJian X(2024)DyLeCT: Achieving Huge-page-like Translation Performance for Hardware-compressed Memory2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00085(1129-1143)Online publication date: 29-Jun-2024
https://doi.org/10.1109/ISCA59077.2024.00085
Show More Cited By

Index Terms

Quantifying Memory Underutilization in HPC Systems and Using it to Improve Performance via Architecture Support

Recommendations

Phase-Change Technology and the Future of Main Memory

Phase-change memory may enable continued scaling of main memories, but PCM has higher access latencies, incurs higher power costs, and wears out more quickly than DRAM. This article discusses how to mitigate these limitations through buffer sizing, row ...
A survey on techniques for improving Phase Change Memory (PCM) lifetime
Abstract
PCMs are Non-Volatile Memories (NVMs) that store data using phase-change semiconductors, such as silicon-chalcogenide glass. In addition to increased integration density, PCMs have high durability and data transfer rates and consume less power ...
Multi-level queue NVM/DRAM hybrid memory management with language runtime support
RACS '15: Proceedings of the 2015 Conference on research in adaptive and convergent systems

Non-volatile memory devices (NVM) devices, such as PCM, STT-MRAM, and ReRAM, enable the integration of secondary storage into main memory. This integration reduces I/O access to slow block devices; however, it is currently unrealistic to construct a ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

MICRO '52: Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture

October 2019

1104 pages

ISBN:9781450369381

DOI:10.1145/3352460

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 October 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

U.S. Department of En- ergy/National Nuclear Security Administration
Office of Naval Research
National Science Foundation

Conference

MICRO '52

Sponsor:

SIGMICRO

MICRO '52: The 52nd Annual IEEE/ACM International Symposium on Microarchitecture

October 12 - 16, 2019

OH, Columbus, USA

Acceptance Rates

Overall Acceptance Rate 484 of 2,242 submissions, 22%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

24
Total Citations
View Citations
1,609
Total Downloads

Downloads (Last 12 months)653
Downloads (Last 6 weeks)57

Reflects downloads up to 11 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Wahlgren JSchieffer GGokhale MPearce RPeng I(2024)Disaggregated Memory with SmartNIC Offloading: a Case Study on Graph Processing2024 IEEE 36th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)10.1109/SBAC-PAD63648.2024.00022(159-169)Online publication date: 13-Nov-2024
https://doi.org/10.1109/SBAC-PAD63648.2024.00022
Kwon OLee YPark JJang STak BHong S(2024)Distributed Page Table: Harnessing Physical Memory as an Unbounded Hashed Page Table2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00013(36-49)Online publication date: 2-Nov-2024
https://doi.org/10.1109/MICRO61859.2024.00013
Panwar GLaghari MChoukse EJian X(2024)DyLeCT: Achieving Huge-page-like Translation Performance for Hardware-compressed Memory2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00085(1129-1143)Online publication date: 29-Jun-2024
https://doi.org/10.1109/ISCA59077.2024.00085
Copik MChrapek MSchmid LCalotoiu AHoefler T(2024)Software Resource Disaggregation for HPC with Serverless Computing2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS57955.2024.00021(139-156)Online publication date: 27-May-2024
https://doi.org/10.1109/IPDPS57955.2024.00021
Lee JJung WKim DKim DLee JKim J(2024)Agile-DRAM: Agile Trade-Offs in Memory Capacity, Latency, and Energy for Data Centers2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00089(1141-1153)Online publication date: 2-Mar-2024
https://doi.org/10.1109/HPCA57654.2024.00089
Keßler RVolpert SWesner S(2024)Towards Improving Resource Allocation for Multi-Tenant HPC Systems: An Exploratory HPC Cluster Utilization Case Study2024 IEEE International Conference on Cluster Computing Workshops (CLUSTER Workshops)10.1109/CLUSTERWorkshops61563.2024.00019(66-75)Online publication date: 24-Sep-2024
https://doi.org/10.1109/CLUSTERWorkshops61563.2024.00019
Li JMichelogiannakis GMaloney SCook BSuarez EShalf JChen Y(2024)Job Scheduling in High Performance Computing Systems with Disaggregated Memory Resources2024 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER59578.2024.00033(297-309)Online publication date: 24-Sep-2024
https://doi.org/10.1109/CLUSTER59578.2024.00033
Nambiar VSuresh K(2024)A Scalable Distributed Computation Framework for Tackling Underutilization and Ad-Hoc Computations in Heterogenous ClustersDeep Sciences for Computing and Communications10.1007/978-3-031-68908-6_6(73-80)Online publication date: 29-Sep-2024
https://doi.org/10.1007/978-3-031-68908-6_6
Zacarias FCarpenter PPetrucci V(2023)Dynamic Memory Provisioning on Disaggregated HPC SystemsProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624174(973-982)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3624062.3624174
Calciu IImran MPuddu IKashyap SAl Maruf HMutlu OKolli A(2023)Using Local Cache Coherence for Disaggregated Memory SystemsACM SIGOPS Operating Systems Review10.1145/3606557.360656157:1(21-28)Online publication date: 28-Jun-2023
https://dl.acm.org/doi/10.1145/3606557.3606561
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents