[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3352460.3358267acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
research-article
Public Access

Quantifying Memory Underutilization in HPC Systems and Using it to Improve Performance via Architecture Support

Published: 12 October 2019 Publication History

Abstract

A system's memory size is often dictated by worst-case workloads with highest memory requirements; this causes memory to be underutilized in the common case when the system is not running its worst-case workloads. Cognizant of this memory underutilization problem, many prior works have studied memory utilization and explored how to improve it in the context of cloud.
In this paper, we perform the first large-scale study of system-level memory utilization in the context of HPC systems; through seven million machine-hours of measurements across four HPC systems, we find memory underutilization in HPC systems is much more severe than in cloud. Subsequently, we also perform the first exploration of architectural techniques to improve memory utilization specifically for HPC systems. We propose exposing each compute node's currently unused memory to its CPU(s) via novel architectural support for OS. This can enable many new microarchitecture techniques that use the abundant free memory to boost microarchitecture performance transparently without requiring any user code modification or recompilation; we refer to them as Free-memory-aware Microarchitecture Techniques (FMTs). We then present a detailed example of an FMT -- Free-memory-aware Memory Replication (FMR). On average across five HPC benchmark suites, FMR provides 13% performance and 8% system-level energy improvement compared to a highly optimized baseline representative of modern memory systems. To check the performance benefits our simulation reports, we emulated FMR in a real system and found close corroboration between simulation results and real-system emulation results. The paper ends by discussing other possible FMTs and applicability to other types of systems.

References

[1]
[n. d.]. Graph500. https://graph500.org/.
[2]
[n. d.]. LANL CTS-1 Grizzly - Tundra Extreme Scale, Xeon E5-2695v4 18C 2.1GHz, Intel Omni-Path. https://www.top500.org/system/178972.
[3]
[n. d.]. LINPACK. http://www.netlib.org/linpack/.
[4]
2017. Advanced Configuration and Power Interface Specification Version 6.2. https://uefi.org/sites/default/files/resources/ACPI_6_2.pdf.
[5]
A. Agelastos, B. Allan, J. Brandt, P. Cassella, J. Enos, J. Fullop, A. Gentile, S. Monk, N. Naksinehaboon, J. Ogden, M. Rajan, M. Showerman, J. Stevenson, N. Taerat, and T. Tucker. 2014. Lightweight Distributed Metric Service: A Scalable Infrastructure for Continuous Monitoring of Large Scale Computing Systems and Applications. In Proc. IEEE/ACM International Conference for High Performance Storage, Networking, and Analysis (SC14). IEEE/ACM.
[6]
Jung Ho Ahn, Norman P. Jouppi, Christos Kozyrakis, Jacob Leverich, and Robert S. Schreiber. 2012. Improving System Energy Efficiency with Memory Rank Sub-setting. ACM Trans. Archit. Code Optim. 9, 1, Article 4 (March 2012), 28 pages. https://doi.org/10.1145/2133382.2133386
[7]
D. H. Bailey, E. Barszcz, J. T. Barton, D. S. Browning, R. L. Carter, L. Dagum, R. A. Fatoohi, P. O. Frederickson, T. A. Lasinski, R. S. Schreiber, H. D. Simon, V. Venkatakrishnan, and S. K. Weeratunga. 1991. The NAS Parallel Benchmarks---Summary and Preliminary Results. In Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91). ACM, New York, NY, USA, 158--165. https://doi.org/10.1145/125826.125925
[8]
Ishan Banerjee, Fei Guo, Kiran Tati, and Rajesh Venkatasubramanian. 2013. Memory Overcommitment in the ESX Server. VMWare Technical Journal (2013). https://labs.vmware.com/vmtj/memory-overcommitment-in-the-esx-server.
[9]
Luiz AndrÃľrroso and Urs HÃülzle. 2009. The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines. Synthesis Lectures on Computer Architecture. Morgan&ClayPool Publishers.
[10]
Luiz AndrÃľ Barroso, Urs HÃülzle, and Parthasarathy Ranganathan. 2018. The Datacenter as a Computer: Designing Warehouse-Scale Machines, Third Edition. Synthesis Lectures on Computer Architecture. Morgan&ClayPool Publishers. https://doi.org/10.2200/S00874ED3V01Y201809CAC046
[11]
Scott Beamer, Krste Asanovic, and David A. Patterson. 2015. The GAP Benchmark Suite. CoRR abs/1508.03619 (2015). arXiv:1508.03619 http://arxiv.org/abs/1508.03619
[12]
I. Bhati, Z. Chishti, S. L. Lu, and B. Jacob. 2015. Flexible auto-refresh: Enabling scalable and energy-efficient DRAM refresh reductions. In 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA). 235--246. https://doi.org/10.1145/2749469.2750408
[13]
Nathan Binkert, Bradford Beckmann, Gabriel Black, Steven K. Reinhardt, Ali Saidi, Arkaprava Basu, Joel Hestness, Derek R. Hower, Tushar Krishna, Somayeh Sardashti, Rathijit Sen, Korey Sewell, Muhammad Shoaib, Nilay Vaish, Mark D. Hill, and David A. Wood. 2011. The Gem5 Simulator. SIGARCH Comput. Archit. News 39, 2 (Aug. 2011), 1--7. https://doi.org/10.1145/2024716.2024718
[14]
John Blackwood. 2012. An Overview of Kernel Text Page Replication in RedHawk Linux 6.3. https://www.concurrent-rt.com/wp-content/uploads/2016/11/kernel-page-replication.pdf.
[15]
Yuri Bubly. 2019. AMD Ryzen Memory Tweaking & Overclocking Guide. https://www.techpowerup.com/review/amd-ryzen-memory-tweaking-overclocking-guide/9.html.
[16]
Karthik Chandrasekar, Christian Weis, Yonghui Li, Sven Goossens, Matthias Jung, Omar Naji, Benny Akesson, Norbert Wehn, and Kees Goossens. [n. d.]. DRAMPower: Open-source DRAM Power & Energy Estimation Tool. URL: http://www.drampower.info.
[17]
K. K. Chang, D. Lee, Z. Chishti, A. R. Alameldeen, C. Wilkerson, Y. Kim, and O. Mutlu. 2014. Improving DRAM performance by parallelizing refreshes with accesses. In 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA). 356--367. https://doi.org/10.1109/HPCA.2014.6835946
[18]
Sylvester Cash Charles Stephan, Alicia Boozer. 2016. Optimizing Memory Performance of Lenovo Servers Based on Intel Xeon E7 v3 Processors. https://lenovopress.com/lp0048.pdf.
[19]
N. Chatterjee, N. Muralimanohar, R. Balasubramonian, A. Davis, and N. P. Jouppi. 2012. Staged Reads: Mitigating the impact of DRAM writes on DRAM reads. In IEEE International Symposium on High-Performance Comp Architecture. 1--12. https://doi.org/10.1109/HPCA.2012.6168943
[20]
W. Chen, K. Ye, Y. Wang, G. Xu, and C. Xu. 2018. How Does the Workload Look Like in Production Cloud? Analysis and Clustering of Workloads on Alibaba Cluster Trace. In 2018 IEEE 24th International Conference on Parallel and Distributed Systems (ICPADS). 102--109. https://doi.org/10.1109/PADSW.2018.8644579
[21]
Jonathan Corbet. 2010. Memory compaction. https://lwn.net/Articles/368869/.
[22]
Neo Cui. 2017. Demonstrating the Memory RAS Features of Lenovo ThinkSystem Servers. https://lenovopress.com/lp0778.pdf.
[23]
Christina Delimitrou and Christos Kozyrakis. 2014. Quasar: Resource-efficient and QoS-aware Cluster Management. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '14). ACM, New York, NY, USA, 127--144. https://doi.org/10.1145/2541940.2541941
[24]
S. Di, D. Kondo, and W. Cirne. 2012. Characterization and Comparison of Cloud versus Grid Workloads. In 2012 IEEE International Conference on Cluster Computing. 230--238. https://doi.org/10.1109/CLUSTER.2012.35
[25]
Jack Dongarra, Michael A Heroux, and Piotr Luszczek. 2016. High-performance Conjugate-gradient Benchmark. Int. J. High Perform. Comput. Appl. 30, 1 (Feb. 2016), 3--10. https://doi.org/10.1177/1094342015593158
[26]
L. Ecco and R. Ernst. 2017. Tackling the Bus Turnaround Overhead in Real-Time SDRAM Controllers. IEEE Trans. Comput. 66, 11 (Nov 2017), 1961--1974. https://doi.org/10.1109/TC.2017.2714672
[27]
M. J. Feeley, W. E. Morgan, E. P. Pighin, A. R. Karlin, H. M. Levy, and C. A. Thekkath. 1995. Implementing Global Memory Management in a Workstation Cluster. In Proceedings of the Fifteenth ACM Symposium on Operating Systems Principles (SOSP '95). ACM, New York, NY, USA, 201--212. https://doi.org/10.1145/224056.224072
[28]
Adi Fuchs and David Wentzlaff. 2018. Scaling Datacenter Accelerators with Compute-reuse Architectures. In Proceedings of the 45th Annual International Symposium on Computer Architecture (ISCA '18). IEEE Press, Piscataway, NJ, USA, 353--366. https://doi.org/10.1109/ISCA.2018.00038
[29]
Fabien Gaud, Baptiste Lepers, Justin Funston, Mohammad Dashti, Alexandra Fedorova, Vivien Quéma, Renaud Lachaize, and Mark Roth. 2015. Challenges of Memory Management on Modern NUMA Systems. Commun. ACM 58, 12 (Nov. 2015), 59--66. https://doi.org/10.1145/2814328
[30]
Dave Glen. 2014. Optimized Client Computing With Dynamic Write Acceleration. https://www.micron.com/~/media/documents/products/technical-marketing-brief/brief_ssd_dynamic_write_accel.pdf.
[31]
Mel Gorman. 2010. Memory Compaction v1. https://lwn.net/Articles/368854/.
[32]
M. Gottscho, S. Govindan, B. Sharma, M. Shoaib, and P. Gupta. 2016. X-Mem: A cross-platform and extensible memory characterization tool for the cloud. In 2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). 263--273. https://doi.org/10.1109/ISPASS.2016.7482101
[33]
Daniel Gruss, Clémentine Maurice, and Stefan Mangard. 2016. Rowhammer.Js: A Remote Software-Induced Fault Attack in JavaScript. In Proceedings of the 13th International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment - Volume 9721 (DIMVA 2016). Springer-Verlag New York, Inc., New York, NY, USA, 300--321. https://doi.org/10.1007/978-3-319-40667-1_15
[34]
Juncheng Gu, Youngmoon Lee, Yiwen Zhang, Mosharaf Chowdhury, and Kang G. Shin. 2017. Efficient Memory Disaggregation with Infiniswap. In 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 17). USENIX Association, Boston, MA, 649--667. https://www.usenix.org/conference/nsdi17/technical-sessions/presentation/gu
[35]
HP. [n. d.]. CACTI. https://www.hpl.hp.com/research/cacti/.
[36]
Intel. [n. d.]. Intel Xeon Processor E7 Family: Reliability, Availability, and Serviceability: Advanced data integrity and resiliency support for mission-critical deployments. https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/xeon-e7-family-ras-server-paper.pdf.
[37]
Intel. 2017. 5-Level Paging and 5-Level EPT. https://software.intel.com/sites/default/files/managed/2b/80/5-level_paging_white_paper.pdf.
[38]
Yeongjin Jang, Jaehyuk Lee, Sangho Lee, and Taesoo Kim. 2017. SGX-Bomb: Locking Down the Processor via Rowhammer Attack. Proceedings of the 2nd Workshop on System Software for Trusted Execution (SysTEX) (October 2017).
[39]
JEDEC. 2009. DDR2 SDRAM SPECIFICATION. https://www.jedec.org/system/files/docs/JESD79-2F.pdf.
[40]
JEDEC. 2012. DDR3 SDRAM SPECIFICATION. https://www.jedec.org/sites/default/files/docs/JESD79-3F.pdf.
[41]
JEDEC. 2017. JEDEC STANDARD DDR4 SDRAM JESD79-4B. https://www.jedec.org/standards-documents/docs/jesd79-4a.
[42]
C. Jiang, G. Han, J. Lin, G. Jia, W. Shi, and J. Wan. 2019. Characteristics of Co-Allocated Online Services and Batch Jobs in Internet Data Centers: A Case Study From Alibaba Cloud. IEEE Access 7 (2019), 22495--22508. https://doi.org/10.1109/ACCESS.2019.2897898
[43]
Vincent J. Zimmer Jiewen Yao. 2015. A Tour beyond BIOS Memory Map Design in UEFI BIOS. https://firmware.intel.com/sites/default/files/resources/A_Tour_Beyond_BIOS_Memory_Map_in%20UEFI_BIOS.pdf.
[44]
Samira Khan, Donghyuk Lee, Yoongu Kim, Alaa R. Alameldeen, Chris Wilkerson, and Onur Mutlu. 2014. The Efficacy of Error Mitigation Techniques for DRAM Retention Failures: A Comparative Experimental Study. SIGMETRICS Perform. Eval. Rev. 42, 1 (June 2014), 519--532. https://doi.org/10.1145/2637364.2592000
[45]
Samira Khan, Chris Wilkerson, Zhe Wang, Alaa R. Alameldeen, Donghyuk Lee, and Onur Mutlu. 2017. Detecting and Mitigating Data-dependent DRAM Failures by Exploiting Current Memory Content. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-50 '17). ACM, New York, NY, USA, 27--40. https://doi.org/10.1145/3123939.3123945
[46]
Y. Kim, R. Daly, J. Kim, C. Fallin, J. H. Lee, D. Lee, C. Wilkerson, K. Lai, and O. Mutlu. 2014. Flipping bits in memory without accessing them: An experimental study of DRAM disturbance errors. In 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA). 361--372. https://doi.org/10.1109/ISCA.2014.6853210
[47]
Yoongu Kim, Weikun Yang, and Onur Mutlu. 2016. Ramulator: A Fast and Extensible DRAM Simulator. IEEE Comput. Archit. Lett. 15, 1 (Jan. 2016), 45--49. https://doi.org/10.1109/LCA.2015.2414456
[48]
Mark Lanteigne. 2016. How Rowhammer Could Be Used to Exploit Weaknesses in Computer Hardware. (march 2016). http://www.thirdio.com/rowhammer.
[49]
D. Lee, Y. Kim, G. Pekhimenko, S. Khan, V. Seshadri, K. Chang, and O. Mutlu. 2015. Adaptive-latency DRAM: Optimizing DRAM timing for the common-case. In 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA). 489--501. https://doi.org/10.1109/HPCA.2015.7056057
[50]
H. Li and L. Wolters. 2007. Towards A Better Understanding of Workload Dynamics on Data-Intensive Clusters and Grids. In 2007 IEEE International Parallel and Distributed Processing Symposium. 1--10. https://doi.org/10.1109/IPDPS.2007.370250
[51]
Sheng Li, Jung Ho Ahn, Richard D. Strong, Jay B. Brockman, Dean M. Tullsen, and Norman P. Jouppi. 2009. McPAT: An Integrated Power, Area, and Timing Modeling Framework for Multicore and Manycore Architectures. In Proceedings of the 42Nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 42). ACM, New York, NY, USA, 469--480. https://doi.org/10.1145/1669112.1669172
[52]
Kevin Lim, Jichuan Chang, Trevor Mudge, Parthasarathy Ranganathan, Steven K. Reinhardt, and Thomas F. Wenisch. 2009. Disaggregated Memory for Expansion and Sharing in Blade Servers. In Proceedings of the 36th Annual International Symposium on Computer Architecture (ISCA '09). ACM, New York, NY, USA, 267--278. https://doi.org/10.1145/1555754.1555789
[53]
K. Lim, Y. Turner, J. R. Santos, A. AuYoung, J. Chang, P. Ranganathan, and T. F. Wenisch. 2012. System-level implications of disaggregated memory. In IEEE International Symposium on High-Performance Comp Architecture. 1--12. https://doi.org/10.1109/HPCA.2012.6168955
[54]
Ben Lin, Michael Healy, Rustam Miftakhutdinov, Phil Emma, and Yale Patt. 2018. Mitigating Off-Chip Memory Bank and Bank Group Conflicts via Data Duplication. In Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO '18).
[55]
C. Lu, K. Ye, G. Xu, C. Xu, and T. Bai. 2017. Imbalance in the cloud: An analysis on Alibaba cluster trace. In 2017 IEEE International Conference on Big Data (Big Data). 2884--2892. https://doi.org/10.1109/BigData.2017.8258257
[56]
Abdelhafid Mazouz, Alexandre Laurent, Benoît Pradelle, and William Jalby. 2014. Evaluation of CPU frequency transition latency. Computer Science - Research and Development 29, 3 (01 Aug 2014), 187--195. https://doi.org/10.1007/s00450-013-0240-x
[57]
MICRON. 2015. DDR4 SDRAM LRDIMM MTA72ASS4G72LZ - 32GB. https://www.micron.com/-/media/documents/products/data-sheet/modules/lrdimm/ddr4/ass72c4gx72lz.pdf.
[58]
MICRON. 2017. 8Gb: x4, x8, x16 DDR4 SDRAM. https://classes.engineering.wustl.edu/permanant/cse260m/images/0/0c/8Gb_DDR4_SDRAM.pdf.
[59]
MICRON. 2017. DDR4 SDRAM UDIMM MTA18ASF2G72AZ - 16GB. https://www.micron.com/-/media/documents/products/data-sheet/modules/unbuffered_dimm/ddr4/asf18c2gx72az.pdf.
[60]
MICRON. 2019. DDR4 SDRAM RDIMM: MTA144ASQ16G72PSZ - 128GB. https://www.micron.com/-/media/client/global/documents/products/data-sheet/modules/lrdimm/ddr4/asq144c16gx72psz.pdf.
[61]
MICRON. 2019. DDR4 SDRAM RDIMM: MTA36ASF4G72PZ - 32GB. https://www.micron.com/-/media/documents/products/data-sheet/modules/rdimm/ddr4/asf36c4gx72pz.pdf.
[62]
K. Nguyen, K. Lyu, X. Meng, V. Sridharan, and X. Jian. 2018. Nonblocking Memory Refresh. In 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). 588--599. https://doi.org/10.1109/ISCA.2018.00055
[63]
K. Nguyen, K. Lyu, X. Meng, V. Sridharan, and X. Jian. 2019. Nonblocking DRAM Refresh. IEEE Micro 39, 3 (May 2019), 103--109. https://doi.org/10.1109/MM.2019.2907486
[64]
Minesh Patel, Jeremie S. Kim, and Onur Mutlu. 2017. The Reach Profiler (REAPER): Enabling the Mitigation of DRAM Retention Failures via Profiling at Aggressive Conditions. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA '17). ACM, New York, NY, USA, 255--268. https://doi.org/10.1145/3079856.3080242
[65]
J. Thomas Pawlowski. 2018. In-person Interview.
[66]
J. Thomas Pawlowski. 2019. Email Interview.
[67]
Erez Perelman, Greg Hamerly, Michael Van Biesbrouck, Timothy Sherwood, and Brad Calder. 2003. Using SimPoint for Accurate and Efficient Simulation. In Proceedings of the 2003 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS '03). ACM, New York, NY, USA, 318--319. https://doi.org/10.1145/781027.781076
[68]
Peter Pessl, Daniel Gruss, Clémentine Maurice, Michael Schwarz, and Stefan Mangard. 2016. DRAMA: Exploiting DRAM Addressing for Cross-CPU Attacks. In 25th USENIX Security Symposium (USENIX Security 16). USENIX Association, Austin, TX, 565--581. https://www.usenix.org/conference/usenixsecurity16/technical-sessions/presentation/pessl
[69]
Peter Pessl, Daniel Gruss, Clémentine Maurice, Michael Schwarz, and Stefan Mangard. 2016. DRAMA: Exploiting DRAM Addressing for Cross-CPU Attacks. In 25th USENIX Security Symposium (USENIX Security 16). USENIX Association, Austin, TX, 565--581. https://www.usenix.org/conference/usenixsecurity16/technical-sessions/presentation/pessl
[70]
Moinuddin K. Qureshi, Michele M. Franceschini, Luis A. Lastras-Montaño, and John P. Karidis. 2010. Morphable Memory System: A Robust Architecture for Exploiting Multi-level Phase Change Memories. SIGARCH Comput. Archit. News 38, 3 (June 2010), 153--162. https://doi.org/10.1145/1816038.1815981
[71]
M. K. Qureshi, D. H. Kim, S. Khan, P. J. Nair, and O. Mutlu. 2015. AVATAR: A Variable-Retention-Time (VRT) Aware Refresh for DRAM Systems. In 2015 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks. 427--437. https://doi.org/10.1109/DSN.2015.58
[72]
redhat. 2019. CACHE LIMITATIONS WITH NFS. https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/storage_administration_guide/fscachelimitnfs.
[73]
Charles Reiss, Alexey Tumanov, Gregory R. Ganger, Randy H. Katz, and Michael A. Kozuch. 2012. Heterogeneity and Dynamicity of Clouds at Scale: Google Trace Analysis. In Proceedings of the Third ACM Symposium on Cloud Computing (SoCC '12). ACM, New York, NY, USA, Article 7, 13 pages. https://doi.org/10.1145/2391229.2391236
[74]
Jeffrey Stuecheli, Dimitris Kaseridis, David Daly, Hillery C. Hunter, and Lizy K. John. 2010. The Virtual Write Queue: Coordinating DRAM and Last-level Cache Policies. SIGARCH Comput. Archit. News 38, 3 (June 2010), 72--82. https://doi.org/10.1145/1816038.1815972
[75]
G. Tziantzioulis, N. Hardavellas, and S. Campanoni. 2018. Temporal Approximate Function Memoization. IEEE Micro 38, 4 (Jul 2018), 60--70. https://doi.org/10.1109/MM.2018.043191126
[76]
Jingjing Wang and Magdalena Balazinska. 2017. Elastic Memory Management for Cloud Data Analytics. In Proceedings of the 2017 USENIX Conference on Usenix Annual Technical Conference (USENIX ATC '17). USENIX Association, Berkeley, CA, USA, 745--758. http://dl.acm.org/citation.cfm?id=3154690.3154760
[77]
T. F. Wenisch, M. Ferdman, A. Ailamaki, B. Falsafi, and A. Moshovos. 2010. Making Address-Correlated Prefetching Practical. IEEE Micro 30, 1 (Jan 2010), 50--59. https://doi.org/10.1109/MM.2010.21
[78]
Wikichip. [n. d.]. Skylake (server) - Microarchitectures - Intel. https://en.wikichip.org/wiki/intel/microarchitectures/skylake_(server).
[79]
Guowei Zhang and Daniel Sanchez. 2018. Leveraging Hardware Caches for Memoization. IEEE Comput. Archit. Lett. 17, 1 (Jan. 2018), 59--63. https://doi.org/10.1109/LCA.2017.2762308
[80]
Darko Zivanovic, Milan Pavlovic, Milan Radulovic, Hyunsung Shin, Jongpil Son, Sally A. Mckee, Paul M. Carpenter, Petar Radojković, and Eduard Ayguadé. 2017. Main Memory in HPC: Do We Need More or Could We Live with Less? ACM Trans. Archit. Code Optim. 14, 1, Article 3 (March 2017), 26 pages. https://doi.org/10.1145/3023362

Cited By

View all
  • (2024)Disaggregated Memory with SmartNIC Offloading: a Case Study on Graph Processing2024 IEEE 36th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)10.1109/SBAC-PAD63648.2024.00022(159-169)Online publication date: 13-Nov-2024
  • (2024)Distributed Page Table: Harnessing Physical Memory as an Unbounded Hashed Page Table2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00013(36-49)Online publication date: 2-Nov-2024
  • (2024)DyLeCT: Achieving Huge-page-like Translation Performance for Hardware-compressed Memory2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00085(1129-1143)Online publication date: 29-Jun-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
MICRO '52: Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture
October 2019
1104 pages
ISBN:9781450369381
DOI:10.1145/3352460
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 October 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. DRAM
  2. HPC Systems
  3. Memory Architecture
  4. Memory Management
  5. Operating Systems
  6. Supercomputing

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

MICRO '52
Sponsor:

Acceptance Rates

Overall Acceptance Rate 484 of 2,242 submissions, 22%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)653
  • Downloads (Last 6 weeks)57
Reflects downloads up to 11 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Disaggregated Memory with SmartNIC Offloading: a Case Study on Graph Processing2024 IEEE 36th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)10.1109/SBAC-PAD63648.2024.00022(159-169)Online publication date: 13-Nov-2024
  • (2024)Distributed Page Table: Harnessing Physical Memory as an Unbounded Hashed Page Table2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00013(36-49)Online publication date: 2-Nov-2024
  • (2024)DyLeCT: Achieving Huge-page-like Translation Performance for Hardware-compressed Memory2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00085(1129-1143)Online publication date: 29-Jun-2024
  • (2024)Software Resource Disaggregation for HPC with Serverless Computing2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS57955.2024.00021(139-156)Online publication date: 27-May-2024
  • (2024)Agile-DRAM: Agile Trade-Offs in Memory Capacity, Latency, and Energy for Data Centers2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00089(1141-1153)Online publication date: 2-Mar-2024
  • (2024)Towards Improving Resource Allocation for Multi-Tenant HPC Systems: An Exploratory HPC Cluster Utilization Case Study2024 IEEE International Conference on Cluster Computing Workshops (CLUSTER Workshops)10.1109/CLUSTERWorkshops61563.2024.00019(66-75)Online publication date: 24-Sep-2024
  • (2024)Job Scheduling in High Performance Computing Systems with Disaggregated Memory Resources2024 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER59578.2024.00033(297-309)Online publication date: 24-Sep-2024
  • (2024)A Scalable Distributed Computation Framework for Tackling Underutilization and Ad-Hoc Computations in Heterogenous ClustersDeep Sciences for Computing and Communications10.1007/978-3-031-68908-6_6(73-80)Online publication date: 29-Sep-2024
  • (2023)Dynamic Memory Provisioning on Disaggregated HPC SystemsProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624174(973-982)Online publication date: 12-Nov-2023
  • (2023)Using Local Cache Coherence for Disaggregated Memory SystemsACM SIGOPS Operating Systems Review10.1145/3606557.360656157:1(21-28)Online publication date: 28-Jun-2023
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media