More Web Proxy on the site http://driver.im/

research-article

RackMem: A Tailored Caching Layer for Rack Scale Computing

Authors:

Bernhard EggerAuthors Info & Claims

PACT '20: Proceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques

Pages 467 - 480

https://doi.org/10.1145/3410463.3414643

Published: 30 September 2020 Publication History

Abstract

High-performance computing (HPC) clusters suffer from an overall low memory utilization that is caused by the node-centric memory allocation combined with the variable memory requirements of HPC workloads. The recent provisioning of nodes with terabytes of memory to accommodate workloads with extreme peak memory requirements further exacerbates the problem. Memory disaggregation is viewed as a promising remedy to increase overall resource utilization and enable cost-effective up-scaling and efficient operation of HPC clusters, however, the overhead of demand paging in virtual memory management has so far hindered performant implementations. To overcome these limitations, this work presents RackMem, an efficient implementation of disaggregated memory for rack scale computing. RackMem addresses the shortcomings of Linux's demand paging algorithm and automatically adapts to the memory access patterns of individual processes to minimize the inherent overhead of remote memory accesses. Evaluated on a cluster with an InfiniBand interconnect, RackMem outperforms the state-of-the-art RDMA implementation and Linux's virtual memory paging by a significant margin. RackMem's custom demand paging implementation achieves a tail latency that is two orders of magnitude better than that of the Linux kernel. Compared to the state-of-the-art remote paging solution, RackMem achieves a 28% higher throughput and a 44% lower tail latency for a wide variety of real-world workloads.

References

[1]

Marcos K. Aguilera, Nadav Amit, Irina Calciu, Xavier Deguillard, Jayneel Gandhi, Stanko Novaković, Arun Ramanathan, Pratap Subrahmanyam, Lalith Suresh, Kiran Tati, Rajesh Venkatasubramanian, and Michael Wei. 2018. Remote regions: a simple abstraction for remote memory. In 2018 USENIX Annual Technical Conference (USENIX ATC 18). USENIX Association, 775--787. https://www.usenix.org/conference/atc18/presentation/aguilera

[2]

Marcos K. Aguilera, Nadav Amit, Irina Calciu, Xavier Deguillard, Jayneel Gandhi, Pratap Subrahmanyam, Lalith Suresh, Kiran Tati, Rajesh Venkatasubramanian, and Michael Wei. 2017. Remote Memory in the Age of Fast Networks. In Proceedings of the 2017 Symposium on Cloud Computing (SoCC '17). ACM, 121--127. https://doi.org/10.1145/3127479.3131612

Digital Library

[3]

Alibaba. 2018. Alibaba Production Cluster Trace Data. https://github.com/alibaba/clusterdata.

[4]

David H Bailey, Eric Barszcz, John T Barton, David S Browning, Russell L Carter, Leonardo Dagum, Rod A Fatoohi, Paul O Frederickson, Thomas A Lasinski, Rob S Schreiber, et al. 1991. The NAS parallel benchmarks. International Journal of High Performance Computing Applications 5, 3 (1991), 63--73.

Digital Library

[5]

Jeff Barr. 2019. EC2 High Memory Update -- New 18 TB and 24 TB Instance. https://aws.amazon.com/ko/blogs/aws/ec2-high-memory-update-new18-tb-and-24-tb-instances/.

[6]

M. Bielski, Ilias Syrigos, Kostas Katrinis, Dimitris Syrivelis, Andrea Reale, Dimitris Theodoropoulos, Nikolaos Alachiotis, Dionisios Pnevmatikatos, H. E. Pap, George Zervas, Vaibhawa Mishra, A. Saljoghei, A. Rigo, J. F. Zazo, Sergio Lopez-Buedo, Marti Torrents, Ferad Zyulkyarov, M. Enrico, and O. G. de Dios. 2018. dReDBox: Materializing a full-stack rack-scale system prototype of a next-generation disaggregated datacenter. In 2018 Design, Automation Test in Europe Conference Exhibition (DATE). 1093--1098. https://doi.org/10.23919/DATE.2018.8342174

[7]

Christian Bienia. 2011. Benchmarking Modern Multiprocessors. Ph.D. Dissertation. Princeton University.

Digital Library

[8]

Brad Calder and Bart Sano. 2019. Introducing Compute- and Memory-Optimized VMs for Google Compute Engine. https://cloud.google.com/blog/products/compute/introducing-compute-and-memory-optimized-vms-for-googlecompute-engine.

[9]

Christina Delimitrou and Christos Kozyrakis. 2014. Quasar: Resource-efficient and QoS-aware Cluster Management. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '14). ACM, 127--144. https://doi.org/10.1145/2541940.2541941

Digital Library

[10]

Djellel Eddine Difallah, Andrew Pavlo, Carlo Curino, and Philippe CudreMauroux. 2013. OLTP-Bench: An Extensible Testbed for Benchmarking Relational Databases. Proc. VLDB Endow. 7, 4 (Dec. 2013), 277--288. https://doi.org/10.14778/2732240.2732246

Digital Library

[11]

Paolo Faraboschi, Kimberly Keeton, Tim Marsland, and Dejan Milojicic. 2015. Beyond Processor-centric Operating Systems. In 15th Workshop on Hot Topics in Operating Systems (HotOS XV). USENIX Association. https://www.usenix.org/conference/hotos15/workshop-program/presentation/faraboschi

[12]

Linux Foundation. 2019. mm, swap: use rbtree for swap extent. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/ ?id=4efaceb1c5f8136d5fec3f26549d294b8e898bd7.

[13]

Linux Foundation. 2020. cgroups(7) - Linux manual page. http://man7.org/linux/man-pages/man7/cgroups.7.html.

[14]

Linux Foundation. 2020. Linux kernel documentation. https://www.kernel.org/doc/Documentation/trace/tracepoints.txt.

[15]

Linux Foundation. 2020. Null block device driver. https://www.kernel.org/doc/html/latest/block/null_blk.html.

[16]

Peter X. Gao, Akshay Narayan, Sagar Karandikar, Joao Carreira, Sangjin Han, Rachit Agarwal, Sylvia Ratnasamy, and Scott Shenker. 2016. Network Requirements for Resource Disaggregation. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (OSDI '16). USENIX Association, 249--264. http://dl.acm.org/citation.cfm?id=3026877.3026897

Digital Library

[17]

Google. 2011. Google Production Cluster Trace Data. https://github.com/google/cluster-data.

[18]

Juncheng Gu, Youngmoon Lee, Yiwen Zhang, Mosharaf Chowdhury, and Kang G. Shin. 2017. Efficient Memory Disaggregation with Infiniswap. In 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 17). USENIX Association, 649--667. https://www.usenix.org/conference/nsdi17/technicalsessions/presentation/gu

Digital Library

[19]

Infiniswap 2017. Infiniswap: Efficient Memory Disaggregation with Infiniswap. https://github.com/SymbioticLab/infiniswap.

[20]

Intel. 2018. Intel Rack Scale Design Architecture. https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/rack-scale-designarchitecture-white-paper.pdf.

[21]

Adam Jacobs. 2009. The pathologies of big data. Commun. ACM 52, 8 (2009), 36--44.

Digital Library

[22]

Changyeon Jo, Hyunik Kim, and Bernhard Egger. 2020. Instant Virtual Machine Live Migration. In Economics of Grids, Clouds, Systems, and Services (GECON '20). Springer International Publishing, Cham.

[23]

Karthik Kambatla, Giorgos Kollias, Vipin Kumar, and Ananth Grama. 2014. Trends in big data analytics. J. Parallel and Distrib. Comput. 74, 7 (2014), 2561 -- 2573. https://doi.org/10.1016/j.jpdc.2014.01.003 Special Issue on Perspectives on Parallel and Distributed Processing.

[24]

Kostas Katrinis, Dimitris Syrivelis, Dionisios Pnevmatikatos, George Zervas, Dimitris Theodoropoulos, Iordanis Koutsopoulos, K. Hasharoni, Daniel Raho, Christian Pinto, F. Espina, Sergio Lopez-Buedo, Q. Chen, Mario D. Nemirovsky, Damian Roca, H. Klos, and T. Berends. 2016. Rack-scale disaggregated cloud data centers: The dReDBox project vision. In 2016 Design, Automation Test in Europe Conference Exhibition (DATE). 690--695.

[25]

Aniraj Kesavan, Robert Ricci, and Ryan Stutsman. 2017. To Copy or Not to Copy: Making In-Memory Databases Fast on Modern NICs. In Data Management on New Hardware. Springer International Publishing, 79--94.

[26]

Andres Lagar-Cavilla, Junwhan Ahn, Suleiman Souhlal, Neha Agarwal, Radoslaw Burny, Shakeel Butt, Jichuan Chang, Ashwin Chaugule, Nan Deng, Junaid Shahid, Greg Thelen, Kamil Adam Yurtsever, Yu Zhao, and Parthasarathy Ranganathan. 2019. Software-Defined Far Memory in Warehouse-Scale Computers. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '19). ACM, New York, NY, USA, 317--330. https://doi.org/10.1145/3297858.3304053

Digital Library

[27]

Shuang Liang, Ranjit Noronha, and Dhabaleswar K. Panda. 2005. Swapping to Remote Memory over InfiniBand: An Approach using a High Performance Network Block Device. In 2005 IEEE International Conference on Cluster Computing. 1--10. https://doi.org/10.1109/CLUSTR.2005.347050

[28]

Kevin Lim, Jichuan Chang, Trevor Mudge, Parthasarathy Ranganathan, Steven K. Reinhardt, and Thomas F. Wenisch. 2009. Disaggregated Memory for Expansion and Sharing in Blade Servers. In Proceedings of the 36th Annual International Symposium on Computer Architecture (ISCA '09). Association for Computing Machinery, New York, NY, USA, 267--278. https://doi.org/10.1145/1555754.1555789

[29]

Kevin Lim, Yoshio Turner, Jose Renato Santos, Alvin AuYoung, Jichuan Chang, Parthasarathy Ranganathan, and Thomas F. Wenisch. 2012. System-level implications of disaggregated memory. In IEEE International Symposium on High-Performance Comp Architecture. 1--12. https://doi.org/10.1109/HPCA.2012.6168955

Digital Library

[30]

H. Litz, M. Thuermer, and U. Bruening. 2010. TCCluster: A Cluster Architecture Utilizing the Processor Host Interface as a Network Interconnect. In 2010 IEEE International Conference on Cluster Computing. 9--18. https://doi.org/10.1109/ CLUSTER.2010.37

Digital Library

[31]

Feilong Liu, Lingyan Yin, and Spyros Blanas. 2017. Design and Evaluation of an RDMA-aware Data Shuffling Operator for Parallel Database Systems. In Proceedings of the Twelfth European Conference on Computer Systems (EuroSys '17). ACM, 48--63. https://doi.org/10.1145/3064176.3064202

Digital Library

[32]

Yin Lu, Yong Chen, Yu Zhuang, Jialin Liu, and Rajeev Thakur. 2015. Collective input/output under memory constraints. International Journal of High Performance Computing Applications 29, 1 (2015), 21--36. https://doi.org/10.1177/1094342014561696

Digital Library

[33]

Youyou Lu, Jiwu Shu, Youmin Chen, and Tao Li. 2017. Octopus: an RDMAenabled Distributed Persistent Memory File System. In 2017 USENIX Annual Technical Conference (USENIX ATC 17). USENIX Association, 773--785. https://www.usenix.org/conference/atc17/technical-sessions/presentation/lu

[34]

LWN.net. 2016. Making swapping scalable. https://lwn.net/Articles/704478/.

[35]

LWN.net. 2017. mm, swap: VMA based swap readahead]. https://lwn.net/Articles/716296/.

[36]

LWN.net. 2017. The next steps for swap]. https://lwn.net/Articles/717707/.

[37]

Hasan Maruf and Mosharaf Chowdhury. 2019. Effectively Prefetching Remote Memory with Leap. (11 2019).

[38]

Mellanox. 2016. Mellanox Products: ConnectX®-5 Single/Dual-Port Adapter supporting 100Gb/s with VPI. http://www.mellanox.com/page/products_dyn? product_family=258&mtag=connectx_5_vpi_card.

[39]

Mellanox. 2020. Introducing 200G HDR InfiniBand Solutions. https://www.mellanox.com/pdf/whitepapers/WP_Introducing_200G_HDR_InfiniBand_Solutions.pdf.

[40]

Michael Mitzenmacher. 2001. The power of two choices in randomized load balancing. IEEE Transactions on Parallel and Distributed Systems 12, 10 (Oct 2001), 1094--1104. https://doi.org/10.1109/71.963420

Digital Library

[41]

Tia Newhall, Sean Finney, Kuzman Ganchev, and Michael Spiegel. 2003. Nswap: A Network Swapping Module for Linux Clusters. In Euro-Par 2003 Parallel Processing. Springer Berlin Heidelberg, Berlin, Heidelberg, 1160--1169.

[42]

Vlad Nitu, Boris Teabe, Alain Tchana, Canturk Isci, and Daniel Hagimont. 2018. Welcome to Zombieland: Practical and Energy-efficient Memory Disaggregation in a Datacenter. In Proceedings of the Thirteenth EuroSys Conference (EuroSys '18). ACM, Article 16, 12 pages. https://doi.org/10.1145/3190508.3190537

Digital Library

[43]

pmem.io. 2020. Persistent Memory Programming. http://pmem.io/.

[44]

Stephen M. Rumble, Diego Ongaro, Ryan Stutsman, Mendel Rosenblum, and John K. Ousterhout. 2011. It's Time for Low Latency. In Proceedings of the 13th USENIX Conference on Hot Topics in Operating Systems (HotOS '13). USENIX Association, 11--11. http://dl.acm.org/citation.cfm?id=1991596.1991611

[45]

Iman Sadooghi, Jesus Hernandez Martin, Tonglin Li, Kevin Brandstatter, Ketan Maheshwari, Tiago Pais Pitta De Lacerda Ruivo, Gabriele Garzoglio, Steven Timm, Yong Zhao, and Ioan Raicu. 2017. Understanding the Performance and Potential of Cloud Computing for Scientific Applications. IEEE Transactions on Cloud Computing 5, 2 (2017), 358--371. https://doi.org/10.1109/TCC.2015.2404821

[46]

Corey Sanders. 2018. Why you should bet on Azure for your infrastructure needs, today and in the future. https://azure.microsoft.com/en-us/blog/why-youshould-bet-on-azure-for-your-infrastructure-needs-today-and-in-the-future/.

[47]

Yizhou Shan, Yutong Huang, Yilun Chen, and Yiying Zhang. 2018. LegoOS: A disseminated, distributed OS for hardware resource disaggregation. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18). 69--87.

[48]

Yizhou Shan, Shin-Yeh Tsai, and Yiying Zhang. 2017. Distributed Shared Persistent Memory. In Proceedings of the 2017 Symposium on Cloud Computing (SoCC '17). ACM, 323--337. https://doi.org/10.1145/3127479.3128610

Digital Library

[49]

Avinash Sodani. 2011. Race to exascale: Challenges and opportunities. MICRO 2011 Keynote (2011).

[50]

Petter Svärd, Benoit Hudzia, Johan Tordsson, and Erik Elmroth. 2014. Hecatonchire: Towards Multi-host Virtual Machines by Server Disaggregation. In Euro-Par 2014: Parallel Processing Workshops. Springer International Publishing, 519--529.

[51]

Top500.org. 2020. TOP500 Supercomputer Sites. https://top500.org/.

[52]

Shin-Yeh Tsai and Yiying Zhang. 2017. LITE Kernel RDMA Support for Datacenter Applications. In Proceedings of the 26th Symposium on Operating Systems Principles (SOSP '17). ACM, 306--324. https://doi.org/10.1145/3132747.3132762

Digital Library

[53]

Matei Zaharia, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2010. Spark: Cluster Computing with Working Sets. In Proceedings of the 2Nd USENIX Conference on Hot Topics in Cloud Computing (HotCloud '10). USENIX Association, 10--10. http://dl.acm.org/citation.cfm?id=1863103.1863113

Digital Library

[54]

Darko Zivanovic, Milan Pavlovic, Milan Radulovic, Paul M. Carpenter, Petar Radojkovic, Eduard Ayguade, Hyunsung Shin, Jongpil Son, and Sally A. McKee. 2017. Main memory in HPC: Do we need more or could we live with less? ACM Transactions on Architecture and Code Optimization 14, 1 (2017), 1--26. https://doi.org/10.1145/3023362.

Digital Library

Cited By

Zacarias FCarpenter PPetrucci V(2023)Dynamic Memory Provisioning on Disaggregated HPC SystemsProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624174(973-982)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3624062.3624174
Wang CHe KFan RWang XWang WHao Q(2023)CXL over Ethernet: A Novel FPGA-based Memory Disaggregation Design in Data Centers2023 IEEE 31st Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)10.1109/FCCM57271.2023.00017(75-82)Online publication date: May-2023
https://doi.org/10.1109/FCCM57271.2023.00017
Kim TKoh KKim CPak EJeong YKim S(2023)DEHype: Retrofitting Hypervisors for a Resource-Disaggregated Environment2023 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER52292.2023.00011(37-48)Online publication date: 31-Oct-2023
https://doi.org/10.1109/CLUSTER52292.2023.00011
Show More Cited By

Index Terms

RackMem: A Tailored Caching Layer for Rack Scale Computing
1. Computer systems organization
  1. Architectures
    1. Distributed architectures
      1. Cloud computing
2. Software and its engineering
  1. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Memory management
        Distributed memory
        Virtual memory
    2. Software system structures
      1. Distributed systems organizing principles
        Cloud computing

Recommendations

Clio: a hardware-software co-designed disaggregated memory system
ASPLOS '22: Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems

Memory disaggregation has attracted great attention recently because of its benefits in efficient memory utilization and ease of management. So far, memory disaggregation research has all taken one of two approaches: building/emulating memory nodes using ...
Rethinking software runtimes for disaggregated memory
ASPLOS '21: Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems

Disaggregated memory can address resource provisioning inefficiencies in current datacenters. Multiple software runtimes for disaggregated memory have been proposed in an attempt to make disaggregated memory practical. These systems rely on the virtual ...
An Approach to Use Cluster-Wide Free Memory in Virtual Environment
CHINAGRID '11: Proceedings of the 2011 Sixth Annual ChinaGrid Conference

Memory and I/O intensive applications always use a huge amount of memory and the performance decreases quickly when memory pressure arises. With the development of high performance network and widely used in cluster, the latency of remote memory access ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

PACT '20: Proceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques

September 2020

505 pages

ISBN:9781450380751

DOI:10.1145/3410463

General Chair:
Vivek Sarkar
Georgia Institute of Technology
,
Program Chair:
Hyesoon Kim
Georgia Institute of Technology

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 September 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Research Foundation of Korea (NRF)
BK21 Plus for Pioneers in Innovative Computing (Dept. of Computer Science and Engineering SNU)

Conference

PACT '20

Sponsor:

SIGARCH

PACT '20: International Conference on Parallel Architectures and Compilation Techniques

October 3 - 7, 2020

GA, Virtual Event, USA

Acceptance Rates

Overall Acceptance Rate 121 of 471 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
370
Total Downloads

Downloads (Last 12 months)83
Downloads (Last 6 weeks)14

Reflects downloads up to 11 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zacarias FCarpenter PPetrucci V(2023)Dynamic Memory Provisioning on Disaggregated HPC SystemsProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624174(973-982)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3624062.3624174
Wang CHe KFan RWang XWang WHao Q(2023)CXL over Ethernet: A Novel FPGA-based Memory Disaggregation Design in Data Centers2023 IEEE 31st Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)10.1109/FCCM57271.2023.00017(75-82)Online publication date: May-2023
https://doi.org/10.1109/FCCM57271.2023.00017
Kim TKoh KKim CPak EJeong YKim S(2023)DEHype: Retrofitting Hypervisors for a Resource-Disaggregated Environment2023 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER52292.2023.00011(37-48)Online publication date: 31-Oct-2023
https://doi.org/10.1109/CLUSTER52292.2023.00011
Kim HJo CAltmann JEgger B(2021)RapidSwap: a Hierarchical Far MemoryEconomics of Grids, Clouds, Systems, and Services10.1007/978-3-030-92916-9_12(143-151)Online publication date: 9-Dec-2021
https://doi.org/10.1007/978-3-030-92916-9_12

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents