More Web Proxy on the site http://driver.im/

research-article

Contention-aware application performance prediction for disaggregated memory systems

Authors:

Felippe Vieira Zacarias,

Rajiv Nishtala,

Paul CarpenterAuthors Info & Claims

CF '20: Proceedings of the 17th ACM International Conference on Computing Frontiers

Pages 49 - 59

https://doi.org/10.1145/3387902.3392625

Published: 23 May 2020 Publication History

Abstract

Disaggregated memory has recently been proposed as a way to allow flexible and fine-grained allocation of memory capacity to compute jobs. This paper makes an important step towards effective resource allocation on disaggregated memory systems. Specifically, we propose a generic approach to predict the performance degradation due to sharing of disaggregated memory. In contrast to prior work, cache capacity is not shared among multiple applications, which removes a major contributor to application performance. For this reason, our analysis is driven by the demand for memory bandwidth, which has been shown to have an important effect on application performance. We show that profiling the application slowdown often involves significant experimental error and noise, and to this end, we improve the accuracy by linear smoothing of the sensitivity curves. We also show that contention is sensitive to the ratio between read and write memory accesses, and we address this sensitivity by building a family of sensitivity curves according to the read/write ratios.

Our results show that the methodology predicts the slowdown in application performance subject to memory contention with an average error of 1.19% and max error of 14.6%. Compared with state-of-the-art, the relative improvements are almost 24% on average and 33% for the worst case.

References

[1]

Bulent Abali, Richard J Eickemeyer, Hubertus Franke, Chung-Sheng Li, and Marc A Taubenblatt. 2015. Disaggregated and optically interconnected memory: when will it be cost effective? arXiv preprint arXiv:1503.01416 (2015).

[2]

D. H. Bailey, E. Barszcz, J. T. Barton, D. S. Browning, R. L. Carter, L. Dagum, R. A. Fatoohi, P. O. Frederickson, T. A. Lasinski, R. S. Schreiber, H. D. Simon, V. Venkatakrishnan, and S. K. Weeratunga. 1991. The NAS Parallel Benchmarks: Summary and Preliminary Results. In SC. ACM, New York, NY, USA, 158--165.

Digital Library

[3]

Tirtha Pratim Bhattacharjee. 2013. Data Movement and Workload characterization: Intel Sandy Bridge Core and Uncore PMU features.

[4]

Maciej Bielski, Ilias Syrigos, Kostas Katrinis, Dimitris Syrivelis, Andrea Reale, Dimitris Theodoropoulos, Nikolaos Alachiotis, D Pnevmatikatos, EH Pap, George Zervas, et al. 2018. dReDBox: Materializing a full-stack rack-scale system prototype of a next-generation disaggregated datacenter. In 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 1093--1098.

[5]

Christian Bienia, Sanjeev Kumar, Jaswinder Pal Singh, and Kai Li. 2008. The PARSEC Benchmark Suite: Characterization and Architectural Implications. In PACT. ACM, 72--81.

Digital Library

[6]

BSC. 2009. PROFET: Analytical model that quantifies the impact of the main memory on application performance and system power and energy consumption. https://github.com/bsc-mem/PROFET Accessed: 2019-10-16.

[7]

Dhantu Buragohain, Abhishek Ghogare, Trishal Patel, Mythili Vutukuru, and Purushottam Kulkarni. 2017. DiME: A performance emulator for disaggregated memory architectures. In Proceedings of the 8th Asia-Pacific Workshop on Systems. 1--8.

Digital Library

[8]

Marc Casas and Greg Bronevetsky. 2015. Evaluation of HPC applications' memory resource consumption via active measurement. IEEE Transactions on Parallel and Distributed Systems 27, 9 (2015), 2560--2573.

Digital Library

[9]

Shuai Che, Michael Boyer, Jiayuan Meng, David Tarjan, Jeremy W. Sheaffer, Sang-Ha Lee, and Kevin Skadron. 2009. Rodinia: A Benchmark Suite for Heterogeneous Computing. In IISWC. IEEE, Washington, DC, USA, 44--54.

Digital Library

[10]

Intel Corporation. 2012. Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring Guide. tech. rep. (March 2012).

[11]

Intel Corporation. 2018. Intel® 64 and IA-32 architectures software developer's manual combined volumes: 1, 2A, 2B, 2C, 2D, 3A, 3B, 3C, 3D and 4.

[12]

Andreas De Blanche and Thomas Lundqvist. 2014. A methodology for estimating co-scheduling slowdowns due to memory bus contention on multicore nodes. In International conference on parallel and distributed computing and networks.

[13]

Andreas De Blanche and Thomas Lundqvist. 2015. Addressing characterization methods for memory contention aware co-scheduling. The Journal of Supercomputing 71, 4 (2015), 1451--1483.

Digital Library

[14]

Yves Durand, Paul M Carpenter, Stefano Adami, Angelos Bilas, Denis Dutoit, Alexis Farcy, Georgi Gaydadjiev, John Goodacre, Manolis Katevenis, Manolis Marazakis, et al. 2014. Euroserver: Energy efficient node for European microservers. In 2014 17th Euromicro Conference on Digital System Design. IEEE, 206--213.

Digital Library

[15]

David Eklov, Nikos Nikoleris, David Black-Schaffer, and Erik Hagersten. 2011. Cache pirating: Measuring the curse of the shared cache. In 2011 International Conference on Parallel Processing. IEEE, 165--175.

Digital Library

[16]

David Eklöv, Nikos Nikoleris, David Black-Schaffer, and Erik Hagersten. 2013. Bandwidth Bandit: Quantitative Characterization of Memory Contention. In CGO 2013, 23-27 February, Shenzhen, China. IEEE Computer Society, 99--108.

[17]

Josué Feliu, Julio Sahuquillo, Salvador Petit, and Jose Duato. 2016. Perf&Fair: A Progress-Aware Scheduler to Enhance Performance and Fairness in SMT Multicores. IEEE Trans. Comput. 66, 5 (2016), 905--911.

Digital Library

[18]

Juncheng Gu, Youngmoon Lee, Yiwen Zhang, Mosharaf Chowdhury, and Kang G Shin. 2017. Efficient memory disaggregation with infiniswap. In 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 17). 649--667.

[19]

Vamsee Reddy Kommareddy, Amro Awad, Clayton Hughes, and Simon David Hammond. 2018. Exploring Allocation Policies in Disaggregated Non-Volatile Memories. In Proceedings of the Workshop on Memory Centric High Performance Computing. ACM, 58--66.

Digital Library

[20]

Kevin Lim, Jichuan Chang, Trevor Mudge, Parthasarathy Ranganathan, Steven K Reinhardt, and Thomas F Wenisch. 2009. Disaggregated memory for expansion and sharing in blade servers. In ACM SIGARCH Computer Architecture News, Vol. 37. ACM, 267--278.

Digital Library

[21]

Kevin Lim, Yoshio Turner, Jose Renato Santos, Alvin AuYoung, Jichuan Chang, Parthasarathy Ranganathan, and Thomas F Wenisch. 2012. System-level implications of disaggregated memory. In IEEE International Symposium on HighPerformance Comp Architecture. IEEE, 1--12.

Digital Library

[22]

Zoltan Majo and Thomas R Gross. 2011. Memory system performance in a NUMA multicore multiprocessor. In Proceedings of the 4th Annual International Conference on Systems and Storage. ACM, 12.

Digital Library

[23]

Jason Mars, Lingjia Tang, Robert Hundt, Kevin Skadron, and Mary Lou Soffa. 2011. Bubble-up: Increasing utilization in modern warehouse scale computers via sensible co-locations. In MICRO. ACM, 248--259.

Digital Library

[24]

John D. McCalpin. 1995. Memory Bandwidth and Machine Balance in Current High Performance Computers. IEEE Computer Society Technical Committee on Computer Architecture (TCCA) Newsletter (Dec. 1995), 19--25.

[25]

John D. McCalpin. 2019. SC16 Invited Talk: Memory Bandwidth and System Balance in HPC Systems. https://sites.utexas.edu/jdm4372/tag/stream-benchmark/. Accessed: 2019-09-18.

[26]

Daniel Molka, Daniel Hackenberg, and Robert Schöne. 2014. Main Memory and Cache Performance of Intel Sandy Bridge and AMD Bulldozer. In Proceedings of the workshop on Memory Systems Performance and Correctness. 1--10.

Digital Library

[27]

Daniel Molka, Robert Schöne, Daniel Hackenberg, and Wolfgang E Nagel. 2017. Detecting memory-boundedness with hardware performance counters. In Proceedings of the 8th ACM/SPEC on International Conference on Performance Engineering. ACM, 27--38.

Digital Library

[28]

Héctor Montaner, Federico Silla, Holger Froning, and José Duato. 2011. Memscale™: A scalable environment for databases. In 2011 IEEE International Conference on High Performance Computing and Communications. IEEE, 339--346.

Digital Library

[29]

Rajiv Nishtala, Paul Carpenter, and Xavier Martorell. 2019. Performance effects on HPC workloads of global memory capacity sharing. In MULTIPROG.

[30]

Antonios D Papaioannou, Reza Nejabati, and Dimitra Simeonidou. 2016. The benefits of a disaggregated data centre: A resource allocation approach. In 2016 IEEE Global Communications Conference (GLOBECOM). IEEE, 1--7.

[31]

EuroEXA project. 2009. H2020 project number 754337. https://euroexa.eu/ Accessed: 2019-10-16.

[32]

Milan Radulovic, Rommel Sánchez Verdejo, Paul Carpenter, Petar Radojkovic, Bruce Jacob, and Eduard Ayguadé. 2019. PROFET: Modeling System Performance and Energy Without Simulating the CPU. In Abstracts of the 2019 SIGMETRICS/Performance Joint International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS '19). ACM, New York, NY, USA, 71--72.

Digital Library

[33]

Pramod Subba Rao and George Porter. 2016. Is memory disaggregation feasible?: A case study with Spark SQL. In Proceedings of the 2016 Symposium on Architectures for Networking and Communications Systems. ACM, 75--80.

Digital Library

[34]

Alvise Rigo, Christian Pinto, Kevin Pouget, Daniel Raho, Denis Dutoit, PierreYves Martinez, Chris Doran, Luca Benini, Iakovos Mavroidis, Manolis Marazakis, et al. 2017. Paving the way towards a highly energy-efficient and highly integrated compute node for the Exascale revolution: the ExaNoDe approach. In 2017 Euromicro Conference on Digital System Design (DSD). IEEE, 486--493.

[35]

Christos Sakalis, Carl Leonardsson, Stefanos Kaxiras, and Alberto Ros. 2016. Splash-3: A properly synchronized benchmark suite for contemporary research. In ISPASS. IEEE, 101--111.

[36]

A Saljoghei, V Mishra, M Bielski, I Syrigos, K Katrinis, D Syrivelis, A Reale, DN Pnevmatikatos, D Theodoropoulos, M Enrico, et al. 2018. dReDbox: Demonstrating Disaggregated Memory in an Optical Data Centre. In 2018 Optical Fiber Communications Conference and Exposition (OFC). IEEE, 1--3.

[37]

Yizhou Shan, Yutong Huang, Yilun Chen, and Yiying Zhang. 2018. Legoos: A disseminated, distributed OS for hardware resource disaggregation. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18). 69--87.

[38]

Lavanya Subramanian, Vivek Seshadri, Arnab Ghosh, Samira Khan, and Onur Mutlu. 2015. The application slowdown model: Quantifying and controlling the impact of inter-application interference at shared caches and main memory. In Proceedings of the 48th International Symposium on Microarchitecture. ACM, 62--75.

Digital Library

[39]

Lavanya Subramanian, Vivek Seshadri, Yoongu Kim, Ben Jaiyen, and Onur Mutlu. 2013. MISE: Providing performance predictability and improving fairness in shared main memory systems. In 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA). IEEE, 639--650.

Digital Library

[40]

Dimitris Syrivelis, Andrea Reale, Kostas Katrinis, and Christian Pinto. 2018. A Software-defined SoC Memory Bus Bridge Architecture for Disaggregated Computing. In Proceedings of the 3rd International Workshop on Advanced Interconnect Solutions and Technologies for Emerging Computing Systems. ACM, 3.

Digital Library

[41]

Dimitris Syrivelis, Andrea Reale, Kostas Katrinis, Ilias Syrigos, Maciej Bielski, Dimitris Theodoropoulos, Dionisios N Pnevmatikatos, and Georgios Zervas. 2017. A software-defined architecture and prototype for disaggregated memory rack scale systems. In 2017 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS). IEEE, 300--307.

[42]

Carlos Vega, Jose Fernando Zazo, Hugo Meyer, Ferad Zyulkyarov, Sergio López-Buedo, and Javier Aracil. 2017. Diluting the Scalability Boundaries: Exploring the Use of Disaggregated Architectures for High-Level Network Data Analysis. In 2017 IEEE 19th International Conference on High Performance Computing and Communications; IEEE 15th International Conference on Smart City; IEEE 3rd International Conference on Data Science and Systems (HPCC/SmartCity/DSS). IEEE, 340--347.

[43]

Dongliang Xiong, Kai Huang, Xiaowen Jiang, and Xiaolang Yan. 2017. Providing Predictable Performance via a Slowdown Estimation Model. ACM Transactions on Architecture and Code Optimization (TACO) 14, 3 (2017), 25.

[44]

F.V. Zacarias, V. Petrucci, R. Nishtala, P. Carpenter, and D. Mossé. 2019. Intelligent Colocation of Workloads for Enhanced Server Efficiency. In 2019 28th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD).

[45]

Georgios Zervas, Hui Yuan, Arsalan Saljoghei, Qianqiao Chen, and Vaibhawa Mishra. 2018. Optically disaggregated data centers with minimal remote memory latency: technologies, architectures, and resource allocation. Journal of Optical Communications and Networking 10, 2 (2018), A270--A285.

[46]

Jiacheng Zhao, Huimin Cui, Jingling Xue, and Xiaobing Feng. 2015. Predicting cross-core performance interference on multicore processors with regression analysis. IEEE Transactions on Parallel and Distributed Systems 27, 5 (2015), 1443--1456.

Digital Library

[47]

Jiacheng Zhao, Huimin Cui, Jingling Xue, Xiaobing Feng, Youliang Yan, and Wensen Yang. 2013. An empirical model for predicting cross-core performance interference on multicore processors. In Proceedings of the 22nd international conference on Parallel architectures and compilation techniques. IEEE Press, 201--212.

Digital Library

[48]

Darko Zivanovic, Milan Pavlovic, Milan Radulovic, Hyunsung Shin, Jongpil Son, Sally A. Mckee, Paul M. Carpenter, Petar Radojković, and Eduard Ayguadé. 2017. Main Memory in HPC: Do We Need More or Could We Live with Less? ACM Trans. Archit. Code Optim. 14, 1, Article 3 (March 2017), 26 pages.

Digital Library

Cited By

Salimi Beni MHunold SCosenza B(2024)Analysis and prediction of performance variability in large-scale computing systemsThe Journal of Supercomputing10.1007/s11227-024-06040-wOnline publication date: 28-Mar-2024
https://doi.org/10.1007/s11227-024-06040-w
Zacarias FCarpenter PPetrucci V(2023)Dynamic Memory Provisioning on Disaggregated HPC SystemsProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624174(973-982)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3624062.3624174
Wahlgren JSchieffer GGokhale MPeng IMohror KArnold DBadia R(2023)A Quantitative Approach for Adopting Disaggregated Memory in HPC SystemsProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607108(1-14)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3581784.3607108
Show More Cited By

Index Terms

Contention-aware application performance prediction for disaggregated memory systems
1. Computing methodologies
  1. Modeling and simulation
    1. Model development and analysis
      1. Modeling methodologies

Recommendations

An Enhanced Memory Address Mapping Scheme for Improved Memory Access Performance of 2-D DWT Processing Systems
Abstract
The implementation of the memory for storing image and transform coefficients in 2-D DWT processing systems using the more cost-effective external memory module such as DDR DRAM is shown to suffer from effective memory bandwidth which is ...
A Framework for Emulating Non-Volatile Memory Systemswith Different Performance Characteristics
ICPE '15: Proceedings of the 6th ACM/SPEC International Conference on Performance Engineering

Exponential increase of online data and a corresponding growth of data-centric applications (Big Data analytics) forces system architects to revisit assumptions and requirements of the future system design. New non-volatile memory (NVM) technologies, ...
Writeback-aware bandwidth partitioning for multi-core systems with PCM
PACT '13: Proceedings of the 22nd international conference on Parallel architectures and compilation techniques

Phase-Change Memory (PCM) has emerged as a promising low-power candidate to replace DRAM in main memory. Hybrid memory architecture comprised of a large PCM and a small DRAM is a popular solution to mitigate undesirable characteristics of PCM writes. ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

CF '20: Proceedings of the 17th ACM International Conference on Computing Frontiers

May 2020

298 pages

ISBN:9781450379564

DOI:10.1145/3387902

General Chairs:
Maurizio Palesi
University of Catania, IT
,
Gianluca Palermo
Politecnico di Milano, IT
,
Program Chairs:
Cat Graves
Hewlett Packard Labs
,
Eishi Arima
ITC University of Tokyo, JP

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMICRO: ACM Special Interest Group on Microarchitectural Research and Processing

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 May 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Badges

Artifacts Available

Author Tags

Qualifiers

Research-article

Funding Sources

European Union's Horizon 2020 Framework Programme

Conference

CF '20

Sponsor:

SIGMICRO

CF '20: Computing Frontiers Conference

May 11 - 13, 2020

Sicily, Catania, Italy

Acceptance Rates

Overall Acceptance Rate 273 of 785 submissions, 35%

Upcoming Conference

CF '25

Sponsor:
sigmicro

22nd ACM International Conference on Computing Frontiers

May 28 - 30, 2025

Cagliari , Italy

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
326
Total Downloads

Downloads (Last 12 months)55
Downloads (Last 6 weeks)4

Reflects downloads up to 24 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Salimi Beni MHunold SCosenza B(2024)Analysis and prediction of performance variability in large-scale computing systemsThe Journal of Supercomputing10.1007/s11227-024-06040-wOnline publication date: 28-Mar-2024
https://doi.org/10.1007/s11227-024-06040-w
Zacarias FCarpenter PPetrucci V(2023)Dynamic Memory Provisioning on Disaggregated HPC SystemsProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624174(973-982)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3624062.3624174
Wahlgren JSchieffer GGokhale MPeng IMohror KArnold DBadia R(2023)A Quantitative Approach for Adopting Disaggregated Memory in HPC SystemsProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607108(1-14)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3581784.3607108
Katsaragakis MMasouros DPapadopoulos LCatthoor FSoudris D(2023)On the Implications of Heterogeneous Memory Tiering on Spark In-Memory Analytics2023 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW59300.2023.00157(945-952)Online publication date: May-2023
https://doi.org/10.1109/IPDPSW59300.2023.00157
Masouros DPinto CGazzetti MXydis SSoudris D(2023)Adrias: Interference-Aware Memory Orchestration for Disaggregated Cloud Infrastructures2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10070939(855-869)Online publication date: Feb-2023
https://doi.org/10.1109/HPCA56546.2023.10070939
Zacarias FCarpenter PPetrucci V(2021)Memory Demands in Disaggregated HPC: How Accurate Do We Need to Be?2021 International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS)10.1109/PMBS54543.2021.00006(1-6)Online publication date: Nov-2021
https://doi.org/10.1109/PMBS54543.2021.00006
Zacarias FCarpenter PPetrucci V(2021)Improving HPC System Throughput and Response Time using Memory Disaggregation2021 IEEE 27th International Conference on Parallel and Distributed Systems (ICPADS)10.1109/ICPADS53394.2021.00041(283-290)Online publication date: Dec-2021
https://doi.org/10.1109/ICPADS53394.2021.00041

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents