[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3387902.3392625acmconferencesArticle/Chapter ViewAbstractPublication PagescfConference Proceedingsconference-collections
research-article

Contention-aware application performance prediction for disaggregated memory systems

Published: 23 May 2020 Publication History

Abstract

Disaggregated memory has recently been proposed as a way to allow flexible and fine-grained allocation of memory capacity to compute jobs. This paper makes an important step towards effective resource allocation on disaggregated memory systems. Specifically, we propose a generic approach to predict the performance degradation due to sharing of disaggregated memory. In contrast to prior work, cache capacity is not shared among multiple applications, which removes a major contributor to application performance. For this reason, our analysis is driven by the demand for memory bandwidth, which has been shown to have an important effect on application performance. We show that profiling the application slowdown often involves significant experimental error and noise, and to this end, we improve the accuracy by linear smoothing of the sensitivity curves. We also show that contention is sensitive to the ratio between read and write memory accesses, and we address this sensitivity by building a family of sensitivity curves according to the read/write ratios.
Our results show that the methodology predicts the slowdown in application performance subject to memory contention with an average error of 1.19% and max error of 14.6%. Compared with state-of-the-art, the relative improvements are almost 24% on average and 33% for the worst case.

References

[1]
Bulent Abali, Richard J Eickemeyer, Hubertus Franke, Chung-Sheng Li, and Marc A Taubenblatt. 2015. Disaggregated and optically interconnected memory: when will it be cost effective? arXiv preprint arXiv:1503.01416 (2015).
[2]
D. H. Bailey, E. Barszcz, J. T. Barton, D. S. Browning, R. L. Carter, L. Dagum, R. A. Fatoohi, P. O. Frederickson, T. A. Lasinski, R. S. Schreiber, H. D. Simon, V. Venkatakrishnan, and S. K. Weeratunga. 1991. The NAS Parallel Benchmarks: Summary and Preliminary Results. In SC. ACM, New York, NY, USA, 158--165.
[3]
Tirtha Pratim Bhattacharjee. 2013. Data Movement and Workload characterization: Intel Sandy Bridge Core and Uncore PMU features.
[4]
Maciej Bielski, Ilias Syrigos, Kostas Katrinis, Dimitris Syrivelis, Andrea Reale, Dimitris Theodoropoulos, Nikolaos Alachiotis, D Pnevmatikatos, EH Pap, George Zervas, et al. 2018. dReDBox: Materializing a full-stack rack-scale system prototype of a next-generation disaggregated datacenter. In 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 1093--1098.
[5]
Christian Bienia, Sanjeev Kumar, Jaswinder Pal Singh, and Kai Li. 2008. The PARSEC Benchmark Suite: Characterization and Architectural Implications. In PACT. ACM, 72--81.
[6]
BSC. 2009. PROFET: Analytical model that quantifies the impact of the main memory on application performance and system power and energy consumption. https://github.com/bsc-mem/PROFET Accessed: 2019-10-16.
[7]
Dhantu Buragohain, Abhishek Ghogare, Trishal Patel, Mythili Vutukuru, and Purushottam Kulkarni. 2017. DiME: A performance emulator for disaggregated memory architectures. In Proceedings of the 8th Asia-Pacific Workshop on Systems. 1--8.
[8]
Marc Casas and Greg Bronevetsky. 2015. Evaluation of HPC applications' memory resource consumption via active measurement. IEEE Transactions on Parallel and Distributed Systems 27, 9 (2015), 2560--2573.
[9]
Shuai Che, Michael Boyer, Jiayuan Meng, David Tarjan, Jeremy W. Sheaffer, Sang-Ha Lee, and Kevin Skadron. 2009. Rodinia: A Benchmark Suite for Heterogeneous Computing. In IISWC. IEEE, Washington, DC, USA, 44--54.
[10]
Intel Corporation. 2012. Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring Guide. tech. rep. (March 2012).
[11]
Intel Corporation. 2018. Intel® 64 and IA-32 architectures software developer's manual combined volumes: 1, 2A, 2B, 2C, 2D, 3A, 3B, 3C, 3D and 4.
[12]
Andreas De Blanche and Thomas Lundqvist. 2014. A methodology for estimating co-scheduling slowdowns due to memory bus contention on multicore nodes. In International conference on parallel and distributed computing and networks.
[13]
Andreas De Blanche and Thomas Lundqvist. 2015. Addressing characterization methods for memory contention aware co-scheduling. The Journal of Supercomputing 71, 4 (2015), 1451--1483.
[14]
Yves Durand, Paul M Carpenter, Stefano Adami, Angelos Bilas, Denis Dutoit, Alexis Farcy, Georgi Gaydadjiev, John Goodacre, Manolis Katevenis, Manolis Marazakis, et al. 2014. Euroserver: Energy efficient node for European microservers. In 2014 17th Euromicro Conference on Digital System Design. IEEE, 206--213.
[15]
David Eklov, Nikos Nikoleris, David Black-Schaffer, and Erik Hagersten. 2011. Cache pirating: Measuring the curse of the shared cache. In 2011 International Conference on Parallel Processing. IEEE, 165--175.
[16]
David Eklöv, Nikos Nikoleris, David Black-Schaffer, and Erik Hagersten. 2013. Bandwidth Bandit: Quantitative Characterization of Memory Contention. In CGO 2013, 23-27 February, Shenzhen, China. IEEE Computer Society, 99--108.
[17]
Josué Feliu, Julio Sahuquillo, Salvador Petit, and Jose Duato. 2016. Perf&Fair: A Progress-Aware Scheduler to Enhance Performance and Fairness in SMT Multicores. IEEE Trans. Comput. 66, 5 (2016), 905--911.
[18]
Juncheng Gu, Youngmoon Lee, Yiwen Zhang, Mosharaf Chowdhury, and Kang G Shin. 2017. Efficient memory disaggregation with infiniswap. In 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 17). 649--667.
[19]
Vamsee Reddy Kommareddy, Amro Awad, Clayton Hughes, and Simon David Hammond. 2018. Exploring Allocation Policies in Disaggregated Non-Volatile Memories. In Proceedings of the Workshop on Memory Centric High Performance Computing. ACM, 58--66.
[20]
Kevin Lim, Jichuan Chang, Trevor Mudge, Parthasarathy Ranganathan, Steven K Reinhardt, and Thomas F Wenisch. 2009. Disaggregated memory for expansion and sharing in blade servers. In ACM SIGARCH Computer Architecture News, Vol. 37. ACM, 267--278.
[21]
Kevin Lim, Yoshio Turner, Jose Renato Santos, Alvin AuYoung, Jichuan Chang, Parthasarathy Ranganathan, and Thomas F Wenisch. 2012. System-level implications of disaggregated memory. In IEEE International Symposium on HighPerformance Comp Architecture. IEEE, 1--12.
[22]
Zoltan Majo and Thomas R Gross. 2011. Memory system performance in a NUMA multicore multiprocessor. In Proceedings of the 4th Annual International Conference on Systems and Storage. ACM, 12.
[23]
Jason Mars, Lingjia Tang, Robert Hundt, Kevin Skadron, and Mary Lou Soffa. 2011. Bubble-up: Increasing utilization in modern warehouse scale computers via sensible co-locations. In MICRO. ACM, 248--259.
[24]
John D. McCalpin. 1995. Memory Bandwidth and Machine Balance in Current High Performance Computers. IEEE Computer Society Technical Committee on Computer Architecture (TCCA) Newsletter (Dec. 1995), 19--25.
[25]
John D. McCalpin. 2019. SC16 Invited Talk: Memory Bandwidth and System Balance in HPC Systems. https://sites.utexas.edu/jdm4372/tag/stream-benchmark/. Accessed: 2019-09-18.
[26]
Daniel Molka, Daniel Hackenberg, and Robert Schöne. 2014. Main Memory and Cache Performance of Intel Sandy Bridge and AMD Bulldozer. In Proceedings of the workshop on Memory Systems Performance and Correctness. 1--10.
[27]
Daniel Molka, Robert Schöne, Daniel Hackenberg, and Wolfgang E Nagel. 2017. Detecting memory-boundedness with hardware performance counters. In Proceedings of the 8th ACM/SPEC on International Conference on Performance Engineering. ACM, 27--38.
[28]
Héctor Montaner, Federico Silla, Holger Froning, and José Duato. 2011. Memscale™: A scalable environment for databases. In 2011 IEEE International Conference on High Performance Computing and Communications. IEEE, 339--346.
[29]
Rajiv Nishtala, Paul Carpenter, and Xavier Martorell. 2019. Performance effects on HPC workloads of global memory capacity sharing. In MULTIPROG.
[30]
Antonios D Papaioannou, Reza Nejabati, and Dimitra Simeonidou. 2016. The benefits of a disaggregated data centre: A resource allocation approach. In 2016 IEEE Global Communications Conference (GLOBECOM). IEEE, 1--7.
[31]
EuroEXA project. 2009. H2020 project number 754337. https://euroexa.eu/ Accessed: 2019-10-16.
[32]
Milan Radulovic, Rommel Sánchez Verdejo, Paul Carpenter, Petar Radojkovic, Bruce Jacob, and Eduard Ayguadé. 2019. PROFET: Modeling System Performance and Energy Without Simulating the CPU. In Abstracts of the 2019 SIGMETRICS/Performance Joint International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS '19). ACM, New York, NY, USA, 71--72.
[33]
Pramod Subba Rao and George Porter. 2016. Is memory disaggregation feasible?: A case study with Spark SQL. In Proceedings of the 2016 Symposium on Architectures for Networking and Communications Systems. ACM, 75--80.
[34]
Alvise Rigo, Christian Pinto, Kevin Pouget, Daniel Raho, Denis Dutoit, PierreYves Martinez, Chris Doran, Luca Benini, Iakovos Mavroidis, Manolis Marazakis, et al. 2017. Paving the way towards a highly energy-efficient and highly integrated compute node for the Exascale revolution: the ExaNoDe approach. In 2017 Euromicro Conference on Digital System Design (DSD). IEEE, 486--493.
[35]
Christos Sakalis, Carl Leonardsson, Stefanos Kaxiras, and Alberto Ros. 2016. Splash-3: A properly synchronized benchmark suite for contemporary research. In ISPASS. IEEE, 101--111.
[36]
A Saljoghei, V Mishra, M Bielski, I Syrigos, K Katrinis, D Syrivelis, A Reale, DN Pnevmatikatos, D Theodoropoulos, M Enrico, et al. 2018. dReDbox: Demonstrating Disaggregated Memory in an Optical Data Centre. In 2018 Optical Fiber Communications Conference and Exposition (OFC). IEEE, 1--3.
[37]
Yizhou Shan, Yutong Huang, Yilun Chen, and Yiying Zhang. 2018. Legoos: A disseminated, distributed OS for hardware resource disaggregation. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18). 69--87.
[38]
Lavanya Subramanian, Vivek Seshadri, Arnab Ghosh, Samira Khan, and Onur Mutlu. 2015. The application slowdown model: Quantifying and controlling the impact of inter-application interference at shared caches and main memory. In Proceedings of the 48th International Symposium on Microarchitecture. ACM, 62--75.
[39]
Lavanya Subramanian, Vivek Seshadri, Yoongu Kim, Ben Jaiyen, and Onur Mutlu. 2013. MISE: Providing performance predictability and improving fairness in shared main memory systems. In 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA). IEEE, 639--650.
[40]
Dimitris Syrivelis, Andrea Reale, Kostas Katrinis, and Christian Pinto. 2018. A Software-defined SoC Memory Bus Bridge Architecture for Disaggregated Computing. In Proceedings of the 3rd International Workshop on Advanced Interconnect Solutions and Technologies for Emerging Computing Systems. ACM, 3.
[41]
Dimitris Syrivelis, Andrea Reale, Kostas Katrinis, Ilias Syrigos, Maciej Bielski, Dimitris Theodoropoulos, Dionisios N Pnevmatikatos, and Georgios Zervas. 2017. A software-defined architecture and prototype for disaggregated memory rack scale systems. In 2017 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS). IEEE, 300--307.
[42]
Carlos Vega, Jose Fernando Zazo, Hugo Meyer, Ferad Zyulkyarov, Sergio López-Buedo, and Javier Aracil. 2017. Diluting the Scalability Boundaries: Exploring the Use of Disaggregated Architectures for High-Level Network Data Analysis. In 2017 IEEE 19th International Conference on High Performance Computing and Communications; IEEE 15th International Conference on Smart City; IEEE 3rd International Conference on Data Science and Systems (HPCC/SmartCity/DSS). IEEE, 340--347.
[43]
Dongliang Xiong, Kai Huang, Xiaowen Jiang, and Xiaolang Yan. 2017. Providing Predictable Performance via a Slowdown Estimation Model. ACM Transactions on Architecture and Code Optimization (TACO) 14, 3 (2017), 25.
[44]
F.V. Zacarias, V. Petrucci, R. Nishtala, P. Carpenter, and D. Mossé. 2019. Intelligent Colocation of Workloads for Enhanced Server Efficiency. In 2019 28th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD).
[45]
Georgios Zervas, Hui Yuan, Arsalan Saljoghei, Qianqiao Chen, and Vaibhawa Mishra. 2018. Optically disaggregated data centers with minimal remote memory latency: technologies, architectures, and resource allocation. Journal of Optical Communications and Networking 10, 2 (2018), A270--A285.
[46]
Jiacheng Zhao, Huimin Cui, Jingling Xue, and Xiaobing Feng. 2015. Predicting cross-core performance interference on multicore processors with regression analysis. IEEE Transactions on Parallel and Distributed Systems 27, 5 (2015), 1443--1456.
[47]
Jiacheng Zhao, Huimin Cui, Jingling Xue, Xiaobing Feng, Youliang Yan, and Wensen Yang. 2013. An empirical model for predicting cross-core performance interference on multicore processors. In Proceedings of the 22nd international conference on Parallel architectures and compilation techniques. IEEE Press, 201--212.
[48]
Darko Zivanovic, Milan Pavlovic, Milan Radulovic, Hyunsung Shin, Jongpil Son, Sally A. Mckee, Paul M. Carpenter, Petar Radojković, and Eduard Ayguadé. 2017. Main Memory in HPC: Do We Need More or Could We Live with Less? ACM Trans. Archit. Code Optim. 14, 1, Article 3 (March 2017), 26 pages.

Cited By

View all
  • (2024)Analysis and prediction of performance variability in large-scale computing systemsThe Journal of Supercomputing10.1007/s11227-024-06040-wOnline publication date: 28-Mar-2024
  • (2023)Dynamic Memory Provisioning on Disaggregated HPC SystemsProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624174(973-982)Online publication date: 12-Nov-2023
  • (2023)A Quantitative Approach for Adopting Disaggregated Memory in HPC SystemsProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607108(1-14)Online publication date: 12-Nov-2023
  • Show More Cited By

Index Terms

  1. Contention-aware application performance prediction for disaggregated memory systems

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CF '20: Proceedings of the 17th ACM International Conference on Computing Frontiers
    May 2020
    298 pages
    ISBN:9781450379564
    DOI:10.1145/3387902
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 23 May 2020

    Permissions

    Request permissions for this article.

    Check for updates

    Badges

    Author Tags

    1. memory bandwidth
    2. memory subsystem
    3. performance counters
    4. performance degradation
    5. performance prediction

    Qualifiers

    • Research-article

    Funding Sources

    • European Union's Horizon 2020 Framework Programme

    Conference

    CF '20
    Sponsor:
    CF '20: Computing Frontiers Conference
    May 11 - 13, 2020
    Sicily, Catania, Italy

    Acceptance Rates

    Overall Acceptance Rate 273 of 785 submissions, 35%

    Upcoming Conference

    CF '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)55
    • Downloads (Last 6 weeks)4
    Reflects downloads up to 24 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Analysis and prediction of performance variability in large-scale computing systemsThe Journal of Supercomputing10.1007/s11227-024-06040-wOnline publication date: 28-Mar-2024
    • (2023)Dynamic Memory Provisioning on Disaggregated HPC SystemsProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624174(973-982)Online publication date: 12-Nov-2023
    • (2023)A Quantitative Approach for Adopting Disaggregated Memory in HPC SystemsProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607108(1-14)Online publication date: 12-Nov-2023
    • (2023)On the Implications of Heterogeneous Memory Tiering on Spark In-Memory Analytics2023 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW59300.2023.00157(945-952)Online publication date: May-2023
    • (2023)Adrias: Interference-Aware Memory Orchestration for Disaggregated Cloud Infrastructures2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10070939(855-869)Online publication date: Feb-2023
    • (2021)Memory Demands in Disaggregated HPC: How Accurate Do We Need to Be?2021 International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS)10.1109/PMBS54543.2021.00006(1-6)Online publication date: Nov-2021
    • (2021)Improving HPC System Throughput and Response Time using Memory Disaggregation2021 IEEE 27th International Conference on Parallel and Distributed Systems (ICPADS)10.1109/ICPADS53394.2021.00041(283-290)Online publication date: Dec-2021

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media