[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3243176.3243211acmconferencesArticle/Chapter ViewAbstractPublication PagespactConference Proceedingsconference-collections
research-article

Hypart: a hybrid technique for practical memory bandwidth partitioning on commodity servers

Published: 01 November 2018 Publication History

Abstract

Memory bandwidth is a highly performance-critical shared resource on modern computer systems. To prevent the contention on memory bandwidth among the collocated workloads, prior works have investigated memory bandwidth partitioning techniques. Despite the extensive prior works, it still remains unexplored to characterize the widely-used memory bandwidth partitioning techniques based on various metrics and investigate a hybrid technique that employs multiple memory bandwidth partitioning techniques to improve the overall efficiency.
To bridge this gap, we first present the in-depth characterization of the three widely-used memory bandwidth partitioning techniques (i.e., thread packing, clock modulation, and Intel's Memory Bandwidth Allocation (MBA)) in terms of dynamic range, granularity, and efficiency. Guided by the characterization results, we propose HyPart, a hybrid technique for practical memory bandwidth partitioning on commodity servers. HyPart composes the three memory bandwidth partitioning techniques in a constructive manner and dynamically performs optimizations based on the application characteristics without requiring any offline profiling. Our experimental results demonstrate the effectiveness of HyPart in that it provides a wider dynamic range and finer-grain control of memory bandwidth and achieves significantly higher efficiency than the conventional memory bandwidth partitioning techniques.

References

[1]
Intel 64 and IA-32 Architectures Software Developer's Manual.
[2]
Intel Performance Counter Monitor. https://software.intel.com/en-us/articles/intel-performance-counter-monitor.
[3]
Intel® Resource Director Technology in Linux. https://01.org/intel-rdt-linux/blogs/fyu1/2017/resource-allocation-intel-resource-director-technology.
[4]
perf: Linux profiling with performance counters. https://perf.wiki.kernel.org/. https://perf.wiki.kernel.org/index.php/Main_Page
[5]
Orna Agmon Ben-Yehuda, Eyal Posener, Muli Ben-Yehuda, Assaf Schuster, and Ahuva Mu'alem. 2014. Ginseng: Market-driven Memory Allocation. In Proceedings of the 10th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE '14). ACM, New York, NY, USA, 41--52.
[6]
S. R. Alam, R. F. Barrett, J. A. Kuehn, P. C. Roth, and J. S. Vetter. 2006. Characterization of Scientific Workloads on Systems with Multi-Core Processors. In 2006 IEEE International Symposium on Workload Characterization. 225--236.
[7]
D. H. Bailey, E. Barszcz, J. T. Barton, D. S. Browning, R. L. Carter, L. Dagum, R. A. Fatoohi, P. O. Frederickson, T. A. Lasinski, R. S. Schreiber, H. D. Simon, V. Venkatakrishnan, and S. K. Weeratunga. 1991. The NAS Parallel Benchmarks---Summary and Preliminary Results. In Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91). ACM, New York, NY, USA, 158--165.
[8]
Christian Bienia, Sanjeev Kumar, Jaswinder Pal Singh, and Kai Li. 2008. The PARSEC Benchmark Suite: Characterization and Architectural Implications. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques (PACT '08). ACM, New York, NY, USA, 72--81.
[9]
P. Cicotti, A. Tiwari, and L. Carrington. 2014. Efficient speed (ES): Adaptive DVFS and clock modulation for energy efficiency. In 2014 IEEE International Conference on Cluster Computing (CLUSTER). 158--166.
[10]
Ryan Cochran, Can Hankendi, Ayse K. Coskun, and Sherief Reda. 2011. Pack & Cap: Adaptive DVFS and Thread Packing Under Power Caps. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-44). ACM, New York, NY, USA, 175--185.
[11]
Henry Cook, Miquel Moreto, Sarah Bird, Khanh Dao, David A. Patterson, and Krste Asanovic. 2013. A Hardware Evaluation of Cache Partitioning to Improve Utilization and Energy-efficiency While Preserving Responsiveness. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA '13). ACM, New York, NY, USA, 308--319.
[12]
N. El-Sayed, A. Mukkara, P. A. Tsai, H. Kasture, X. Ma, and D. Sanchez. 2018. KPart: A Hybrid Cache Partitioning-Sharing Technique for Commodity Multicores. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA). 104--117.
[13]
Michael Ferdman, Almutaz Adileh, Onur Kocberber, Stavros Volos, Mohammad Alisafaee, Djordje Jevdjic, Cansu Kaynak, Adrian Daniel Popescu, Anastasia Ailamaki, and Babak Falsafi. 2012. Clearing the Clouds: A Study of Emerging Scale-out Workloads on Modern Hardware. In Proceedings of the Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '12). ACM, New York, NY, USA, 37--48.
[14]
Liran Funaro, Orna Agmon Ben-Yehuda, and Assaf Schuster. 2016. Ginseng: Market-driven LLC Allocation. In Proceedings of the 2016 USENIX Conference on Usenix Annual Technical Conference (USENIX ATC '16). USENIX Association, Berkeley, CA, USA, 295--308. http://dl.acm.org/citation.cfm?id=3026959.3026987
[15]
Fred Glover and Manuel Laguna. 1997. Tabu Search. Kluwer Academic Publishers, Norwell, MA, USA.
[16]
M. Han, S. Yu, and W. Baek. 2018. Secure and Dynamic Core and Cache Partitioning for Safe and Efficient Server Consolidation. In 2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID). 311--320.
[17]
Andrew Herdrich, Ramesh Illikkal, Ravi Iyer, Don Newell, Vineet Chadha, and Jaideep Moses. 2009. Rate-based QoS Techniques for Cache/Memory in CMP Platforms. In Proceedings of the 23rd International Conference on Supercomputing (ICS '09). ACM, New York, NY, USA, 479--488.
[18]
A. Herdrich, E. Verplanke, P. Autee, R. Illikkal, C. Gianos, R. Singhal, and R. Iyer. 2016. Cache QoS: From concept to reality in the Intel<sup>®</sup> Xeon<sup>®</sup> processor E5-2600 v3 product family. In 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA) (HPCA '16). 657--668.
[19]
Henry Hoffmann, Martina Maggio, Marco D. Santambrogio, Alberto Leva, and Anant Agarwal. 2013. A Generalized Software Framework for Accurate and Efficient Management of Performance Goals. In Proceedings of the Eleventh ACM International Conference on Embedded Software (EMSOFT '13). IEEE Press, Piscataway, NJ, USA, Article 19, 10 pages. http://dl.acm.org/citation.cfm?id=2555754.2555773
[20]
D. R. Hower, H. W. Cain, and C. A. Waldspurger. 2017. PABST: Proportionally Allocated Bandwidth at the Source and Target. In 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA). 505--516.
[21]
Ravi Iyer, Li Zhao, Fei Guo, Ramesh Illikkal, Srihari Makineni, Don Newell, Yan Solihin, Lisa Hsu, and Steve Reinhardt. 2007. QoS Policies and Architecture for Cache/Memory in CMP Platforms. In Proceedings of the 2007 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS '07). ACM, New York, NY, USA, 25--36.
[22]
Jiang Lin, Qingda Lu, Xiaoning Ding, Zhao Zhang, Xiaodong Zhang, and P. Sadayappan. 2008. Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems. In 2008 IEEE 14th International Symposium on High Performance Computer Architecture (HPCA '08). 367--378.
[23]
F. Liu, X. Jiang, and Y. Solihin. 2010. Understanding how off-chip memory bandwidth partitioning in Chip Multiprocessors affects system performance. In HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture. 1--12.
[24]
Fang Liu and Yan Solihin. 2011. Studying the Impact of Hardware Prefetching and Bandwidth Partitioning in Chip-multiprocessors. In Proceedings of the ACM SIGMETRICS Joint International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS '11). ACM, New York, NY, USA, 37--48.
[25]
David Lo, Liqun Cheng, Rama Govindaraju, Parthasarathy Ranganathan, and Christos Kozyrakis. 2015. Heracles: Improving Resource Efficiency at Scale. In Proceedings of the 42Nd Annual International Symposium on Computer Architecture (ISCA '15). ACM, New York, NY, USA, 450--462.
[26]
John D. McCalpin. 1995. Memory Bandwidth and Machine Balance in Current High Performance Computers. IEEE Computer Society Technical Committee on Computer Architecture (TCCA) Newsletter (Dec. 1995), 19--25.
[27]
R. Nishtala, P. Carpenter, V. Petrucci, and X. Martorell. 2017. Hipster: Hybrid Task Manager for Latency-Critical Cloud Workloads. In 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA) (HPCA '17). 409--420.
[28]
T. Palit, Yongming Shen, and M. Ferdman. 2016. Demystifying cloud benchmarking. In 2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) (ISPASS '16). 122--132.
[29]
J. Park and W. Baek. 2018. Quantifying the Performance and Energy-Efficiency Impact of Hardware Transactional Memory on Scientific Applications on Large-Scale NUMA Systems. In 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[30]
Jinsu Park, Eunbi Cho, and Woongki Baek. 2016. RMC: An Integrated Runtime System for Adaptive Many-core Computing. In Proceedings of the 13th International Conference on Embedded Software (EMSOFT '16). ACM, New York, NY, USA, Article 13, 10 pages.
[31]
J. Park, M. Han, and W. Baek. 2016. Quantifying the performance impact of large pages on in-memory big-data workloads. In 2016 IEEE International Symposium on Workload Characterization (IISWC) (IISWC '16). 1--10.
[32]
V. Petrucci, M. A. Laurenzano, J. Doherty, Y. Zhang, D. MossÃl', J. Mars, and L. Tang. 2015. Octopus-Man: QoS-driven task management for heterogeneous multicores in warehouse-scale computers. In 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA) (HPCA '15). 246--258.
[33]
Moinuddin K. Qureshi and Yale N. Patt. 2006. Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 39). IEEE Computer Society, Washington, DC, USA, 423--432.
[34]
Daniel Sanchez and Christos Kozyrakis. 2011. Vantage: Scalable and Efficient Fine-grain Cache Partitioning. In Proceedings of the 38th Annual International Symposium on Computer Architecture (ISCA '11). ACM, New York, NY, USA, 57--68.
[35]
Robert Schöne, Thomas Ilsche, Mario Bielert, Daniel Molka, and Daniel Hackenberg. 2016. Software Controlled Clock Modulation for Energy Efficiency Optimization on Intel Processors. In Proceedings of the 4th International Workshop on Energy Efficient Supercomputing (E2SC '16). IEEE Press, Piscataway, NJ, USA, 69--76.
[36]
S. Seo, G. Jo, and J. Lee. 2011. Performance characterization of the NAS Parallel Benchmarks in OpenCL. In 2011 IEEE International Symposium on Workload Characterization (IISWC) (IISWC '11). 137--148.
[37]
Akbar Sharifi, Shekhar Srikantaiah, Asit K. Mishra, Mahmut Kandemir, and Chita R. Das. 2011. METE: Meeting End-to-end QoS in Multicores Through System-wide Resource Management. In Proceedings of the ACM SIGMETRICS Joint International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS '11). ACM, New York, NY, USA, 13--24.
[38]
Kshitij Sudan, Sadagopan Srinivasan, Rajeev Balasubramonian, and Ravi Iyer. 2012. Optimizing Datacenter Power with Memory System Levers for Guaranteed Quality-of-service. In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques (PACT '12). ACM, New York, NY, USA, 117--126.
[39]
L. Wang, J. Zhan, C. Luo, Y. Zhu, Q. Yang, Y. He, W. Gao, Z. Jia, Y. Shi, S. Zhang, C. Zheng, G. Lu, K. Zhan, X. Li, and B. Qiu. 2014. BigDataBench: A big data benchmark suite from internet services. In 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA) (HPCA '14). 488--499.
[40]
W. Wang, J. W. Davidson, and M. L. Soffa. 2016. Predicting the memory bandwidth and optimal core allocations for multi-threaded applications on large-scale NUMA machines. In 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA) (HPCA '16). 419--431.
[41]
X. Wang, S. Chen, J. Setter, and J. F. Martinez. 2017. SWAP: Effective Fine-Grain Management of Shared Last-Level Caches with Minimum Hardware Support. In 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA) (HPCA '17). 121--132.
[42]
Steven Cameron Woo, Moriyoshi Ohara, Evan Torrie, Jaswinder Pal Singh, and Anoop Gupta. 1995. The SPLASH-2 Programs: Characterization and Methodological Considerations. In Proceedings of the 22Nd Annual International Symposium on Computer Architecture (ISCA '95). ACM, New York, NY, USA, 24--36.
[43]
Haishan Zhu and Mattan Erez. 2016. Dirigent: Enforcing QoS for Latency-Critical Tasks on Shared Multicore Systems. In Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '16). ACM, New York, NY, USA, 33--47.

Cited By

View all
  • (2024)Exploiting Elasticity via OS-Runtime Cooperation to Improve CPU Utilization in Multicore Systems2024 32nd Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)10.1109/PDP62718.2024.00014(35-43)Online publication date: 20-Mar-2024
  • (2024)Reinforcement Learning-Driven Co-Scheduling and Diverse Resource Assignments on NUMA Systems2024 IEEE 42nd International Conference on Computer Design (ICCD)10.1109/ICCD63220.2024.00034(170-178)Online publication date: 18-Nov-2024
  • (2023)Orchid: An Online Learning Based Resource Partitioning Framework for Job Colocation With Multiple ObjectivesIEEE Transactions on Computers10.1109/TC.2023.330395972:12(3443-3457)Online publication date: Dec-2023
  • Show More Cited By

Index Terms

  1. Hypart: a hybrid technique for practical memory bandwidth partitioning on commodity servers

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    PACT '18: Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques
    November 2018
    494 pages
    ISBN:9781450359863
    DOI:10.1145/3243176
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    In-Cooperation

    • IFIP WG 10.3: IFIP WG 10.3
    • IEEE CS

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 November 2018

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. hybrid technique
    2. memory bandwidth partitioning

    Qualifiers

    • Research-article

    Funding Sources

    • Korea government (MSIP)

    Conference

    PACT '18
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 121 of 471 submissions, 26%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)64
    • Downloads (Last 6 weeks)10
    Reflects downloads up to 03 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Exploiting Elasticity via OS-Runtime Cooperation to Improve CPU Utilization in Multicore Systems2024 32nd Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)10.1109/PDP62718.2024.00014(35-43)Online publication date: 20-Mar-2024
    • (2024)Reinforcement Learning-Driven Co-Scheduling and Diverse Resource Assignments on NUMA Systems2024 IEEE 42nd International Conference on Computer Design (ICCD)10.1109/ICCD63220.2024.00034(170-178)Online publication date: 18-Nov-2024
    • (2023)Orchid: An Online Learning Based Resource Partitioning Framework for Job Colocation With Multiple ObjectivesIEEE Transactions on Computers10.1109/TC.2023.330395972:12(3443-3457)Online publication date: Dec-2023
    • (2023)Hierarchical Resource Partitioning on Modern GPUs: A Reinforcement Learning Approach2023 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER52292.2023.00023(185-196)Online publication date: 31-Oct-2023
    • (2023)COSMOS: Coordinated Management of Cores, Memory, and Compressed Memory Swap for QoS-Aware and Efficient Workload Consolidation for Memory-Intensive ApplicationsIEEE Access10.1109/ACCESS.2023.333668511(133199-133214)Online publication date: 2023
    • (2023)BALANCER: bandwidth allocation and cache partitioning for multicore processorsThe Journal of Supercomputing10.1007/s11227-023-05070-079:9(10252-10276)Online publication date: 4-Feb-2023
    • (2023)GCNPart: Interference-Aware Resource Partitioning Framework with Graph Convolutional Neural Networks and Deep Reinforcement LearningAlgorithms and Architectures for Parallel Processing10.1007/978-3-031-22677-9_30(568-589)Online publication date: 11-Jan-2023
    • (2022)Characterizing Job Microarchitectural Profiles at Scale: Dataset and AnalysisProceedings of the 51st International Conference on Parallel Processing10.1145/3545008.3545026(1-11)Online publication date: 29-Aug-2022
    • (2022)MISOProceedings of the 13th Symposium on Cloud Computing10.1145/3542929.3563510(173-189)Online publication date: 7-Nov-2022
    • (2022)Reinforcement Learning-Based Resource Partitioning for Improving Responsiveness in Cloud GamingIEEE Transactions on Computers10.1109/TC.2021.307087971:5(1049-1062)Online publication date: 1-May-2022
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media