More Web Proxy on the site http://driver.im/

research-article

Hypart: a hybrid technique for practical memory bandwidth partitioning on commodity servers

Authors:

Seongbeom Park,

Myeonggyun Han,

Woongki BaekAuthors Info & Claims

PACT '18: Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques

Article No.: 5, Pages 1 - 14

https://doi.org/10.1145/3243176.3243211

Published: 01 November 2018 Publication History

Abstract

Memory bandwidth is a highly performance-critical shared resource on modern computer systems. To prevent the contention on memory bandwidth among the collocated workloads, prior works have investigated memory bandwidth partitioning techniques. Despite the extensive prior works, it still remains unexplored to characterize the widely-used memory bandwidth partitioning techniques based on various metrics and investigate a hybrid technique that employs multiple memory bandwidth partitioning techniques to improve the overall efficiency.

To bridge this gap, we first present the in-depth characterization of the three widely-used memory bandwidth partitioning techniques (i.e., thread packing, clock modulation, and Intel's Memory Bandwidth Allocation (MBA)) in terms of dynamic range, granularity, and efficiency. Guided by the characterization results, we propose HyPart, a hybrid technique for practical memory bandwidth partitioning on commodity servers. HyPart composes the three memory bandwidth partitioning techniques in a constructive manner and dynamically performs optimizations based on the application characteristics without requiring any offline profiling. Our experimental results demonstrate the effectiveness of HyPart in that it provides a wider dynamic range and finer-grain control of memory bandwidth and achieves significantly higher efficiency than the conventional memory bandwidth partitioning techniques.

References

[1]

Intel 64 and IA-32 Architectures Software Developer's Manual.

[2]

Intel Performance Counter Monitor. https://software.intel.com/en-us/articles/intel-performance-counter-monitor.

[3]

Intel® Resource Director Technology in Linux. https://01.org/intel-rdt-linux/blogs/fyu1/2017/resource-allocation-intel-resource-director-technology.

[4]

perf: Linux profiling with performance counters. https://perf.wiki.kernel.org/. https://perf.wiki.kernel.org/index.php/Main_Page

[5]

Orna Agmon Ben-Yehuda, Eyal Posener, Muli Ben-Yehuda, Assaf Schuster, and Ahuva Mu'alem. 2014. Ginseng: Market-driven Memory Allocation. In Proceedings of the 10th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE '14). ACM, New York, NY, USA, 41--52.

Digital Library

[6]

S. R. Alam, R. F. Barrett, J. A. Kuehn, P. C. Roth, and J. S. Vetter. 2006. Characterization of Scientific Workloads on Systems with Multi-Core Processors. In 2006 IEEE International Symposium on Workload Characterization. 225--236.

[7]

D. H. Bailey, E. Barszcz, J. T. Barton, D. S. Browning, R. L. Carter, L. Dagum, R. A. Fatoohi, P. O. Frederickson, T. A. Lasinski, R. S. Schreiber, H. D. Simon, V. Venkatakrishnan, and S. K. Weeratunga. 1991. The NAS Parallel Benchmarks---Summary and Preliminary Results. In Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91). ACM, New York, NY, USA, 158--165.

Digital Library

[8]

Christian Bienia, Sanjeev Kumar, Jaswinder Pal Singh, and Kai Li. 2008. The PARSEC Benchmark Suite: Characterization and Architectural Implications. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques (PACT '08). ACM, New York, NY, USA, 72--81.

Digital Library

[9]

P. Cicotti, A. Tiwari, and L. Carrington. 2014. Efficient speed (ES): Adaptive DVFS and clock modulation for energy efficiency. In 2014 IEEE International Conference on Cluster Computing (CLUSTER). 158--166.

[10]

Ryan Cochran, Can Hankendi, Ayse K. Coskun, and Sherief Reda. 2011. Pack & Cap: Adaptive DVFS and Thread Packing Under Power Caps. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-44). ACM, New York, NY, USA, 175--185.

Digital Library

[11]

Henry Cook, Miquel Moreto, Sarah Bird, Khanh Dao, David A. Patterson, and Krste Asanovic. 2013. A Hardware Evaluation of Cache Partitioning to Improve Utilization and Energy-efficiency While Preserving Responsiveness. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA '13). ACM, New York, NY, USA, 308--319.

Digital Library

[12]

N. El-Sayed, A. Mukkara, P. A. Tsai, H. Kasture, X. Ma, and D. Sanchez. 2018. KPart: A Hybrid Cache Partitioning-Sharing Technique for Commodity Multicores. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA). 104--117.

[13]

Michael Ferdman, Almutaz Adileh, Onur Kocberber, Stavros Volos, Mohammad Alisafaee, Djordje Jevdjic, Cansu Kaynak, Adrian Daniel Popescu, Anastasia Ailamaki, and Babak Falsafi. 2012. Clearing the Clouds: A Study of Emerging Scale-out Workloads on Modern Hardware. In Proceedings of the Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '12). ACM, New York, NY, USA, 37--48.

Digital Library

[14]

Liran Funaro, Orna Agmon Ben-Yehuda, and Assaf Schuster. 2016. Ginseng: Market-driven LLC Allocation. In Proceedings of the 2016 USENIX Conference on Usenix Annual Technical Conference (USENIX ATC '16). USENIX Association, Berkeley, CA, USA, 295--308. http://dl.acm.org/citation.cfm?id=3026959.3026987

Digital Library

[15]

Fred Glover and Manuel Laguna. 1997. Tabu Search. Kluwer Academic Publishers, Norwell, MA, USA.

Digital Library

[16]

M. Han, S. Yu, and W. Baek. 2018. Secure and Dynamic Core and Cache Partitioning for Safe and Efficient Server Consolidation. In 2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID). 311--320.

[17]

Andrew Herdrich, Ramesh Illikkal, Ravi Iyer, Don Newell, Vineet Chadha, and Jaideep Moses. 2009. Rate-based QoS Techniques for Cache/Memory in CMP Platforms. In Proceedings of the 23rd International Conference on Supercomputing (ICS '09). ACM, New York, NY, USA, 479--488.

Digital Library

[18]

A. Herdrich, E. Verplanke, P. Autee, R. Illikkal, C. Gianos, R. Singhal, and R. Iyer. 2016. Cache QoS: From concept to reality in the Intel<sup>®</sup> Xeon<sup>®</sup> processor E5-2600 v3 product family. In 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA) (HPCA '16). 657--668.

[19]

Henry Hoffmann, Martina Maggio, Marco D. Santambrogio, Alberto Leva, and Anant Agarwal. 2013. A Generalized Software Framework for Accurate and Efficient Management of Performance Goals. In Proceedings of the Eleventh ACM International Conference on Embedded Software (EMSOFT '13). IEEE Press, Piscataway, NJ, USA, Article 19, 10 pages. http://dl.acm.org/citation.cfm?id=2555754.2555773

Digital Library

[20]

D. R. Hower, H. W. Cain, and C. A. Waldspurger. 2017. PABST: Proportionally Allocated Bandwidth at the Source and Target. In 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA). 505--516.

[21]

Ravi Iyer, Li Zhao, Fei Guo, Ramesh Illikkal, Srihari Makineni, Don Newell, Yan Solihin, Lisa Hsu, and Steve Reinhardt. 2007. QoS Policies and Architecture for Cache/Memory in CMP Platforms. In Proceedings of the 2007 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS '07). ACM, New York, NY, USA, 25--36.

Digital Library

[22]

Jiang Lin, Qingda Lu, Xiaoning Ding, Zhao Zhang, Xiaodong Zhang, and P. Sadayappan. 2008. Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems. In 2008 IEEE 14th International Symposium on High Performance Computer Architecture (HPCA '08). 367--378.

[23]

F. Liu, X. Jiang, and Y. Solihin. 2010. Understanding how off-chip memory bandwidth partitioning in Chip Multiprocessors affects system performance. In HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture. 1--12.

[24]

Fang Liu and Yan Solihin. 2011. Studying the Impact of Hardware Prefetching and Bandwidth Partitioning in Chip-multiprocessors. In Proceedings of the ACM SIGMETRICS Joint International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS '11). ACM, New York, NY, USA, 37--48.

Digital Library

[25]

David Lo, Liqun Cheng, Rama Govindaraju, Parthasarathy Ranganathan, and Christos Kozyrakis. 2015. Heracles: Improving Resource Efficiency at Scale. In Proceedings of the 42Nd Annual International Symposium on Computer Architecture (ISCA '15). ACM, New York, NY, USA, 450--462.

Digital Library

[26]

John D. McCalpin. 1995. Memory Bandwidth and Machine Balance in Current High Performance Computers. IEEE Computer Society Technical Committee on Computer Architecture (TCCA) Newsletter (Dec. 1995), 19--25.

[27]

R. Nishtala, P. Carpenter, V. Petrucci, and X. Martorell. 2017. Hipster: Hybrid Task Manager for Latency-Critical Cloud Workloads. In 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA) (HPCA '17). 409--420.

[28]

T. Palit, Yongming Shen, and M. Ferdman. 2016. Demystifying cloud benchmarking. In 2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) (ISPASS '16). 122--132.

[29]

J. Park and W. Baek. 2018. Quantifying the Performance and Energy-Efficiency Impact of Hardware Transactional Memory on Scientific Applications on Large-Scale NUMA Systems. In 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[30]

Jinsu Park, Eunbi Cho, and Woongki Baek. 2016. RMC: An Integrated Runtime System for Adaptive Many-core Computing. In Proceedings of the 13th International Conference on Embedded Software (EMSOFT '16). ACM, New York, NY, USA, Article 13, 10 pages.

Digital Library

[31]

J. Park, M. Han, and W. Baek. 2016. Quantifying the performance impact of large pages on in-memory big-data workloads. In 2016 IEEE International Symposium on Workload Characterization (IISWC) (IISWC '16). 1--10.

[32]

V. Petrucci, M. A. Laurenzano, J. Doherty, Y. Zhang, D. MossÃl', J. Mars, and L. Tang. 2015. Octopus-Man: QoS-driven task management for heterogeneous multicores in warehouse-scale computers. In 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA) (HPCA '15). 246--258.

[33]

Moinuddin K. Qureshi and Yale N. Patt. 2006. Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 39). IEEE Computer Society, Washington, DC, USA, 423--432.

Digital Library

[34]

Daniel Sanchez and Christos Kozyrakis. 2011. Vantage: Scalable and Efficient Fine-grain Cache Partitioning. In Proceedings of the 38th Annual International Symposium on Computer Architecture (ISCA '11). ACM, New York, NY, USA, 57--68.

Digital Library

[35]

Robert Schöne, Thomas Ilsche, Mario Bielert, Daniel Molka, and Daniel Hackenberg. 2016. Software Controlled Clock Modulation for Energy Efficiency Optimization on Intel Processors. In Proceedings of the 4th International Workshop on Energy Efficient Supercomputing (E2SC '16). IEEE Press, Piscataway, NJ, USA, 69--76.

Digital Library

[36]

S. Seo, G. Jo, and J. Lee. 2011. Performance characterization of the NAS Parallel Benchmarks in OpenCL. In 2011 IEEE International Symposium on Workload Characterization (IISWC) (IISWC '11). 137--148.

Digital Library

[37]

Akbar Sharifi, Shekhar Srikantaiah, Asit K. Mishra, Mahmut Kandemir, and Chita R. Das. 2011. METE: Meeting End-to-end QoS in Multicores Through System-wide Resource Management. In Proceedings of the ACM SIGMETRICS Joint International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS '11). ACM, New York, NY, USA, 13--24.

Digital Library

[38]

Kshitij Sudan, Sadagopan Srinivasan, Rajeev Balasubramonian, and Ravi Iyer. 2012. Optimizing Datacenter Power with Memory System Levers for Guaranteed Quality-of-service. In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques (PACT '12). ACM, New York, NY, USA, 117--126.

Digital Library

[39]

L. Wang, J. Zhan, C. Luo, Y. Zhu, Q. Yang, Y. He, W. Gao, Z. Jia, Y. Shi, S. Zhang, C. Zheng, G. Lu, K. Zhan, X. Li, and B. Qiu. 2014. BigDataBench: A big data benchmark suite from internet services. In 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA) (HPCA '14). 488--499.

[40]

W. Wang, J. W. Davidson, and M. L. Soffa. 2016. Predicting the memory bandwidth and optimal core allocations for multi-threaded applications on large-scale NUMA machines. In 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA) (HPCA '16). 419--431.

[41]

X. Wang, S. Chen, J. Setter, and J. F. Martinez. 2017. SWAP: Effective Fine-Grain Management of Shared Last-Level Caches with Minimum Hardware Support. In 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA) (HPCA '17). 121--132.

[42]

Steven Cameron Woo, Moriyoshi Ohara, Evan Torrie, Jaswinder Pal Singh, and Anoop Gupta. 1995. The SPLASH-2 Programs: Characterization and Methodological Considerations. In Proceedings of the 22Nd Annual International Symposium on Computer Architecture (ISCA '95). ACM, New York, NY, USA, 24--36.

Digital Library

[43]

Haishan Zhu and Mattan Erez. 2016. Dirigent: Enforcing QoS for Latency-Critical Tasks on Shared Multicore Systems. In Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '16). ACM, New York, NY, USA, 33--47.

Digital Library

Cited By

Rubio JBilbao CSaez JPrieto-Matias M(2024)Exploiting Elasticity via OS-Runtime Cooperation to Improve CPU Utilization in Multicore Systems2024 32nd Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)10.1109/PDP62718.2024.00014(35-43)Online publication date: 20-Mar-2024
https://doi.org/10.1109/PDP62718.2024.00014
Saroliya UArima ELiu DSchulz M(2024)Reinforcement Learning-Driven Co-Scheduling and Diverse Resource Assignments on NUMA Systems2024 IEEE 42nd International Conference on Computer Design (ICCD)10.1109/ICCD63220.2024.00034(170-178)Online publication date: 18-Nov-2024
https://doi.org/10.1109/ICCD63220.2024.00034
Chen RPeng WLi YLiu XWang G(2023)Orchid: An Online Learning Based Resource Partitioning Framework for Job Colocation With Multiple ObjectivesIEEE Transactions on Computers10.1109/TC.2023.330395972:12(3443-3457)Online publication date: Dec-2023
https://doi.org/10.1109/TC.2023.3303959
Show More Cited By

Index Terms

Hypart: a hybrid technique for practical memory bandwidth partitioning on commodity servers
1. Software and its engineering
  1. Software organization and properties
    1. Contextual software domains
      1. Operating systems

Recommendations

Symmetry-Agnostic Coordinated Management of the Memory Hierarchy in Multicore Systems

In a multicore system, many applications share the last-level cache (LLC) and memory bandwidth. These resources need to be carefully managed in a coordinated way to maximize performance. DRAM is still the technology of choice in most systems. However, ...
Studying the impact of hardware prefetching and bandwidth partitioning in chip-multiprocessors
Performance evaluation review

Modern high performance microprocessors widely employ hardware prefetching technique to hide long memory access latency. While very useful, hardware prefetching tends to aggravate the bandwidth wall, a problem where system performance is increasingly ...
Studying the impact of hardware prefetching and bandwidth partitioning in chip-multiprocessors
SIGMETRICS '11: Proceedings of the ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems

Modern high performance microprocessors widely employ hardware prefetching technique to hide long memory access latency. While very useful, hardware prefetching tends to aggravate the bandwidth wall, a problem where system performance is increasingly ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

PACT '18: Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques

November 2018

494 pages

ISBN:9781450359863

DOI:10.1145/3243176

General Chair:
Skevos Evripidou
University of Cyprus, Cyprus
,
Program Chairs:
Per Stenström
Chalmers University of Technology, Sweden
,
Michael O'Boyle
University of Edinburgh, UK

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture

In-Cooperation

IFIP WG 10.3: IFIP WG 10.3
IEEE CS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 November 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Korea government (MSIP)

Conference

PACT '18

Sponsor:

SIGARCH

PACT '18: International conference on Parallel Architectures and Compilation Techniques

November 1 - 4, 2018

Limassol, Cyprus

Acceptance Rates

Overall Acceptance Rate 121 of 471 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

20
Total Citations
View Citations
702
Total Downloads

Downloads (Last 12 months)67
Downloads (Last 6 weeks)5

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Rubio JBilbao CSaez JPrieto-Matias M(2024)Exploiting Elasticity via OS-Runtime Cooperation to Improve CPU Utilization in Multicore Systems2024 32nd Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)10.1109/PDP62718.2024.00014(35-43)Online publication date: 20-Mar-2024
https://doi.org/10.1109/PDP62718.2024.00014
Saroliya UArima ELiu DSchulz M(2024)Reinforcement Learning-Driven Co-Scheduling and Diverse Resource Assignments on NUMA Systems2024 IEEE 42nd International Conference on Computer Design (ICCD)10.1109/ICCD63220.2024.00034(170-178)Online publication date: 18-Nov-2024
https://doi.org/10.1109/ICCD63220.2024.00034
Chen RPeng WLi YLiu XWang G(2023)Orchid: An Online Learning Based Resource Partitioning Framework for Job Colocation With Multiple ObjectivesIEEE Transactions on Computers10.1109/TC.2023.330395972:12(3443-3457)Online publication date: Dec-2023
https://doi.org/10.1109/TC.2023.3303959
Saroliya UArima ELiu DSchulz M(2023)Hierarchical Resource Partitioning on Modern GPUs: A Reinforcement Learning Approach2023 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER52292.2023.00023(185-196)Online publication date: 31-Oct-2023
https://doi.org/10.1109/CLUSTER52292.2023.00023
Han MPark EShin YOh DCho YBaek W(2023)COSMOS: Coordinated Management of Cores, Memory, and Compressed Memory Swap for QoS-Aware and Efficient Workload Consolidation for Memory-Intensive ApplicationsIEEE Access10.1109/ACCESS.2023.333668511(133199-133214)Online publication date: 2023
https://doi.org/10.1109/ACCESS.2023.3336685
Navarro-Torres AAlastruey-Benedé JIbáñez PViñals-Yúfera V(2023)BALANCER: bandwidth allocation and cache partitioning for multicore processorsThe Journal of Supercomputing10.1007/s11227-023-05070-079:9(10252-10276)Online publication date: 4-Feb-2023
https://doi.org/10.1007/s11227-023-05070-0
Chen RShi HWu JLi YLiu XWang G(2023)GCNPart: Interference-Aware Resource Partitioning Framework with Graph Convolutional Neural Networks and Deep Reinforcement LearningAlgorithms and Architectures for Parallel Processing10.1007/978-3-031-22677-9_30(568-589)Online publication date: 11-Jan-2023
https://doi.org/10.1007/978-3-031-22677-9_30
Wang KLi YWang CJia TChow KWen YDou YXu GHou CYao JZhang L(2022)Characterizing Job Microarchitectural Profiles at Scale: Dataset and AnalysisProceedings of the 51st International Conference on Parallel Processing10.1145/3545008.3545026(1-11)Online publication date: 29-Aug-2022
https://dl.acm.org/doi/10.1145/3545008.3545026
Li BPatel TSamsi SGadepally VTiwari DGavrilovska AAltınbüken DBinnig C(2022)MISOProceedings of the 13th Symposium on Cloud Computing10.1145/3542929.3563510(173-189)Online publication date: 7-Nov-2022
https://dl.acm.org/doi/10.1145/3542929.3563510
Li YWang XLiu HPu LTang SWang GLiu X(2022)Reinforcement Learning-Based Resource Partitioning for Improving Responsiveness in Cloud GamingIEEE Transactions on Computers10.1109/TC.2021.307087971:5(1049-1062)Online publication date: 1-May-2022
https://doi.org/10.1109/TC.2021.3070879
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten