More Web Proxy on the site http://driver.im/

research-article

A software memory partition approach for eliminating bank-level interference in multicore systems

Authors:

Chengyong WuAuthors Info & Claims

PACT '12: Proceedings of the 21st international conference on Parallel architectures and compilation techniques

Pages 367 - 376

https://doi.org/10.1145/2370816.2370869

Published: 19 September 2012 Publication History

Abstract

Main memory system is a shared resource in modern multicore machines, resulting in serious interference, which causes performance degradation in terms of throughput slowdown and unfairness. Numerous new memory scheduling algorithms have been proposed to address the interference problem. However, these algorithms usually employ complex scheduling logic and need hardware modification to memory controllers, as a result, industrial venders seem to have some hesitation in adopting them.

This paper presents a practical software approach to effectively eliminate the interference without hardware modification. The key idea is to modify the OS memory management subsystem to adopt a page-coloring based bank-level partition mechanism (BPM), which allocates specific DRAM banks to specific cores (threads). By using BPM, memory controllers can passively schedule memory requests in a core-cluster (or thread-cluster) way.

We implement BPM in Linux 2.6.32.15 kernel and evaluate BPM on 4-core and 8-core real machines by running randomly generated 20 multi-programmed workloads (each contains 4/8 benchmarks) and multi-threaded benchmark. Experimental results show that BPM can improve the overall system throughput by 4.7% on average (up to 8.6%), and reduce the maximum slowdown by 4.5% on average (up to 15.8%). Moreover, BPM also saves 5.2% of the energy consumption of memory system.

References

[1]

Hewlett-Packed Development Company. Perfmon project. http: //www.hpl.hp.com/research/linux/ perfmon.

[2]

Standard Performance Evaluation Corporation. http://www.spec.org/cpu2006/CINT2006/.

[3]

N. Aggarwal et al. Power Efficient DRAM Speculation. In HPCA-14, 2008.

[4]

R. Azimi, D. K. Tam, L. Soares, and M. Stumm. Enhancing Operating System Support for Multicore Processors by Using Hardware Performance Monitoring. In ACM SIGOPS Operating Systems Review 43(2): 56--65, 2009.

Digital Library

[5]

Y. Bao et al. HMTT: A Platform Independent Full-System Memory Trace Monitoring System. In SIGMETRICS-08, 2008

Digital Library

[6]

S. Beamer et al. Re-Architecting DRAM Memory Systems with Monolithically Integrated Silicon Photonics. In ISCA-37, 2010.

Digital Library

[7]

C. Bienia, S. Kumar, J. P. Singh, and K. Li. The PARSEC benchmark suite: Characterization and architectural implications. Technical Report TR-811-08, Princeton Univ., Jan. 2008.

Digital Library

[8]

S. Cho, and L. Jin. Managing Distributed, Shared L2 Caches through OS-Level page Allocation. In MICRO-39, 2006.

Digital Library

[9]

J. Carter, IBM Power Aware Systems. Personal Correspondence, 2011.

[10]

Z.Cui, Y.Zhu, Y.Bao and M.Chen. A Fine-grained Component-level power measurement method. In PMP,2011.

[11]

G. Dhiman, G. Marchetti, and T. Rosing. vGreen: a System for Energy Efficient Computing in Virtualized Environments. In Proceedings of International Symposium on Low Power Electronics and Design. In ISLPED-2009.

Digital Library

[12]

J. Demme et al, Rapid Identification of Architectural Bottlenecks via Precise Event Counting. In ISCA, 2011

Digital Library

[13]

G. E. Suh, S. Devadas, and L. Rudolph. A New Memory Monitoring Scheme for Memory-Aware Scheduling and Partitioning. In HPCA-8, 2002.

Digital Library

[14]

G. E. Suh, L. Rudolph, and S. Devadas. Dynamic partitioning of shared cache memory. In Journal of Supercomputing, 28(1), 2004.

Digital Library

[15]

I. Hur and C. Lin. Memory scheduling for modern microprocessors. ACM Transactions on Computer Systems, 25(4), December 2007.

Digital Library

[16]

R. Iyer et al, QoS policy and architecture for cache/memory in CMP platforms. In SIGMETRICS-07, 2007.

Digital Library

[17]

C. J. Lee et al. Improving memory bank-level parallelism in the presence of prefetching. In MICRO-42, 2009.

Digital Library

[18]

Y. Kim, M. Papamicheal and O. Mutlu. Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior. In MICRO-43, 2010.

Digital Library

[19]

Y . Kim et al. A TLAS: A scalable and high-performance scheduling algorithm for multiple memory controllers. In HPCA-16, 2010.

[20]

D. Kaseridis, J. Stuecheli, and L. K. John. Minimalist Open- page: A DRAM Page-mode Scheduling Policy for the many- core Era. In MICRO-44, 2011.

Digital Library

[21]

R. Knauerhase, P. Brett, B. Hohlt, T. Li, and S. Hahn. Using OS Observations to Improve Performance in Multicore Systems. In Micro-41, 2008.

Digital Library

[22]

G. L. Yuan et al. Complexity effective memory access scheduling for many-core accelerator architectures. In MICRO-42, 2009.

Digital Library

[23]

J. Lin, Q. Lu, X. Ding, Z. Zhang, X. Zhang, and P. Sadayappan. Gaining Insights into Multicore Cache Partitioning: Bridging the Gap between Simulation and Real Systems. In HPCA-14, 2008.

[24]

J. Liedtke, H. Haertig, and M. Hohmuth. OS-Controlled Cache Predictability for Real-Time Systems. In RTAS-3, 1997.

Digital Library

[25]

O. Mutlu and T. Moscibroda. Parallelism-aware batch scheduling: Enhancing both performance and fairness of shared DRAM systems. In ISCA-35, 2008.

Digital Library

[26]

T. Moscibroda and O. Mutlu. Memory performance attacks: Denial of memory service in multi-core systems. In USENIX Security, 2007.

Digital Library

[27]

O. Mutlu and T. Moscibroda. Stall-time fair memory access scheduling for chip multiprocessors. In MICRO-40, 2007.

Digital Library

[28]

W. Mi, X. Feng, J. Xue, and Y. Jia. Software-hardware cooperative DRAM bank partitioning for chip multiprocessors. In Proc. the 2010 IFIP Int'l Conf. Network and Parallel Computing (NPC), Sep. 2010.

Digital Library

[29]

C. Natarajan, B. Christenson, and F. Briggs. A Study of Performance Impact of Memory Controller Features in Multi- Processor Environment. In Proceedings of WMPI, 2004.

Digital Library

[30]

S. Prashanth et al. Reducing Memory Interference in Multicore Systems via Application-Aware Memory Channel Partitioning. In Micro-44, 2011.

[31]

M. K. Qureshi, and Y . N. Patt. Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. In MICRO-39, 2006.

Digital Library

[32]

M. K. Jeong, D. H. Yoon et al. Balancing DRAM Locality and Parallelism in Shared Memory CMP Systems. In HPCA- 18, 2012.

Digital Library

[33]

S. Rixner, W. J. Dally, U. J. Kapasi, P. R. Mattson, and J. D. Owens. Memory access scheduling. In ISCA-27, 2000.

Digital Library

[34]

B. Rogers et al. Scaling the Bandwidth Wall: Challenges in and Avenues for CMP Scaling. In ISCA-42, 2009.

Digital Library

[35]

K. Sudan, N. Chatterjee, D. Nellans, M. Awasthi, R. Balasubramonian, and A. Davis. Micro-Pages: Increasing DRAM Efficiency with Locality-Aware. In ASPLOS-2010.

Digital Library

[36]

H. S. Stone, J. Turek, and J. L. Wolf. Optimal Partitioning of Cache Memory. In IEEE Transactions on Computers, 41(9), 1992.

Digital Library

[37]

A. Udipi et al. Rethinking DRAM design and organization for energy-constrained multi-cores. ISCA, June 2010.

Digital Library

[38]

Z. Zhang, Z. Zhu, and X. Zhang. A permutation-based page interleaving scheme to reduce row-buffer conflicts and exploit data locality. In MICRO-33, 2000.

Digital Library

[39]

S. Zhuravlev, S. Blagodurov, and A. Fedorova. Addressing shared resource contention in multicore processors via scheduling. In ASPLOS-XV, 2010.

Digital Library

Cited By

Gomez IDíaz de Cerio UParra JRivas JGutiérrez J(2023)Uso de GPUs en aplicaciones de tiempo real: Una revisión de técnicas para el análisis y optimización de parámetros temporalesRevista Iberoamericana de Automática e Informática industrial10.4995/riai.2023.2032121:1(1-16)Online publication date: 7-Nov-2023
https://doi.org/10.4995/riai.2023.20321
Seals EBechtel MYun H(2023)BandWatch: A System-Wide Memory Bandwidth Regulation System for Heterogeneous Multicore2023 IEEE 29th International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA)10.1109/RTCSA58653.2023.00014(38-46)Online publication date: 30-Aug-2023
https://doi.org/10.1109/RTCSA58653.2023.00014
Yang HXu SChen YLiu GZhou RZhou QLi K(2023)A shared libraries aware and bank partitioning-based mechanism for multicore architectureSoft Computing10.1007/s00500-023-08020-327:13(8775-8787)Online publication date: 24-Apr-2023
https://doi.org/10.1007/s00500-023-08020-3
Show More Cited By

Index Terms

A software memory partition approach for eliminating bank-level interference in multicore systems

Recommendations

Reducing memory interference in multicore systems via application-aware memory channel partitioning
MICRO-44: Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture

Main memory is a major shared resource among cores in a multicore system. If the interference between different applications' memory requests is not controlled effectively, system performance can degrade significantly. Previous work aimed to mitigate ...
BPM/BPM+: Software-based dynamic memory partitioning mechanisms for mitigating DRAM bank-/channel-level interferences in multicore systems

The main memory system is a shared resource in modern multicore machines that can result in serious interference leading to reduced throughput and unfairness. Many new memory scheduling mechanisms have been proposed to address the interference problem. ...
Harvesting Row-Buffer Hits via Orchestrated Last-Level Cache and DRAM Scheduling for Heterogeneous Multicore Systems

In heterogeneous multicore systems, the memory subsystem, including the last-level cache and DRAM, is widely shared among the CPU, the GPU, and the real-time cores. Due to their distinct memory traffic patterns, heterogeneous cores result in more ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

PACT '12: Proceedings of the 21st international conference on Parallel architectures and compilation techniques

September 2012

512 pages

ISBN:9781450311823

DOI:10.1145/2370816

General Chairs:
Pen-Chung Yew
University of Minnesota
,
Sangyeun Cho
University of Pittsburgh
,
Program Chairs:
Luiz DeRose
Cray, Inc.
,
David J. Lilja
University of Minnesota

Copyright © 2012 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

IFIP WG 10.3: IFIP WG 10.3
SIGARCH: ACM Special Interest Group on Computer Architecture
IEEE CS TCPP: IEEE Computer Society Technical Committee on Parallel Processing
IEEE CS TCAA: IEEE CS technical committee on architectural acoustics

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 September 2012

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

PACT '12

Sponsor:

IFIP WG 10.3
SIGARCH
IEEE CS TCPP
IEEE CS TCAA

PACT '12: International Conference on Parallel Architectures and Compilation Techniques

September 19 - 23, 2012

Minnesota, Minneapolis, USA

Acceptance Rates

Overall Acceptance Rate 121 of 471 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

210
Total Citations
View Citations
1,049
Total Downloads

Downloads (Last 12 months)45
Downloads (Last 6 weeks)12

Reflects downloads up to 14 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Gomez IDíaz de Cerio UParra JRivas JGutiérrez J(2023)Uso de GPUs en aplicaciones de tiempo real: Una revisión de técnicas para el análisis y optimización de parámetros temporalesRevista Iberoamericana de Automática e Informática industrial10.4995/riai.2023.2032121:1(1-16)Online publication date: 7-Nov-2023
https://doi.org/10.4995/riai.2023.20321
Seals EBechtel MYun H(2023)BandWatch: A System-Wide Memory Bandwidth Regulation System for Heterogeneous Multicore2023 IEEE 29th International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA)10.1109/RTCSA58653.2023.00014(38-46)Online publication date: 30-Aug-2023
https://doi.org/10.1109/RTCSA58653.2023.00014
Yang HXu SChen YLiu GZhou RZhou QLi K(2023)A shared libraries aware and bank partitioning-based mechanism for multicore architectureSoft Computing10.1007/s00500-023-08020-327:13(8775-8787)Online publication date: 24-Apr-2023
https://doi.org/10.1007/s00500-023-08020-3
Zhai JJin YChen WZheng WZhai JJin YChen WZheng W(2023)Informed Memory Access MonitoringPerformance Analysis of Parallel Applications for HPC10.1007/978-981-99-4366-1_4(73-97)Online publication date: 19-Jun-2023
https://doi.org/10.1007/978-981-99-4366-1_4
Zhang CWang SYu ZWang HXu YCai LTang DSun NBao Y(2022) A Labeled Architecture for Low-Entropy Clouds: Theory, Practice, and Lessons Intelligent Computing10.34133/2022/97954762022Online publication date: Jan-2022
https://doi.org/10.34133/2022/9795476
Zhang JSwift MLi JFalsafi BFerdman MLu SWenisch T(2022)Software-defined address mapping: a case on 3D memoryProceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3503222.3507774(70-83)Online publication date: 28-Feb-2022
https://dl.acm.org/doi/10.1145/3503222.3507774
Lugo TLozano SFernandez JCarretero J(2022)A Survey of Techniques for Reducing Interference in Real-Time Applications on Multicore PlatformsIEEE Access10.1109/ACCESS.2022.315189110(21853-21882)Online publication date: 2022
https://doi.org/10.1109/ACCESS.2022.3151891
Wei RLi CChen CSun GHe M(2021)Memory Access Optimization of a Neural Network Accelerator Based on Memory ControllerElectronics10.3390/electronics1004043810:4(438)Online publication date: 10-Feb-2021
https://doi.org/10.3390/electronics10040438
Bechtel MYun H(2021)Memory-Aware Denial-of-Service Attacks on Shared Cache in Multicore Real-Time SystemsIEEE Transactions on Computers10.1109/TC.2021.3108044(1-1)Online publication date: 2021
https://doi.org/10.1109/TC.2021.3108044
Pan XMueller F(2021)NUMA-aware memory coloring for multicore real-time systemsJournal of Systems Architecture10.1016/j.sysarc.2021.102188118(102188)Online publication date: Sep-2021
https://doi.org/10.1016/j.sysarc.2021.102188
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents