[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2370816.2370869acmconferencesArticle/Chapter ViewAbstractPublication PagespactConference Proceedingsconference-collections
research-article

A software memory partition approach for eliminating bank-level interference in multicore systems

Published: 19 September 2012 Publication History

Abstract

Main memory system is a shared resource in modern multicore machines, resulting in serious interference, which causes performance degradation in terms of throughput slowdown and unfairness. Numerous new memory scheduling algorithms have been proposed to address the interference problem. However, these algorithms usually employ complex scheduling logic and need hardware modification to memory controllers, as a result, industrial venders seem to have some hesitation in adopting them.
This paper presents a practical software approach to effectively eliminate the interference without hardware modification. The key idea is to modify the OS memory management subsystem to adopt a page-coloring based bank-level partition mechanism (BPM), which allocates specific DRAM banks to specific cores (threads). By using BPM, memory controllers can passively schedule memory requests in a core-cluster (or thread-cluster) way.
We implement BPM in Linux 2.6.32.15 kernel and evaluate BPM on 4-core and 8-core real machines by running randomly generated 20 multi-programmed workloads (each contains 4/8 benchmarks) and multi-threaded benchmark. Experimental results show that BPM can improve the overall system throughput by 4.7% on average (up to 8.6%), and reduce the maximum slowdown by 4.5% on average (up to 15.8%). Moreover, BPM also saves 5.2% of the energy consumption of memory system.

References

[1]
Hewlett-Packed Development Company. Perfmon project. http: //www.hpl.hp.com/research/linux/ perfmon.
[2]
Standard Performance Evaluation Corporation. http://www.spec.org/cpu2006/CINT2006/.
[3]
N. Aggarwal et al. Power Efficient DRAM Speculation. In HPCA-14, 2008.
[4]
R. Azimi, D. K. Tam, L. Soares, and M. Stumm. Enhancing Operating System Support for Multicore Processors by Using Hardware Performance Monitoring. In ACM SIGOPS Operating Systems Review 43(2): 56--65, 2009.
[5]
Y. Bao et al. HMTT: A Platform Independent Full-System Memory Trace Monitoring System. In SIGMETRICS-08, 2008
[6]
S. Beamer et al. Re-Architecting DRAM Memory Systems with Monolithically Integrated Silicon Photonics. In ISCA-37, 2010.
[7]
C. Bienia, S. Kumar, J. P. Singh, and K. Li. The PARSEC benchmark suite: Characterization and architectural implications. Technical Report TR-811-08, Princeton Univ., Jan. 2008.
[8]
S. Cho, and L. Jin. Managing Distributed, Shared L2 Caches through OS-Level page Allocation. In MICRO-39, 2006.
[9]
J. Carter, IBM Power Aware Systems. Personal Correspondence, 2011.
[10]
Z.Cui, Y.Zhu, Y.Bao and M.Chen. A Fine-grained Component-level power measurement method. In PMP,2011.
[11]
G. Dhiman, G. Marchetti, and T. Rosing. vGreen: a System for Energy Efficient Computing in Virtualized Environments. In Proceedings of International Symposium on Low Power Electronics and Design. In ISLPED-2009.
[12]
J. Demme et al, Rapid Identification of Architectural Bottlenecks via Precise Event Counting. In ISCA, 2011
[13]
G. E. Suh, S. Devadas, and L. Rudolph. A New Memory Monitoring Scheme for Memory-Aware Scheduling and Partitioning. In HPCA-8, 2002.
[14]
G. E. Suh, L. Rudolph, and S. Devadas. Dynamic partitioning of shared cache memory. In Journal of Supercomputing, 28(1), 2004.
[15]
I. Hur and C. Lin. Memory scheduling for modern microprocessors. ACM Transactions on Computer Systems, 25(4), December 2007.
[16]
R. Iyer et al, QoS policy and architecture for cache/memory in CMP platforms. In SIGMETRICS-07, 2007.
[17]
C. J. Lee et al. Improving memory bank-level parallelism in the presence of prefetching. In MICRO-42, 2009.
[18]
Y. Kim, M. Papamicheal and O. Mutlu. Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior. In MICRO-43, 2010.
[19]
Y . Kim et al. A TLAS: A scalable and high-performance scheduling algorithm for multiple memory controllers. In HPCA-16, 2010.
[20]
D. Kaseridis, J. Stuecheli, and L. K. John. Minimalist Open- page: A DRAM Page-mode Scheduling Policy for the many- core Era. In MICRO-44, 2011.
[21]
R. Knauerhase, P. Brett, B. Hohlt, T. Li, and S. Hahn. Using OS Observations to Improve Performance in Multicore Systems. In Micro-41, 2008.
[22]
G. L. Yuan et al. Complexity effective memory access scheduling for many-core accelerator architectures. In MICRO-42, 2009.
[23]
J. Lin, Q. Lu, X. Ding, Z. Zhang, X. Zhang, and P. Sadayappan. Gaining Insights into Multicore Cache Partitioning: Bridging the Gap between Simulation and Real Systems. In HPCA-14, 2008.
[24]
J. Liedtke, H. Haertig, and M. Hohmuth. OS-Controlled Cache Predictability for Real-Time Systems. In RTAS-3, 1997.
[25]
O. Mutlu and T. Moscibroda. Parallelism-aware batch scheduling: Enhancing both performance and fairness of shared DRAM systems. In ISCA-35, 2008.
[26]
T. Moscibroda and O. Mutlu. Memory performance attacks: Denial of memory service in multi-core systems. In USENIX Security, 2007.
[27]
O. Mutlu and T. Moscibroda. Stall-time fair memory access scheduling for chip multiprocessors. In MICRO-40, 2007.
[28]
W. Mi, X. Feng, J. Xue, and Y. Jia. Software-hardware cooperative DRAM bank partitioning for chip multiprocessors. In Proc. the 2010 IFIP Int'l Conf. Network and Parallel Computing (NPC), Sep. 2010.
[29]
C. Natarajan, B. Christenson, and F. Briggs. A Study of Performance Impact of Memory Controller Features in Multi- Processor Environment. In Proceedings of WMPI, 2004.
[30]
S. Prashanth et al. Reducing Memory Interference in Multicore Systems via Application-Aware Memory Channel Partitioning. In Micro-44, 2011.
[31]
M. K. Qureshi, and Y . N. Patt. Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. In MICRO-39, 2006.
[32]
M. K. Jeong, D. H. Yoon et al. Balancing DRAM Locality and Parallelism in Shared Memory CMP Systems. In HPCA- 18, 2012.
[33]
S. Rixner, W. J. Dally, U. J. Kapasi, P. R. Mattson, and J. D. Owens. Memory access scheduling. In ISCA-27, 2000.
[34]
B. Rogers et al. Scaling the Bandwidth Wall: Challenges in and Avenues for CMP Scaling. In ISCA-42, 2009.
[35]
K. Sudan, N. Chatterjee, D. Nellans, M. Awasthi, R. Balasubramonian, and A. Davis. Micro-Pages: Increasing DRAM Efficiency with Locality-Aware. In ASPLOS-2010.
[36]
H. S. Stone, J. Turek, and J. L. Wolf. Optimal Partitioning of Cache Memory. In IEEE Transactions on Computers, 41(9), 1992.
[37]
A. Udipi et al. Rethinking DRAM design and organization for energy-constrained multi-cores. ISCA, June 2010.
[38]
Z. Zhang, Z. Zhu, and X. Zhang. A permutation-based page interleaving scheme to reduce row-buffer conflicts and exploit data locality. In MICRO-33, 2000.
[39]
S. Zhuravlev, S. Blagodurov, and A. Fedorova. Addressing shared resource contention in multicore processors via scheduling. In ASPLOS-XV, 2010.

Cited By

View all
  • (2023)Uso de GPUs en aplicaciones de tiempo real: Una revisión de técnicas para el análisis y optimización de parámetros temporalesRevista Iberoamericana de Automática e Informática industrial10.4995/riai.2023.2032121:1(1-16)Online publication date: 7-Nov-2023
  • (2023)BandWatch: A System-Wide Memory Bandwidth Regulation System for Heterogeneous Multicore2023 IEEE 29th International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA)10.1109/RTCSA58653.2023.00014(38-46)Online publication date: 30-Aug-2023
  • (2023)A shared libraries aware and bank partitioning-based mechanism for multicore architectureSoft Computing10.1007/s00500-023-08020-327:13(8775-8787)Online publication date: 24-Apr-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
PACT '12: Proceedings of the 21st international conference on Parallel architectures and compilation techniques
September 2012
512 pages
ISBN:9781450311823
DOI:10.1145/2370816
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 September 2012

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. bank
  2. data allocation
  3. interference
  4. main memory
  5. memory scheduling
  6. multicore
  7. partition

Qualifiers

  • Research-article

Conference

PACT '12
Sponsor:
  • IFIP WG 10.3
  • SIGARCH
  • IEEE CS TCPP
  • IEEE CS TCAA

Acceptance Rates

Overall Acceptance Rate 121 of 471 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)45
  • Downloads (Last 6 weeks)12
Reflects downloads up to 14 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Uso de GPUs en aplicaciones de tiempo real: Una revisión de técnicas para el análisis y optimización de parámetros temporalesRevista Iberoamericana de Automática e Informática industrial10.4995/riai.2023.2032121:1(1-16)Online publication date: 7-Nov-2023
  • (2023)BandWatch: A System-Wide Memory Bandwidth Regulation System for Heterogeneous Multicore2023 IEEE 29th International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA)10.1109/RTCSA58653.2023.00014(38-46)Online publication date: 30-Aug-2023
  • (2023)A shared libraries aware and bank partitioning-based mechanism for multicore architectureSoft Computing10.1007/s00500-023-08020-327:13(8775-8787)Online publication date: 24-Apr-2023
  • (2023)Informed Memory Access MonitoringPerformance Analysis of Parallel Applications for HPC10.1007/978-981-99-4366-1_4(73-97)Online publication date: 19-Jun-2023
  • (2022) A Labeled Architecture for Low-Entropy Clouds: Theory, Practice, and Lessons Intelligent Computing10.34133/2022/97954762022Online publication date: Jan-2022
  • (2022)Software-defined address mapping: a case on 3D memoryProceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3503222.3507774(70-83)Online publication date: 28-Feb-2022
  • (2022)A Survey of Techniques for Reducing Interference in Real-Time Applications on Multicore PlatformsIEEE Access10.1109/ACCESS.2022.315189110(21853-21882)Online publication date: 2022
  • (2021)Memory Access Optimization of a Neural Network Accelerator Based on Memory ControllerElectronics10.3390/electronics1004043810:4(438)Online publication date: 10-Feb-2021
  • (2021)Memory-Aware Denial-of-Service Attacks on Shared Cache in Multicore Real-Time SystemsIEEE Transactions on Computers10.1109/TC.2021.3108044(1-1)Online publication date: 2021
  • (2021)NUMA-aware memory coloring for multicore real-time systemsJournal of Systems Architecture10.1016/j.sysarc.2021.102188118(102188)Online publication date: Sep-2021
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media