More Web Proxy on the site http://driver.im/

article

Last level cache size heterogeneity in embedded systems

Authors:

Mario D. Marino,

Kuan-Ching LiAuthors Info & Claims

The Journal of Supercomputing, Volume 72, Issue 2

Pages 503 - 544

https://doi.org/10.1007/s11227-015-1576-8

Published: 01 February 2016 Publication History

Abstract

In typical multicore processors, last level caches are formed by distributed clusters of memory banks of the same size, namely homogeneous ones. By shutting down part of these clusters to save power along generations of multicore processors, clusters with non-homogeneous cache sizes can be originated, named as heterogeneous ones. Given that heterogeneous clusters have typically smaller sizes than the homogeneous ones, they present larger miss rates that are likely to deteriorate performance. In this investigation, we study the impact of heterogeneous caches in embedded microprocessors, by having an arbitrary mix of homogeneous and heterogeneous clusters. That is, we propose to evaluate the architectural implications of these heterogeneous caches and a flexible algorithm that can be used to explore them. From scientific applications' experimental benchmarking, our findings show that microprocessors with heterogeneous clusters present a maximal performance degradation of about 10 % and maximal performance improvement of 16 %, while obtaining maximum miss hit rate of reduction and improvement up to 10 %. In addition, 10 % of coherence activity decrease when presenting maximum energy utilization up to 50 % and maximum energy reduction of 15 %.

References

[1]

Cook H et al (2013) A hardware evaluation of cache partitioning to improve utilization and energy-efficiency while preserving responsiveness. In: ISCA'13, New York, NY, USA, ACM, pp 308---319

[2]

Benitez D, Moure JC, Rexachs D, Luque E (2010) A reconfigurable cache memory with heterogeneous banks. In: Proceedings of the conference on design, automation and test in Europe, DATE '10 (3001 Leuven, Belgium, Belgium), pp 825---830. European Design and Automation Association

[3]

Benitez D, Moure JC, Rexachs DI, Luque E (2006) A reconfigurable data cache for adaptive processors. In: Reconfigurable computing: architectures and applications, vol 3985. Lecture notes in computer science, pp 230---242. Springer, Berlin

[4]

Mittal S, Zhang Z, Vetter J (2013) FlexiWay: a cache energy saving technique using fine-grained cache reconfiguration. In: 2013 IEEE 31st international conference on computer design (ICCD), pp 100---107

[5]

Sleiman FM, Dreslinski RG, Wenisch TF (2012) Embedded way prediction for last-level caches. In: 2012 IEEE 30th international conference on computer design (ICCD), pp 167---174

[6]

Sundararajan KT, Porpodas V, Jones TM, Topham NP, Franke B (2012) Cooperative partitioning: energy-efficient cache partitioning for high-performance CMPs. In: 2012 IEEE 18th international symposium on high performance computer architecture (HPCA), pp 1---12

[7]

Mittal S, Zhang Z (2012) Encache: Improving cache energy efficiency using a software-controlled profiling cache. In: IEEE international conference on electro/information technology, pp 1---12

[8]

Mittal S, Zhang Z, Cao Y (2013) Cashier: a cache energy saving technique for QoS systems. In: 2013 26th international conference on VLSI design and 2013 12th international conference on embedded systems (VLSID), pp 43---48

[9]

Wang W, Mishra P, Ranka S (2011) Dynamic cache reconfiguration and partitioning for energy optimization in real-time multi-core systems. In: Design automation conference (DAC), 2011 48th ACM/EDAC/IEEE, pp 948---953

[10]

Hajimiri H, Mishra P, Bhunia S (2013) Dynamic cache tuning for efficient memory based computing in multicore architectures. In: International conference on VLSI design, pp 49---54. Pune, India, January 5---10, 2013

[11]

Mittal S, Zhang Z (2013) Palette: a cache leakage energy saving technique for green computing. In: Series advances in parallel computing Ebook, pp 46---61

[12]

Kotera I, Abe K, Egawa R, Takizawa H, Kobayashi H (2011) Power-aware dynamic cache partitioning for CMPs. Transactions on high-performance embedded architectures and compilers iii, pp 135---153

[13]

Sundararajan et al (2013) The smart cache: an energy-efficient cache architecture through dynamic adaptation. Int J Parallel Program 41:305---330

[14]

Paul M, Petrov P (2011) Dynamically adaptive I-cache partitioning for energy-efficient embedded multitasking. IEEE Trans Very Large Scale Integr Syst 19:2067---2080

Digital Library

[15]

Abella J, González A (2006) Heterogeneous way-size cache. In: Proceedings of the 20th annual international conference on supercomputing, ICS '06, New York, NY, USA, pp 239---248. ACM

[16]

Bardine A, Foglia P, Gabrielli G, Prete CA, Stenstrm P (2007) Improving power efficiency of D-NUCA caches. ACM SIGARCH Comput Arch News 35(4):53---58

Digital Library

[17]

Lodde M, Flich J, Acacio ME (2012) Dynamic last-level cache allocation to reduce area and power overhead in directory coherence protocols. In: Proceedings of the 18th international conference on parallel processing. Euro-Par'12, pp 206---218. Springer, Berlin

[18]

Marino MD (2006) L2-cache hierarchical organizations for multi-core architectures. Frontiers of high performance computing and networking---ISPA 2006 workshops, pp 74---83, Springer, Sorrento, Italy

[19]

Gebhart M, Maher BA, Coons KE, Diamond J, Gratz P, Marino M, Ranganathan N, Robatmili B, Smith A, Burrill J, Keckler SW, Burger D, McKinley KS (2009) An evaluation of the trips computer system. In: Proceedings of the 14th international conference on architectural support for programming languages and operating systems, ASPLOS XIV, New York, NY, USA, pp 1---12. ACM

[20]

Muralimanohar N, Balasubramonian R (2007) Interconnect design considerations for large NUCA caches. In: Proceedings of the 34th annual international symposium on computer architecture, New York, NY, USA, ACM

[21]

Tian Y et al (2014) Last-level cache deduplication. In: International conference on supercomputing, New York, NY, USA. ACM, pp 53---62

[22]

Hameed F et al (2014) Reducing latency in an SRAM/DRAM cache hierarchy via a novel tag-cache architecture. In: DAC '14, New York, NY, USA. ACM

[23]

CACTI 5.1. http://www.hpl.hp.com/techreports/2008/HPL200820.html. Accessed 01 November 2014

[24]

Marino MD (2006) 32-core CMP with Multi-sliced L2: 2 and 4 cores sharing a L2 slice. In: International symposium on computer architecture and high performance computing, IEEE, pp 141---150

[25]

Marino MD (2012) RFiop: RF-memory path to address on-package I/O pad and memory controller scalability. In: ICCD, 2012, Montreal, Quebec, Canada. IEEE

Digital Library

[26]

Marino MD, Li K-C (2015) Implications of shallower memory controller transaction queues in scalable memory systems. J Supercomput.

[27]

Ortego PM, Sack P (2004) Sesc: superescalar simulator. Technical report, University of Illinois

[28]

Woo SC, Ohara M, Torrie E, Singh JP, Gupta A (1995) The SPLASH-2 programs: characterization and methodological considerations. In: Proceedings of the 22nd annual international symposium on computer architecture. ISCA '95, New York, NY, USA, pp 24---36. ACM

[29]

Bardine A, Comparetti M, Foglia P, Prete CA (2014) Evaluation of leakage reduction alternatives for deep submicron dynamic nonuniform cache architecture caches. IEEE Trans Very Large Scale Integr Syst 22:185---190

Digital Library

[30]

Abella J, González A, Vera X, O'Boyle MFP (2005) Iatac: a smart predictor to turn-off l2 cache lines. ACM Trans Archit Code Optim 2:55---77

Digital Library

[31]

Reconfigurable caches for adaptive high-performance and embedded processors. http://serdis.dis.ulpgc.es/~dbenitez/index_archivos/Page318.htm. Accessed 15 October 2015

[32]

Marino MD, Li KC (2014) Insights on memory controller scalability in heterogeneous multi-core embedded systems 6(4)

[33]

Marino MD (2013) RFiof: an RF approach to the I/O-pin and memory controller scalability for off-chip memories, in CF, May 14---16, Ischia, Italy. ACM, pp 100---110

[34]

Marino MD (2012) On-package scalability of RF and inductive memory controllers. In: Euromicro DSD, Turkey. IEEE, pp 923---930

Cited By

Marino MWeng TLi K(2018)Exploiting dynamic transaction queue size in scalable memory systemsSoft Computing - A Fusion of Foundations, Methodologies and Applications10.5555/3197793.319780922:6(2065-2077)Online publication date: 1-Mar-2018
https://dl.acm.org/doi/10.5555/3197793.3197809
Marino MLi K(2017)System implications of LLC MSHRs in scalable memory systemsMicroprocessors & Microsystems10.1016/j.micpro.2016.12.00752:C(355-364)Online publication date: 1-Jul-2017
https://dl.acm.org/doi/10.1016/j.micpro.2016.12.007
Marino M(2016)ABaT-FSMicroprocessors & Microsystems10.1016/j.micpro.2016.06.01345:PB(339-354)Online publication date: 1-Sep-2016
https://dl.acm.org/doi/10.1016/j.micpro.2016.06.013

Last level cache size heterogeneity in embedded systems
1. General and reference
  1. Cross-computing tools and techniques

Recommendations

A new cache replacement algorithm for last-level caches by exploiting tag-distance correlation of cache lines

Cache memory plays a crucial role in determining the performance of processors, especially for embedded processors where area and power are tightly constrained. It is necessary to have effective management mechanisms, such as cache replacement policies, ...
Increasing hardware data prefetching performance using the second-level cache

Techniques to reduce or tolerate large memory latencies are critical for achieving high processor performance. Hardware data prefetching is one of the most heavily studied solutions, but it is essentially applied to first-level caches where it can ...
Domino Cache: An Energy-Efficient Data Cache for Modern Applications

The energy consumption for processing modern workloads is challenging in data centers. Due to the large datasets of cloud workloads, the miss rate of the L1 data cache is high, and with respect to the energy efficiency concerns, such misses are costly ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image The Journal of Supercomputing

The Journal of Supercomputing Volume 72, Issue 2

February 2016

442 pages

ISSN:0920-8542

Issue’s Table of Contents

Copyright © Copyright © 2016 Springer Science+Business Media New York.

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 February 2016

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Marino MWeng TLi K(2018)Exploiting dynamic transaction queue size in scalable memory systemsSoft Computing - A Fusion of Foundations, Methodologies and Applications10.5555/3197793.319780922:6(2065-2077)Online publication date: 1-Mar-2018
https://dl.acm.org/doi/10.5555/3197793.3197809
Marino MLi K(2017)System implications of LLC MSHRs in scalable memory systemsMicroprocessors & Microsystems10.1016/j.micpro.2016.12.00752:C(355-364)Online publication date: 1-Jul-2017
https://dl.acm.org/doi/10.1016/j.micpro.2016.12.007
Marino M(2016)ABaT-FSMicroprocessors & Microsystems10.1016/j.micpro.2016.06.01345:PB(339-354)Online publication date: 1-Sep-2016
https://dl.acm.org/doi/10.1016/j.micpro.2016.06.013

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents