[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article

A workload independent energy reduction strategy for D-NUCA caches

Published: 01 April 2014 Publication History

Abstract

Wire delays and leakage energy consumption are both growing problems in the design of large on chip caches built in deep submicron technologies. D-NUCA caches (Dynamic-Nonuniform Cache Architecture) exploit an aggressive subbanking of the cache and a migration mechanism to speed up frequently accessed data access latency, to limit wire delays effects on performances. Way Adaptable D-NUCA is a leakage power reduction technique specifically suited for D-NUCA caches. It dynamically varies the portion of the powered-on cache area based on the running workload caching needs, but it relies on application dependent parameters that must be evaluated off-line. This limits the effectiveness of Way Adaptable D-NUCA in the general purpose, multiprogrammed environment. In this paper, we propose a new power reduction technique for D-NUCA caches, which still adapts the powered-on cache area to the needs of the running workload, but it does not rely on application-dependent parameters. Results show that our proposal saves around 49 % of total cache energy consumption in a single core environment and 44 % in CMP environment. By adding a timer, it performs similarly to previously proposed techniques to reduce leakage power consumptions, and outperforms them when they are applied in a workload independent manner.

References

[1]
Kim C, Burger D, Keckler SW (2002) An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches. In: Proc 10th ASPLOS, San Jose, CA, USA, Oct 2002, pp 211---222
[2]
Bardine A, Comparetti M, Foglia P, Gabrielli G, Prete CA (2010) Way-adaptable D-NUCA caches. Int J High Perform Syst Archit 2(3/4):215---228
[3]
Standard Performance Evaluation Corporation (2000) Available: http://www.spec.org/cpu2000/
[4]
Bailey DH, Barszcz E et al (1991) The NAS parallel benchmarks--summary and preliminary results. In: Proceedings of the 1991 ACM/IEEE conference on supercomputing. ACM, New York, pp 158---165. Available http://www.nas.nasa.gov/Resources/Software/npb.html
[5]
Powell M, Yangh S, Falsafi B, Roy K, Vijaykumar TN (2000) Gated-Vdd: a circuit technique to reduce leakage in deep-submicron cache memories. In: Proc int symp low power electronics and design, Rapallo, Italy, July 2000, pp 90---95
[6]
Desikan R et al (2001) Sim-Alpha: a validated execution-driven alpha2164 simulator. Tech Report TR-01-23, Dept of Computer Sciences, Univ Texas at Austin
[7]
Muralimanohar N, Balasubramonian R, Jouppi N (2009) CACTI 6.0: a tool to model large caches. HP Tech Rep, HPL-2009-85, April 2009
[8]
Snavely A, Tullsen DM (2000) Symbiotic jobscheduling for a simultaneous multithreading processor. In: Proc of the 9th ASPLOS, Cambridge, MA, Nov 2000, pp 234---244
[9]
Chisti Z, Powell MD, Vijaykumar TN (2003) Distance associativity for high-performance energy-efficient non-uniform cache architectures. In: Proc 36th int symp on microarchitecture, San Diego, CA, Dec 2003, pp 55---66
[10]
Foglia P, Mangano D, Prete CA (2005) A NUCA model for embedded systems cache design. In: IEEE 2005 workshop on embedded systems for real-time multimedia (ESTIMEDIA), New York Metropolitan Area, USA, September 2005, pp 41---46
[11]
Huh J, Kim C, Shafi H, Zhang L, Bourger D, Keckler SW (2005) A NUCA substrate for flexible CMP cache sharing. In: Proc of the 19th ICS, Cambridge, MA, 20---22 June 2005
[12]
Beckmann BM, Wood DA (2003) Managing wire delay in large chip-multiprocessors caches. In: Proc of 37th int symp on microarchitecture, San Diego, CA, Dec 2003, pp 55---66
[13]
Annoni A et al (2012) A real-time configurable NURBS interpolator with bounded acceleration, Jerk and Chord error. Comput Aided Des 44(6):509---521.
[14]
Bardine A et al (2009) Impact of on-chip network parameters on NUCA cache performance. IET Comput Digit Tech 3(5):501---512.
[15]
Bardine A, Foglia P, Gabrielli G, Prete CA (2007) Analysis of static and dynamic energy consumption in NUCA caches: initial results. In: Proc of the MEDEA 2007 workshop, Brasov, Romania, Sep 2007, pp 105---112
[16]
Venkatachalam V, Franz M (2005) Power reduction techniques for microprocessor systems. ACM Comput Surv 37(3):195---237
[17]
Albonesi DH (1999) Selective cache ways: on-demand cache resource allocation. In: Proc 32nd int symp on microarchitecture, Israel, Nov 1999, pp 248---259
[18]
Balasubramonian R et al (2000) Memory hierarchy reconfiguration for energy and performance in general purpose processor architectures. In: Proc 33rd int symp on microarchitecture, Monterey, CA, Dec 2000, pp 245---257
[19]
Bardine A et al (2013) Evaluation of leakage reduction alternatives for deep sub-micron D-NUCA caches. IEEE Trans Very Large Scale Integr (VLSI) Syst. published on-line Feb 2013
[20]
Hanson H et al (2003) Static energy reduction techniques for microprocessor caches. IEEE Trans Very Large Scale Integr (VLSI) Syst 11(3):303---313
[21]
Flautner K, Kim NS, Blaauw SMD, Mudge T (2002) Drowsy caches: simple techniques for reducing leakage power. In: Proc 29th ISCA, Anchorage, AK, May 2002, pp 148---157
[22]
Mohyuddin N, Bhatti R, Dubois M (2005) Controlling leakage power with the replacement policy in slumberous cache. In: Proc 2nd conf on computing frontiers, Ischia, Italy, May 2005, pp 161---170
[23]
Hu Z, Kaxiras S, Martonosi M (2002) Let caches decay: reducing leakage energy via exploitation of cache generational behavior. ACM Trans Comput Syst 20(2):161---190
[24]
Eyerman S, Eeckhout L (2008) System-level performance metrics for multiprogram workloads. IEEE MICRO 28(3):42---53
[25]
Kumar R, Hinton G (2009) A family of 45 nm IA processors. In: Proceedings of the 56th international solid state circuits conference (ISSCC), February 2009
[26]
Kurd NA, Bhamidipati S, Mozak C et al (2010) A family of 32 nm IA processors. IEEE J Solid-State Circuits 46(1):119---130
[27]
Agny R, DeLano E, Kumar M, Nachimutu M, Shiveley R (2010) The Intel Itanium processor 9300 series. Intel White Paper
[28]
Horowitz M, Indermaur T, Gonzales R (1994) Low-power digital design. In: Proc IEEE symposium on low power electronics, pp 8---11
[29]
Foglia P, Panicucci F, Prete CA, Solinas M (2009) Analysis of performance dependencies in NUCA-based CMP systems. In: 21st int symp on computer architecture and high performance computing, Sao Paulo, Brazil, 28---31 October 2009, pp 49---56
[30]
Kotera I, Egawa R, Takizawa H, Kobayashi H (2008) Modeling of cache access behavior based on Zipf's law. In: Proc of 9th MEDEA workshop, Toronto, Canada, October 2008, pp 9---15
[31]
Kobayashi H, Kotera I, Takizawa H (2004) Locality analysis to control dynamically way-adaptable caches. Comput Archit News 33(3):25---32
[32]
S.I.A. Int. Technology Roadmap for Semiconductors (2005) http://public.itrs.net/Links/2005ITRS/Home2005.htm
[33]
Kim NS et al (2003) Leakage current: Moore's law meets static power. Computer 36(12):68---75
[34]
Foglia P, Monni G, Prete CA, Solinas M (2010) Re-nuca: boosting CMP performances through block replication. In: Proc 13th EUROMICRO conference on digital system design, architectures, methods and tools, Lille, France, 1---3 September 2010, pp 199---206
[35]
Foglia P, Solinas M (2013) Exploiting replication to improve performances of NUCA-based CMP systems. ACM Trans Embed Comput Syst. Accepted September 2013, to appear
[36]
Qureshi MK, Patt YN (2006) Utility-based cache partitioning: a low-overhead, high-performance, runtime mechanism to partition shared caches. In: Proc of the 39th annual IEEE/ACM int symp on microarchitecture (MICRO 39)
[37]
Xie Y, Loh GH (2010) Scalable shared cache management by containing thrashing workloads. In: Proc of the int conf on high-performance embedded architectures and compilers (HiPEAC), Pisa, Italy, 25---27 January 2010, pp 262---276
[38]
Kahng A, Li B, Peh L-S, Samadi K (2009) ORION 2.0: a fast and accurate NoC power and area model for early-stage design space exploration. In: Proc of design automation and test in Europe (DATE), Nice, France, April 2009
[39]
Agarwal V, Hrishikesh MS, Keckler S, Burger D (2000) Clock rate versus IPC: the end of the road for conventional microarchitectures. In: Proc of 27th ISCA, June 2000
[40]
Ho R, Mai KW, Horowitz MA (2001) The future of wires. Proc IEEE 89(4):490---504
[41]
Mattson RL, Gecsei J, Slutz D, Traiger I (1970) Evaluation techniques for storage hierarchies. IBM Syst J.
[42]
Cascaval C, DeRose L, Padua DA, Reed D (1999) Compile-time based performance prediction. In: 12th intl workshop on languages and compilers for parallel computing
[43]
Kotera I, Abe K, Egawa R, Takizawa H, Kobayashi H (2008) Power-aware dynamic cache partitionning for cmps. Trans HiPEAC 3(2):149---167
[44]
Tanenbaum AS (2007) Modern operating systems, 3rd edn. Prentice Hall Press, Englewood Cliffs
[45]
Fallin C, Nazario G, Yuy X, Chang K, Ausavarungnirun R, Mutlu O (2012) MinBD: minimally-buffered deflection routing for energy-efficient interconnect. In: NOCS
[46]
Lotfi-Kamran P, Grot B, Falsafi B (2012) NOC-out: microarchitecting a scale-out processor. In: Proc the 45th annual inter symp on microarchitecture, Vancouver, Canada, December 2012
[47]
Homayoun H, Sasan A, Veidenbaum AV, Yao H-C, Golshan S, Heydari P (2011) MZZ-HVS: multiple sleep modes zig-zag horizontal and vertical sleep transistor sharing to reduce leakage power in on-chip SRAM peripheral circuits. IEEE Trans Very Large Scale Integr (VLSI) Syst 19(12):2303---2316
[48]
Chandra D, Guo F, Kim S, Solihin Y (2005) Predicting inter-thread cache contention on a chip multi-processor architecture. In: HPCA '05: proceedings of the 11th international symposium on high-performance computer architecture, pp 340---351
[49]
Meng Y, Sherwood T, Kastner R (2005) Exploring the limits of leakage power reduction in caches. ACM Trans Archit Code Optim 2(3):221---246
[50]
Zhao W, Cao Y (2006) New generation of predictive technology model for sub-45 nm design exploration. In: Proc 7th int symp quality electron design, Mar 2006, pp 590---596
[51]
Keating M, Flynn D, Aitken R, Gibbons A, Shi K (2007) Low power methodology manual. Springer, Berlin
[52]
Comparetti M, Foglia P et al (2009) A power-efficient migration mechanism for D-NUCA caches. In: Design, automation & test in Europe 2009 (Date 2009), Nice, France, 20---24 April 2009, pp 598---601
[53]
Bardine A, Foglia P, Panicucci F, Sahuquillo J, Solinas M (2011) Energy behaviour of NUCA caches in CMPs. In: 14th EUROMICRO conference on digital system design, architectures, methods and tools (DSD2011), OULU, Finland, 31 August---2 September 2011, pp 746---753
[54]
Hardavellas N et al (2009) Reactive NUCA: near-optimal block placement and replication in distributed caches. In: 36th annual international symposium on computer architecture (ISCA '09). ACM, New York, pp 184---195.
[55]
Bartolini S et al (2010) Feedback driven restructuring of multi-threaded applications for NUCA cache performance in CMPs. In: 22nd int symp on computer architecture and high performance computing, Petropolis, Brazil, 27---30 October 2010, pp 87---94.
[56]
Bardine A, Comparetti M, Foglia P, Gabrielli G, Prete CA, Stenstrom P (2008) Leveraging data promotion for low power D-NUCA caches. In: 11th EUROMICRO conference on digital system design, Parma, Italy, 3---5 September 2008, pp 307---316.

Cited By

View all
  • (2017)Performance linked dynamic cache tuningMicroprocessors & Microsystems10.1016/j.micpro.2017.06.01252:C(221-235)Online publication date: 1-Jul-2017
  • (2016)Static energy efficient cache reconfiguration for dynamic NUCA in tiled CMPsProceedings of the 31st Annual ACM Symposium on Applied Computing10.1145/2851613.2851674(1739-1744)Online publication date: 4-Apr-2016
  • (2015)EECacheACM Transactions on Architecture and Code Optimization10.1145/275655212:2(1-22)Online publication date: 8-Jul-2015
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image The Journal of Supercomputing
The Journal of Supercomputing  Volume 68, Issue 1
April 2014
507 pages

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 April 2014

Author Tags

  1. Cache memories
  2. Leakage
  3. NUCA
  4. Power consumption
  5. Wire delay

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2017)Performance linked dynamic cache tuningMicroprocessors & Microsystems10.1016/j.micpro.2017.06.01252:C(221-235)Online publication date: 1-Jul-2017
  • (2016)Static energy efficient cache reconfiguration for dynamic NUCA in tiled CMPsProceedings of the 31st Annual ACM Symposium on Applied Computing10.1145/2851613.2851674(1739-1744)Online publication date: 4-Apr-2016
  • (2015)EECacheACM Transactions on Architecture and Code Optimization10.1145/275655212:2(1-22)Online publication date: 8-Jul-2015
  • (2015)Static energy reduction by performance linked cache capacity management in tiled CMPsProceedings of the 30th Annual ACM Symposium on Applied Computing10.1145/2695664.2695763(1913-1918)Online publication date: 13-Apr-2015

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media