More Web Proxy on the site http://driver.im/

research-article

WiDGET: Wisconsin decoupled grid execution tiles

Authors:

Yasuko Watanabe,

David A. WoodAuthors Info & Claims

ACM SIGARCH Computer Architecture News, Volume 38, Issue 3

Pages 2 - 13

https://doi.org/10.1145/1816038.1815965

Published: 19 June 2010 Publication History

Abstract

The recent paradigm shift to multi-core systems results in high system throughput within a specified power budget. However, future systems still require good single thread performance--no longer the predominant design priority--to mitigate sequential bottlenecks and/or to guarantee service-level agreements. Unfortunately, near saturation in voltage scaling necessitates a long-term alternative to dynamic voltage and frequency scaling.

We propose an energy-proportional computing infrastructure, called WiDGET, that decouples thread context management from a sea of simple execution units (EUs). WiDGET's decoupled design provides flexibility to alter resource allocation for a particular power-performance target while turning off unallocated resources. In other words, WiDGET enables dynamic customization of different combinations of small and/or powerful cores on a single chip, consuming power in proportion to the delivered performance.

Over all SPEC CPU2006 benchmarks, WiDGET provides average per-thread performance that is 26% better than a Xeon-like processor while using 8% less power. WiDGET can also scale down to a level comparable to an Atom-like processor, turning off resources to reduce average power by 58%. WiDGET achieves high power efficiency (BIPS³/W), exceeding Xeon-like and Atom-like processors by up to 2x and 21x, respectively.

References

[1]

D. Albonesi, R., Balasubramonian, S. Dropsbo, S. Dwarkadas, F. Friedman, M. Huang, V. Kursun, G. Magklis, M. Scott, G. Semeraro, P. Bose, A. Buyuktosunoglu, P. Cook, and S. Schuster. Dynamically tuning processor resources with adaptive processing. IEEE Computer, 36(2):49--58, Dec. 2003.

Digital Library

[2]

G. M. Amdahl. Validity of the Single-Processor Approach to Achieving Large Scale Computing Capabilities. In AFIPS Conference Proceedings, pages 483--485, Apr. 1967.

Digital Library

[3]

A. Baniasadi and A. Moshovos. Instruction distribution heuristics for quad-cluster, dynamically-scheduled, superscalar processors. In Proc. of the 27th Annual Intnl. Symp. on Computer Architecture, June 2000.

[4]

L. A. Barroso and U. Hölzle. The Case for Energy-Proportional Computing. IEEE Computer, 40(12), 2007.

Digital Library

[5]

D. Brooks, V. Tiwari, and M. Martonosi. Wattch: A Framework for Architectural-Level Power Analysis and Optimizations. In Proc. of the 27th Annual Intnl. Symp. on Computer Architecture, pages 83--94, June 2000.

Digital Library

[6]

R. Canal, J.-M. Parcerisa, and A. Gonzalez. A Cost-Effective Clustered Architecture. In Proc. of the Intnl. Conf. on Parallel Architectures and Compilation Techniques, Oct. 1999.

Digital Library

[7]

A. P. Chandrakasan, S. Sheng, and R. W. Brodersen. Low-Power CMOS Digital Design. IEEE Journal of Solid-State Circuits, 27(4):473--484, April 1992.

[8]

M. S. Floyd, S. Ghiasi, T. W. Keller, K. Rajamani, F. L. Rawson, J. C. Rubio, and M. S. Ware. System power management support in the IBM POWER6 microprocessor. IBM Journal of Research and Development, 51(6), 2007.

Digital Library

[9]

G. Gerosa, S. Curtis, M. D'Addeo, B. Jiang, B. Kuttanna, F. Merchant, B. Patel, M. Taufique, and H. Samarchi. A Sub-2 W Low Power IA Processor for Mobile Internet Devices in 45 nm High-k Metal Gate CMOS. IEEE Journal of Solid-State Circuits, 44(1):73--82, 2009.

[10]

J. González and A. González. Dynamic Cluster Resizing. In Proceedings of the 21st International Conference on Computer Design, 2003.

Digital Library

[11]

L. Hammond, B. Hubbert, M. Siu, M. Prabhu, M. Chen, and K. Olukotun. The Stanford Hydra CMP. IEEE Micro, 20(2):71--84, March-April 2000.

Digital Library

[12]

A. Hartstein and T. R. Puzak. Optimum Power/Performance Pipeline Depth. In Proc. of the 36th Annual IEEE/ACM International Symp. on Microarchitecture, Dec. 2003.

Digital Library

[13]

M. D. Hill and M. R. Marty. Amdahl's Law in the Multicore Era. IEEE Computer, pages 33--38, July 2008.

Digital Library

[14]

Z. Hu, A. Buyuktosunoglu, V. Srinivasan, V. Zyuban, H. Jacobson, and P. Bose. Microarchitectural techniques for power gating of execution units. In International Symposium on Low Power Electronics and Design, pages 32--37, Aug. 2004.

Digital Library

[15]

Intel. Intel and Core i7 (Nehalem) Dynamic Power Management, 2008.

[16]

E. Ipek, M. Kirman, N. Kirman, and J. F. Martinez. Core Fusion: Accomodating Software Diversity in Chip Multiprocessors. In Proc. of the 34th Annual Intnl. Symp. on Computer Architecture, June 2007.

Digital Library

[17]

S. Keckler, D. Burger, K. Sankaralingam, R. Nagarajan, R. McDonald, R. Desikan, S. Drolia, M. Govindan, P. Gratz, D. Gulati, H. H. amd C. Kim, H. Liu, N. Ranganathan, S. Sethumadhavan, S. Sharif, and P. Shivakumar. Architecture and Implementation of the TRIPS Processor. CRC Press, 2007.

[18]

C. Kim, S. Sethumadhavan, M. S. Govindan, N. Ranganathan, D. Gulati, D. Burger, and S. W. Keckler. Composable Lightweight Processors. In Proc. of the 40th Annual IEEE/ACM International Symp. on Microarchitecture, Dec. 2007.

Digital Library

[19]

H. S. Kim and J. E. Smith. An instruction set and microarchitecture for instruction level distributed processing. In Proc. of the 29th Annual Intnl. Symp. on Computer Architecture, May 2002.

Digital Library

[20]

P. Kongetira, K. Aingaran, and K. Olukotun. Niagara: A 32-Way Multithreaded Sparc Processor. IEEE Micro, 25(2):21--29, Mar/Apr 2005.

Digital Library

[21]

R. Kumar, D. Tullsen, P. Ranganathan, N. Jouppi, and K. Farkas. Single-ISA Heterogeneous Multi-core Architectures for Multithreaded Workload Performance. In Proc. of the 31st Annual Intnl. Symp. on Computer Architecture, pages 64--75, June 2004.

Digital Library

[22]

G. Magklis, G. Semeraro, D. H. Albonesi, S. G. Dropsho, S. Dwarkadas, and M. L. Scott. Dynamic Frequency and Voltage Scaling for a Multiple-Clock-Domain Microprocessor. IEEE Micro, 23(6):62--68, Nov/Dec 2003.

Digital Library

[23]

P. S. Magnusson et al. Simics: A Full System Simulation Platform. IEEE Computer, 35(2):50--58, Feb. 2002.

Digital Library

[24]

M. M. K. Martin, D. J. Sorin, B. M. Beckmann, M. R. Marty, M. Xu, A. R. Alameldeen, K. E. Moore, M. D. Hill, and D. A. Wood. Multifacet's General Execution-driven Multiprocessor Simulator (GEMS) Toolset. Computer Architecture News, pages 92--99, Sept. 2005.

Digital Library

[25]

D. Meisner, B. T. Gold, and T. F. Wenisch. PowerNap: Eliminating Server Idle Power. In Proc. of the 14th Intnl. Conf. on Architectural Support for Programming Languages and Operating Systems, Mar. 2009.

Digital Library

[26]

S. Palacharla and J. E. Smith. Complexity-Effective Superscalar Processors. In Proc. of the 24th Annual Intnl. Symp. on Computer Architecture, pages 206--218, June 1997.

Digital Library

[27]

K. K. Rangan, G.-Y. Wei, and D. Brooks. Thread Motion: Fine-Grained Power Management for Multi-Core Systems. In Proc. of the 36th Annual Intnl. Symp. on Computer Architecture, June 2009.

Digital Library

[28]

V. J. Reddi, B. Lee, T. Chilimbi, and K. Vaid. Web Search Using Small Cores: Quantifying the Price of Efficiency. Technical Report MSR-TR-2009-105, Microsoft Research, Aug. 2009.

[29]

J. Renau, K. Strauss, L. Ceze, W. Liu, S. Sarangi, J. Tuck, and J. Torrellas. Energy-Efficient Thread-Level Speculation on a CMP. IEEE Micro, 26(1), Jan/Feb 2006.

Digital Library

[30]

A. Roth and G. S. Sohi. Register Integration: A Simple and Efficient Implementation of Squash Reuse. In Proc. of the 33rd Annual IEEE/ACM International Symp. on Microarchitecture, pages 223--234, Dec. 2000.

Digital Library

[31]

P. Salverda and C. Zilles. Fundamental performance constraints in horizontal fusion of in-order cores. In Proc. of the 14th IEEE Symp. on High-Performance Computer Architecture, pages 252--263, Feb. 2008.

[32]

L. Seiler, D. Carmean, E. Sprangle, T. Forsyth, M. Abrash, P. Dubey, S. Junkins, A. Lake, J. Sugerman, R. Cavin, R. Espasa, E. Grochowski, T. Juan, and P. Hanrahan. Larrabee: a many-core x86 architecture for visual computing. In Proceedings of the International Conference on Computer Graphics and Interactive Techniques, 2008.

Digital Library

[33]

T. Sha, M. M. K. Martin, and A. Roth. NoSQ: Store-Load Communication without a Store Queue. In Proc. of the 39th Annual IEEE/ACM International Symp. on Microarchitecture, pages 285--296, Dec. 2006.

Digital Library

[34]

T. Shyamkumar, N. Muralimanohar, J. H. Ahn, and N. P. Jouppi. CACTI 5.1. Technical Report HPL-2008-20, Hewlett Packard Labs, 2008.

[35]

J. E. Smith. Decoupled Access/Execute Computer Architecture. In Proc. of the 9th Annual Symp. on Computer Architecture, pages 112--119, Apr. 1982.

Digital Library

[36]

G. Sohi, S. Breach, and T. Vijaykumar. Multiscalar Processors. In Proc. of the 22nd Annual Intnl. Symp. on Computer Architecture, pages 414--425, June 1995.

Digital Library

[37]

S. Tam, S. Rusu, J. Chang, S. Vora, B. Cherkauer, and D. Ayers. A 65nm 95W Dual-Core Multi-Threaded Xeon Processor with L3 Cache. In Proc. of the 2006 IEEE Asian Solid-State Circuits Conference, Nov. 2006.

[38]

F. Tseng and Y. N. Patt. Achieving Out-of-Order Performance with Almost In-Order Complexity. In Proc. of the 35th Annual Intnl. Symp. on Computer Architecture, June 2008.

Digital Library

[39]

Wisconsin Multifacet GEMS Simulator. http://www.cs.wisc.edu/gems/.

[40]

B. Zhai, D. Blaauw, D. Sylvester, and K. Flaunter. Theoretical and Practical Limits of Dynamic Voltage Scaling. In Proc. of the 41st Annual Design Automation Conference, pages 868--873, June 2004.

Digital Library

Cited By

Diavastos ACarlson T(2022)Efficient Instruction Scheduling Using Real-time Load Delay TrackingACM Transactions on Computer Systems10.1145/354868140:1-4(1-21)Online publication date: 24-Nov-2022
https://dl.acm.org/doi/10.1145/3548681
Venkataramani VPathania AShafique MMitra THenkel J(2018)Scalable Dynamic Task Scheduling on Adaptive Many-Core2018 IEEE 12th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)10.1109/MCSoC2018.2018.00037(168-175)Online publication date: Sep-2018
https://doi.org/10.1109/MCSoC2018.2018.00037
Mohammadi MAamodt TDally W(2017)CG-OoOACM Transactions on Architecture and Code Optimization10.1145/315103414:4(1-26)Online publication date: 5-Dec-2017
https://dl.acm.org/doi/10.1145/3151034
Show More Cited By

Index Terms

WiDGET: Wisconsin decoupled grid execution tiles
1. Computer systems organization
  1. Architectures
    1. Distributed architectures

Recommendations

WiDGET: Wisconsin decoupled grid execution tiles
ISCA '10: Proceedings of the 37th annual international symposium on Computer architecture

The recent paradigm shift to multi-core systems results in high system throughput within a specified power budget. However, future systems still require good single thread performance--no longer the predominant design priority--to mitigate sequential ...
PCOUNT: A power aware fetch policy in Simultaneous Multithreading processors
IGCC '11: Proceedings of the 2011 International Green Computing Conference and Workshops

The Simultaneous Multithreading (SMT) architecture improves the resource efficiency via scheduling and executing concurrent threads in the same core. Moreover, fetch policies are proposed to assign priorities in the fetch stage to manage the shared ...
Managing power constraints in a single-core scenario through power tokens

Current microprocessors face constant thermal and power-related problems during their everyday use, usually solved by applying a power budget to the processor/core. Dynamic voltage and frequency scaling (DVFS) has been an effective technique that ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGARCH Computer Architecture News

ACM SIGARCH Computer Architecture News Volume 38, Issue 3

ISCA '10

June 2010

508 pages

ISSN:0163-5964

DOI:10.1145/1816038

Issue’s Table of Contents

ISCA '10: Proceedings of the 37th annual international symposium on Computer architecture
June 2010
520 pages
ISBN:9781450300537
DOI:10.1145/1815961
General Chair:
André Seznec
INRIA Rennes
,
Program Chairs:
Uri Weiser
Technion
,
Ronny Ronen
Intel

Copyright © 2010 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 June 2010

Published in SIGARCH Volume 38, Issue 3

Check for updates

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

67
Total Citations
View Citations
944
Total Downloads

Downloads (Last 12 months)6
Downloads (Last 6 weeks)3

Reflects downloads up to 17 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Diavastos ACarlson T(2022)Efficient Instruction Scheduling Using Real-time Load Delay TrackingACM Transactions on Computer Systems10.1145/354868140:1-4(1-21)Online publication date: 24-Nov-2022
https://dl.acm.org/doi/10.1145/3548681
Venkataramani VPathania AShafique MMitra THenkel J(2018)Scalable Dynamic Task Scheduling on Adaptive Many-Core2018 IEEE 12th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)10.1109/MCSoC2018.2018.00037(168-175)Online publication date: Sep-2018
https://doi.org/10.1109/MCSoC2018.2018.00037
Mohammadi MAamodt TDally W(2017)CG-OoOACM Transactions on Architecture and Code Optimization10.1145/315103414:4(1-26)Online publication date: 5-Dec-2017
https://dl.acm.org/doi/10.1145/3151034
Zhou YHoffmann HWentzlaff D(2016)CASHACM SIGARCH Computer Architecture News10.1145/3007787.300120944:3(682-694)Online publication date: 18-Jun-2016
https://dl.acm.org/doi/10.1145/3007787.3001209
Nobre RMartins LCardoso J(2016)A graph-based iterative compiler pass selection and phase ordering approachACM SIGPLAN Notices10.1145/2980930.290795951:5(21-30)Online publication date: 13-Jun-2016
https://dl.acm.org/doi/10.1145/2980930.2907959
Sui YFan XZhou HXue J(2016)Loop-oriented array- and field-sensitive pointer analysis for automatic SIMD vectorizationACM SIGPLAN Notices10.1145/2980930.290795751:5(41-51)Online publication date: 13-Jun-2016
https://dl.acm.org/doi/10.1145/2980930.2907957
Spink TWagstaff HFranke B(2016)Efficient asynchronous interrupt handling in a full-system instruction set simulatorACM SIGPLAN Notices10.1145/2980930.290795351:5(1-10)Online publication date: 13-Jun-2016
https://dl.acm.org/doi/10.1145/2980930.2907953
Micolet PSmith ADubach C(2016)A machine learning approach to mapping streaming workloads to dynamic multicore processorsACM SIGPLAN Notices10.1145/2980930.290795151:5(113-122)Online publication date: 13-Jun-2016
https://dl.acm.org/doi/10.1145/2980930.2907951
Zhu HErez M(2016)DirigentACM SIGPLAN Notices10.1145/2954679.287239451:4(33-47)Online publication date: 25-Mar-2016
https://dl.acm.org/doi/10.1145/2954679.2872394
Kim WKim JBaek WNam BWon Y(2016)NVWALACM SIGPLAN Notices10.1145/2954679.287239251:4(385-398)Online publication date: 25-Mar-2016
https://dl.acm.org/doi/10.1145/2954679.2872392
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents