[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

WiDGET: Wisconsin decoupled grid execution tiles

Published: 19 June 2010 Publication History

Abstract

The recent paradigm shift to multi-core systems results in high system throughput within a specified power budget. However, future systems still require good single thread performance--no longer the predominant design priority--to mitigate sequential bottlenecks and/or to guarantee service-level agreements. Unfortunately, near saturation in voltage scaling necessitates a long-term alternative to dynamic voltage and frequency scaling.
We propose an energy-proportional computing infrastructure, called WiDGET, that decouples thread context management from a sea of simple execution units (EUs). WiDGET's decoupled design provides flexibility to alter resource allocation for a particular power-performance target while turning off unallocated resources. In other words, WiDGET enables dynamic customization of different combinations of small and/or powerful cores on a single chip, consuming power in proportion to the delivered performance.
Over all SPEC CPU2006 benchmarks, WiDGET provides average per-thread performance that is 26% better than a Xeon-like processor while using 8% less power. WiDGET can also scale down to a level comparable to an Atom-like processor, turning off resources to reduce average power by 58%. WiDGET achieves high power efficiency (BIPS3/W), exceeding Xeon-like and Atom-like processors by up to 2x and 21x, respectively.

References

[1]
D. Albonesi, R., Balasubramonian, S. Dropsbo, S. Dwarkadas, F. Friedman, M. Huang, V. Kursun, G. Magklis, M. Scott, G. Semeraro, P. Bose, A. Buyuktosunoglu, P. Cook, and S. Schuster. Dynamically tuning processor resources with adaptive processing. IEEE Computer, 36(2):49--58, Dec. 2003.
[2]
G. M. Amdahl. Validity of the Single-Processor Approach to Achieving Large Scale Computing Capabilities. In AFIPS Conference Proceedings, pages 483--485, Apr. 1967.
[3]
A. Baniasadi and A. Moshovos. Instruction distribution heuristics for quad-cluster, dynamically-scheduled, superscalar processors. In Proc. of the 27th Annual Intnl. Symp. on Computer Architecture, June 2000.
[4]
L. A. Barroso and U. Hölzle. The Case for Energy-Proportional Computing. IEEE Computer, 40(12), 2007.
[5]
D. Brooks, V. Tiwari, and M. Martonosi. Wattch: A Framework for Architectural-Level Power Analysis and Optimizations. In Proc. of the 27th Annual Intnl. Symp. on Computer Architecture, pages 83--94, June 2000.
[6]
R. Canal, J.-M. Parcerisa, and A. Gonzalez. A Cost-Effective Clustered Architecture. In Proc. of the Intnl. Conf. on Parallel Architectures and Compilation Techniques, Oct. 1999.
[7]
A. P. Chandrakasan, S. Sheng, and R. W. Brodersen. Low-Power CMOS Digital Design. IEEE Journal of Solid-State Circuits, 27(4):473--484, April 1992.
[8]
M. S. Floyd, S. Ghiasi, T. W. Keller, K. Rajamani, F. L. Rawson, J. C. Rubio, and M. S. Ware. System power management support in the IBM POWER6 microprocessor. IBM Journal of Research and Development, 51(6), 2007.
[9]
G. Gerosa, S. Curtis, M. D'Addeo, B. Jiang, B. Kuttanna, F. Merchant, B. Patel, M. Taufique, and H. Samarchi. A Sub-2 W Low Power IA Processor for Mobile Internet Devices in 45 nm High-k Metal Gate CMOS. IEEE Journal of Solid-State Circuits, 44(1):73--82, 2009.
[10]
J. González and A. González. Dynamic Cluster Resizing. In Proceedings of the 21st International Conference on Computer Design, 2003.
[11]
L. Hammond, B. Hubbert, M. Siu, M. Prabhu, M. Chen, and K. Olukotun. The Stanford Hydra CMP. IEEE Micro, 20(2):71--84, March-April 2000.
[12]
A. Hartstein and T. R. Puzak. Optimum Power/Performance Pipeline Depth. In Proc. of the 36th Annual IEEE/ACM International Symp. on Microarchitecture, Dec. 2003.
[13]
M. D. Hill and M. R. Marty. Amdahl's Law in the Multicore Era. IEEE Computer, pages 33--38, July 2008.
[14]
Z. Hu, A. Buyuktosunoglu, V. Srinivasan, V. Zyuban, H. Jacobson, and P. Bose. Microarchitectural techniques for power gating of execution units. In International Symposium on Low Power Electronics and Design, pages 32--37, Aug. 2004.
[15]
Intel. Intel and Core i7 (Nehalem) Dynamic Power Management, 2008.
[16]
E. Ipek, M. Kirman, N. Kirman, and J. F. Martinez. Core Fusion: Accomodating Software Diversity in Chip Multiprocessors. In Proc. of the 34th Annual Intnl. Symp. on Computer Architecture, June 2007.
[17]
S. Keckler, D. Burger, K. Sankaralingam, R. Nagarajan, R. McDonald, R. Desikan, S. Drolia, M. Govindan, P. Gratz, D. Gulati, H. H. amd C. Kim, H. Liu, N. Ranganathan, S. Sethumadhavan, S. Sharif, and P. Shivakumar. Architecture and Implementation of the TRIPS Processor. CRC Press, 2007.
[18]
C. Kim, S. Sethumadhavan, M. S. Govindan, N. Ranganathan, D. Gulati, D. Burger, and S. W. Keckler. Composable Lightweight Processors. In Proc. of the 40th Annual IEEE/ACM International Symp. on Microarchitecture, Dec. 2007.
[19]
H. S. Kim and J. E. Smith. An instruction set and microarchitecture for instruction level distributed processing. In Proc. of the 29th Annual Intnl. Symp. on Computer Architecture, May 2002.
[20]
P. Kongetira, K. Aingaran, and K. Olukotun. Niagara: A 32-Way Multithreaded Sparc Processor. IEEE Micro, 25(2):21--29, Mar/Apr 2005.
[21]
R. Kumar, D. Tullsen, P. Ranganathan, N. Jouppi, and K. Farkas. Single-ISA Heterogeneous Multi-core Architectures for Multithreaded Workload Performance. In Proc. of the 31st Annual Intnl. Symp. on Computer Architecture, pages 64--75, June 2004.
[22]
G. Magklis, G. Semeraro, D. H. Albonesi, S. G. Dropsho, S. Dwarkadas, and M. L. Scott. Dynamic Frequency and Voltage Scaling for a Multiple-Clock-Domain Microprocessor. IEEE Micro, 23(6):62--68, Nov/Dec 2003.
[23]
P. S. Magnusson et al. Simics: A Full System Simulation Platform. IEEE Computer, 35(2):50--58, Feb. 2002.
[24]
M. M. K. Martin, D. J. Sorin, B. M. Beckmann, M. R. Marty, M. Xu, A. R. Alameldeen, K. E. Moore, M. D. Hill, and D. A. Wood. Multifacet's General Execution-driven Multiprocessor Simulator (GEMS) Toolset. Computer Architecture News, pages 92--99, Sept. 2005.
[25]
D. Meisner, B. T. Gold, and T. F. Wenisch. PowerNap: Eliminating Server Idle Power. In Proc. of the 14th Intnl. Conf. on Architectural Support for Programming Languages and Operating Systems, Mar. 2009.
[26]
S. Palacharla and J. E. Smith. Complexity-Effective Superscalar Processors. In Proc. of the 24th Annual Intnl. Symp. on Computer Architecture, pages 206--218, June 1997.
[27]
K. K. Rangan, G.-Y. Wei, and D. Brooks. Thread Motion: Fine-Grained Power Management for Multi-Core Systems. In Proc. of the 36th Annual Intnl. Symp. on Computer Architecture, June 2009.
[28]
V. J. Reddi, B. Lee, T. Chilimbi, and K. Vaid. Web Search Using Small Cores: Quantifying the Price of Efficiency. Technical Report MSR-TR-2009-105, Microsoft Research, Aug. 2009.
[29]
J. Renau, K. Strauss, L. Ceze, W. Liu, S. Sarangi, J. Tuck, and J. Torrellas. Energy-Efficient Thread-Level Speculation on a CMP. IEEE Micro, 26(1), Jan/Feb 2006.
[30]
A. Roth and G. S. Sohi. Register Integration: A Simple and Efficient Implementation of Squash Reuse. In Proc. of the 33rd Annual IEEE/ACM International Symp. on Microarchitecture, pages 223--234, Dec. 2000.
[31]
P. Salverda and C. Zilles. Fundamental performance constraints in horizontal fusion of in-order cores. In Proc. of the 14th IEEE Symp. on High-Performance Computer Architecture, pages 252--263, Feb. 2008.
[32]
L. Seiler, D. Carmean, E. Sprangle, T. Forsyth, M. Abrash, P. Dubey, S. Junkins, A. Lake, J. Sugerman, R. Cavin, R. Espasa, E. Grochowski, T. Juan, and P. Hanrahan. Larrabee: a many-core x86 architecture for visual computing. In Proceedings of the International Conference on Computer Graphics and Interactive Techniques, 2008.
[33]
T. Sha, M. M. K. Martin, and A. Roth. NoSQ: Store-Load Communication without a Store Queue. In Proc. of the 39th Annual IEEE/ACM International Symp. on Microarchitecture, pages 285--296, Dec. 2006.
[34]
T. Shyamkumar, N. Muralimanohar, J. H. Ahn, and N. P. Jouppi. CACTI 5.1. Technical Report HPL-2008-20, Hewlett Packard Labs, 2008.
[35]
J. E. Smith. Decoupled Access/Execute Computer Architecture. In Proc. of the 9th Annual Symp. on Computer Architecture, pages 112--119, Apr. 1982.
[36]
G. Sohi, S. Breach, and T. Vijaykumar. Multiscalar Processors. In Proc. of the 22nd Annual Intnl. Symp. on Computer Architecture, pages 414--425, June 1995.
[37]
S. Tam, S. Rusu, J. Chang, S. Vora, B. Cherkauer, and D. Ayers. A 65nm 95W Dual-Core Multi-Threaded Xeon Processor with L3 Cache. In Proc. of the 2006 IEEE Asian Solid-State Circuits Conference, Nov. 2006.
[38]
F. Tseng and Y. N. Patt. Achieving Out-of-Order Performance with Almost In-Order Complexity. In Proc. of the 35th Annual Intnl. Symp. on Computer Architecture, June 2008.
[39]
Wisconsin Multifacet GEMS Simulator. http://www.cs.wisc.edu/gems/.
[40]
B. Zhai, D. Blaauw, D. Sylvester, and K. Flaunter. Theoretical and Practical Limits of Dynamic Voltage Scaling. In Proc. of the 41st Annual Design Automation Conference, pages 868--873, June 2004.

Cited By

View all
  • (2022)Efficient Instruction Scheduling Using Real-time Load Delay TrackingACM Transactions on Computer Systems10.1145/354868140:1-4(1-21)Online publication date: 24-Nov-2022
  • (2018)Scalable Dynamic Task Scheduling on Adaptive Many-Core2018 IEEE 12th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)10.1109/MCSoC2018.2018.00037(168-175)Online publication date: Sep-2018
  • (2017)CG-OoOACM Transactions on Architecture and Code Optimization10.1145/315103414:4(1-26)Online publication date: 5-Dec-2017
  • Show More Cited By

Index Terms

  1. WiDGET: Wisconsin decoupled grid execution tiles

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM SIGARCH Computer Architecture News
    ACM SIGARCH Computer Architecture News  Volume 38, Issue 3
    ISCA '10
    June 2010
    508 pages
    ISSN:0163-5964
    DOI:10.1145/1816038
    Issue’s Table of Contents
    • cover image ACM Conferences
      ISCA '10: Proceedings of the 37th annual international symposium on Computer architecture
      June 2010
      520 pages
      ISBN:9781450300537
      DOI:10.1145/1815961
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 19 June 2010
    Published in SIGARCH Volume 38, Issue 3

    Check for updates

    Author Tags

    1. hardware
    2. instruction steering
    3. performance
    4. power efficiency
    5. power proportional computing

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)6
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 17 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media