[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1454115.1454149acmconferencesArticle/Chapter ViewAbstractPublication PagespactConference Proceedingsconference-collections
research-article

Meeting points: using thread criticality to adapt multicore hardware to parallel regions

Published: 25 October 2008 Publication History

Abstract

We present a novel mechanism, called meeting point thread characterization, to dynamically detect critical threads in a parallel region. We define the critical thread the one with the longest completion time in the parallel region. Knowing the criticality of each thread has many potential applications. In this work, we propose two applications: thread delaying for multi-core systems and thread balancing for simultaneous multi-threaded (SMT) cores. Thread delaying saves energy consumptions by running the core containing the critical thread at maximum frequency while scaling down the frequency and voltage of the cores containing non-critical threads. Thread balancing improves overall performance by giving higher priority to the critical thread in the issue queue of an SMT core. Our experiments on a detailed microprocessor simulator with the Recognition, Mining, and Synthesis applications from Intel research laboratory reveal that thread delaying can achieve energy savings up to more than 40% with negligible performance loss. Thread balancing can improve performance from 1% to 20%.

References

[1]
S. Balakrishnan, R. Rajwar, M. Upton, and K. Lai. The impact of performance asymmetry in emerging multicore architectures. In Proceedings of the 32nd annual international symposium on Computer Architecture, pages 506--517, Washington, DC, USA, 2005. IEEE Computer Society
[2]
OpenMP Architecture Review Board. Openmp application program interface, 2005.
[3]
S. Y. Borkar. Platform 2015: Intel processor and platform evolution for the next decode. Intel White Paper, 2005
[4]
David Brooks, Vivek Tiwari, and Margaret Martonosi. Wattcy7sh: A framework for architectural-level power analysis and optimizations. ACM SIGARCH Computer Architec-ture News, 28, 2000.
[5]
T.D. Burd and R.W. Brodersen. Energy efficient cmos microprocessor design. System Sciences. Proceedings of the Twenty-Eighth Hawaii International Conference, 1995.
[6]
Francisco J. Cazorla, Alex Ramirez, Mateo Valero, and Enrique Fernandez. Dynamically controlled resource allocation in smt processors. Microarchitecture, 2004.
[7]
T. J. Chaney and C. E. Molnar. Anomalous behavior of synchronizer and arbiter circuits. IEEE Transactions on Computer, 22(4), 1973.
[8]
P. Chaparro, J. Gonzalez, G. Magklis, Q. Cai, and A. Gonzalez. Understanding the termal implications of multicore architectures. IEEE Transactions on Parallel and Distributed Systems, 18(8), 2007.
[9]
T. Chelcea and S. M. Nowick. Robust interfaces for mixed-timing systems with application to latency-insensitive protocols. Proceedings of the 38th Design Automation Conference, 2001.
[10]
Intel Corporation. Computer intenstive, highly parallel application and uses. Intel Technology Journal, 9(2), 2005.
[11]
Intel Corporation. Intel's tera-scale research prepares for tens, hundreds of cores, 2006.
[12]
A. El-Moursy and D.H. Albonesi. Front-end policies for improved issue efficiency in smt processors. High-Performance Computer Architecture, 2003.
[13]
S. Fischer. Technical overview of the 45nm next generation intel core microarchitecture (penryn), 2007.
[14]
T. Fischer, J. Desai, B. Doyle, S. Naffziger, and B. Patella. A 90-nm variable frequency clock system for a power-managed itanium architecture processor. IEEE Journal of Solid-State Circuits, 41, 2006.
[15]
S. Gochman, R. Ronen, I. Anati, A. Berkovits, T. Kurts, A. Naveh, A. Saeed, Z. Sperber, and R. Valentine. The intel pentium m processor: Microarchitecture and performance. Intel Technology Journal, 7(2), 2003.
[16]
P. Hazucha, T. Karnik, B.A. Bloechel, C. Parsons, D. Finan, and S. Borkar. Area-efficient linear regulator with ultra-fast load regulation. Solid-State Circuits, IEEE Journal of, 40, 2005.
[17]
H. Homayoun, K.F. Li, and S. Rafatirad. Thread scheduling based on low-quality instruction prediction for simultaneous multithreaded processors. IEEE-NEWCAS Conference, 2005.
[18]
Chenming Hu. Low-voltage cmos device scaling. Solid-State Circuits Conference, 1994.
[19]
Anoop Iyer and Diana Marculescu. Power and performance evaluation of globally asynchronous locally synchronous processors. ACM SIGARCH Computer Architecture News, 30, 2002.
[20]
R. Jain, C. Hughes, and S. Adve. Soft real-time scheduling on simultaneous multithreaded processors. In 23rd IEEE International Real-Time Systems Symposium, 2002.
[21]
P. Kongetira, K. Aingaran, and K. Olukotun. Niagara: a 32-way multithreaded sparc processor. Micro, IEEE, 25, 2005.
[22]
Rakesh Kumar, Keith I. Farkas, Norman P. Jouppi, Parthasarathy Ranganathan, and Dean M. Tullsen. Single-isa heterogeneous multi-core architectures: The potential for processor power reduction. In MICRO 36: Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture, page 81, Washington, DC, USA, 2003. IEEE Computer Society.
[23]
Rakesh Kumar, Dean M. Tullsen, Parthasarathy Ranganathan, Norman P. Jouppi, and Keith I. Farkas. Single-isa heterogeneous multi-core architectures for multithreaded workload performance. Proceedings of the 31st annual international symposium on Computer architecture, Washington, DC, USA, 2004. IEEE Computer Society.
[24]
J. Li, J.F. Martinez, and M.C. Huang. The thrifty barrier: energy-aware synchronization in shared-memory multiprocessors. High Performance Computer Architecture, 2004.
[25]
C. Liu, A. Sivasubramaniam, M. Kandemir, and M.J. Irwin. Exploiting barriers to optimize power consumption of cmps. Parallel and Distributed Processing Symposium, 2005.
[26]
Jacob R. Lorch and Alan Jay Smith. Improving dynamic voltage scaling algorithms with pace. ACM SIGMETRICS, 2001.
[27]
G Magklis, P. Chaparro, J. Gonzalez, and A. Gonzalez. Independent front-end and back-end dynamic voltage scaling for a gals microarchitecture. ISLPED, 2006.
[28]
G Magklis, J. Gonzalez, and A. Gonzalez. Frontend frequency-voltage adaptation for optimal energy-delay2. International Conference on Computer Design, 2004.
[29]
Pedro Marcuello, Antonio Gonzlez, and Jordi Tubella. Speculative multithreaded processors. Supercomputing, 1998.
[30]
T. Olsson, P. Nilsson, T. Meincke, A. Hemam, and M. Torkelson. A digitally controlled low-power clock multiplier for globally asynchronous locally synchronous designs. ISCAS 2000 Geneva.
[31]
K. Olukotun, B. A. Nayfeh, L. Hammond, K. Wilson, and K. Chang. The case for a single-chip multiprocessor. ACM SIGOPS Operating Systems Review, 30, 1996.
[32]
B. Robatmili, N. Yazdani, S. Sardashti, and M. Nourani. Thread-sensitive instruction issue for smt processors. Computer Architecture Letters, IEEE, 3, 2004.
[33]
G. Semeraro, D. H. Albonesi, G. Magklis, M. L. Scott, S. Dropsho, and S. Dwarkadas. Hiding synchronization delays in a gals processor microarchitecture. Proceedings of the 10th International Symposium on Asynchronous Circuits and Systems, 2004.
[34]
Dean M. Tullsen, Susan J. Eggers, Joel S. Emer, Henry M. Levy, Jack L. Lo, and Rebecca L. Stamm. Exploiting choice: Instruction fetch and issue on an implementable simultaneous multithreading processor. ACM SIGARCH Computer Architecture News, 24, 1996.
[35]
R. Uhlig, R. Fishtein, O. Gershon, I. Hirsh, and H. Wang. Softsdv: A pre-silicon software development environment for the ia-64 architecture. Intel Technology Journal, 3(4), 1999.
[36]
Q. Wu, P. Juang, M. Martonosi, and D.W. Clark. Voltage and frequency control with adaptive reaction time in multiple-clock-domain processors. High-Performance Computer Architecture, 2005.
[37]
W. Zhu, J. del Cuvillo, and G. R. Gao. Performance characteristics of openmp language constructs on a many-core-on-a-chip architecuture. The 2nd International Workshop on OpenMP (IWOMP), 2006.

Cited By

View all
  • (2022)Criticality-aware priority to accelerate GPU memory accessThe Journal of Supercomputing10.1007/s11227-022-04657-379:1(188-213)Online publication date: 6-Jul-2022
  • (2021)Intelligent Adaptation of Hardware Knobs for Improving Performance and Power ConsumptionIEEE Transactions on Computers10.1109/TC.2020.298023070:1(1-16)Online publication date: 1-Jan-2021
  • (2021)Multi-Core Power Management through Deep Reinforcement Learning2021 IEEE International Symposium on Circuits and Systems (ISCAS)10.1109/ISCAS51556.2021.9401447(1-5)Online publication date: May-2021
  • Show More Cited By

Index Terms

  1. Meeting points: using thread criticality to adapt multicore hardware to parallel regions

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    PACT '08: Proceedings of the 17th international conference on Parallel architectures and compilation techniques
    October 2008
    328 pages
    ISBN:9781605582825
    DOI:10.1145/1454115
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 25 October 2008

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. critical threads
    2. energy-aware
    3. low-power
    4. meeting point thread characterization
    5. microarchitecture
    6. multi-threaded application
    7. thread balancing
    8. thread delaying

    Qualifiers

    • Research-article

    Conference

    PACT '08
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 121 of 471 submissions, 26%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)6
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 02 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)Criticality-aware priority to accelerate GPU memory accessThe Journal of Supercomputing10.1007/s11227-022-04657-379:1(188-213)Online publication date: 6-Jul-2022
    • (2021)Intelligent Adaptation of Hardware Knobs for Improving Performance and Power ConsumptionIEEE Transactions on Computers10.1109/TC.2020.298023070:1(1-16)Online publication date: 1-Jan-2021
    • (2021)Multi-Core Power Management through Deep Reinforcement Learning2021 IEEE International Symposium on Circuits and Systems (ISCAS)10.1109/ISCAS51556.2021.9401447(1-5)Online publication date: May-2021
    • (2021)PrioRAT: Criticality-Driven Prioritization Inside the On-Chip Memory HierarchyEuro-Par 2021: Parallel Processing10.1007/978-3-030-85665-6_37(599-615)Online publication date: 25-Aug-2021
    • (2020)Advance Virtual Channel ReservationIEEE Transactions on Computers10.1109/TC.2020.297198269:9(1320-1334)Online publication date: 1-Sep-2020
    • (2020)Pursuing Extreme Power Efficiency With PPCC Guided NoC DVFSIEEE Transactions on Computers10.1109/TC.2019.294980769:3(410-426)Online publication date: 1-Mar-2020
    • (2019)Advance Virtual Channel Reservation2019 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE.2019.8715104(1178-1183)Online publication date: Mar-2019
    • (2017)Energy-Efficient Compilation of Irregular Task-Parallel LoopsACM Transactions on Architecture and Code Optimization10.1145/313606314:4(1-29)Online publication date: 14-Nov-2017
    • (2017)Mirage coresProceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3123939.3123969(745-758)Online publication date: 14-Oct-2017
    • (2017)Thread Criticality Assisted Replication and Migration for Chip Multiprocessor CachesIEEE Transactions on Computers10.1109/TC.2017.270567866:10(1747-1762)Online publication date: 1-Oct-2017
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media