More Web Proxy on the site http://driver.im/

research-article

Compiler-Guided Parallelism Adaption Based on Application Partition for Power-Gated ILP Processor

Authors:

Haowen LuoAuthors Info & Claims

IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Volume 25, Issue 4

Pages 1329 - 1341

https://doi.org/10.1109/TVLSI.2016.2636419

Published: 01 April 2017 Publication History

Abstract

Instruction-level parallelism (ILP) processors have been widely used to improve speed for several decades. However, the requirement of parallelism changes between applications, even within an application. Fixed high parallelism could result in poor utilization and extra leakage energy. Designing energy-efficient ILP processors to trade off power/speed has been a critical issue in current research. In this paper, a compiler-guided parallelism adaption based on an application partition algorithm is proposed to implement parallelism adaption with applications running on ILP processors. The aim is to minimize energy consumption without degrading the execution time. The main idea is described as follows: 1) partition the application into several power gating regions (PGRs); 2) assign adapted parallelism for each region by analyzing the requirements of resources and energy efficiency; and 3) reschedule each region with its own parallelism and insert power-gating instructions into the application to control hardware ON/OFF. The experimental results of evaluation with the CoreMarkPro benchmark suits show the expected savings of leakage energy. Our algorithm could reduce the leakage energy in register files by 30.46% and 64.06% for applications with high variance on software-inherent parallelism. Furthermore, the overhead energy originated from state transition is much lower than Tabkhi’s algorithm.

References

[1]

H. Esmaeilzadeh, E. Blem, R. S. Amant, K. Sankaralingam, and D. Burger, “Dark silicon and the end of multicore scaling,” IEEE Micro, vol. Volume 32, no. Issue 3, pp. 122–134, 2012.

Digital Library

[2]

Y. Shin, J. Seomun, K.-M. Choi, and T. Sakurai, “Power gating: Circuits, design methodologies, and best practice for standard-cell VLSI designs,” ACM Trans. Design Autom. Elect. Syst., vol. Volume 15, no. Issue 4, 2010, Art. no. .

Digital Library

[3]

A. Lambrechts et al., “Power breakdown analysis for a heterogeneous NoC platform running a video application,” in Proc. IEEE ASAP, Jul. 2005, pp. 179–184.

Digital Library

[4]

V. Zyuban and P. Kogge, “The energy complexity of register files,” in Proc. Int. Symp. Low Power Electron. Design (ISLPED), 1998, pp. 305–310.

Digital Library

[5]

S. Rixner, W. J. Dally, B. Khailany, P. Mattson, U. J. Kapasi, and J. D. Owens, “Register organization for media processing,” in Proc. 6th Int. Symp. High-Perform. Comput. Archit. (HPCA), 2000, pp. 375–386.

[6]

Z. Liang, W. Zhang, and Y.-C. Ma, “Deadline-constrained clustered scheduling for VLIW architectures using power-gated register files,” ACM Trans. Archit. Code Optim., vol. Volume 11, no. Issue 2, 2014, Art. no. .

Digital Library

[7]

EEMBC Industry-Standard Benchmarks for Embedded Systems, accessed on 2016. {Online}. Available: http://www.eembc.org/coremark/index.php?b=pro.htm

[8]

. (2011). 40nm Technology . {Online}. Available: http://www.tsmc.com/english/dedicatedfoundry/technology/40nm.htm

[9]

A. B. Kahng, S. Kang, R. Kumar, and J. Sartori, “Enhancing the efficiency of energy-constrained DVFS designs,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. Volume 21, no. Issue 10, pp. 1769–1782, 2013.

Digital Library

[10]

K. K. Rangan, G.-Y. Wei, and D. Brooks, “Thread motion: Fine-grained power management for multi-core systems,” ACM SIGARCH Comput. Archit. News, vol. Volume 37, no. Issue 3, pp. 302–313, 2009.

Digital Library

[11]

Q. Cai, J. González, G. Magklis, P. Chaparro, and A. González, “Thread shuffling: Combining DVFS and thread migration to reduce energy consumptions for multi-core systems,” in Proc. Int. Symp. Low Power Electron. Design (ISLPED), Aug. 2011, pp. 379–384.

Digital Library

[12]

Y.-F. Tsai, A. H. Ankadi, N. Vijaykrishnan, M. J. Irwin, and T. Theocharides, “ChipPower: An architecture-level leakage simulator,” in Proc. IEEE Int. SoC Conf., Sep. 2004, pp. 395–398.

[13]

S. Roy, N. Ranganathan, and S. Katkoori, “A framework for power-gating functional units in embedded microprocessors,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. Volume 17, no. Issue 11, pp. 1640–1649, 2009.

Digital Library

[14]

M. Wang, Y. Wang, D. Liu, Z. Qin, and Z. Shao, “Compiler-assisted leakage-aware loop scheduling for embedded VLIW DSP processors,” J. Syst. Softw., vol. Volume 83, no. Issue 5, pp. 772–785, 2010.

Digital Library

[15]

R. Nagpal and Y. N. Srikant, “Compiler-assisted power optimization for clustered VLIW architectures,” Parallel Comput., vol. Volume 37, no. Issue 1, pp. 42–59, 2011.

Digital Library

[16]

H. S. Kim, N. Vijaykrishnan, M. Kandemir, and M. J. Irwin, “Adapting instruction level parallelism for optimizing leakage in VLIW architectures,” in Proc. LCTES, 2013, pp. 275–283.

Digital Library

[17]

M. B. Henry and L. Nazhandali, “NEMS-based functional unit powergating: Design, analysis, and optimization,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. Volume 60, no. Issue 2, pp. 290–302, 2013.

[18]

Y.-P. You, C.-W. Huang, and J. K. Lee, “Compilation for compact power-gating controls,” ACM Trans. Design Autom. Elect. Syst., vol. Volume 12, no. Issue 4, 2007, Art. no. .

Digital Library

[19]

M. Kondo et al., “Design and evaluation of fine-grained power-gating for embedded microprocessors,” in Proc. Conf. Design, Autom. Test Eur. (DATE), 2014, Art. no. .

Digital Library

[20]

D. Patti, M. Palesi, and V. Catania, Merging Compilation and Microarchitectural Configuration Spaces for Performance/Power Optimization in VLIW-Based Systems, vol. Volume 285 . New York, NY, USA: Springer, 2014, pp. 203–212.

[21]

S. Cao, Z. Li, F. Wang, and S. Wei, “Compiler-assisted leakage- and temperature-aware instruction-level VLIW scheduling,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. Volume 22, no. Issue 6, pp. 1416–1428, 2014.

[22]

N. Goel, A. Kumar, and P. R. Panda, “Shared-port register file architecture for low-energy VLIW processors,” ACM Trans. Archit. Code Optim., vol. Volume 11, no. Issue 1, 2014, Art. no. .

Digital Library

[23]

V. Porpodas and M. Cintra, “CAeSaR: Unified cluster-assignment scheduling and communication reuse for clustered VLIW processors,” in Proc. Int. Conf. Compilers, Archit. Synthesis Embedded Syst. (CASES), Sep. 2013, pp. 1–10.

Digital Library

[24]

Q. Cai, J. M. Codina, J. Gonzalez, and A. Gonzalez, “A software-hardware hybrid steering mechanism for clustered microarchitectures,” in Proc. IEEE Int. Symp. Parallel Distrib. Process. (IPDPS), Apr. 2008, pp. 1–12.

[25]

J. M. Codina, J. Sanchez, and A. Gonzalez, “Virtual cluster scheduling through the scheduling graph,” in Proc. CGO, 2007, pp. 89–101.

Digital Library

[26]

S. Roy, N. Ranganathan, and S. Katkoori, “State-retentive power gating of register files in multicore processors featuring multithreaded in-order cores,” IEEE Trans. Comput., vol. Volume 60, no. Issue 11, pp. 1547–1560, 2011.

Digital Library

[27]

H. Tabkhi and G. Schirner, “Application-guided power gating reducing register file static power,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. Volume 22, no. Issue 12, pp. 2513–2526, 2014.

[28]

C. Lattner and V. Adve, “LLVM: A compilation framework for lifelong program analysis & transformation,” in Proc. CGO, Mar. 2004, pp. 75–86.

Digital Library

Recommendations

Natural instruction level parallelism-aware compiler for high-performance QueueCore processor architecture

This work presents a static method implemented in a compiler for extracting high instruction level parallelism for the 32-bit QueueCore, a queue computation-based processor. The instructions of a queue processor implicitly read and write their operands, ...
Enhancing instruction level parallelism through compiler-controlled speculation
Optimum Instruction-level Parallelism (ILP) for Superscalar and VLIW Processors

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Very Large Scale Integration (VLSI) Systems

IEEE Transactions on Very Large Scale Integration (VLSI) Systems Volume 25, Issue 4

April 2017

400 pages

ISSN:1063-8210

Issue’s Table of Contents

Copyright © 2017.

Publisher

IEEE Educational Activities Department

United States

Publication History

Published: 01 April 2017

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 17 Dec 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents