[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Compiler-Guided Parallelism Adaption Based on Application Partition for Power-Gated ILP Processor

Published: 01 April 2017 Publication History

Abstract

Instruction-level parallelism (ILP) processors have been widely used to improve speed for several decades. However, the requirement of parallelism changes between applications, even within an application. Fixed high parallelism could result in poor utilization and extra leakage energy. Designing energy-efficient ILP processors to trade off power/speed has been a critical issue in current research. In this paper, a compiler-guided parallelism adaption based on an application partition algorithm is proposed to implement parallelism adaption with applications running on ILP processors. The aim is to minimize energy consumption without degrading the execution time. The main idea is described as follows: 1) partition the application into several power gating regions (PGRs); 2) assign adapted parallelism for each region by analyzing the requirements of resources and energy efficiency; and 3) reschedule each region with its own parallelism and insert power-gating instructions into the application to control hardware ON/OFF. The experimental results of evaluation with the CoreMarkPro benchmark suits show the expected savings of leakage energy. Our algorithm could reduce the leakage energy in register files by 30.46% and 64.06% for applications with high variance on software-inherent parallelism. Furthermore, the overhead energy originated from state transition is much lower than Tabkhi’s algorithm.

References

[1]
H. Esmaeilzadeh, E. Blem, R. S. Amant, K. Sankaralingam, and D. Burger, “Dark silicon and the end of multicore scaling,” IEEE Micro, vol. Volume 32, no. Issue 3, pp. 122–134, 2012.
[2]
Y. Shin, J. Seomun, K.-M. Choi, and T. Sakurai, “Power gating: Circuits, design methodologies, and best practice for standard-cell VLSI designs,” ACM Trans. Design Autom. Elect. Syst., vol. Volume 15, no. Issue 4, 2010, Art. no. .
[3]
A. Lambrechts et al., “Power breakdown analysis for a heterogeneous NoC platform running a video application,” in Proc. IEEE ASAP, Jul. 2005, pp. 179–184.
[4]
V. Zyuban and P. Kogge, “The energy complexity of register files,” in Proc. Int. Symp. Low Power Electron. Design (ISLPED), 1998, pp. 305–310.
[5]
S. Rixner, W. J. Dally, B. Khailany, P. Mattson, U. J. Kapasi, and J. D. Owens, “Register organization for media processing,” in Proc. 6th Int. Symp. High-Perform. Comput. Archit. (HPCA), 2000, pp. 375–386.
[6]
Z. Liang, W. Zhang, and Y.-C. Ma, “Deadline-constrained clustered scheduling for VLIW architectures using power-gated register files,” ACM Trans. Archit. Code Optim., vol. Volume 11, no. Issue 2, 2014, Art. no. .
[7]
EEMBC Industry-Standard Benchmarks for Embedded Systems, accessed on 2016. {Online}. Available: http://www.eembc.org/coremark/index.php?b=pro.htm
[8]
. (2011). 40nm Technology . {Online}. Available: http://www.tsmc.com/english/dedicatedfoundry/technology/40nm.htm
[9]
A. B. Kahng, S. Kang, R. Kumar, and J. Sartori, “Enhancing the efficiency of energy-constrained DVFS designs,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. Volume 21, no. Issue 10, pp. 1769–1782, 2013.
[10]
K. K. Rangan, G.-Y. Wei, and D. Brooks, “Thread motion: Fine-grained power management for multi-core systems,” ACM SIGARCH Comput. Archit. News, vol. Volume 37, no. Issue 3, pp. 302–313, 2009.
[11]
Q. Cai, J. González, G. Magklis, P. Chaparro, and A. González, “Thread shuffling: Combining DVFS and thread migration to reduce energy consumptions for multi-core systems,” in Proc. Int. Symp. Low Power Electron. Design (ISLPED), Aug. 2011, pp. 379–384.
[12]
Y.-F. Tsai, A. H. Ankadi, N. Vijaykrishnan, M. J. Irwin, and T. Theocharides, “ChipPower: An architecture-level leakage simulator,” in Proc. IEEE Int. SoC Conf., Sep. 2004, pp. 395–398.
[13]
S. Roy, N. Ranganathan, and S. Katkoori, “A framework for power-gating functional units in embedded microprocessors,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. Volume 17, no. Issue 11, pp. 1640–1649, 2009.
[14]
M. Wang, Y. Wang, D. Liu, Z. Qin, and Z. Shao, “Compiler-assisted leakage-aware loop scheduling for embedded VLIW DSP processors,” J. Syst. Softw., vol. Volume 83, no. Issue 5, pp. 772–785, 2010.
[15]
R. Nagpal and Y. N. Srikant, “Compiler-assisted power optimization for clustered VLIW architectures,” Parallel Comput., vol. Volume 37, no. Issue 1, pp. 42–59, 2011.
[16]
H. S. Kim, N. Vijaykrishnan, M. Kandemir, and M. J. Irwin, “Adapting instruction level parallelism for optimizing leakage in VLIW architectures,” in Proc. LCTES, 2013, pp. 275–283.
[17]
M. B. Henry and L. Nazhandali, “NEMS-based functional unit powergating: Design, analysis, and optimization,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. Volume 60, no. Issue 2, pp. 290–302, 2013.
[18]
Y.-P. You, C.-W. Huang, and J. K. Lee, “Compilation for compact power-gating controls,” ACM Trans. Design Autom. Elect. Syst., vol. Volume 12, no. Issue 4, 2007, Art. no. .
[19]
M. Kondo et al., “Design and evaluation of fine-grained power-gating for embedded microprocessors,” in Proc. Conf. Design, Autom. Test Eur. (DATE), 2014, Art. no. .
[20]
D. Patti, M. Palesi, and V. Catania, Merging Compilation and Microarchitectural Configuration Spaces for Performance/Power Optimization in VLIW-Based Systems, vol. Volume 285 . New York, NY, USA: Springer, 2014, pp. 203–212.
[21]
S. Cao, Z. Li, F. Wang, and S. Wei, “Compiler-assisted leakage- and temperature-aware instruction-level VLIW scheduling,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. Volume 22, no. Issue 6, pp. 1416–1428, 2014.
[22]
N. Goel, A. Kumar, and P. R. Panda, “Shared-port register file architecture for low-energy VLIW processors,” ACM Trans. Archit. Code Optim., vol. Volume 11, no. Issue 1, 2014, Art. no. .
[23]
V. Porpodas and M. Cintra, “CAeSaR: Unified cluster-assignment scheduling and communication reuse for clustered VLIW processors,” in Proc. Int. Conf. Compilers, Archit. Synthesis Embedded Syst. (CASES), Sep. 2013, pp. 1–10.
[24]
Q. Cai, J. M. Codina, J. Gonzalez, and A. Gonzalez, “A software-hardware hybrid steering mechanism for clustered microarchitectures,” in Proc. IEEE Int. Symp. Parallel Distrib. Process. (IPDPS), Apr. 2008, pp. 1–12.
[25]
J. M. Codina, J. Sanchez, and A. Gonzalez, “Virtual cluster scheduling through the scheduling graph,” in Proc. CGO, 2007, pp. 89–101.
[26]
S. Roy, N. Ranganathan, and S. Katkoori, “State-retentive power gating of register files in multicore processors featuring multithreaded in-order cores,” IEEE Trans. Comput., vol. Volume 60, no. Issue 11, pp. 1547–1560, 2011.
[27]
H. Tabkhi and G. Schirner, “Application-guided power gating reducing register file static power,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. Volume 22, no. Issue 12, pp. 2513–2526, 2014.
[28]
C. Lattner and V. Adve, “LLVM: A compilation framework for lifelong program analysis & transformation,” in Proc. CGO, Mar. 2004, pp. 75–86.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Very Large Scale Integration (VLSI) Systems
IEEE Transactions on Very Large Scale Integration (VLSI) Systems  Volume 25, Issue 4
April 2017
400 pages

Publisher

IEEE Educational Activities Department

United States

Publication History

Published: 01 April 2017

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 17 Dec 2024

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media