[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

enDebug

Published: 01 October 2016 Publication History

Abstract

Energy consumption by software applications is one of the critical issues that determine the future of multicore software development. Inefficient software has been often cited as a major reason for wasteful energy consumption in computing systems. Without adequate tools, programmers and compilers are often left to guess the regions of code to optimize, that results in frustrating and unfruitful attempts at improving application energy. In this paper, we propose enDebug, an energy debugging framework that aims to automate the process of energy debugging. It first profiles the application code for high energy consumption using a hardware-software cooperative approach. Based on the observed application energy profile, an automated recommendation system that utilizes artificial selection genetic programming is used to generate the energy optimizing program mutants while preserving functional accuracy. We demonstrate the usefulness of our framework using several Splash-2, PARSEC-1.0 and SPEC CPU2006 benchmarks, where we were able to achieve up to 7% energy savings beyond the highest compiler optimization (including profile guided optimization) settings on real-world Intel Core i7 processors. We explore the design of a hardware-software cooperative energy profiler.We design automated recommendation system using the guided genetic algorithm to explore energy optimizations in the program code.Our guided genetic algorithm can substantially reduce program energy on top of the highest GNU C compiler settings.

References

[1]
F. Bellosa, The benefits of event: Driven energy accounting in power-sensitive systems, in: Proceedings of the 9th Workshop on ACM SIGOPS European Workshop: Beyond the PC: New Challenges for the Operating System, EW 9, 2000.
[2]
A. Bhattacharjee, M. Martonosi, Thread criticality predictors for dynamic performance, power, and resource management in chip multiprocessors, in: Proceedings of ISCA, 2009.
[3]
C. Bienia, S. Kumar, J. Singh, K. Li, The PARSEC benchmark suite: Characterization and architectural implications, Princeton University Technical Report TR-811-08, 2008.
[4]
W. Bircher, L. John, Complete system power estimation using processor performance events, IEEE Comput. (2012).
[5]
M. Brameier, W. Banzhaf, Linear Genetic Programming, Springer, 2007.
[6]
A. Buyuktosunoglu, T. Karkhanis, D.H. Albonesi, P. Bose, Energy efficient co-adaptive instruction fetch and issue, in: Proceedings of ISCA, 2003.
[7]
J. Chen, R.C. Chiang, H.H. Huang, G. Venkataramani, Energy-aware writes to non-volatile main memory, SIGOPS Oper. Syst. Rev., 45 (2012) 48-52.
[8]
J. Chen, G. Venkataramani, A hardware-software cooperative approach for application energy profiling, Comput. Archit. Lett., 14 (2015) 5-8.
[9]
J. Chen, G. Venkataramani, G. Parmer, The need for power debugging in the multi-core environment, IEEE Comput. Archit. Lett. (2012).
[10]
J. Chen, F. Yao, G. Venkataramani, Watts-inside: A hardware-software cooperative approach for multicore power debugging, in: 2013 IEEE 31st International Conference on Computer Design, ICCD, 2013, pp. 335-342. http://dx.doi.org/10.1109/ICCD.2013.6657062.
[11]
eBay Inc., Digital service efficiency. http://dse.ebay.com/.
[12]
D. Ernst, N.S. Kim, S. Das, S. Pant, R. Rao, T. Pham, C. Ziesler, D. Blaauw, T. Austin, K. Flautner, T. Mudge, Razor: A low-power pipeline based on circuit-level timing speculation, in: Proceedings of MICRO, 2003.
[13]
D. Folegnani, A. González, Energy-effective issue logic, in: Proceedings of ISCA, 2001.
[14]
Free Software Foundation, Inc, GCC, the GNU compiler collection. http://gcc.gnu.org.
[15]
P. Godefroid, N. Klarlund, K. Sen, Dart: Directed automated random testing, SIGPLAN Not., 40 (2005) 213-223.
[16]
V. Govindaraju, C.H. Ho, K. Sankaralingam, Dynamically specialized datapaths for energy efficient computing, in: Proceedings of HPCA, HPCA'11, 2011.
[17]
S. Hao, D. Li, W.G.J. Halfond, R. Govindan, Estimating mobile application energy consumption using program analysis, in: Proceedings of ICSE, 2013.
[18]
M. Hayenga, V. Reddy, M.H. Lipasti, Revolver: Processor architecture for power efficient loop execution, in: Proceedings of the 20th IEEE International Symposium on High Performance Computer Architecture, HPCA 2014, 2014.
[19]
C.-H. Hsu, U. Kremer, The design, implementation, and evaluation of a compiler algorithm for cpu energy reduction, in: Proceedings of the ACM SIGPLAN 2003 Conference on Programming Language Design and Implementation, ACM, New York, NY, USA, 2003, pp. 38-48.
[20]
Intel Corporation, Intel' ¿64 and IA-32 Architectures Optimization Reference Manual, no. 248966-018, 2009.
[21]
C. Isci, G. Contreras, M. Martonosi, Live, runtime phase monitoring and prediction on real systems with application to dynamic power management, in: Proceedings of MICRO, 2006.
[22]
H. Jacobson, A. Buyuktosunoglu, P. Bose, E. Acar, R. Eickemeyer, Abstraction and microarchitecture scaling in early-stage power modeling, in: Proceedings of HPCA, 2011.
[23]
T.M. Jones, M.F.P. O'Boyle, J. Abella, A. Gonzalez, Software directed issue queue power reduction, in: Proceedings of HPCA, 2005.
[24]
R. Jotwani, S. Sundaram, S. Kosonocky, A. Schaefer, V. Andrade, G. Constant, A. Novak, S. Naffziger, An x86-64 core implemented in 32nm soi cmos, in: Proceedings of ISSCC, 2010.
[25]
N.P. Jouppi, et al. Cacti 5.1. http://quid.hpl.hp.com:9081/cacti/.
[26]
M. Kandemir, N. Vijaykrishnan, M.J. Irwin, W. Ye, Influence of compiler optimizations on system power, IEEE Trans. Very Large Scale Integr. Syst., 9 (2001) 801-804.
[27]
W. Kim, M. Gupta, G.-Y. Wei, D. Brooks, System level analysis of fast, per-core dvfs using on-chip switching regulators, in: IEEE 14th International Symposium on High Performance Computer Architecture, 2008. HPCA 2008, 2008.
[28]
R. Kohavi, A study of cross-validation and bootstrap for accuracy estimation and model selection, in: Proceedings of the 14th International Joint Conference on Artificial Intelligence-Vol. 2, IJCAI'95, 1995.
[29]
S. Li, J.H. Ahn, R.D. Strong, J.B. Brockman, D.M. Tullsen, N.P. Jouppi, Mcpat: an integrated power, area, and timing modeling framework for multicore and manycore architectures, in: MICRO, 2009.
[30]
J. Li, J.F. Martinez, M.C. Huang, The thrifty barrier: Energy-aware synchronization in shared-memory multiprocessors, in: Proceedings of HPCA, 2004.
[31]
G. Magklis, M.L. Scott, G. Semeraro, D.H. Albonesi, S. Dropsho, Profile-based dynamic voltage and frequency scaling for a multiple clock domain microprocessor, in: Proceedings of the 30th Annual International Symposium on Computer Architecture, ACM, New York, NY, USA, 2003, pp. 14-27.
[32]
M.D. Powell, A. Agarwal, T.N. Vijaykumar, B. Falsafi, K. Roy, Reducing set-associative cache energy via way-prediction and selective direct-mapping, in: Proceedings of MICRO, 2001.
[33]
M.D. Powell, A. Biswas, J. Emer, S. Mukherjee, B. Sheikh, S. Yardi, Camp: A technique to estimate per-structure power at run-time using a few simple parameters, in: Proceedings of HPCA, 2009.
[34]
A. Rallo, Data center efficiency trends for 2014. http://www.energymanagertoday.com/data-center-efficiency-trends-for-2014-097779/.
[35]
K.K. Rangan, G. Wei, D. Brooks, Thread motion: fine-grained power management for multi-core systems, in: Proceedings of ISCA, 2009.
[36]
J. Renau, et al. SESC. http://sesc.sourceforge.net.
[37]
E. Rotem, A. Naveh, D. Rajwan, A. Ananthakrishnan, E. Weissmann, Power-management architecture of the intel microarchitecture code-named sandy bridge, IEEE Micro (2012).
[38]
J. Russell, M. Jacome, Software power estimation and optimization for high performance, 32-bit embedded processors, in: International Conference on Computer Design: VLSI in Computers and Processors, 1998. ICCD'98. Proceedings, 1998, pp. 328-333.
[39]
J. Sartori, B. Ahrens, R. Kumar, Power balanced pipelines, in: Proceedings of HPCA, 2012.
[40]
E. Schulte, J. Dorn, S. Harding, S. Forrest, W. Weimer, Post-compiler software optimization for reducing energy, in: Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems, 2014.
[41]
K. Sen, D. Marinov, G. Agha, Cute: A concolic unit testing engine for c, in: Proceedings of the 10th European Software Engineering Conference Held Jointly with 13th ACM SIGSOFT International Symposium on Foundations of Software Engineering, ACM, New York, NY, USA, 2005, pp. 263-272.
[42]
K. Shen, A. Shriraman, S. Dwarkadas, X. Zhang, Z. Chen, Power containers: An os facility for fine-grained power and energy management on multicore servers, in: Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS'13, 2013.
[43]
Standard Performance Evaluation Corporation, SPEC Benchmarks. http://www.spec.org.
[44]
V. Tiwari, S. Malik, A. Wolfe, M.T. Lee, Instruction level power analysis and optimization of software, J. VLSI Signal Process. Syst., 13 (1996).
[45]
J. Treibig, G. Hager, G. Wellein, Likwid: A lightweight performance-oriented tool suite for x86 multicore environments, in: Proceedings of the 2010 39th International Conference on Parallel Processing Workshops, IEEE Computer Society, Washington, DC, USA, 2010, pp. 207-216.
[46]
G. Venkatesh, J. Sampson, N. Goulding, S. Garcia, V. Bryksin, J. Lugo-Martinez, S. Swanson, M.B. Taylor, Conservation cores: reducing the energy of mature computations, in: Proceedings of ASPLOS, 2010.
[47]
C. Wilkerson, H. Gao, A.R. Alameldeen, Z. Chishti, M. Khellah, S. Lu, Trading off cache capacity for low-voltage operation, IEEE Micro (2009).
[48]
E. Witchel, C.S. Larsen, S. Ananian, K. Asanović, Direct addressed caches for reduced power consumption, in: Proceedings of MICRO, 2001.
[49]
S.C. Woo, M. Ohara, E. Torrie, J.P. Singh, A. Gupta, The splash-2 programs: characterization and methodological considerations, in: Proceedings of ISCA, 1995.
[50]
Q. Wu, M. Martonosi, D.W. Clark, V.J. Reddi, D. Connors, Y. Wu, J. Lee, D. Brooks, A dynamic compilation framework for controlling microprocessor energy and performance, in: Proceedings of the 38th Annual IEEE/ACM International Symposium on Microarchitecture, IEEE Computer Society, Washington, DC, USA, 2005, pp. 271-282.
[51]
V.J.R. Yuhazo Zhu, Webcore: Architectural support for mobile web browsing, in: Proc. of International Symposium on Computer Architecture., 2014.
[52]
C. Zhang, F. Vahid, W. Najjar, A highly configurable cache architecture for embedded systems, in: Proceedings of ISCA, 2003.
[53]
Y. Zhu, G. Magklis, M.L. Scott, C. Ding, D.H. Albonesi, The energy impact of aggressive loop fusion, in: Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques, IEEE Computer Society, Washington, DC, USA, 2004, pp. 153-164.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Journal of Parallel and Distributed Computing
Journal of Parallel and Distributed Computing  Volume 96, Issue C
October 2016
218 pages

Publisher

Academic Press, Inc.

United States

Publication History

Published: 01 October 2016

Author Tags

  1. Energy optimization
  2. Energy profiling
  3. Genetic programming

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 11 Dec 2024

Other Metrics

Citations

Cited By

View all

View Options

View options

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media