[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article

Energy-Awareness and Performance Management with Parallel Dataflow Applications

Published: 01 April 2017 Publication History

Abstract

Applications have traditionally been executed as fast as possible (Race-to-Idle) and mapped to as many cores as possible (Fair scheduling) to minimize the energy consumption. With modern hardware, this method has become inefficient because of the power characteristics of the platforms. Instead, applications should utilize an optimal combination of clock frequency and number of cores to balance the dynamic and static power. Such approaches have been difficult to achieve since resource allocation is based only on CPU utilization. Resources are then allocated to prohibit over utilization rather than following software performance requirements. By adjusting the clock frequency directly according to software requirements and activating CPU cores according to the application parallelism, significant energy can be saved by lowering the average power dissipation. To enforce these recommendations, this paper provides means of expressing performance and parallelism in applications for more tight integration with the power management to balance the execution speed and mapping on multi-core systems. An interface between the applications and the hardware resources is provided in combination with a novel power management runtime system called Bricktop. A signal processing case study demonstrates real-world energy savings up to 50 % without performance degradation.

References

[1]
Aydin, H., Melhem, R., Mosse, D., & Mejia-Alvarez, P. (2004). Power-aware scheduling for periodic real-time tasks. IEEE Transactions on Computers, 53(5), 584---600.
[2]
Azeemi, N.Z. (2006). Exploiting parallelism for energy efficient source code high performance computing. In IEEE International Conference on Industrial Technology, 2006. ICIT 2006. (pp. 2741---2746)
[3]
Brodowski, D. (2013). Cpu frequency and voltage scaling code in the linux(tm) kernel. https://www.kernel.org/doc/Documentation/cpu-freq/governors.txt.
[4]
Cervin, A., Henriksson, D., Lincoln, B., Eker, J., & Årzén, K.E. (2003). How does control timing affect performance? Analysis and simulation of timing using Jitterbug and TrueTime. IEEE Control Systems Magazine, 23 (3), 16---30.
[5]
Chandrakasan, A., Sheng, S., & Brodersen, R. (1992). Low-power cmos digital design. Solid-State Circuits . Journal of IEEE, 27(4), 473---484.
[6]
Cho, S., & Melhem, R. (2010). On the interplay of parallelization, program performance, and energy consumption. Parallel and Distributed Systems. Transactions on IEEE, 21(3), 342---353.
[7]
Cristea, A., & Okamoto, T. (1999). Speed-up opportunities for ann in a time-share parallel environment. In International Joint Conference on Neural Networks, 1999. IJCNN '99. vol. 4. (Vol. 4 pp. 2410---2413)
[8]
Lee, E., & D.m. (1987). Static scheduling of synchronous data-flow programs for digital signal processing. IEEE Transactions on Computers, 24---35.
[9]
Eyerman, S., Eeckhout, L., Karkhanis, T., & Smith, J.E. (2009). A mechanistic performance model for superscalar out-of-order processors. ACM Transactions on Computer Systems27 (2), 3:1---3:37.
[10]
Gill, P. E., Murray, W., & Michael, Saunders, M.A. (1997). Snopt An sqp algorithm for large-scale constrained optimization. SIAM Journal on Optimization, 12, 979---1006.
[11]
Hähnel, M., & Härtig, H. (2014). Heterogeneity by the numbers: A study of the odroid xu+e big. little platform. In Proceedings of the 6th USENIX Conference on Power-Aware Computing and Systems, HotPower'14, pp. 3---3. USENIX Association, Berkeley, CA, USA. http://dl.acm.org/citation.cfm?id=2696568.2696571.
[12]
Hällis, F., Holmbacka, S., Lund, W., Slotte, R., Lafond, S., & Lilius, J. (2013). Thermal influence on the energy efficiency of workload consolidation in many-core architectures. In Digital Communications - Green ICT (TIWDC), 2013 24th Tyrrhenian International Workshop on. (pp. 1---6)
[13]
Haque, M., Aydin, H., & Zhu, D. (2013). Energy-aware task replication to manage reliability for periodic real-time applications on multicore platforms. In International Green Computing Conference (IGCC), 2013. (pp. 1---11)
[14]
He, Y., Leiserson, C.E., & Leiserson, W.M. (2010). The cilkview scalability analyzer. In Proceedings of the Twenty-second Annual ACM Symposium on Parallelism in Algorithms and Architectures, SPAA '10, pp. 145---156. ACM, New York, NY, USA.
[15]
Hoffmann, H., Eastep, J., Santambrogio, M.D., Miller, J.E., & Agarwal, A. (2010). Application heartbeats: A generic interface for specifying program performance and goals in autonomous computing environments. In Proceedings of the 7th International Conference on Autonomic Computing, ICAC '10, pp. 79---88. ACM, New York, NY, USA.
[16]
Hoffmann, H., Eastep, J., Santambrogio, M.D., Miller, J.E., & Agarwal, A. (2010). Application heartbeats for software performance and health. SIGPLAN Not, 45(5), 347---348.
[17]
Hoffmann, H., Sidiroglou, S., Carbin, M., Misailovic, S., Agarwal, A., & Rinard, M. (2011). Dynamic knobs for responsive power-aware computing. SIGPLAN Not, 46(3), 199---212.
[18]
Holmbacka, S., Lafond, S., & Lilius, J. (2015). Performance monitor based power management for big.little platforms. In HIPEAC Workshop on energy efficiency with heterogeneous computing (pp. 1---6).
[19]
Hong, I., Kirovski, D., Qu, G., Potkonjak, M., & Srivastava, M. (1998). Power optimization of variable voltage core-based systems. In Design automation conference, 1998. Proceedings (pp. 176---181).
[20]
Huang, K., Santinelli, L., Chen, J.J., Thiele, L., & Buttazzo, G. (2009). Adaptive dynamic power management for hard real-time systems. In Real-Time Systems Symposium, 2009, RTSS 2009. 30th IEEE. (pp. 23---32)
[21]
Huang, K., Santinelli, L., Chen, J.J., Thiele, L., & Buttazzo, G. (2009). Periodic power management schemes for real-time event streams. In CDC/CCC 2009. Proceedings of the 48th IEEE Conference. (pp. 6224---6231)
[22]
Iondry, K. (1999). Iterative methods for optimization society for industrial and applied mathematics.
[23]
Jafri, S., Tajammul, M., Hemani, A., Paul, K., Plosila, J., & Tenhunen, H. (2013). Energy-aware-task-parallelism for efficient dynamic voltage, and frequency scaling, in cgras. In International Conference on Embedded computer systems: Architectures, Modeling, and Simulation (SAMOS XIII), 2013 (pp. 104---112).
[24]
Jejurikar, R., Pereira, C., & Gupta, R. (2004). Leakage aware dynamic voltage scaling for real-time embedded systems. In Proceedings of the 41st Annual Design Automation Conference, DAC '04, pp. 275---280. ACM, New York, NY, USA.
[25]
Jones, M.T. (2006). Inside the linux scheduler. http://www.ibm.com/developerworks/linux/library/l-scheduler/.
[26]
Kahng, A., Kang, S., Kumar, R., & Sartori, J. (2013). Enhancing the efficiency of energy-constrained dvfs designs. Very Large Scale Integration (VLSI) Systems, IEEE Transactions on, 21(10), 1769---1782.
[27]
Karmarkar, N. (1984). A new polynomial-time algorithm for linear programming. In Proceedings of the 16th ACM symposium on Theory of computing, STOC '84, pp. 302---311. ACM.
[28]
Khalid, N., Ahmad, S., Noor, N., Fadzil, A., & Taib, M. (2011). Parallel approach of sobel edge detector on multicore platform. International Journal of Computers and Communications Issue, 4, 236---244.
[29]
Kim, N., Austin, T., Baauw, D., Mudge, T., Flautner, K., Hu, J., Irwin, M., Kandemir, M., & Narayanan, V. (2003). Leakage current: Moore's law meets static power. Computer, 36(12), 68---75.
[30]
Kim, W., Shin, D., Yun, H.S., Kim, J., & Min, S.L. (2002). Performance comparison of dynamic voltage scaling algorithms for hard real-time systems. In Real-Time and Embedded Technology and Applications Symposium, 2002. Proceedings. Eighth IEEE. (pp. 219---228)
[31]
M'zah, A., & Hammami, O. (2010). Parallel programming and speed up evaluation of a noc 2-ary 4-fly. In International Conference on Microelectronics (ICM), 2010. 10.1109/ICM.2010.5696103 (pp. 156---159).
[32]
Nollet, V., Verkest, D., & Corporaal, H. (2008). A safari through the mpsoc run-time management jungle. Journal of Signal Processing Systems, 60(2), 251---268.
[33]
Pelcat, M., Piat, J., Wipliez, M., Aridhi, S., & Nezan, J. F. (2009). An open framework for rapid prototyping of signal processing applications. EURASIP journal on embedded systems, 2009, 11.
[34]
Qiu, M., Niu, J.W., Yang, L., Qin, X., Zhang, S., & Wang, B. (2010). Energy-aware loop parallelism maximization for multi-core dsp architectures. In Green Computing and Communications (GreenCom), 2010 IEEE/ACM Int'l Conference on Int'l Conference on Cyber, Physical and Social Computing (CPSCom). (pp. 205---212)
[35]
Rauber, T., & Runger, G. (2012). Energy-aware execution of fork-join-based task parallelism. In IEEE 20th International Symposium on Modeling, Analysis Simulation of Computer and Telecommunication Systems (MASCOTS), 2012. (pp. 231---240)
[36]
Sadri, M., Bartolini, A., & Benini, L. (2011). Single-chip cloud computer thermal model. In 17th international workshop on Thermal investigations of ICs and systems (THERMINIC), 2011 (pp. 1---6).
[37]
Sasaki, H., Imamura, S., & Inoue, K. (2013). Coordinated power-performance optimization in manycores. In 22nd International Conference on Parallel Architectures and Compilation Techniques (PACT), 2013. (pp. 51---61)
[38]
Seth, K., Anantaraman, A., Mueller, F., & Rotenberg, E. (2003). Fast: Frequency-aware static timing analysis. In Proceedings of the 24th IEEE international Real-Time Systems Symposium, RTSS '03, pp. 40---. IEEE computer society, washington, DC, USA.
[39]
Singh, H., Agarwal, K., Sylvester, D., & Nowka, K. (2007). IEEE Transactions on Enhanced leakage reduction techniques using intermediate strength power gating. Very Large Scale Integration (VLSI) Systems, 15(11), 1215---1224.
[40]
Takouna, I., Dawoud, W., & Meinel, C. (2011). Accurate mutlicore processor power models for power-aware resource management. In IEEE Ninth International Conference on Dependable, Autonomic and Secure Computing (DASC), 2011. (pp. 419---426)
[41]
Truchet, C., Richoux, F., & Codognet, P. (2013). Prediction of parallel speed-ups for las vegas algorithms. In 42nd International Conference on Parallel Processing (ICPP), 2013. (pp. 160---169)

Cited By

View all
  • (2018)Comparing Three Clustering-based Scheduling Methods for Energy-Aware Rapid Design of MP2SoCsJournal of Signal Processing Systems10.5555/3200212.320022290:4(537-570)Online publication date: 1-Apr-2018
  1. Energy-Awareness and Performance Management with Parallel Dataflow Applications

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Journal of Signal Processing Systems
    Journal of Signal Processing Systems  Volume 87, Issue 1
    April 2017
    172 pages
    ISSN:1939-8018
    EISSN:1939-8115
    Issue’s Table of Contents

    Publisher

    Kluwer Academic Publishers

    United States

    Publication History

    Published: 01 April 2017

    Author Tags

    1. Dataflow
    2. Multi-core
    3. Parallelism
    4. Power management

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 03 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2018)Comparing Three Clustering-based Scheduling Methods for Energy-Aware Rapid Design of MP2SoCsJournal of Signal Processing Systems10.5555/3200212.320022290:4(537-570)Online publication date: 1-Apr-2018

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media