[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
survey

Energy Efficient Computing Systems: Architectures, Abstractions and Modeling to Techniques and Standards

Published: 09 September 2022 Publication History

Abstract

Computing systems have undergone a tremendous change in the last few decades with several inflexion points. While Moore’s law guided the semiconductor industry to cram more and more transistors and logic into the same volume, the limits of instruction-level parallelism (ILP) and the end of Dennard’s scaling drove the industry towards multi-core chips. More recently, we have entered the era of domain-specific architectures (DSA) and chips for new workloads like artificial intelligence (AI) and machine learning (ML). These trends continue, arguably with other limits, along with challenges imposed by tighter integration, extreme form factors and increasingly diverse workloads, making systems more complex to architect, design, implement and optimize from an energy efficiency perspective. Energy efficiency has now become a first order design parameter and constraint across the entire spectrum of computing devices.
Many research surveys have gone into different aspects of energy efficiency techniques implemented in hardware and microarchitecture across devices, servers, HPC/cloud, data center systems along with improved software, algorithms, frameworks, and modeling energy/thermals. Somewhat in parallel, the semiconductor industry has developed techniques and standards around specification, modeling/simulation, benchmarking and verification of complex chips; these areas have not been addressed in detail by previous research surveys. This survey aims to bring these domains holistically together, present the latest in each of these areas, highlight potential gaps and challenges, and discuss opportunities for the next generation of energy efficient systems. The survey is composed of a systematic categorization of key aspects of building energy efficient systems - (1) specification - the ability to precisely specify the power intent, attributes or properties at different layers (2) modeling and simulation of the entire system or subsystem (hardware or software or both) so as to be able to experiment with possible options and perform what-if analysis, (3) techniques used for implementing energy efficiency at different levels of the stack, (4) verification techniques used to provide guarantees that the functionality of complex designs are preserved, and (5) energy efficiency benchmarks, standards and consortiums that aim to standardize different aspects of energy efficiency, including cross-layer optimizations.

References

[1]
[3]
Filipp Akopyan, Jun Sawada, Andrew Cassidy, Rodrigo Alvarez-Icaza, John Arthur, Paul Merolla, Nabil Imam, Yutaka Nakamura, Pallab Datta, Gi-Joon Nam, et al. 2015. Truenorth: Design and tool flow of a 65 mw 1 million neuron programmable neurosynaptic chip. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 34, 10 (2015), 1537–1557.
[4]
AMD. 2019. Workload tuning guide for AMD EPYC™ 7002 series processor based servers. (2019). Retrieved from https://developer.amd.com/resources/epyc-resources/epyc-tuning-guides/.
[5]
G. M. Amdahl. 1967. Validity of the single-processor approach to achieving large scale computing capabilities. In Proceedings of the AFIPS Conference Proceedings. Vol. 30. AFIPS Press, Reston, VA, 483–485.
[6]
Anandtech. 2019. Examining Intel’s ice lake processors: Taking a bite of the sunny cove microarchitecture. (2019). Retrieved from https://www.anandtech.com/show/14514/examining-intels-ice-lake-microarchitecture-and-sunny-cove.
[7]
Anandtech. 2019. The Intel optane memory SSD review. (2019). Retrieved from https://www.anandtech.com/show/11210/the-intel-optane-memory-ssd-review-32gb-of-kaby-lake-caching.
[8]
AnandTech. 2020. Amazon’s ARM-based graviton2 against AMD and intel: Comparing cloud compute. (2020). Retrieved from https://www.anandtech.com/show/15578/cloud-clash-amazon-graviton2-arm-against-intel-and-amd.
[9]
AnandTech. 2020. Apple announces the Apple M1 silicon. (2020). Retrieved from https://www.anandtech.com/print/16226/apple-silicon-m1-a14-deep-dive.
[10]
Anandtech. 2020. Intel’s 11th gen core tiger lake SOC detailed: Super Fin, Willow Cove and Xe-LP. (2020). Retrieved from https://www.anandtech.com/show/15971/intels-11th-gen-core-tiger-lake-soc-detailed-superfin-willow-cove-and-xelp.
[12]
Sonu Arora, Dan Bouvier, and Chris Weaver. 2020. AMD next generation 7NM Ryzen™ 4000 APU “Renoir”. In IEEE Hot Chips 32 Symposium, HCS 2020, Palo Alto, CA, USA, August 16-18, 2020. IEEE, 1–30.
[13]
IEEE Standards Association. 2016. IEEE P2415 - Standard for power modeling to enable system level analysis. (2016). Retrieved from https://standards.ieee.org/project/2415.html.
[14]
IEEE Standards Association. 2019. IEEE P2416 - Standard for power modeling to enable system level analysis. (2019). Retrieved from https://standards.ieee.org/project/2416.html.
[15]
Grant Ayers, Nayana Prasad Nagendra, David I. August, Hyoun Kyu Cho, Svilen Kanev, Christos Kozyrakis, Trivikram Krishnamurthy, Heiner Litz, Tipp Moseley, and Parthasarathy Ranganathan. 2019. AsmDB: Understanding and mitigating front-end stalls in warehouse-scale computers. In Proceedings of the 46th International Symposium on Computer Architecture. Association for Computing Machinery, New York, NY.
[16]
Luiz Andre Barroso and Urs Hoelzle. 2009. The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines (1st ed.). Morgan and Claypool Publishers.
[17]
Luiz André Barroso and Urs Hölzle. 2007. The case for energy-proportional computing. Computer 40, 12 (2007), 33–37.
[18]
Koen Bertels, Aritra Sarkar, A. Mouedenne, Thomas Hubregtsen, A. Yadav, Anneriet Krol, and Imran Ashraf. 2019. Quantum computer architecture: Towards full-stack quantum accelerators. In Proceedings of the 2020 Design, Automation & Test in Europe Conference & Exhibition.
[19]
W. L. Bircher and S. Naffziger. 2014. AMD SOC power management: Improving performance/watt using run-time feedback. In Proceedings of the IEEE 2014 Custom Integrated Circuits Conference. 1–4.
[20]
Shekhar Y. Borkar. 2010. The exascale challenge. In Proceedings of the 2010 International Symposium on VLSI Design, Automation and Test.2–3.
[21]
Kevin K. Chang, A. Giray Yağlıkçı, Saugata Ghose, Aditya Agrawal, Niladrish Chatterjee, Abhijith Kashyap, Donghyuk Lee, Mike O’Connor, Hasan Hassan, and Onur Mutlu. 2017. Understanding reduced-voltage operation in modern DRAM devices: Experimental characterization, analysis, and mechanisms. In Proceedings of the ACM on Measurement and Analysis of Computing Systems.
[22]
Vincent Chau, Xiaowen Chu, Hai Liu, and Yiu-Wing Leung. 2017. Energy efficient job scheduling with DVFS for CPU-GPU heterogeneous systems. In Proceedings of the 8th International Conference on Future Energy Systems. Association for Computing Machinery, New York, NY.
[23]
Samit Chaudhuri and Asmus Hetzel. 2017. SAT-based compilation to a non-von neumann processor. In Proceedings of the 36th International Conference on Computer-Aided Design. IEEE Press, 675–682.
[24]
Yu-Hsin Chen, Joel Emer, and Vivienne Sze. 2016. Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks. In Proceedings of the ACM SIGARCH Computer Architecture News, Vol. 44. IEEE, 367–379.
[25]
Compute Express Link Consortium. 2020. Compute express link. (2020). Retrieved from https://www.computeexpresslink.org/.
[26]
Tom Conte, Erik DeBenedictis, Natesh Ganesh, Todd Hylton, Susanne Still, John William Strachan, Stan Williams, Alexander Alemi, Lee Altenberg, Gavin Crooks, James Crutchfield, Lidia Rio, Josh Deutsch, Michael DeWeese, Khari Douglas, Massimiliano Esposito, Michael Frank, Robert Fry, Peter Harsha, and Yan Yufik. 2019. Thermodynamic computing: A report based on a computing community consortium (CCC) workshop. https://cra.org/ccc/wpcontent/uploads/sites/2/2019/10/CCC-Thermodynamic-Computing-Reportv3.pdf.
[27]
Intel Corporation. 2017. Intel® MovidiusTM MyriadTM VPU 2: A Class-Defining Processor. Retrieved from https://www.movidius.com/myriad2.
[28]
Intel Corporation. 2017. Intel® MovidiusTM SHAVE v2.0 - Microarchitectures - Movidius. Retrieved from https://en.wikichip.org/wiki/movidius/microarchitectures/shave_v2.0.
[29]
Intel Corporation. 2018. Intel and AMD working on Power Sharing Across CPU and GPU for Optimal Performance. Retrieved from https://www.spokenbyyou.com/intel-amd-working-power-sharing-across-cpu-gpu-optimal-performance/.
[30]
Intel Corporation. 2018. Intel’s Exascale Dataflow Engine drops x86 and von Neumann. Retrieved from https://www.nextplatform.com/2018/08/30/intels-exascale-dataflow-engine-drops-x86-and-von-neuman/.
[31]
Intel Corporation. 2019. Lakefield: Hybrid CPU with Foveros Technology. Retrieved from https://newsroom.intel.com/press-kits/lakefield/.
[32]
Intel Corporation. 2020. Intel 64 and IA-32 architectures software developer’s manual. (2020). Retrieved from https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-system-programming-manual-325384.pdf.
[33]
[34]
David E. Culler. 1986. Dataflow architectures. Annual Review of Computer Science 1, 1 (1986), 225–253.
[35]
Howard David, Chris Fallin, Eugene Gorbatov, Ulf R. Hanebutte, and Onur Mutlu. 2011. Memory power management via dynamic voltage/frequency scaling. In Proceedings of the 8th ACM International Conference on Autonomic Computing. Association for Computing Machinery, New York, NY.
[36]
Mike Davies, Narayan Srinivasa, Tsung-Han Lin, Gautham Chinya, Yongqiang Cao, Sri Harsha Choday, Georgios Dimou, Prasad Joshi, Nabil Imam, Shweta Jain, et al. 2018. Loihi: A neuromorphic manycore processor with on-chip learning. IEEE Micro 38, 1 (2018), 82–99.
[37]
Mike Davies, Andreas Wild, Garrick Orchard, Yulia Sandamirskaya, Gabriel A. Fonseca Guerra, Prasad Joshi, Philipp Plank, and Sumedh R. Risbud. 2021. Advancing neuromorphic computing with Loihi: A survey of results and outlook. Proceedings of the IEEE 109, 5 (2021), 911–934.
[38]
Q. Deng, D. Meisner, A. Bhattacharjee, T. F. Wenisch, and R. Bianchini. 2012. CoScale: Coordinating CPU and memory system DVFS in server systems. In Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.
[39]
Qingyuan Deng, David Meisner, Luiz Ramos, Thomas F. Wenisch, and Ricardo Bianchini. 2011. MemScale: Active low-power modes for main memory. ACM SIGPLAN Notices 46, 3 (2011), 225–238. DOI:
[40]
Linux Kernel Documentation. 2020. Power capping framework. (2020). Retrieved from https://www.kernel.org/doc/Documentation/power/powercap/powercap.txt.
[41]
Linux Kernel Documentation. 2021. CPU frequency scaling. (2021). Retrieved from https://wiki.archlinux.org/index.php/CPU_frequency_scaling.
[42]
Linux Kernel Documentation. 2021. Device frequency scaling. (2021). Retrieved from https://www.kernel.org/doc/html/latest/driver-api/devfreq.html.
[43]
Linux Kernel Documentation. 2021. Linux voltage and current regulator framework. (2021). Retrieved from https://www.kernel.org/doc/Documentation/power/regulator/overview.txt.
[44]
Linux Kernel Documentation. 2021. NO_HZ: Reducing scheduling-clock ticks. (2021). Retrieved from https://www.kernel.org/doc/Documentation/timers/no_hz.rst.
[45]
Linux Kernel Documentation. 2021. PM quality of service interface. (2021). Retrieved from https://www.kernel.org/doc/Documentation/power/pm_qos_interface.txt.
[46]
Linux Kernel Documentation. 2021. Runtime power management framework for I/O devices. (2021). Retrieved from https://www.kernel.org/doc/Documentation/power/runtime_pm.txt.
[47]
Linux Kernel Documentation. 2021. System power management sleep states. (2021). Retrieved from https://www.kernel.org/doc/Documentation/power/states.txt.
[48]
J. Doweck, W. Kao, A. K. Lu, J. Mandelblat, A. Rahatekar, L. Rappoport, E. Rotem, A. Yasin, and A. Yoaz. 2017. Inside 6th-generation Intel core: New microarchitecture code-named Skylake. IEEE Micro 37, 2 (2017), 52–62.
[49]
Jonathan Eastep, Steve Sylvester, Christopher Cantalupo, Brad Geltz, Federico Ardanaz, Asma Al-Rawi, Kelly Livingston, Fuat Keceli, Matthias Maiterth, and Siddhartha Jana. 2017. Global extensible open power manager: A vehicle for HPC community collaboration on co-designed energy management solutions. In Proceedings of the High Performance Computing. Julian M. Kunkel, Rio Yokota, Pavan Balaji, and David Keyes (Eds.), Springer International Publishing, Cham, 394–412.
[50]
R. Efraim, R. Ginosar, C. Weiser, and A. Mendelson. 2014. Energy aware race to halt: A down to EARtH approach for platform energy management. IEEE Computer Architecture Letters 13, 1 (2014), 25–28.
[51]
Semiconductor Engineering. 2020. Software Defined Hardware gains ground again. Retrieved from https://semiengineering.com/software-defined-hardware-gains-ground-again/.
[52]
Hadi Esmaeilzadeh, Emily Blem, Renee St. Amant, Karthikeyan Sankaralingam, and Doug Burger. 2012. Power limitations and dark silicon are challenging the future of multicore. ACM Transcations on Computer Systems 30, 3 (2012), 1–27.
[53]
Daniel Etiemble. 2018. 45-year CPU evolution: one law and two equations. arXiv:1803.00254. Retrieved from https://arxiv.org/abs/1803.00254.
[54]
Giorgos Fagas, John P. Gallagher, Luca Gammaitoni, and Douglas J. Paul. 2017. Energy challenges for ICT. In Proceedings of the ICT - Energy Concepts for Energy Efficiency and Sustainability. Giorgos Fagas, Luca Gammaitoni, John P. Gallagher, and Douglas J. Paul (Eds.), IntechOpen, Rijeka, Chapter 1.
[55]
Clément Farabet, Berin Martini, Benoit Corda, Polina Akselrod, Eugenio Culurciello, and Yann LeCun. 2011. Neuflow: A runtime reconfigurable dataflow processor for vision. In Proceedings of the Computer Vision and Pattern Recognition Workshops. IEEE, 109–116.
[56]
Richard Phillips Feynman, Anthony J. Hey, and Robin W. Allen. 2000. Feynman Lectures on Computation. Perseus Books, USA.
[57]
J. A. Fisher. 1981. Trace scheduling: A technique for global microcode compaction. IEEE Transactions on Computers 30, 7 (1981), 478–490.
[58]
Sagi Fisher, Adam Teman, Dmitry Vaysman, Alexander Gertsman, Orly Yadid-Pecht, and Alexander Fish. 2008. Digital subthreshold logic design - motivation and challenges. In Proceedings of the 2008 IEEE 25th Convention of Electrical and Electronics Engineers in Israel. 702–706.
[59]
Adi Fuchs and David Wentzlaff. 2019. The accelerator wall: Limits of chip specialization. In Proceedings of the 2019 IEEE International Symposium on High Performance Computer Architecture. 1–14.
[60]
Steve Furber. 2014. SpinNNaker: The world’s biggest NoC. In Proceedings of the 2014 8th IEEE/ACM International Symposium on Networks-on-Chip. IEEE, ii–ii.
[61]
Yu Gan and Christina Delimitrou. 2018. The architectural implications of microservices in the cloud. arXiv:1805.10351. Retrieved from https://arxiv.org/abs/1805.10351.
[62]
Yu Gan, Yanqi Zhang, Dailun Cheng, Ankitha Shetty, Priyal Rathi, Nayan Katarki, Ariana Bruno, Justin Hu, Brian Ritchken, Brendon Jackson, Kelvin Hu, Meghna Pancholi, Yuan He, Brett Clancy, Chris Colen, Fukang Wen, Catherine Leung, Siyuan Wang, Leon Zaruvinsky, Mateo Espinosa, Rick Lin, Zhongling Liu, Jake Padilla, and Christina Delimitrou. 2019. An open-source benchmark suite for microservices and their hardware-software implications for cloud and edge systems. In Proceedings of the 24th International Conference on Architectural Support for Programming Languages and Operating Systems. Association for Computing Machinery, New York, NY.
[63]
Antara Ganguly, Rajeev Muralidhar, and Virendra Singh. 2019. Towards energy efficient non-von Neumann architectures for deep learning. In Proceedings of the 20th International Symposium on Quality Electronic Design.335–342.
[64]
Mingyu Gao, Jing Pu, Xuan Yang, Mark Horowitz, and Christos Kozyrakis. 2017. Tetris: Scalable and efficient neural network acceleration with 3d memory. In Proceedings of the 22nd International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, 751–764.
[65]
Sukhpal Singh Gill and Rajkumar Buyya. 2018. A taxonomy and future directions for sustainable cloud computing: 360 degree view. ACM Computing Surveys 51, 5, Article 104 (2018), 33 pages.
[66]
O. Golonzka, J. Alzate, U. Arslan, M. Bohr, P. Bai, J. Brockman, B. Buford, C. Connor, N. Das, B. Doyle, T. Ghani, F. Hamzaoglu, P. Heil, P. Hentges, R. Jahan, D. Kencke, B. Lin, M. Lu, M. Mainuddin, M. Meterelliyoz, P. Nguyen, D. Nikonov, K. O’brien, J. O. Donnell, K. Oguz, D. Ouellette, J. Park, J. Pellegren, C. Puls, P. Quintero, T. Rahman, A. Romang, M. Sekhar, A. Selarka, M. Seth, A. J. Smith, A. K. Smith, L. Wei, C. Wiegand, Z. Zhang, and K. Fischer. 2018. MRAM as embedded non-volatile memory solution for 22FFL FinFET technology. In Proceedings of the 2018 IEEE International Electron Devices Meeting.18.1.1–18.1.4.
[67]
The Green Grid. 2020. Retrieved from https://www.thegreengrid.org/.
[68]
The Energy Efficient High Performance Computing (EEHPC) Group. 2019. Retrieved from https://eehpcwg.llnl.gov/index.html.
[69]
John Gurd, Wim Bohm, and Yong Meng Teo. 1987. Performance issues in dataflow machines. Future Generation Computer Systems 3, 4 (1987), 285–297.
[70]
Laszlo Gyongyosi and Sandor Imre. 2019. A survey on quantum computing technology. Computer Science Review 31 (2019), 51–71.
[71]
Heonjae Ha. 2018. Understanding and improving the energy efficiency of DRAM, PhD Thesis, Stanford university. (2018). Retrieved from https://searchworks.stanford.edu/view/12819402.
[72]
Heonjae Ha, Ardavan Pedram, Stephen Richardson, Shahar Kvatinsky, and Mark Horowitz. 2016. Improving energy efficiency of DRAM by exploiting half page row access. In Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE.
[73]
D. Hackenberg, R. Schöne, T. Ilsche, D. Molka, J. Schuchart, and R. Geyer. 2015. An energy efficiency feature survey of the Intel Haswell processor. In Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium Workshop. 896–904.
[74]
Jawad Haj-Yahya, Mohammed Alser, Jeremie Kim, A. Giray Yağlıkçı, Nandita Vijaykumar, Efraim Rotem, and Onur Mutlu. 2020. SysScale: Exploiting multi-domain dynamic voltage and frequency scaling for energy efficient mobile processors. In Proceedings of the 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture. IEEE.
[75]
J. Haj-Yahya, E. Rotem, A. Mendelson, and A. Chattopadhyay. 2019. A comprehensive evaluation of power delivery schemes for modern microprocessors. In Proceedings of the 20th International Symposium on Quality Electronic Design.123–130.
[76]
J. Haj-Yahya, Y. Sazeides, M. Alser, E. Rotem, and O. Mutlu. 2020. Techniques for reducing the connected-standby energy consumption of mobile devices. In Proceedings of the 2020 IEEE International Symposium on High Performance Computer Architecture.623–636.
[77]
Rehan Hameed, Wajahat Qadeer, Megan Wachs, Omid Azizi, Alex Solomatnikov, Benjamin C. Lee, Stephen Richardson, Christos Kozyrakis, and Mark Horowitz. 2010. Understanding sources of inefficiency in general-purpose chips. In Proceedings of the 37th Annual International Symposium on Computer Architecture. Association for Computing Machinery, New York, NY.
[78]
James Hamilton. 2017. Data center power and water consumption. (2017). Retrieved from https://perspectives.mvdirona.com/2015/06/data-center-power-water-consumption/.
[79]
John L. Hennessy and David A. Patterson. 2019. A new golden age for computer architecture. Communications of the ACM 62, 2 (2019), 48–60.
[80]
D. S. Henry, B. C. Kuszmaul, and V. Viswanath. 1999. The ultrascalar processor-an asymptotically scalable superscalar microarchitecture. In Proceedings of the 20th Anniversary Conference on Advanced Research in VLSI. 256–273.
[81]
M. D. Hill and M. R. Marty. 2008. Amdahl’s law in the multicore era. Computer 41, 7 (2008), 33–38.
[82]
IEEE. 2020. IEEE Rebooting Computing Initiative. Retrieved from https://rebootingcomputing.ieee.org/.
[83]
Intel. 2007. From a Few Cores to Many: A Tera-scale Computing Research Overview. Retrieved from https://www.intel.com/content/dam/www/public/us/en/documents/technology-briefs/intel-labs-tera-scale-research-paper.pdf/.
[84]
Intel. 2020. Current and power limit throttling indicators in Intel processors. (2020). Retrieved from https://www.intel.in/content/www/in/en/support/articles/000039154/processors/intel-core-processors.html.
[85]
Intel. 2020. The Intel running average power limiter. (2020). Retrieved from https://01.org/blogs/2014/running-average-power-limit.
[86]
Intel. 2020. What Intel is planning for the future of quantum computing: Hot qubits, cold control chips, and rapid testing. (2020). Retrieved from https://spectrum.ieee.org/tech-talk/computing/hardware/intels-quantum-computing-plans-hot-qubits-cold-control-chips-and-rapid-testing.
[87]
Qualcomm Hexagon 685 DSP is a Boon for Machine Learning. 2017. Retrieved from https://www.xda-developers.com/qualcomm-snapdragon-845-hexagon-685-dsp/.
[88]
Q. Jiang, Y. C. Lee, and A. Y. Zomaya. 2020. The power of ARM64 in public clouds. In Proceedings of the 2020 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing.459–468.
[89]
Norman P. Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, et al. 2017. In-datacenter performance analysis of a tensor processing unit. In Proceedings of the 44th Annual International Symposium on Computer Architecture. IEEE, 1–12.
[90]
Himanshu Kaul, Mark Anders, Steven Hsu, Amit Agarwal, Ram Krishnamurthy, and Shekhar Borkar. 2012. Near-threshold voltage (NTV) design: Opportunities and challenges. In Proceedings of the 49th Annual Design Automation Conference.Association for Computing Machinery, New York, NY,1153–1158.
[91]
Stefanos Kaxiras and Margaret Martonosi. 2008. Computer Architecture Techniques for Power-Efficiency (1st ed.). Morgan and Claypool Publishers.
[92]
Linux Kernel. 2021. The Intel p-state driver. (2021). Retrieved from https://www.kernel.org/doc/Documentation/cpu-freq/intel-pstate.txt.
[93]
M. Krstic, E. Grass, F. K. Gürkaynak, and P. Vivet. 2007. Globally asynchronous, locally synchronous circuits: Overview and outlook. IEEE Design Test of Computers 24, 5 (2007), 430–441.
[94]
E. Kültürsay, M. Kandemir, A. Sivasubramaniam, and O. Mutlu. 2013. Evaluating STT-RAM as an energy-efficient main memory alternative. In Proceedings of the 2013 IEEE International Symposium on Performance Analysis of Systems and Software. 256–267.
[95]
Rolf Landauer. 2000. Irreversibility and heat generation in the computing process. IBM Journal of Research and Development 44 (2000), 261–269.
[96]
Benjamin C. Lee, Engin Ipek, Onur Mutlu, and Doug Burger. 2010. Phase change memory architecture and the quest for scalability. Communications of the ACM 53, 7 (2010), 99–106.
[97]
Dong Uk Lee, Kyung Whan Kim, Kwan Weon Kim, Kang Seol Lee, Sang Jin Byeon, Jae Hwan Kim, Jin Hee Cho, Jaejin Lee, and Jun Hyun Chun. 2015. A 1.2 V 8 Gb 8-channel 128 GB/s high-bandwidth memory (HBM) stacked DRAM with effective I/O test circuits. IEEE Journal of Solid-State Circuits 50, 1 (2015), 191–203.
[98]
Sukhan Lee, Yuhwan Ro, Young Hoon Son, Hyunyoon Cho, Nam Sung Kim, and Jung Ho Ahn. 2017. Understanding power-performance relationship of energy-efficient modern DRAM devices. In Proceedings of the 2017 IEEE International Symposium on Workload Characterization.110–111.
[99]
Woojoo Lee. 2016. Tutorial: Design and optimization of power delivery networks. IEIE Transactions on Smart Processing and Computing 5 (2016), 349–357.
[100]
Jiajun Li, Guihai Yan, Wenyan Lu, Shuhao Jiang, Shijun Gong, Jingya Wu, and Xiaowei Li. 2018. SmartShuttle: Optimizing off-chip memory accesses for deep learning accelerators. In Proceedings of the 2018 Design, Automation & Test in Europe Conference & Exhibition. IEEE, 343–348.
[101]
P. Li and Y. Luo. 2016. P4GPU: Accelerate packet processing of a P4 program with a CPU-GPU heterogeneous architecture. In Proceedings of the 2016 ACM/IEEE Symposium on Architectures for Networking and Communications Systems.125–126.
[102]
W. Liang, S. Chen, Y. Chang, and J. Fang. 2008. Memory-aware dynamic voltage and frequency prediction for portable devices. In Proceedings of the 2008 14th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications. 229–236.
[103]
Sung Kyu Lim. 2010. 3D circuit design with through-silicon-via: Challenges and opportunities. In Proceedings of the IEEE Electronic Design Processes Symposium Workshop.
[104]
Linux. 2020. Linux Kernel Documentation. Retrieved from https://www.kernel.org/doc/html/latest/.
[105]
Jamie Liu, Ben Jaiyen, Richard Veras, and Onur Mutlu. 2012. RAIDR: Retention-aware intelligent DRAM refresh. ACM SIGARCH Computer Architecture News 40, 3 (2012), 1–12.
[106]
C. A. Mack. 2011. Fifty years of Moore’s law. IEEE Transactions on Semiconductor Manufacturing 24, 2 (2011), 202–207.
[107]
J. A. Mandelman, R. H. Dennard, G. B. Bronner, J. K. DeBrosse, R. Divakaruni, Y. Li, and C. J. Radens. 2002. Challenges and future directions for the scaling of dynamic random-access memory (DRAM). IBM Journal of Research and Development 46, 2.3 (2002), 187–212.
[108]
ARM Developer Manuals. 2020. Energy aware scheduling and multi cluster PM. (2020). Retreived from https://developer.arm.com/tools-and-software/open-source-software/linux-kernel/energy-aware-scheduling.
[109]
Igor L. Markov. 2014. Limits on fundamental limits to computation. Nature 512, 7513 (2014), 147–154.
[110]
Toni Mastelic, Ariel Oleksiak, Holger Claussen, Ivona Brandic, Jean-Marc Pierson, and Athanasios Vasilakos. 2015. Cloud computing: Survey on energy efficiency. Computing Surveys 47 (01 2015), 36.
[111]
Paul A. Merolla, John V. Arthur, Rodrigo Alvarez-Icaza, Andrew S. Cassidy, Jun Sawada, Filipp Akopyan, Bryan L. Jackson, Nabil Imam, Chen Guo, Yutaka Nakamura, et al. 2014. A million spiking-neuron integrated circuit with a scalable communication network and interface. Science 345, 6197 (2014), 668–673.
[112]
Amirhossein Mirhosseini, Akshitha Sriraman, and Thomas F. Wenisch. 2019. Enhancing server efficiency in the face of killer microseconds. In Proceedings of the 25th International Symposium on High-Performance Computer Architecture. IEEE, 185–198.
[113]
Sparsh Mittal and Jeffrey S. Vetter. 2014. A survey of methods for analyzing and improving GPU energy efficiency. ACM Computing Surveys 47, 2, Article 19 (2014), 23 pages.
[114]
Sparsh Mittal and Jeffrey S. Vetter. 2015. A survey of CPU-GPU heterogeneous computing techniques. ACM Computing Surveys 47, 4 (2015), 1–35.
[115]
Jugdutt Singh Mohsen Radfar, Kriyang Shah. 2012. Recent subthreshold design techniques. In Proceedings of the Active and Passive Electronic Components.
[116]
Gordon E. Moore. 1965. Cramming more components onto integrated circuits. Electronics 38, 8 (April 1965).
[117]
G. W. K. Moore. 2003. No exponential is forever: But “Forever” can be delayed! [semiconductor industry]. In Proceedings of the 2003 IEEE International Solid-State Circuits Conference, Vol. 1. 20–23.
[118]
Onur Mutlu. 2013. Memory scaling: A systems architecture perspective. In Proceedings of the 2013 5th IEEE International Memory Workshop.
[119]
Onur Mutlu, Saugata Ghose, Juan Gómez-Luna, and Rachata Ausavarungnirun. 2020. A modern primer on processing in memory. arXiv:2012.03112. Retrieved from https://arxiv.org/abs/2012.03112.
[120]
Chris Nicol. 2017. A coarse grain reconfigurable array (CGRA) for statically scheduled data flow computing. Wave Computing White Paper.https://www.eenewsanalog.com/en/white_papers/wave-computing-a-coarse-grainreconfigurable-array-cgra-for-statically-scheduled-data-ow-computing/.
[121]
Tony Nowatzki, Vinay Gangadhar, and Karthikeyan Sankaralingam. 2015. Exploring the potential of heterogeneous Von Neumann/dataflow execution models. In Proceedings of the 42nd Annual International Symposium on Computer Architecture.ACM, New York, NY, 298–310.
[122]
Internet of Things Agenda. 2020. Startups target subthreshold to solve IoT power consumption challenge. Retrieved from https://internetofthingsagenda.techtarget.com/feature/Startups-target-subthreshold-to-solve-IoT-power-consumption-challenge.
[123]
International Standards Organization. 2016. Data centres facilities and infrastructure: Power usage effectiveness. (2016). Retrieved from https://www.en-standard.eu/csn-en-50600-4-2-information-technology-data-centre-facilities-and-infrastructures-part-4-2-power-usage-effectiveness/.
[124]
International Standards Organization. 2016. Data centres key performance indicators: Power usage effectiveness. (2016). Retrieved from https://www.iso.org/standard/63451.html.
[125]
S. Palacharla, N. P. Jouppi, and J. E. Smith. 1997. Complexity-effective superscalar processors. In Proceedings of the 24th Annual International Symposium on Computer Architecture. 206–218.
[126]
Venkatesh Pallipadi. 2007. cpuidle - Do nothing, efficiently... Retrieved from http://ols.108.redhat.com/2007/Reprints/pallipadi-Reprint.pdf.
[127]
Venkatesh Pallipadi and Alexey Starikovskiy. 2006. The Ondemand governor: Past, present and future. (2006). Retrieved from https://www.kernel.org/doc/ols/2006/ols2006v2-pages-223-238.pdf.
[128]
David Patterson and Andrew Waterman. 2017. The RISC-V Reader: An Open Architecture Atlas. Strawberry Canyon.
[129]
Peripheral Component Interconnect Special Interest Group (PCI-SIG). 2020. PCI specifications. (2020). Retrieved from https://pcisig.com/.
[130]
Thomas Ernst Peide D. Ye and Mukesh V. Khare. 2019. The nanosheet transistor is the next (and maybe last) step in Moore’s law. (2019). Retrieved from https://spectrum.ieee.org/semiconductors/devices/the-nanosheet-transistor-is-the-next-and-maybe-last-step-in-moores-law.
[131]
Luis A. Plana, Steve B. Furber, Steve Temple, Mukaram Khan, Yebin Shi, Jian Wu, and Shufan Yang. 2007. A GALS infrastructure for a massively parallel multiprocessor. IEEE Design & Test of Computers 24, 5 (2007).
[132]
Next Platform. 2021. Amazon goes wide and deep with Graviton3 server chip. (2021). Retrieved from https://www.nextplatform.com/2021/12/02/aws-goes-wide-and-deep-with-graviton3-server-chip/.
[133]
IBM Research. 2020. IEDM 2020: Advances in memory, analog AI and interconnects point to the future of hybrid cloud and AI. (2020). Retrieved from https://www.ibm.com/blogs/research/2020/12/iedm2020-memory-analog-ai/.
[134]
Karl Rupp. 2018. 42 Years of Microprocessor Trend Data. Retrieved from https://www.karlrupp.net/2018/02/42-years-of-microprocessor-trend-data/.
[135]
Michael S. Schlansker and B. Ramakrishna Rau. 2000. EPIC: An architecture for instruction-level parallel processors. (2000). Retrieved from https://www.hpl.hp.com/techreports/1999/HPL-1999-111.pdf.
[136]
Robert Schöne, Thomas Ilsche, Mario Bielert, Andreas Gocht, and Daniel Hackenberg. 2019. Energy efficiency features of the Intel Skylake-SP processor and their impact on performance. arxiv:1905.12468. Retrieved from http://arxiv.org/abs/1905.12468.
[137]
Roy Schwartz, Jesse Dodge, Noah A. Smith, and Oren Etzioni. 2019. Green AI. arxiv:1907.10597. Retrieved from http://arxiv.org/abs/1907.10597.
[138]
Amazon Web Services. 2020. Amazon braket—Get started with quantum computing. (2020). Retrieved from https://aws.amazon.com/blogs/aws/amazon-braket-get-started-with-quantum-computing/.
[139]
Yakun Shao, Brandon Reagen, Gu-Yeon Wei, and David Brooks. 2014. Aladdin: A pre-RTL, power-performance accelerator simulator enabling large design space exploration of customized architectures. In Proceedings of International Symposium on Computer Architecture, 97–108.
[140]
Anirudh Sivaraman, Alvin Cheung, Mihai Budiu, Changhoon Kim, Mohammad Alizadeh, Hari Balakrishnan, George Varghese, Nick McKeown, and Steve Licking. 2016. Packet transactions: High-level programming for line-rate switches. In Proceedings of the 2016 ACM SIGCOMM Conference.Association for Computing Machinery, New York, NY, 15–28. DOI:
[141]
Gurindar S. Sohi, Scott E. Breach, and T. N. Vijaykumar. 1995. Multiscalar processors. In Proceedings of the 22nd Annual International Symposium on Computer Architecture. Association for Computing Machinery, New York, NY.
[142]
D. Suggs, M. Subramony, and D. Bouvier. 2020. The AMD “Zen 2” processor. IEEE Micro 40, 2 (2020), 45–52.
[143]
Siddha Suresh, Pallipadi Venkatesh, Van Arjan, and De Ven. 2007. Getting maximum mileage out of tickless. In Proceedings of the Linux Symposium.
[144]
Emil Talpes, Atchyuth Gorti, Gagandeep Sachdev, Debjit Sarma, Ganesh Venkataramanan, Peter Bannon, Bill McGee, Benjamin Floering, Ankit Jalote, Chris Hsiong, and Sahil Arora. 2020. Compute solution for Tesla’s full self driving computer. IEEE Micro 21 (2020), 1–1.
[145]
Anand Tech. 2018. The iPhone XS and XS Max review: Unveiling the silicon secrets. (2018). Retrieved from https://www.anandtech.com/show/13392/the-iphone-xs-xs-max-review-unveiling-the-silicon-secrets.
[146]
IMG Tech. 2018. Power VR Series 3NX Neural Network Accelerator. Retrieved from https://www.imgtec.com/vision-ai/powervr-series3nx/.
[147]
Berkeley The University of California. 2011. Magnetic memory and logic could achieve ultimate energy efficiency. Retrieved from https://news.berkeley.edu/2011/07/01/magnetic-memory-and-logic-could-achieve-ultimate-energy-efficiency/.
[148]
EE Times. 2020. AI at the very very Edge. Retrieved from https://www.eetimes.com/ai-at-the-very-very-edge.
[149]
R. M. Tomasulo. 1967. An efficient algorithm for exploiting multiple arithmetic units. IBM Journal of Research and Development 11, 1 (1967), 25–33.
[150]
E. Vasilakis, I. Sourdis, V. Papaefstathiou, A. Psathakis, and M. G. H. Katevenis. 2017. Modeling energy-performance tradeoffs in ARM big.LITTLE architectures. In Proceedings of the 2017 27th International Symposium on Power and Timing Modeling, Optimization and Simulation.1–8.
[151]
Arthur H. Veen. 1986. Dataflow machine architecture. ACM Computing Surveys 18, 4 (1986), 365–396.
[152]
Vasanth Venkatachalam and Michael Franz. 2005. Power reduction techniques for microprocessor systems. ACM Computing Surveys 37, 3 (2005), 195–237.
[153]
David W. Wall. 1991. Limits of instruction-level parallelism. SIGPLAN Not. 26, 4 (1991), 176–188.
[154]
HPC Wire. 2020. Arm technology powers the world’s fastest supercomputer. (2020). Retrieved from https://www.hpcwire.com/off-the-wire/arm-technology-powers-the-worlds-fastest-supercomputer/.
[155]
A. Yakovlev. 2011. Energy-modulated computing. In Proceedings of the 2011 Design, Automation Test in Europe. 1–6.
[156]
Tien-Ju Yang, Yu-Hsin Chen, Joel Emer, and Vivienne Sze. 2017. A method to estimate the energy consumption of deep neural networks. Energy 1, L2 (2017), L3.

Cited By

View all
  • (2025)The octagonal-cross-by-pass-mesh topology design for the on-chip-communicationComputer Networks10.1016/j.comnet.2024.110933257(110933)Online publication date: Feb-2025
  • (2024)A Differentially Private Framework for the Dynamic Heterogeneous Redundant Architecture System in CyberspaceElectronics10.3390/electronics1310180513:10(1805)Online publication date: 7-May-2024
  • (2024)Enabling Efficient Hybrid Systolic Computation in Shared-L1-Memory Manycore ClustersIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2024.341548632:9(1602-1615)Online publication date: 24-Jun-2024
  • Show More Cited By

Index Terms

  1. Energy Efficient Computing Systems: Architectures, Abstractions and Modeling to Techniques and Standards

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Computing Surveys
    ACM Computing Surveys  Volume 54, Issue 11s
    January 2022
    785 pages
    ISSN:0360-0300
    EISSN:1557-7341
    DOI:10.1145/3551650
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 09 September 2022
    Online AM: 10 February 2022
    Accepted: 01 January 2022
    Revised: 01 December 2021
    Received: 01 June 2020
    Published in CSUR Volume 54, Issue 11s

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Energy efficiency
    2. low power
    3. specification
    4. modeling
    5. low power optimizations
    6. platform-level power management
    7. dynamic power management

    Qualifiers

    • Survey
    • Refereed

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)1,051
    • Downloads (Last 6 weeks)195
    Reflects downloads up to 11 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)The octagonal-cross-by-pass-mesh topology design for the on-chip-communicationComputer Networks10.1016/j.comnet.2024.110933257(110933)Online publication date: Feb-2025
    • (2024)A Differentially Private Framework for the Dynamic Heterogeneous Redundant Architecture System in CyberspaceElectronics10.3390/electronics1310180513:10(1805)Online publication date: 7-May-2024
    • (2024)Enabling Efficient Hybrid Systolic Computation in Shared-L1-Memory Manycore ClustersIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2024.341548632:9(1602-1615)Online publication date: 24-Jun-2024
    • (2024)An in-depth Examination of Green Energy Options to Improve Power Consumption Efficiency and Reduce Carbon Emissions in Contemporary Data Centre Infrastructures2024 International Conference on Trends in Quantum Computing and Emerging Business Technologies10.1109/TQCEBT59414.2024.10545213(1-6)Online publication date: 22-Mar-2024
    • (2024)Spatial Differential Metasurface With Wavelength-MultiplexingIEEE Photonics Technology Letters10.1109/LPT.2024.335908436:6(413-416)Online publication date: 15-Mar-2024
    • (2024)Interpolation-Based IoT Sensors SelectionIEEE Sensors Journal10.1109/JSEN.2024.346183324:21(36143-36147)Online publication date: 1-Nov-2024
    • (2024)Current Research Themes and Future Research Needs on Making AI's Energy Consumption Efficient: A Review2024 4th International Conference on Electronic and Electrical Engineering and Intelligent System (ICE3IS)10.1109/ICE3IS62977.2024.10775966(99-104)Online publication date: 7-Aug-2024
    • (2024)Analysis of Quantum Computing’s Applicability in Data Analysis: Utilizing a Hybrid MCDM Approach With Quantum Spherical Fuzzy SetsIEEE Access10.1109/ACCESS.2024.343962912(110526-110549)Online publication date: 2024
    • (2024)Modern computing: Vision and challengesTelematics and Informatics Reports10.1016/j.teler.2024.10011613(100116)Online publication date: Mar-2024
    • (2024)Sustainable computing across datacenters: A review of enabling models and techniquesComputer Science Review10.1016/j.cosrev.2024.10062052(100620)Online publication date: May-2024
    • Show More Cited By

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media