[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3508352.3561105acmconferencesArticle/Chapter ViewAbstractPublication PagesiccadConference Proceedingsconference-collections
invited-talk
Public Access

Approximate Computing and the Efficient Machine Learning Expedition

Published: 22 December 2022 Publication History

Abstract

Approximate computing (AxC) has been long accepted as a design alternative for efficient system implementation at the cost of relaxed accuracy requirements. Despite the AxC research activities in various application domains, AxC thrived the past decade when it was applied in Machine Learning (ML). The by definition approximate notion of ML models but also the increased computational overheads associated with ML applications-that were effectively mitigated by corresponding approximations-led to a perfect matching and a fruitful synergy. AxC for AI/ML has transcended beyond academic prototypes. In this work, we enlighten the synergistic nature of AxC and ML and elucidate the impact of AxC in designing efficient ML systems. To that end, we present an overview and taxonomy of AxC for ML and use two descriptive application scenarios to demonstrate how AxC boosts the efficiency of ML systems.

References

[1]
Ankur Agrawal et al. 2019. Dlfloat: A 16-b floating point format designed for deep learning training and inference. In 26th IEEE Symposium on Computer Arithmetic, ARITH 2019, Kyoto, Japan, June 10--12, 2019, 92--95.
[2]
Hussam Amrouch, Georgios Zervakis, Sami Salamin, Hammam Kattan, Iraklis Anagnostopoulos, and Jörg Henkel. 2020. Npu thermal management. IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., 39, 11, 3842--3855.
[3]
Giorgos Armeniakos, Georgios Zervakis, Dimitrios Soudris, and Jörg Henkel. 2022. Hardware approximate techniques for deep neural network accelerators: a survey. ACM Comput. Surv., (Mar. 2022).
[4]
Giorgos Armeniakos, Georgios Zervakis, Dimitrios Soudris, Mehdi B. Tahoori, and Jörg Henkel. 2022. Cross-layer approximation for printed machine learning circuits. In Design, Automation Test in Europe Conference Exhibition, 190--195.
[5]
Konstantinos Balaskas, Georgios Zervakis, Kostas Siozios, Mehdi B. Tahoori, and Jörg Henkel. 2022. Approximate decision trees for machine learning classification on tiny printed circuits. In Int. Symp. Quality Electronic Design (ISQED).
[6]
John Biggs et al. 2021. A natively flexible 32-bit arm microprocessor. Nature, 595, (July 2021), 532--536.
[7]
Nathaniel Bleier, Muhammad Husnain Mubarik, Farhan Rasheed, Jasmin Aghassi-Hagmann, Mehdi B Tahoori, and Rakesh Kumar. 2020. Printed microprocessors. In Annu. Int. Symp. Computer Architecture (ISCA). (June 2020), 213--226.
[8]
Indranil Chakraborty et al. 2020. Resistive crossbars as approximate hardware building blocks for machine learning: opportunities and challenges. Proceedings of the IEEE, 108, 12, 2276--2310.
[9]
Srimat T. Chakradhar and Anand Raghunathan. 2010. Best-effort computing: re-thinking parallel software and hardware. In Design Automation Conference.
[10]
Joseph S Chang, Antonio F Facchetti, and Robert Reuss. 2017. A circuits and systems perspective of organic/printed electronics: review, challenges, and contemporary and emerging design approaches. IEEE Journal on emerging and selected topics in circuits and systems, 7, 1, 7--26.
[11]
Chia-Yu Chen, Jungwook Choi, Daniel Brand, Ankur Agrawal, Wei Zhang, and Kailash Gopalakrishnan. 2017. Adacomp : adaptive residual gradient compression for data-parallel distributed training. (2017). arXiv: 1712.02679 [cs.LG].
[12]
Chia-Yu Chen, Jungwook Choi, Kailash Gopalakrishnan, Viji Srinivasan, and Swagath Venkataramani. 2018. Exploiting approximate computing for deep learning acceleration. In Design, Automation Test in Europe Conference Exhibition, 821--826.
[13]
Fan Chen, Linghao Song, and Yiran Chen. 2018. Regan: A pipelined reram-based accelerator for generative adversarial networks. In Asia and South Pacific Design Automation Conference, ASP-DAC. IEEE, 178--183.
[14]
Wei-Hao Chen et al. 2018. A 65nm 1mb nonvolatile computing-in-memory reram macro with sub-16ns multiply-and-accumulate for binary dnn ai edge processors. In IEEE Int. Solid-State Circuits Conf. (ISSCC), 494--496.
[15]
Vinay K. Chippa, Srimat T. Chakradhar, Kaushik Roy, and Anand Raghunathan. 2013. Analysis and characterization of inherent application resilience for approximate computing. In Design Automation Conference, 9 pages.
[16]
Jungwook Choi, Swagath Venkataramani, Vijayalakshmi (Viji) Srinivasan, Kailash Gopalakrishnan, Zhuo Wang, and Pierce Chuang. 2019. Accurate and efficient 2-bit quantized neural networks. In Proceedings of Machine Learning and Systems. A. Talwalkar, V. Smith, and M. Zaharia, (Eds.) Vol. 1, 348--359.
[17]
Jungwook Choi, Zhuo Wang, Swagath Venkataramani, Pierce I-Jen Chuang, Vijayalakshmi Srinivasan, and Kailash Gopalakrishnan. 2018. Pact: parameterized clipping activation for quantized neural networks. (2018). arXiv: 1805.06085.
[18]
Silvia Conti et al. 2020. Low-voltage 2d materials-based printed field-effect transistors for integrated digital and analog electronics on paper. Nature communications, 11, 1, 1--9.
[19]
Zheng Cui. 2016. Printed electronics: materials, technologies and applications. John Wiley & Sons.
[20]
Dimitrios Danopoulos, Georgios Zervakis, Kostas Siozios, Dimitrios Soudris, and Jörg Henkel. 2022. Adapt: fast emulation of approximate dnn accelerators in pytorch. arXiv preprint arXiv:2203.04071. https://arxiv.org/abs/2203.04071.
[21]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
[22]
Matthew Douthwaite, Fernando García-Redondo, Pantelis Georgiou, and Shidhartha Das. 2019. A time-domain current-mode mac engine for analogue neural networks in flexible electronics. In IEEE Biomedical Circuits and Systems Conference (BioCAS), 1--4.
[23]
Dheeru Dua and Casey Graff. 2017. UCI machine learning repository. (2017).
[24]
Thomas Elsken, Jan Hendrik Metzen, and Frank Hutter. 2019. Neural architecture search: a survey. J. Mach. Learn. Res., 20, 1, (Jan. 2019), 1997--2017.
[25]
Ben Feinberg, Shibo Wang, and Engin Ipek. 2018. Making memristive neural network accelerators reliable. In Int. Symp. High Performance Computer Architecture, HPCA. IEEE, 52--65.
[26]
Amira Guesmi et al. 2021. Defensive approximation: securing cnns using approximate computing. In Int. Conf. Architectural Support for Programming Languages and Operating Systems, 990--1003.
[27]
Vaibhav Gupta, Debabrata Mohapatra, Anand Raghunathan, and Kaushik Roy. 2013. Low-power digital signal processing using approximate adders. IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., 32, 1, 124--137.
[28]
Song Han et al. 2016. Eie: efficient inference engine on compressed deep neural network. SIGARCH Comput. Archit. News, 44, 3, (June 2016), 243--254.
[29]
Zhezhi He, Jie Lin, Rickard Ewetz, Jiann-Shiun Yuan, and Deliang Fan. 2019. Noise injection adaption: end-to-end reram crossbar non-ideal effect adaption for neural network mapping. In Design Automation Conference, 1--6.
[30]
Jörg Henkel, Heba Khdr, Santiago Pagani, and Muhammad Shafique. 2015. New trends in dark silicon. In Design Automation Conference (DAC), 1--6.
[31]
Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531.
[32]
KC Hsu et al. 2015. A study of array resistance distribution and a novel operation algorithm for wox reram memory. In SSDM.
[33]
Miao Hu, Hai Li, Yiran Chen, Qing Wu, Garrett S. Rose, and Richard W. Linderman. 2014. Memristor crossbar-based neuromorphic computing system: a case study. IEEE Trans. on Neural Networks and Learning Systems, 25, 10, 1864--1878.
[34]
Miao Hu, Hai Li, Yiran Chen, Qing Wu, Garrett S. Rose, and Richard W. Linderman. 2014. Memristor crossbar-based neuromorphic computing system: A case study. IEEE Trans. Neural Networks Learn. Syst., 25, 10, 1864--1878.
[35]
Miao Hu et al. 2016. Dot-product engine for neuromorphic computing: programming 1t1m crossbar to accelerate matrix-vector multiplication. In design automation conference (dac). IEEE, 1--6.
[36]
Mingu Kang, Sujan K. Gonugondla, Ameya Patil, and Naresh R. Shanbhag. 2018. A multi-functional in-memory inference processor using a standard 6t sram array. IEEE Journal of Solid-State Circuits, 53, 2, 642--655.
[37]
Yao-Wen Kang, Chun-Feng Wu, Yuan-Hao Chang, Tei-Wei Kuo, and Shu-Yin Ho. 2020. On minimizing analog variation errors to resolve the scalability issue of reram-based crossbar accelerators. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 39, 11, 3856--3867.
[38]
Y. Kim, S. Venkataramani, N. Chandrachoodan, and A. Raghunathan. 2019. Data subsetting: a data-centric approach to approximate computing. In Design, Automation Test in Europe Conference Exhibition. (Mar. 2019), 576--581.
[39]
Peter Lacy, Jessica Long, and Wesley Spindler. 2020. Fast-moving consumer goods (fmcg) industry profile. In The Circular Economy Handbook. Springer.
[40]
Seung Ryul Lee et al. 2012. Multi-level switching of triple-layered taox rram with excellent reliability for storage class memory. In Symposium on VLSI Technology (VLSIT). IEEE, 71--72.
[41]
Song Liu, Karthik Pattabiraman, Thomas Moscibroda, and Benjamin G. Zorn. 2011. Flikker: saving dram refresh-power through critical data partitioning. SIGPLAN Not., 46, 3, (Mar. 2011), 213--224.
[42]
Jiayuan Mengte, Anand Raghunathan, Srimat Chakradhar, and Surendra Byna. 2010. Exploiting the forgiving nature of applications for scalable parallel execution. In Int. Symp. Parallel & Distributed Processing (IPDPS), 1--12.
[43]
Joshua San Miguel, Mario Badr, and Natalie Enright Jerger. 2014. Load value approximation. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-47). IEEE Computer Society, Cambridge, United Kingdom, 127--139.
[44]
Muhammad Husnain Mubarik et al. 2020. Printed machine learning classifiers. In Annu. Int. Symp. Microarchitecture (MICRO), 73--87.
[45]
Emre Özer et al. 2020. A hardwired machine learning processing engine fabricated with submicron metal-oxide thin-film transistors on a flexible substrate. Nature Electronics, 3, (July 2020), 1--7.
[46]
Subhankar Pal et al. 2018. Outerspace: an outer product based sparse matrix multiplication accelerator. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA), 724--736.
[47]
Priyadarshini Panda, Indranil Chakraborty, and Kaushik Roy. 2019. Discretization based solutions for secure machine learning against adversarial attacks. IEEE Access, 7, 70157--70168.
[48]
Jeff Pool, Abhishek Sawarkar, and Jay Rodge. 2021. Accelerating inference with sparsity using the nvidia ampere architecture and nvidia tensorrt. Nvidia blog. Retrieved Aug. 8, 2022 from https://developer.nvidia.com/blog/accelerating-inference-with-sparsity-using-ampere-and-tensorrt.
[49]
Gabriele Prato, Ella Charlaix, and Mehdi Rezagholizadeh. 2019. Fully quantized transformer for machine translation. arXiv preprint arXiv:1910.10485.
[50]
Ashish Ranjan, Swagath Venkataramani, Xuanyao Fong, Kaushik Roy, and Anand Raghunathan. 2015. Approximate storage for energy efficient spintronic memories. In Design Automation Conference (DAC), 1--6.
[51]
Syed Shakib Sarwar, Swagath Venkataramani, Anand Raghunathan, and Kaushik Roy. 2016. Multiplier-less artificial neurons exploiting error resiliency for energy-efficient neural computing. In 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE), 145--150.
[52]
Sanchari Sen, Shubham Jain, Swagath Venkataramani, and Anand Raghunathan. 2019. Sparce: sparsity aware general-purpose core extensions to accelerate deep neural networks. IEEE Trans. Comput., 68, 6, (June 2019), 912--925.
[53]
Ali Shafiee et al. 2016. Isaac: a convolutional neural network accelerator with in-situ analog arithmetic in crossbars. ACM SIGARCH Computer Architecture News, 44, 3, 14--26.
[54]
Muhammad Shafique, Waqas Ahmad, Rehan Hafiz, and Jörg Henkel. 2015. A low latency generic accuracy configurable adder. In Design Automation Conference (DAC), 1--6.
[55]
Linghao Song, Xuehai Qian, Hai Li, and Yiran Chen. 2017. PipeLayer: Apipelined ReRAM-based accelerator for deep learning. In Int. Symp. High Performance Computer Architecture, HPCA, 541--552.
[56]
Xiao Sun et al. 2019. Hybrid 8-bit floating point (hfp8) training and inference for deep neural networks. In Advances in Neural Information Processing Systems, 4901--4910.
[57]
Zois-Gerasimos Tasoulas, Georgios Zervakis, Iraklis Anagnostopoulos, Hussam Amrouch, and Jörg Henkel. 2020. Weight-oriented approximation for energy-efficient neural network inference accelerators. IEEE Trans. Circuits Syst., 67-I, 12, 4670--4683.
[58]
Minh SQ Truong et al. 2021. Racer: bit-pipelined processing using resistive memory. In Annu. Int. Symp. on Microarchitecture, 100--116.
[59]
Ashish Vaswani et al. 2017. Attention is all you need. Advances in neural information processing systems, 30.
[60]
Swagath Venkataramani, Srimat T. Chakradhar, Kaushik Roy, and Anand Raghunathan. 2015. Approximate computing and the quest for computing efficiency. In Design Automation Conference (DAC), 1--6.
[61]
Swagath Venkataramani, Vinay K. Chippa, Srimat T. Chakradhar, Kaushik Roy, and Anand Raghunathan. 2013. Quality programmable vector processors for approximate computing. In 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 1--12.
[62]
Swagath Venkataramani, Anand Raghunathan, Jie Liu, and Mohammed Shoaib. 2015. Scalable-effort classifiers for energy-efficient machine learning. In Design Automation Conference, 6 pages.
[63]
Swagath Venkataramani, Ashish Ranjan, Kaushik Roy, and Anand Raghunathan. 2014. Axnn: energy-efficient neuromorphic systems using approximate computing. In Int. Symp. Low Power Electronics and Design (ISLPED), 27--32.
[64]
Swagath Venkataramani, Amit Sabne, Vivek Kozhikkottu, Kaushik Roy, and Anand Raghunathan. 2012. Salsa: systematic logic synthesis of approximate circuits. In Design Automation Conference (DAC), 796--801.
[65]
Swagath Venkataramani et al. 2020. Efficient ai system design with cross-layer approximate computing. Proceedings of the IEEE, 108, 12, 2232--2250.
[66]
Swagath Venkataramani et al. 2021. Rapid: ai accelerator for ultra-low precision training and inference. In Int. Symp. Computer Architecture (ISCA), 153--166.
[67]
Shibo Wang and Pankaj Kanwar. 2019. Bfloat16: the secret to high performance on cloud tpus. Google blog. Retrieved Aug. 8, 2022 from https://cloud.google.com/blog/products/ai-machine-learning/bfloat16-the-secret-to-high-performance-on-cloud-tpus.
[68]
Dennis D. Weller, Michael Hefenbrock, Mehdi B. Tahoori, Jasmin Aghassi-Hagmann, and Michael Beigl. 2020. Programmable neuromorphic circuit based on printed electrolyte-gated transistors. In 2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC), 446--451.
[69]
Dennis D. Weller et al. 2021. Printed stochastic computing neural networks. In Design, Automation Test in Europe Conference Exhibition (DATE), 914--919.
[70]
Wei Wen, Chunpeng Wu, Yandan Wang, Yiran Chen, and Hai Li. 2016. Learning structured sparsity in deep neural networks. In Advances in Neural Information Processing Systems. D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, and R. Garnett, (Eds.) Vol. 29. Curran Associates, Inc.
[71]
Bonan Yan, Jianhua Yang, Qing Wu, Yiran Chen, and Hai Li. 2017. A closed-loop design to enhance weight stability of memristor based neural network chips. In Int. Conf. Computer-Aided Design (ICCAD). IEEE, 541--548.
[72]
Xiaoxuan Yang, Brady Taylor, Ailong Wu, Yiran Chen, and Leon O. Chua. 2022. Research progress on memristor: from synapses to computing systems. IEEE Trans. Circuits Syst. I, Reg. Papers, 69, 5, 1845--1857.
[73]
Xiaoxuan Yang, Bonan Yan, Hai Li, and Yiran Chen. 2020. Retransformer: reram-based processing-in-memory architecture for transformer acceleration. In Int. Conf. Computer-Aided Design (ICCAD), 1--9.
[74]
Xiaoxuan Yang et al. 2021. Multi-objective optimization of reram crossbars for robust dnn inferencing under stochastic noise. In Int. Conf. Computer-Aided Design, 1--9.
[75]
Joonsang Yu et al. 2021. Nn-lut: neural approximation of non-linear operations for efficient transformer inference. arXiv preprint arXiv:2112.02191.
[76]
Shimeng Yu, Ximeng Guan, and H-S Philip Wong. 2012. On the switching parameter variation of metal oxide rram---part ii: model corroboration and device design strategy. IEEE Transactions on Electron Devices, 59, 4, 1183--1188.
[77]
Ofir Zafrir, Guy Boudoukh, Peter Izsak, and Moshe Wasserblat. 2019. Q8bert: quantized 8bit bert. In 2019 Fifth Workshop on Energy Efficient Machine Learning and Cognitive Computing-NeurIPS Edition (EMC2-NIPS). IEEE, 36--39.
[78]
G. Zervakis, K. Koliogeorgi, D. Anagnostos, N. Zompakis, and K. Siozios. 2019. Vader: voltage-driven netlist pruning for cross-layer approximate arithmetic circuits. IEEE Trans. Very Large Scale Integr. (VLSI) Syst., 27, 6, 1460--1464.
[79]
Georgios Zervakis, Ourania Spantidi, Iraklis Anagnostopoulos, Hussam Amrouch, and Jörg Henkel. 2021. Control variate approximation for dnn accelerators. In Design Automation Conference (DAC), 481--486.
[80]
Georgios Zervakis, Kostas Tsoumanis, Sotirios Xydis, Dimitrios Soudris, and Kiamal Pekmestzi. 2016. Design-efficient approximate multiplication circuits through partial product perforation. IEEE Trans. Very Large Scale Integr. (VLSI) Syst., 24, 10, 3105--3117.
[81]
Georgios Zervakis et al. 2022. Thermal-aware design for approximate dnn accelerators. IEEE Trans. Comput., 1--1.
[82]
Wei Zhang, Suyog Gupta, Xiangru Lian, and Ji Liu. 2016. Staleness-aware async-sgd for distributed deep learning. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI'16). AAAI Press, New York, New York, USA, 2350--2356.
[83]
Ritchie Zhao, Yuwei Hu, Jordan Dotzel, Chris De Sa, and Zhiru Zhang. 2019. Improving neural network quantization without retraining using outlier channel splitting. In International Conference on Machine Learning, 7543--7552.

Cited By

View all
  • (2024)Reservoir Computing Using Measurement-Controlled Quantum DynamicsElectronics10.3390/electronics1306116413:6(1164)Online publication date: 21-Mar-2024
  • (2023)Hardware-Aware Automated Neural Minimization for Printed Multilayer Perceptrons2023 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE56975.2023.10137161(1-2)Online publication date: Apr-2023
  • (2023)Improving the Robustness and Efficiency of PIM-Based Architecture by SW/HW Co-DesignProceedings of the 28th Asia and South Pacific Design Automation Conference10.1145/3566097.3568358(618-623)Online publication date: 16-Jan-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ICCAD '22: Proceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design
October 2022
1467 pages
ISBN:9781450392174
DOI:10.1145/3508352
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

In-Cooperation

  • IEEE-EDS: Electronic Devices Society
  • IEEE CAS
  • IEEE CEDA

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 December 2022

Check for updates

Author Tags

  1. approximate computing
  2. in-memory
  3. machine learning
  4. precision scaling
  5. printed electronics
  6. pruning
  7. quantization
  8. transformers

Qualifiers

  • Invited-talk

Funding Sources

Conference

ICCAD '22
Sponsor:
ICCAD '22: IEEE/ACM International Conference on Computer-Aided Design
October 30 - November 3, 2022
California, San Diego

Acceptance Rates

Overall Acceptance Rate 457 of 1,762 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)311
  • Downloads (Last 6 weeks)57
Reflects downloads up to 15 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Reservoir Computing Using Measurement-Controlled Quantum DynamicsElectronics10.3390/electronics1306116413:6(1164)Online publication date: 21-Mar-2024
  • (2023)Hardware-Aware Automated Neural Minimization for Printed Multilayer Perceptrons2023 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE56975.2023.10137161(1-2)Online publication date: Apr-2023
  • (2023)Improving the Robustness and Efficiency of PIM-Based Architecture by SW/HW Co-DesignProceedings of the 28th Asia and South Pacific Design Automation Conference10.1145/3566097.3568358(618-623)Online publication date: 16-Jan-2023
  • (2023)Investigating hardware and software aspects in the energy consumption of machine learning: A green AI‐centric analysisConcurrency and Computation: Practice and Experience10.1002/cpe.782535:24Online publication date: Jun-2023

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media