More Web Proxy on the site http://driver.im/

research-article

General-purpose code acceleration with limited-precision analog computation

Authors:

Renée St. Amant,

Amir Yazdanbakhsh,

Bradley Thwaites,

Hadi Esmaeilzadeh,

Arjang Hassibi,

Doug BurgerAuthors Info & Claims

ISCA '14: Proceeding of the 41st annual international symposium on Computer architecuture

Pages 505 - 516

Published: 14 June 2014 Publication History

Abstract

As improvements in per-transistor speed and energy efficiency diminish, radical departures from conventional approaches are becoming critical to improving the performance and energy efficiency of general-purpose processors. We propose a solution--from circuit to compiler-that enables general-purpose use of limited-precision, analog hardwareto accelerate "approximable" code---code that can tolerate imprecise execution. We utilize an algorithmic transformation that automatically converts approximable regions of code from a von Neumann model to an "analog" neural model. We outline the challenges of taking an analog approach, including restricted-range value encoding, limited precision in computation, circuit inaccuracies, noise, and constraints on supported topologies. We address these limitations with a combination of circuit techniques, a hardware/software interface, neuralnetwork training techniques, and compiler support. Analog neural acceleration provides whole application speedup of 3.7x and energy savings of 6.3x with quality loss less than 10% for all except one benchmark. These results show that using limited-precision analog circuits for code acceleration, through a neural approach, is both feasible and beneficial over a range of approximation-tolerant, emerging applications including financial analysis, signal processing, robotics, 3D gaming, compression, and image processing

References

[1]

P. E. Allen and D. R. Holberg, CMOS Analog Circuit Design. Oxford University Press, 2002.

[2]

C. Alvarez, J. Corbal, and M. Valero, "Fuzzy memoization for floatingpoint multimedia applications," IEEE TC, 2005.

Digital Library

[3]

W. Baek and T. M. Chilimbi, "Green: A framework for supporting energy-conscious programming using controlled approximation," in PLDI, 2010.

Digital Library

[4]

B. Belhadj, A. Joubert, Z. Li, R. Héliot, and O. Temam, "Continuous real-world inputs can open up alternative accelerator designs," in ISCA, 2013.

Digital Library

[5]

B. E. Boser, E. Säckinger, J. Bromley, Y. L. Cun, L. D. Jackel, and S. Member, "An analog neural network processor with programmable topology," JSSC, 1991.

[6]

Y. Cao, "Predictive technology models," 2013. Available: http://ptm.asu.edu

[7]

M. Carbin, S. Misailovic, and M. C. Rinard, "Verifying quantitative reliability for programs that execute on unreliable hardware," in OOPSLA, 2013.

Digital Library

[8]

L. N. Chakrapani, B. E. S. Akgul, S. Cheemalavagu, P. Korkmaz, K. V. Palem, and B. Seshasayee, "Ultra-efficient (embedded) SOC architectures based on probabilistic CMOS (PCMOS) technology," in DATE, 2006.

Digital Library

[9]

T. Chen, Y. Chen, M. Duranton, Q. Guo, A. Hashmi, M. Lipasti, A. Nere, S. Qiu, M. Sebag, and O. Temam, "BenchNN: On the broad potential application scope of hardware neural network accelerators," in IISWC, 2012.

Digital Library

[10]

F. Choudry, E. Fiesler, A. Choudry, and H. J. Caulfield, "A weight discretization paradigm for optical neural networks," in ICOE, 1990.

[11]

N. Clark, M. Kudlur, H. Park, S. Mahlke, and K. Flautner, "Applicationspecific processing on a general-purpose core via transparent instruction set customization," in MICRO, 2004.

Digital Library

[12]

M. de Kruijf and K. Sankaralingam, "Exploring the synergy of emerging workloads and silicon reliability trends," in SELSE, 2009.

[13]

M. de Kruijf, S. Nomura, and K. Sankaralingam, "Relax: An architectural framework for software recovery of hardware faults," in ISCA, 2010.

Digital Library

[14]

S. Draghici, "On the capabilities of neural networks using limited precision weights," Elsevier NN, 2002.

Digital Library

[15]

H. Esmaeilzadeh, P. Saeedi, B. N. Araabi, C. Lucas, and S. M. Fakhraie, "Neural network stream processing core (NnSP) for embedded systems," in ISCAS, 2006.

[16]

H. Esmaeilzadeh, E. Blem, R. St. Amant, K. Sankaralingam, and D. Burger, "Dark silicon and the end of multicore scaling," in ISCA, 2011.

Digital Library

[17]

H. Esmaeilzadeh, A. Sampson, L. Ceze, and D. Burger, "Architecture support for disciplined approximate programming," in ASPLOS, 2012.

Digital Library

[18]

H. Esmaeilzadeh, A. Sampson, L. Ceze, and D. Burger, "Neural acceleration for general-purpose approximate programs," in MICRO, 2012.

Digital Library

[19]

K. Fan, M. Kudlur, G. Dasika, and S. Mahlke, "Bridging the computation gap between programmable processors and hardwired accelerators," in HPCA, 2009.

[20]

Y. Fang, H. Li, and X. Li, "A fault criticality evaluation framework of digital systems for error tolerant video applications," in ATS, 2011.

Digital Library

[21]

A. Frank and A. Asuncion, "UCI machine learning repository," 2010. Available: http://archive.ics.uci.edu/ml

[22]

S. Galal and M. Horowitz, "Energy-efficient floating-point unit design," IEEE TC, 2011.

Digital Library

[23]

V. Govindaraju, C. Ho, and K. Sankaralingam, "Dynamically specialized datapaths for energy efficient computing," in HPCA, 2011.

Digital Library

[24]

S. Gupta, S. Feng, A. Ansari, S. Mahlke, and D. August, "Bundled execution of recurring traces for energy-efficient general purpose processing," in MICRO, 2011.

Digital Library

[25]

N. Hardavellas, M. Ferdman, B. Falsafi, and A. Ailamaki, "Toward dark silicon in servers," IEEE Micro, 2011.

Digital Library

[26]

A. Hashmi, H. Berry, O. Temam, and M. Lipasti, "Automatic abstraction and fault tolerance in cortical microarchitectures," in ISCA, 2011.

Digital Library

[27]

A. Hashmi, A. Nere, J. J. Thomas, and M. Lipasti, "A case for neuromorphic ISAs," in ASPLOS, 2011.

Digital Library

[28]

R. Hegde and N. R. Shanbhag, "Energy-efficient signal processing via algorithmic noise-tolerance," in ISLPED, 1999.

Digital Library

[29]

Y. Huang, P. Ienne, O. Temam, Y. Chen, and C. Wu, "Elastic cgras," in FPGA, 2013.

Digital Library

[30]

C. Igel and M. Hüsken, "Improving the RPROP learning algorithm," in NC, 2000.

[31]

D. A. Johns and K. Martin, Analog Integrated Circuit Design. John Wiley and Sons, Inc., 1997.

[32]

A. Joubert, B. Belhadj, O. Temam, and R. Héliot, "Hardware spiking neurons design: Analog or digital?" in IJCNN, 2012.

[33]

S. Li, J. H. Ahn, R. D. Strong, J. B. Brockman, D. M. Tullsen, and N. P. Jouppi, "McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures," in MICRO, 2009.

Digital Library

[34]

X. Li and D. Yeung, "Exploiting soft computing for increased fault tolerance," in ASGI, 2006.

[35]

S. Liu, K. Pattabiraman, T. Moscibroda, and B. G. Zorn, "Flikker: Saving dram refresh-power through critical data partitioning," in ASPLOS, 2011.

Digital Library

[36]

S. Misailovic, S. Sidiroglou, H. Hoffman, and M. Rinard, "Quality of service profiling," in ICSE, 2010.

Digital Library

[37]

N. Muralimanohar, R. Balasubramonian, and N. Jouppi, "Optimizing NUCA organizations and wiring alternatives for large caches with CACTI 6.0," in MICRO, 2007.

Digital Library

[38]

S. Narayanan, J. Sartori, R. Kumar, and D. L. Jones, "Scalable stochastic processors," in DATE, 2010.

Digital Library

[39]

A. Patel, F. Afram, S. Chen, and K. Ghose, "MARSSx86: A full system simulator for x86 CPUs," in DAC, 2011.

Digital Library

[40]

K. W. Przytula and V. K. P. Kumar, Eds., Parallel Digital Implementations of Neural Networks. Prentice Hall, 1993.

Digital Library

[41]

A. Putnam, D. Bennett, E. Dellinger, J. Mason, P. Sundararajan, and S. Eggers, "CHiMPS: A high-level compilation flow for hybrid CPUFPGA architectures," in FPGA, 2008.

Digital Library

[42]

R. Razdan and M. D. Smith, "A high-performance microarchitecture with hardware-programmable functional units," in MICRO, 1994.

Digital Library

[43]

D. E. Rumelhart, G. E. Hinton, and R. J. Williams, "Learning internal representations by error propagation," in Parallel Distributed Processing: Explorations in the Microstructure of Cognition. MIT Press, 1986.

Digital Library

[44]

M. Samadi, J. Lee, D. A. Jamshidi, A. Hormati, and S. Mahlke, "Sage: Self-tuning approximation for graphics engines," in MICRO, 2013.

Digital Library

[45]

A. Sampson, W. Dietl, E. Fortuna, D. Gnanapragasam, L. Ceze, and D. Grossman, "EnerJ: Approximate data types for safe and general low-power computation," in PLDI, 2011.

Digital Library

[46]

A. Sampson, J. Nelson, K. Strauss, and L. Ceze, "Approximate storage in solid-state memories," in MICRO, 2013.

Digital Library

[47]

J. Schemmel, J. Fieres, and K. Meier, "Wafer-scale integration of analog neural networks," in IJCNN, 2008.

[48]

R. St. Amant, D. A. Jiménez, and D. Burger, "Mixed-signal approximate computation: A neural predictor case study," IEEE MICRO Top Picks, vol. 29, no. 1, January/February 2009.

Digital Library

[49]

S. M. Tam, B. Gupta, H. A. Castro, and M. Holler, "Learning on an analog VLSI neural network chip," in SMC, 1990.

[50]

O. Temam, "A defect-tolerant accelerator for emerging highperformance applications," in ISCA, 2012.

Digital Library

[51]

S. Venkataramani, V. K. Chippa, S. T. Chakradhar, K. Roy, and A. Raghunathan, "Quality-programmable vector processors for approximate computing," in MICRO, 2013.

Digital Library

[52]

G. Venkatesh, J. Sampson, N. Goulding, S. Garcia, V. Bryksin, J. Lugo- Martinez, S. Swanson, and M. B. Taylor, "Conservation cores: Reducing the energy of mature computations," in ASPLOS, 2010.

Digital Library

[53]

G. Venkatesh, J. Sampson, N. Goulding, S. K. Venkata, M. Taylor, and S. Swanson, "QsCores: Trading dark silicon for scalable energy efficiency with quasi-specific cores," in MICRO, 2011.

Digital Library

[54]

V. Wong and M. Horowitz, "Soft error resilience of probabilistic inference applications," in SELSE, 2006.

[55]

J. Zhu and P. Sutton, "FPGA implementations of neural networks: A survey of a decade of progress," in FPL, 2003.

Cited By

Karakoy MKislal OTang XKandemir MArunachalam M(2019)Architecture-Aware Approximate ComputingProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/3341617.33261533:2(1-24)Online publication date: 19-Jun-2019
https://dl.acm.org/doi/10.1145/3341617.3326153
Zhao ZSrivastava APeng LChen Q(2019)Long Short-Term Memory Network Design for Analog ComputingACM Journal on Emerging Technologies in Computing Systems10.1145/328939315:1(1-27)Online publication date: 9-Jan-2019
https://dl.acm.org/doi/10.1145/3289393
Yazdanbakhsh ASong CSacks JLotfi-Kamran PEsmaeilzadeh HKim NEvripidou SStenström PO'Boyle M(2018)In-DRAM near-data approximate acceleration for GPUsProceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques10.1145/3243176.3243188(1-14)Online publication date: 1-Nov-2018
https://dl.acm.org/doi/10.1145/3243176.3243188
Show More Cited By

Recommendations

General-purpose code acceleration with limited-precision analog computation
ISCA '14

As improvements in per-transistor speed and energy efficiency diminish, radical departures from conventional approaches are becoming critical to improving the performance and energy efficiency of general-purpose processors. We propose a solution--from ...
OpenACC acceleration of the Nek5000 spectral element code

We present a case study of porting NekBone, a skeleton version of the Nek5000 code, to a parallel GPU-accelerated system. Nek5000 is a computational fluid dynamics code based on the spectral element method used for the simulation of incompressible flow. ...
Neural acceleration for general-purpose approximate programs

As improvements in per-transistor speed and energy efficiency diminish, radical departures from conventional approaches are needed to continue improvements in the performance and energy efficiency of general-purpose processors. One such departure is ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ISCA '14: Proceeding of the 41st annual international symposium on Computer architecuture

June 2014

566 pages

ISBN:9781479943944

General Chairs:
Pen-Chung Yew
University of Minnesota
,
Antonia Zhai
University of Minnesota
,
Program Chair:
Steve Keckler
NVIDIA/University of Texas at Austin

ACM SIGARCH Computer Architecture News Volume 42, Issue 3
ISCA '14
June 2014
552 pages
ISSN:0163-5964
DOI:10.1145/2678373
Editor:
Doug DeGroot
acm dot org
Issue’s Table of Contents

Sponsors

IEEE TCCA: IEEE Computer Society Technical Committee on Computer Architecture
SIGARCH: ACM Special Interest Group on Computer Architecture

Publisher

IEEE Press

Publication History

Published: 14 June 2014

Check for updates

Qualifiers

Research-article

Funding Sources

Conference

ISCA'14

Sponsor:

IEEE TCCA
SIGARCH

ISCA'14: The 41st Annual International Symposium on Computer Architecture

June 14 - 18, 2014

Minnesota, Minneapolis, USA

Acceptance Rates

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Sponsor:
sigarch

The 52nd Annual International Symposium on Computer Architecture

June 21 - 25, 2025

Tokyo , Japan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

108
Total Citations
View Citations
982
Total Downloads

Downloads (Last 12 months)58
Downloads (Last 6 weeks)6

Reflects downloads up to 11 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Karakoy MKislal OTang XKandemir MArunachalam M(2019)Architecture-Aware Approximate ComputingProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/3341617.33261533:2(1-24)Online publication date: 19-Jun-2019
https://dl.acm.org/doi/10.1145/3341617.3326153
Zhao ZSrivastava APeng LChen Q(2019)Long Short-Term Memory Network Design for Analog ComputingACM Journal on Emerging Technologies in Computing Systems10.1145/328939315:1(1-27)Online publication date: 9-Jan-2019
https://dl.acm.org/doi/10.1145/3289393
Yazdanbakhsh ASong CSacks JLotfi-Kamran PEsmaeilzadeh HKim NEvripidou SStenström PO'Boyle M(2018)In-DRAM near-data approximate acceleration for GPUsProceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques10.1145/3243176.3243188(1-14)Online publication date: 1-Nov-2018
https://dl.acm.org/doi/10.1145/3243176.3243188
Grochow JWolpert D(2018)Beyond Number of Bit ErasuresACM SIGACT News10.1145/3232679.323268949:2(33-56)Online publication date: 13-Jun-2018
https://dl.acm.org/doi/10.1145/3232679.3232689
Horváth AHillmer MLou QHu XNiemier M(2017)Cellular neural network friendly convolutional neural networksProceedings of the Conference on Design, Automation & Test in Europe10.5555/3130379.3130412(145-150)Online publication date: 27-Mar-2017
https://dl.acm.org/doi/10.5555/3130379.3130412
Boyapati RHuang JMajumder PYum KKim E(2017)APPROX-NoCACM SIGARCH Computer Architecture News10.1145/3140659.308024145:2(666-677)Online publication date: 24-Jun-2017
https://dl.acm.org/doi/10.1145/3140659.3080241
Boyapati RHuang JMajumder PYum KKim E(2017)APPROX-NoCProceedings of the 44th Annual International Symposium on Computer Architecture10.1145/3079856.3080241(666-677)Online publication date: 24-Jun-2017
https://dl.acm.org/doi/10.1145/3079856.3080241
Ceze LSampson A(2017)APPROXIMATE COMPUTINGGetMobile: Mobile Computing and Communications10.1145/3036699.303670320:3(12-16)Online publication date: 5-Jan-2017
https://dl.acm.org/doi/10.1145/3036699.3036703
Ji YZhang YLi SChi PJiang CQu PXie YChen WHsu WYang CLipasti MLee H(2016)NEUTRAMSThe 49th Annual IEEE/ACM International Symposium on Microarchitecture10.5555/3195638.3195663(1-13)Online publication date: 15-Oct-2016
https://dl.acm.org/doi/10.5555/3195638.3195663
Wang SZhang XLi YBashizade RYang SDwyer CLebeck A(2016)Accelerating markov random field inference using molecular optical gibbs sampling unitsACM SIGARCH Computer Architecture News10.1145/3007787.300119644:3(558-569)Online publication date: 18-Jun-2016
https://dl.acm.org/doi/10.1145/3007787.3001196
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents