[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.5555/2665671.2665746acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
research-article

General-purpose code acceleration with limited-precision analog computation

Published: 14 June 2014 Publication History

Abstract

As improvements in per-transistor speed and energy efficiency diminish, radical departures from conventional approaches are becoming critical to improving the performance and energy efficiency of general-purpose processors. We propose a solution--from circuit to compiler-that enables general-purpose use of limited-precision, analog hardwareto accelerate "approximable" code---code that can tolerate imprecise execution. We utilize an algorithmic transformation that automatically converts approximable regions of code from a von Neumann model to an "analog" neural model. We outline the challenges of taking an analog approach, including restricted-range value encoding, limited precision in computation, circuit inaccuracies, noise, and constraints on supported topologies. We address these limitations with a combination of circuit techniques, a hardware/software interface, neuralnetwork training techniques, and compiler support. Analog neural acceleration provides whole application speedup of 3.7x and energy savings of 6.3x with quality loss less than 10% for all except one benchmark. These results show that using limited-precision analog circuits for code acceleration, through a neural approach, is both feasible and beneficial over a range of approximation-tolerant, emerging applications including financial analysis, signal processing, robotics, 3D gaming, compression, and image processing

References

[1]
P. E. Allen and D. R. Holberg, CMOS Analog Circuit Design. Oxford University Press, 2002.
[2]
C. Alvarez, J. Corbal, and M. Valero, "Fuzzy memoization for floatingpoint multimedia applications," IEEE TC, 2005.
[3]
W. Baek and T. M. Chilimbi, "Green: A framework for supporting energy-conscious programming using controlled approximation," in PLDI, 2010.
[4]
B. Belhadj, A. Joubert, Z. Li, R. Héliot, and O. Temam, "Continuous real-world inputs can open up alternative accelerator designs," in ISCA, 2013.
[5]
B. E. Boser, E. Säckinger, J. Bromley, Y. L. Cun, L. D. Jackel, and S. Member, "An analog neural network processor with programmable topology," JSSC, 1991.
[6]
Y. Cao, "Predictive technology models," 2013. Available: http://ptm.asu.edu
[7]
M. Carbin, S. Misailovic, and M. C. Rinard, "Verifying quantitative reliability for programs that execute on unreliable hardware," in OOPSLA, 2013.
[8]
L. N. Chakrapani, B. E. S. Akgul, S. Cheemalavagu, P. Korkmaz, K. V. Palem, and B. Seshasayee, "Ultra-efficient (embedded) SOC architectures based on probabilistic CMOS (PCMOS) technology," in DATE, 2006.
[9]
T. Chen, Y. Chen, M. Duranton, Q. Guo, A. Hashmi, M. Lipasti, A. Nere, S. Qiu, M. Sebag, and O. Temam, "BenchNN: On the broad potential application scope of hardware neural network accelerators," in IISWC, 2012.
[10]
F. Choudry, E. Fiesler, A. Choudry, and H. J. Caulfield, "A weight discretization paradigm for optical neural networks," in ICOE, 1990.
[11]
N. Clark, M. Kudlur, H. Park, S. Mahlke, and K. Flautner, "Applicationspecific processing on a general-purpose core via transparent instruction set customization," in MICRO, 2004.
[12]
M. de Kruijf and K. Sankaralingam, "Exploring the synergy of emerging workloads and silicon reliability trends," in SELSE, 2009.
[13]
M. de Kruijf, S. Nomura, and K. Sankaralingam, "Relax: An architectural framework for software recovery of hardware faults," in ISCA, 2010.
[14]
S. Draghici, "On the capabilities of neural networks using limited precision weights," Elsevier NN, 2002.
[15]
H. Esmaeilzadeh, P. Saeedi, B. N. Araabi, C. Lucas, and S. M. Fakhraie, "Neural network stream processing core (NnSP) for embedded systems," in ISCAS, 2006.
[16]
H. Esmaeilzadeh, E. Blem, R. St. Amant, K. Sankaralingam, and D. Burger, "Dark silicon and the end of multicore scaling," in ISCA, 2011.
[17]
H. Esmaeilzadeh, A. Sampson, L. Ceze, and D. Burger, "Architecture support for disciplined approximate programming," in ASPLOS, 2012.
[18]
H. Esmaeilzadeh, A. Sampson, L. Ceze, and D. Burger, "Neural acceleration for general-purpose approximate programs," in MICRO, 2012.
[19]
K. Fan, M. Kudlur, G. Dasika, and S. Mahlke, "Bridging the computation gap between programmable processors and hardwired accelerators," in HPCA, 2009.
[20]
Y. Fang, H. Li, and X. Li, "A fault criticality evaluation framework of digital systems for error tolerant video applications," in ATS, 2011.
[21]
A. Frank and A. Asuncion, "UCI machine learning repository," 2010. Available: http://archive.ics.uci.edu/ml
[22]
S. Galal and M. Horowitz, "Energy-efficient floating-point unit design," IEEE TC, 2011.
[23]
V. Govindaraju, C. Ho, and K. Sankaralingam, "Dynamically specialized datapaths for energy efficient computing," in HPCA, 2011.
[24]
S. Gupta, S. Feng, A. Ansari, S. Mahlke, and D. August, "Bundled execution of recurring traces for energy-efficient general purpose processing," in MICRO, 2011.
[25]
N. Hardavellas, M. Ferdman, B. Falsafi, and A. Ailamaki, "Toward dark silicon in servers," IEEE Micro, 2011.
[26]
A. Hashmi, H. Berry, O. Temam, and M. Lipasti, "Automatic abstraction and fault tolerance in cortical microarchitectures," in ISCA, 2011.
[27]
A. Hashmi, A. Nere, J. J. Thomas, and M. Lipasti, "A case for neuromorphic ISAs," in ASPLOS, 2011.
[28]
R. Hegde and N. R. Shanbhag, "Energy-efficient signal processing via algorithmic noise-tolerance," in ISLPED, 1999.
[29]
Y. Huang, P. Ienne, O. Temam, Y. Chen, and C. Wu, "Elastic cgras," in FPGA, 2013.
[30]
C. Igel and M. Hüsken, "Improving the RPROP learning algorithm," in NC, 2000.
[31]
D. A. Johns and K. Martin, Analog Integrated Circuit Design. John Wiley and Sons, Inc., 1997.
[32]
A. Joubert, B. Belhadj, O. Temam, and R. Héliot, "Hardware spiking neurons design: Analog or digital?" in IJCNN, 2012.
[33]
S. Li, J. H. Ahn, R. D. Strong, J. B. Brockman, D. M. Tullsen, and N. P. Jouppi, "McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures," in MICRO, 2009.
[34]
X. Li and D. Yeung, "Exploiting soft computing for increased fault tolerance," in ASGI, 2006.
[35]
S. Liu, K. Pattabiraman, T. Moscibroda, and B. G. Zorn, "Flikker: Saving dram refresh-power through critical data partitioning," in ASPLOS, 2011.
[36]
S. Misailovic, S. Sidiroglou, H. Hoffman, and M. Rinard, "Quality of service profiling," in ICSE, 2010.
[37]
N. Muralimanohar, R. Balasubramonian, and N. Jouppi, "Optimizing NUCA organizations and wiring alternatives for large caches with CACTI 6.0," in MICRO, 2007.
[38]
S. Narayanan, J. Sartori, R. Kumar, and D. L. Jones, "Scalable stochastic processors," in DATE, 2010.
[39]
A. Patel, F. Afram, S. Chen, and K. Ghose, "MARSSx86: A full system simulator for x86 CPUs," in DAC, 2011.
[40]
K. W. Przytula and V. K. P. Kumar, Eds., Parallel Digital Implementations of Neural Networks. Prentice Hall, 1993.
[41]
A. Putnam, D. Bennett, E. Dellinger, J. Mason, P. Sundararajan, and S. Eggers, "CHiMPS: A high-level compilation flow for hybrid CPUFPGA architectures," in FPGA, 2008.
[42]
R. Razdan and M. D. Smith, "A high-performance microarchitecture with hardware-programmable functional units," in MICRO, 1994.
[43]
D. E. Rumelhart, G. E. Hinton, and R. J. Williams, "Learning internal representations by error propagation," in Parallel Distributed Processing: Explorations in the Microstructure of Cognition. MIT Press, 1986.
[44]
M. Samadi, J. Lee, D. A. Jamshidi, A. Hormati, and S. Mahlke, "Sage: Self-tuning approximation for graphics engines," in MICRO, 2013.
[45]
A. Sampson, W. Dietl, E. Fortuna, D. Gnanapragasam, L. Ceze, and D. Grossman, "EnerJ: Approximate data types for safe and general low-power computation," in PLDI, 2011.
[46]
A. Sampson, J. Nelson, K. Strauss, and L. Ceze, "Approximate storage in solid-state memories," in MICRO, 2013.
[47]
J. Schemmel, J. Fieres, and K. Meier, "Wafer-scale integration of analog neural networks," in IJCNN, 2008.
[48]
R. St. Amant, D. A. Jiménez, and D. Burger, "Mixed-signal approximate computation: A neural predictor case study," IEEE MICRO Top Picks, vol. 29, no. 1, January/February 2009.
[49]
S. M. Tam, B. Gupta, H. A. Castro, and M. Holler, "Learning on an analog VLSI neural network chip," in SMC, 1990.
[50]
O. Temam, "A defect-tolerant accelerator for emerging highperformance applications," in ISCA, 2012.
[51]
S. Venkataramani, V. K. Chippa, S. T. Chakradhar, K. Roy, and A. Raghunathan, "Quality-programmable vector processors for approximate computing," in MICRO, 2013.
[52]
G. Venkatesh, J. Sampson, N. Goulding, S. Garcia, V. Bryksin, J. Lugo- Martinez, S. Swanson, and M. B. Taylor, "Conservation cores: Reducing the energy of mature computations," in ASPLOS, 2010.
[53]
G. Venkatesh, J. Sampson, N. Goulding, S. K. Venkata, M. Taylor, and S. Swanson, "QsCores: Trading dark silicon for scalable energy efficiency with quasi-specific cores," in MICRO, 2011.
[54]
V. Wong and M. Horowitz, "Soft error resilience of probabilistic inference applications," in SELSE, 2006.
[55]
J. Zhu and P. Sutton, "FPGA implementations of neural networks: A survey of a decade of progress," in FPL, 2003.

Cited By

View all
  • (2019)Architecture-Aware Approximate ComputingProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/3341617.33261533:2(1-24)Online publication date: 19-Jun-2019
  • (2019)Long Short-Term Memory Network Design for Analog ComputingACM Journal on Emerging Technologies in Computing Systems10.1145/328939315:1(1-27)Online publication date: 9-Jan-2019
  • (2018)In-DRAM near-data approximate acceleration for GPUsProceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques10.1145/3243176.3243188(1-14)Online publication date: 1-Nov-2018
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ISCA '14: Proceeding of the 41st annual international symposium on Computer architecuture
June 2014
566 pages
ISBN:9781479943944

Sponsors

Publisher

IEEE Press

Publication History

Published: 14 June 2014

Check for updates

Qualifiers

  • Research-article

Funding Sources

Conference

ISCA'14
Sponsor:

Acceptance Rates

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)58
  • Downloads (Last 6 weeks)6
Reflects downloads up to 11 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2019)Architecture-Aware Approximate ComputingProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/3341617.33261533:2(1-24)Online publication date: 19-Jun-2019
  • (2019)Long Short-Term Memory Network Design for Analog ComputingACM Journal on Emerging Technologies in Computing Systems10.1145/328939315:1(1-27)Online publication date: 9-Jan-2019
  • (2018)In-DRAM near-data approximate acceleration for GPUsProceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques10.1145/3243176.3243188(1-14)Online publication date: 1-Nov-2018
  • (2018)Beyond Number of Bit ErasuresACM SIGACT News10.1145/3232679.323268949:2(33-56)Online publication date: 13-Jun-2018
  • (2017)Cellular neural network friendly convolutional neural networksProceedings of the Conference on Design, Automation & Test in Europe10.5555/3130379.3130412(145-150)Online publication date: 27-Mar-2017
  • (2017)APPROX-NoCACM SIGARCH Computer Architecture News10.1145/3140659.308024145:2(666-677)Online publication date: 24-Jun-2017
  • (2017)APPROX-NoCProceedings of the 44th Annual International Symposium on Computer Architecture10.1145/3079856.3080241(666-677)Online publication date: 24-Jun-2017
  • (2017)APPROXIMATE COMPUTINGGetMobile: Mobile Computing and Communications10.1145/3036699.303670320:3(12-16)Online publication date: 5-Jan-2017
  • (2016)NEUTRAMSThe 49th Annual IEEE/ACM International Symposium on Microarchitecture10.5555/3195638.3195663(1-13)Online publication date: 15-Oct-2016
  • (2016)Accelerating markov random field inference using molecular optical gibbs sampling unitsACM SIGARCH Computer Architecture News10.1145/3007787.300119644:3(558-569)Online publication date: 18-Jun-2016
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media