[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Memristor-CMOS Analog Coprocessor for Acceleration of High-Performance Computing Applications

Published: 01 November 2018 Publication History

Abstract

Vector matrix multiplication computation underlies major applications in machine vision, deep learning, and scientific simulation. These applications require high computational speed and are run on platforms that are size, weight, and power constrained. With the transistor scaling coming to an end, existing digital hardware architectures will not be able to meet this increasing demand. Analog computation with its rich set of primitives and inherent parallel architecture can be faster, more efficient, and compact for some of these applications. One such primitive is a memristor-CMOS crossbar array-based vector matrix multiplication. In this article, we develop a memristor-CMOS analog coprocessor architecture that can handle floating-point computation. To demonstrate the working of the analog coprocessor at a system level, we use a new electronic design automation tool called PSpice Systems Option, which performs integrated cosimulation of MATLAB/Simulink and PSpice. It is shown that the analog coprocessor has a superior performance when compared to other processors, and a speedup of up to 12 × when compared to projected GPU performance is observed. Using the new PSpice Systems Option tool, various application simulations for image processing and solutions to partial differential equations are performed on the analog coprocessor model.<?enlrg 3pt?>

References

[1]
Analog Devices. 2017. Retrieved from http://www.analog.com/en/products/switches-multiplexers/analog-switches-multiplexers/adg901.html.
[2]
ARM Community. 2015. Retrieved from https://community.arm.com/processors/b/blog/posts/introducing-cortex-a32-arm-s-smallest-lowest-power-armv8-a-processor-for-next-generation-32-bit-embedded-applications
[3]
N. Athreyas, D. Gupta, and J. Gupta. 2017. Analog signal processing solution for machine vision applications. Journal of Real-Time Image Processing 13 (Feb. 2017), 1--22. Retrieved from https://link.springer.com/article/10.1007/s11554-017-0669-4.
[4]
M. Bojnordi and E. Ipek. 2016. Memristive Boltzmann machine: A hardware accelerator for combinatorial optimization and deep learning. In Proceedings of HPCA, 1--13.
[5]
B. E. Boser, E. Sackinger, J. Bromley, Y. Le Cun, and L. D. Jackel. 1991. An analog neural network processor with programmable topology. IEEE Journal of Solid-State Circuits 26, 12 (Dec. 1991), 2017--2025.
[6]
Y. Byung-Do. 2015. Low-power and area-efficient shift register using pulsed latches. IEEE Transactions on Circuits and Systems I: Regular Papers 62, 6 (May 2015), 1564--1571.
[7]
Cadence. 2017. Retrieved from http://www.pspice.com/technology/pspice-systems-option.
[8]
P.-Y. Chen, D. Kadetotad, Z. Xu, A. Mohanty, B. Lin, J. Ye, S. Vrudhula, J.-S. Seo, Y. Cao, and S. Yu. 2015. Technology-design co-optimization of resistive cross-point array for accelerating learning algorithms on chip. In IEEE Design, Automation 8 Test in Europe (DATE’15).
[9]
P.-Y. Chen, B. Lin, I.-T. Wang, T.-H. Hou, J. Ye, S. Vrudhula, J.-S. Seo, Y. Cao, and S. Yu. 2015. Mitigating effects of non-ideal synaptic device characteristics for on-chip learning. In IEEE/ACM International Conference on Computer-Aided Design (ICCAD’15).
[10]
P. Chi, S. Li, C. Xu, T. Zhang, J. Zhao, Y. Liu, Y. Wang, and Y. Xiw. 2016. PRIME: A novel processing-in-memory architecture for neural network computation in Reram-based main memory. In ISCA.
[11]
S. Choi, P. Sheridan, and W. D. Lu. 2015. Data clustering using memristor networks. Scientific Reports 5, 10492 (May 2015).
[12]
L. O. Chua. 1971. Memristor-the missing circuit element. IEEE Transactions on Circuit Theory 18, 5 (Sept. 1971), 507--519.
[13]
L. O. Chua. 2012. The fourth element. Proceedings of the IEEE 100, 6 (Apr. 2012), 1920--1927.
[14]
F. Chung and S.-T. Yau. 2000. Discrete green's functions. Journal of Combinatorial Theory 91, 1--2 (July 2000), 191--214.
[15]
F. De Simone, D. Ticca, F. Dufaux, M. Ansorge, and T. Ebrahimi. 2008. A comparative study of color image compression standards using perceptually driven quality metrics. In SPIE Optics and Photonics, Applications of Digital Image Processing.
[16]
V. G. Devereux. 1987. Limiting of YUV Digital Video Signals. BBC Research Department.
[17]
R. Dosselmann and X. D. Yang. 2009. A comprehensive assessment of the structural similarity index. Signal, Image and Video Processing 5, 1 (Nov. 2009), 81--91.
[18]
R. Genov and G. Cauwenberghs. 2001. Charge-mode parallel architecture for vector-matrix multiplication. In Transactions on IEEE Circuits and Systems II 48, 10 (Oct. 2001), 930--936.
[19]
R. Genov and G. Cauwenberghs. 2003. Kerneltron: Support vector “machine” in silicon. IEEE Transactions on Neural Networks 14, 5 (Nov. 2003), 1426--1434.
[20]
R. Gonzalez and R. Woods. 2002. Digital Image Processing. Prentice Hall, Upper Saddle River, NJ.
[21]
S. Han, X. Liu, H. Mao, J. Pu, A. Pedram, M. A. Horowitz, and W. J. Dally. 2016. EIE: Efficient inference engine on compressed deep neural network. In Proceedings of ISCA, 243--254.
[22]
P. Harpe, Y. Zhang, G. Dolmans, K. Philips, and H. D. Groot. 2012. A 7-to-10b 0-to-4MS/s flexible SAR ADC with 6.5-to-16fJ/conversion-step. In ISSCC, 472--474.
[23]
M. R. Hestenes and E. Stiefel. 1952. Methods of conjugate gradients for solving linear systems. Journal of Research of the National Bureau of Standards 49, 1 (May 1952), 409--436.
[24]
J. Hu, C. J. Xue, Q. Zhuge, W-C. Tseng, and E. H-M. Sha. 2011. Towards energy efficient hybrid on-chip scratch pad memory with non-volatile memory. In DATE, 1--6.
[25]
M. Hu, J. P. Strachan, Z. Li, E. M. Grafals, N. Davila, C. Graves, S. Lam, N. Ge, R. S. Williams, and J. Yang. 2016. Dot-product engine for neuromorphic computing: Programming 1T1M crossbar to accelerate matrix-vector multiplication. In Proceedings of DAC-53.
[26]
Intel. 2017. Retrieved from https://www.intelnervana.com/neon/.
[27]
A. K. Jain. 1989. Fundamentals of Digital Image Processing. Prentice Hall, Englewood Cliffs, NJ, 150--153.
[28]
H. Jiang, L. Han, P. Lin, Z. Wang, M. H. Jang, Q. Wu, M. Barnell, J. J. Yang, H. L. Xin, and Q. Xia. 2016. Sub-10nm ta channel responsible for superior performance of a HfO2 memristor. Scientific Reports 6, 28525 (June 2016).
[29]
Norman P. Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, Rick Boyle, Pierre-luc Cantin, Clifford Chao, Chris Clark, Jeremy Coriell, Mike Daley, Matt Dau, Jeffrey Dean, Ben Gelb, Tara Vazir Ghaemmaghami, Rajendra Gottipati, William Gulland, Robert Hagmann, C. Richard Ho, Doug Hogberg, John Hu, Robert Hundt, Dan Hurt, Julian Ibarz, Aaron Jaffey, Alek Jaworski, Alexander Kaplan, Harshit Khaitan, Andy Koch, Naveen Kumar, Steve Lacy, James Laudon, James Law, Diemthu Le, Chris Leary, Zhuyuan Liu, Kyle Lucke, Alan Lundin, Gordon MacKean, Adriana Maggiore, Maire Mahony, Kieran Miller, Rahul Nagarajan, Ravi Narayanaswami, Ray Ni, Kathy Nix, Thomas Norrie, Mark Omernick, Narayana Penukonda, Andy Phelps, Jonathan Ross, Matt Ross, Amir Salek, Emad Samadiani, Chris Severn, Gregory Sizikov, Matthew Snelham, Jed Souter, Dan Steinberg, Andy Swing, Mercedes Tan, Gregory Thorson, Bo Tian, Horia Toma, Erick Tuttle, Vijay Vasudevan, Richard Walter, Walter Wang, Eric Wilcox, and Doe Hyun Yoon. 2017. In-datacenter performance analysis of a tensor processing unit. In ISCA.
[30]
F. Kub, K. Moon, I. Mack, and F. Long. 1990. Programmable analog vector-matrix multipliers. IEEE Journal of Solid-State Circuits 25, 1 (Feb. 1990), 207--214.
[31]
D. Lewis. 2004. SerDes architectures and applications. DesignCon.
[32]
W.-T. Lin, H.-Y. Huang, and T.-H. Kuo. 2014. A 12-bit 40 nm DAC Achieving SFDR>70 dB at 1.6 GS/s and IMD<--61dB at 2.8 GS/s With DEMDRZ Technique. IEEE Journal of Solid-State Circuits 49, 3 (Feb. 2014), 708--717.
[33]
Mathworks. Retrieved from https://www.mathworks.com/help/images/ref/fspecial.html.
[34]
K. K. Moon, F. J. Kub, and I. A. Mack. 1990. Random address 32 times; 32 programmable analog vector-matrix multiplier for artificial neural networks. In Proceedings of the IEEE Custom Integrated Circuits Conference, 26.7/1-26.7/4.
[35]
Nvidia. 2016. NVIDIA Tesla P100”. Nvidia Whitepaper.
[36]
Nvidia. 2017. Retrieved from https://www.nvidia.com/en-us/data-center/volta-gpu-architecture.
[37]
X. Pan and H. Graeb. 2011. Reliability optimization of analog integrated circuits considering the tradeoff between lifetime and area. ICMAT 52, 8 (Oct. 2011), 1559--1564.
[38]
M. Parvizi, K. Allidina, and M. N. El-Gamal. 2016. An ultra-low-power wideband inductorless CMOS LNA with tunable active shunt-feedback. IEEE Transactions on Microwave Theory and Technique 64, 6 (May 2016), 1843--1853.
[39]
W. B. Pennebaker and J. L. Mitchell. 1993. JPEG: Still Image Data Compression Standard. Van Nostrand Reinhold.
[40]
A. Shafiee, A. Nag, N. Muralimanohar, R. Balasubramonian, J. P. Strachan, M. Hu, R. S. Williams, and V. Srikumar. ISAAC: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars. In Proceedings of the ISCA, 14--26.
[41]
P. M. Sheridan, C. Du, and W. D. Lu. 2016. Feature extraction using memristor networks. IEEE Transactions on Neural Networks and Learning Systems 27, 11 (Nov. 2016), 1--10.
[42]
P. Sheridan, W. Ma, and W. Lu. 2014. Pattern recognition with memristor networks. In ISCAS, 1078--1081.
[43]
T. Sohmers. 2017. EE380: Computer systems colloquium seminar. In The REX Neo Architecture: An Energy Efficient New Processor Architecture for HPC, DSP, Machine Learning, and More. Retrieved from https://www.youtube.com/watch?v=ki6jVXZM2XU.
[44]
J. Stam. Stable fluids. In Proceedings of SIGGRAPH, 121--128.
[45]
J. P. Strachan, A. C. Torrezan, F. Miao, M. D. Pickett, J. J. Yang, W. Yi, G. Medeiros-Ribeiro, and R. S. Williams. 2013. State dynamics and modeling of tantalum oxide memristors. IEEE Transactions on Electron Devices 60 7 (July 2013), 2194--2202.
[46]
H. Tang. 2012. Study of Design for Reliability of RF and Analog Integrated Circuits. PhD. Dissertation, Dept. Electrical Eng., University of Central Florida, Orlando, FL.
[47]
L. Tao, S. Liu, L. Li, Y. Wang, S. Zhang, T. Chen, Z. Xu, O. Temam, and Y. Chen. 2016. DaDianNao: A neural network supercomputer. IEEE Transactions on Computers 66, 1 (May 2016), 73--88.
[48]
A. Vatanjou Asghar, T. Ytterdal, and S. Aunet. 2015. Energy efficient sub/near-threshold ripple-carry adder in standard 65 nm CMOS. In ASQED, 7--12.
[49]
S. Winkler. 2005. Digital Video Quality: Vision Models and Metrics. John Wiley 8 Sons, West Sussex.
[50]
J. J. Yang, D. B. Strukov, and D. R. Stewart. 2012. Memristive devices for computing. Nature Nanotechnology 8, 1 (Dec. 2012), 13--24.
[51]
Wei Yi, Sergey E. Savel'ev, Gilberto Medeiros-Ribeiro, Feng Miao, M.-X. Zhang, J. Joshua Yang, Alexander M. Bratkovsky, and R. Stanley Williams. 2014. Quantized conductance coincides with state instability and excess noise in tantalum oxide memristors. Nature Communications 7 (Oct. 2017), 1--6.

Cited By

View all
  • (2024)Energy-Efficient Brain Floating Point Convolutional Neural Network Using MemristorsIEEE Transactions on Electron Devices10.1109/TED.2024.337995371:5(3293-3300)Online publication date: May-2024
  • (2022)Impact of Switching Variability, Memory Window, and Temperature on Vector Matrix Operations Using 65nm CMOS Integrated Hafnium Dioxide-based ReRAM Devices2022 IEEE 31st Microelectronics Design & Test Symposium (MDTS)10.1109/MDTS54894.2022.9826924(1-6)Online publication date: 23-May-2022
  • (2021)Memristive System Based Image Processing Technology: A Review and PerspectiveElectronics10.3390/electronics1024317610:24(3176)Online publication date: 20-Dec-2021
  • Show More Cited By

Index Terms

  1. Memristor-CMOS Analog Coprocessor for Acceleration of High-Performance Computing Applications

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Journal on Emerging Technologies in Computing Systems
    ACM Journal on Emerging Technologies in Computing Systems  Volume 14, Issue 3
    July 2018
    150 pages
    ISSN:1550-4832
    EISSN:1550-4840
    DOI:10.1145/3287773
    • Editor:
    • Yuan Xie
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Journal Family

    Publication History

    Published: 01 November 2018
    Accepted: 01 August 2018
    Revised: 01 July 2018
    Received: 01 December 2017
    Published in JETC Volume 14, Issue 3

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Analog coprocessor
    2. PSpice systems option
    3. crossbar
    4. electronic design automation
    5. hardware accelerator
    6. machine vision
    7. memristor
    8. modeling and simulation
    9. partial differential equations
    10. vector matrix multiplication

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    • DARPA contract

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)48
    • Downloads (Last 6 weeks)6
    Reflects downloads up to 03 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Energy-Efficient Brain Floating Point Convolutional Neural Network Using MemristorsIEEE Transactions on Electron Devices10.1109/TED.2024.337995371:5(3293-3300)Online publication date: May-2024
    • (2022)Impact of Switching Variability, Memory Window, and Temperature on Vector Matrix Operations Using 65nm CMOS Integrated Hafnium Dioxide-based ReRAM Devices2022 IEEE 31st Microelectronics Design & Test Symposium (MDTS)10.1109/MDTS54894.2022.9826924(1-6)Online publication date: 23-May-2022
    • (2021)Memristive System Based Image Processing Technology: A Review and PerspectiveElectronics10.3390/electronics1024317610:24(3176)Online publication date: 20-Dec-2021
    • (2021)All Hardware-based Two-layer Perceptron Implemented in Memristor Crossbar Arrays2021 IEEE International Symposium on Circuits and Systems (ISCAS)10.1109/ISCAS51556.2021.9401793(1-5)Online publication date: May-2021
    • (2020)Impact of Switching Variability of 65nm CMOS Integrated Hafnium Dioxide-based ReRAM Devices on Distinct Level Operations2020 IEEE International Integrated Reliability Workshop (IIRW)10.1109/IIRW49815.2020.9312855(1-4)Online publication date: Oct-2020
    • (2019)Cross-point Resistive MemoryACM Transactions on Design Automation of Electronic Systems10.1145/332506724:4(1-37)Online publication date: 20-Jun-2019

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media