[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3422575.3422790acmotherconferencesArticle/Chapter ViewAbstractPublication PagesmemsysConference Proceedingsconference-collections
research-article

A Low Power In-DRAM Architecture for Quantized CNNs using Fast Winograd Convolutions

Published: 21 March 2021 Publication History

Abstract

In recent years, the performance and memory bandwidth bottlenecks associated with memory intensive applications are encouraging researchers to explore Processing in Memory (PIM) architectures. In this paper, we focus on DRAM-based PIM architecture for Convolutional Neural Network (CNN) inference. The close proximity of the computation units and the memory cells in a PIM architecture reduces the data movement costs and improves the overall energy efficiency. In this context, CNN inference requires efficient implementations of the area-intensive arithmetic multipliers near the highly dense DRAM regions. Additionally, the multiplication units increase the overall latency and power consumption. Due to this, most previous works in this domain uses binary or ternary weights, which replaces the complicated multipliers with bitwise logical operations resulting in efficient implementations. However, it is well known that the binary and ternary weight networks considerably affect the accuracy and hence can be used only for limited applications.
In this work, we present a novel DRAM-based PIM architecture for quantized (8-bit weight and input) CNN inference by utilizing the complexity reduction offered by fast convolution algorithms. The Winograd convolution accelerates the widely-used small convolution sizes by reducing the number of multipliers as compared to direct convolution. In order to exploit data parallelism and minimize energy, the proposed architecture integrates the basic computation units at the output of the Primary Sense Amplifiers (PSAs) and the rest of the substantial logic near the Secondary Sense Amplifiers (SSAs) and completely comply with the commodity DRAM technology and process. Commodity DRAMs are temperature sensitive devices, hence integration of the additional logic is challenging due to increase in the overall power consumption. In contrast to previous works, our architecture consumes 0.525 W, which is within the range of commodity DRAM thermal design power (i.e. ≤ 1 W). For VGG16, the proposed architecture achieves 21.69 GOPS per device and an area overhead of 2.04% compared to a commodity 8 Gb DRAM. The architecture delivers a peak performance of 7.552 TOPS per memory channel while maintaining a high energy efficiency of 95.52 GOPS/W. We also demonstrate that our architecture consumes 10.1 × less power and is 2.23 × energy efficient as compared to prior DRAM-based PIM architectures.

References

[1]
A. Agrawal, A. Jaiswal, D. Roy, B. Han, G. Srinivasan, A. Ankit, and K. Roy. 2019. Xcel-RAM: Accelerating Binary Neural Networks in High-Throughput SRAM Compute Arrays. IEEE Transactions on Circuits and Systems I: Regular Papers PP (04 2019), 1–13. https://doi.org/10.1109/TCSI.2019.2907488
[2]
S. Angizi and D. Fan. 2019. Accelerating Bulk Bit-Wise X(N)OR Operation in Processing-in-DRAM Platform. CoRR abs/1904.05782(2019). arxiv:1904.05782http://arxiv.org/abs/1904.05782
[3]
A. Biswas and A. P. Chandrakasan. 2018. Conv-RAM: An energy-efficient SRAM with embedded convolution computation for low-power CNN-based machine learning applications. In 2018 IEEE International Solid - State Circuits Conference - (ISSCC). 488–490.
[4]
M. Blott, T. B. Preußer, N. J. Fraser, G. Gambardella, K. O’Brien, and Y. Umuroglu. 2018. FINN-R: An End-to-End Deep-Learning Framework for Fast Exploration of Quantized Neural Networks. CoRR abs/1809.04570(2018). arxiv:1809.04570http://arxiv.org/abs/1809.04570
[5]
K. Chandrasekar, C. Weis, B. Akesson, N. Wehn, and K. Goossens. 2013. Towards variation-aware system-level power estimation of DRAMs: An empirical approach. In 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC). 1–8.
[6]
A. Chen and M. Lin. 2011. Variability of resistive switching memories and its impact on crossbar array performance. In 2011 International Reliability Physics Symposium. MY.7.1–MY.7.4. https://doi.org/10.1109/IRPS.2011.5784590
[7]
Y. Chen, T. Krishna, J. S. Emer, and V. Sze. 2017. Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks. IEEE Journal of Solid-State Circuits 52, 1 (Jan 2017), 127–138. https://doi.org/10.1109/JSSC.2016.2616357
[8]
P. Chi, S. Li, C. Xu, T. Zhang, J. Zhao, Y. Liu, Y. Wang, and Y. Xie. 2016. PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory. In 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). 27–39.
[9]
J. Choe. 2017. Samsung 18 nm DRAM cell integration: QPT and higher uniformed capacitor high-k dielectrics. https://www.techinsights.com/blog/samsung-18-nm-dram-cell-integration-qpt-and-higher-uniformed-capacitor-high-k-dielectrics
[10]
J. Choe. 2017. SK hynix’ 21 nm DRAM Cell Technology: Comparison of 1st and 2nd generation. https://www.techinsights.com/blog/sk-hynix-21-nm-dram-cell-technology-comparison-1st-and-2nd-generation
[11]
J. Choe. 2018. Micron’s 1x DRAMs Examined. https://www.eetimes.com/author.asp?section_id=36&doc_id=1333289
[12]
A. Das. 2012. Hynix DRAM layout, process integration adapt to change. https://www.eetimes.com/hynix-dram-layout-process-integration-adapt-to-change/#
[13]
Q. Deng, L. Jiang, Y. Zhang, M. Zhang, and J. Yang. 2018. DrAcc: A DRAM Based Accelerator for Accurate CNN Inference. In Proceedings of the 55th Annual Design Automation Conference (San Francisco, California) (DAC ’18). ACM, New York, NY, USA, Article 168, 6 pages. https://doi.org/10.1145/3195970.3196029
[14]
Q. Deng, Y. Zhang, M. Zhang, and J. Yang. 2019. LAcc: Exploiting Lookup Table-based Fast and Accurate Vector Multiplication in DRAM-based CNN Accelerator. In 2019 56th ACM/IEEE Design Automation Conference (DAC). 1–6.
[15]
J. Draper, J. Chame, M. Hall, C. Steele, T. Barrett, J. LaCoss, J. Granacki, J. Shin, C. Chen, C. W. Kang, I. Kim, and G. Daglikoca. 2002. The Architecture of the DIVA Processing-in-Memory Chip. In Proceedings of the 16th International Conference on Supercomputing (New York, New York, USA) (ICS ’02). Association for Computing Machinery, New York, NY, USA, 14–25. https://doi.org/10.1145/514191.514197
[16]
D. G. Elliott, M. Stumm, W. M. Snelgrove, C. Cojocaru, and R. Mckenzie. 1999. Computational RAM: implementing processors in memory. IEEE Design Test of Computers 16, 1 (1999), 32–41.
[17]
A. Fantini, L. Goux, R. Degraeve, D. J. Wouters, N. Raghavan, G. Kar, A. Belmonte, Y. . Chen, B. Govoreanu, and M. Jurczak. 2013. Intrinsic switching variability in HfO2RRAM. In 2013 5th IEEE International Memory Workshop. 30–33. https://doi.org/10.1109/IMW.2013.6582090
[18]
J. Fernandez-Marques, P. N. Whatmough, A. Mundy, and M. Mattina. 2020. Searching for Winograd-aware Quantized Networks. arxiv:2002.10711 [cs.LG]
[19]
Z. Guz. 2014. Real-Time Analytics as the Killer Application for Processing-In-Memory.
[20]
K. He, X. Zhang, S. Ren, and J. Sun. [n.d.]. Deep Residual Learning for Image Recognition. CVPR ’15 ([n. d.]).
[21]
Y. Huang, J. Shen, Z. Wang, M. Wen, and C. Zhang. 2018. A High-efficiency FPGA-based Accelerator for Convolutional Neural Networks using Winograd Algorithm. Journal of Physics: Conference Series 1026 (may 2018), 012019. https://doi.org/10.1088/1742-6596/1026/1/012019
[22]
L. Jiang, M. Kim, W. Wen, and D. Wang. 2017. XNOR-POP: A processing-in-memory architecture for binary Convolutional Neural Networks in Wide-IO2 DRAMs. In 2017 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED). 1–6. https://doi.org/10.1109/ISLPED.2017.8009163
[23]
Z. Jiang, S. Yin, M. Seok, and J. Seo. 2018. XNOR-SRAM: In-Memory Computing SRAM Macro for Binary/Ternary Deep Neural Networks. In 2018 IEEE Symposium on VLSI Technology. 173–174.
[24]
N. P. Jouppi, C. Young, N. Patil, D. Patterson, G. Agrawal, R. Bajwa, S. Bates, S. Bhatia, N. Boden, A. Borchers, R. Boyle, P.-l. Cantin, C. Chao, C. Clark, J. Coriell, M. Daley, M. Dau, J. Dean, B. Gelb, T. V. Ghaemmaghami, R. Gottipati, W. Gulland, R. Hagmann, C. R. Ho, D. Hogberg, J. Hu, R. Hundt, D. Hurt, J. Ibarz, A. Jaffey, A. Jaworski, A. Kaplan, H. Khaitan, D. Killebrew, A. Koch, N. Kumar, S. Lacy, J. Laudon, J. Law, D. Le, C. Leary, Z. Liu, K. Lucke, A. Lundin, G. MacKean, A. Maggiore, M. Mahony, K. Miller, R. Nagarajan, R. Narayanaswami, R. Ni, K. Nix, T. Norrie, M. Omernick, N. Penukonda, A. Phelps, J. Ross, M. Ross, A. Salek, E. Samadiani, C. Severn, G. Sizikov, M. Snelham, J. Souter, D. Steinberg, A. Swing, M. Tan, G. Thorson, B. Tian, H. Toma, E. Tuttle, V. Vasudevan, R. Walter, W. Wang, E. Wilcox, and D. H. Yoon. 2017. In-Datacenter Performance Analysis of a Tensor Processing Unit. In Proceedings of the 44th Annual International Symposium on Computer Architecture (Toronto, ON, Canada) (ISCA ’17). ACM, New York, NY, USA, 1–12. https://doi.org/10.1145/3079856.3080246
[25]
Y.-F. Kao, W. C. Zhuang, C.-J. Lin, and Y.-C. King. 2018. A Study of the Variability in Contact Resistive Random Access Memory by Stochastic Vacancy Model. Nanoscale Research Letters 13, 1 (16 Jul 2018), 213. https://doi.org/10.1186/s11671-018-2619-x
[26]
S. Ko and S. Yu. 2020. SMART Paths for Latency Reduction in ReRAM Processing-In-Memory Architecture for CNN Inference. arxiv:2004.04865 [cs.AR]
[27]
A. Lavin. 2015. Fast Algorithms for Convolutional Neural Networks. CoRR abs/1509.09308(2015). arxiv:1509.09308http://arxiv.org/abs/1509.09308
[28]
D. U. Lee, K. W. Kim, K. W. Kim, H. Kim, J. Y. Kim, Y. J. Park, J. H. Kim, D. S. Kim, H. B. Park, J. W. Shin, J. H. Cho, K. H. Kwon, M. J. Kim, J. Lee, K. W. Park, B. Chung, and S. Hong. 2014. 25.2 A 1.2V 8Gb 8-channel 128GB/s high-bandwidth memory (HBM) stacked DRAM with effective microbump I/O test methods using 29nm process and TSV. In 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC). 432–433.
[29]
J. C. Lee, J. Kim, K. W. Kim, Y. J. Ku, D. S. Kim, C. Jeong, T. S. Yun, H. Kim, H. S. Cho, Y. O. Kim, J. H. Kim, J. H. Kim, S. Oh, H. S. Lee, K. H. Kwon, D. B. Lee, Y. J. Choi, J. Lee, H. G. Kim, J. H. Chun, J. Oh, and S. H. Lee. 2016. 18.3 A 1.2V 64Gb 8-channel 256GB/s HBM DRAM with peripheral-base-die architecture and small-swing technique on heavy load interface. In 2016 IEEE International Solid-State Circuits Conference (ISSCC). 318–319.
[30]
W. J. Lee, C. H. Kim, Y. Paik, J. Park, I. Park, and S. W. Kim. 2019. Design of Processing-“Inside”-Memory Optimized for DRAM Behaviors. IEEE Access 7(2019), 82633–82648.
[31]
S. Li, A. O. Glova, X. Hu, P. Gu, D. Niu, K. T. Malladi, H. Zheng, B. Brennan, and Y. Xie. 2018. SCOPE: A Stochastic Computing Engine for DRAM-Based In-Situ Accelerator. In 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 696–709.
[32]
S. Li, D. Niu, K. T. Malladi, H. Zheng, B. Brennan, and Y. Xie. 2017. DRISA: A DRAM-based Reconfigurable In-Situ Accelerator. In 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 288–301.
[33]
S. Li, C. Xu, Q. Zou, J. Zhao, Y. Lu, and Y. Xie. 2016. Pinatubo: A processing-in-memory architecture for bulk bitwise operations in emerging non-volatile memories. In 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC). 1–6.
[34]
H. Liu, J. Han, and Y. Zhang. 2019. A Unified Framework for Training, Mapping and Simulation of ReRAM-Based Convolutional Neural Network Acceleration. IEEE Computer Architecture Letters 18, 1 (2019), 63–66.
[35]
J. Liu, J. Wang, Y. Zhou, and F. Liu. 2019. A Cloud Server Oriented FPGA Accelerator for LSTM Recurrent Neural Network. IEEE Access 7(2019), 122408–122418.
[36]
L. Lu, Y. Liang, Q. Xiao, and S. Yan. 2017. Evaluating Fast Algorithms for Convolutional Neural Networks on FPGAs. In 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). 101–108.
[37]
L. Meng and J. Brothers. 2019. Efficient Winograd Convolution via Integer Arithmetic. CoRR abs/1901.01965(2019). arxiv:1901.01965http://arxiv.org/abs/1901.01965
[38]
R. Nair, S. F. Antao, C. Bertolli, P. Bose, J. R. Brunheroto, T. Chen, C. . Cher, C. H. A. Costa, J. Doi, C. Evangelinos, B. M. Fleischer, T. W. Fox, D. S. Gallo, L. Grinberg, J. A. Gunnels, A. C. Jacob, P. Jacob, H. M. Jacobson, T. Karkhanis, C. Kim, J. H. Moreno, J. K. O’Brien, M. Ohmacht, Y. Park, D. A. Prener, B. S. Rosenburg, K. D. Ryu, O. Sallenave, M. J. Serrano, P. D. M. Siegl, K. Sugavanam, and Z. Sura. 2015. Active Memory Cube: A processing-in-memory architecture for exascale systems. IBM Journal of Research and Development 59, 2/3 (2015), 17:1–17:14.
[39]
S. Okumura, M. Yabuuchi, K. Hijioka, and K. Nose. 2019. A Ternary Based Bit Scalable, 8.80 TOPS/W CNN accelerator with Many-core Processing-in-memory Architecture with 896K synapses/mm2. In 2019 Symposium on VLSI Circuits. C248–C249.
[40]
O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, and et. al.2015. ImageNet Large Scale Visual Recognition Challenge. Int. J. Comput. Vision 115, 3 (Dec. 2015).
[41]
V. Seshadri, K. Hsieh, A. Boroum, D. Lee, M. A. Kozuch, O. Mutlu, P. B. Gibbons, and T. C. Mowry. 2015. Fast Bulk Bitwise AND and OR in DRAM. IEEE Computer Architecture Letters 14, 2 (2015), 127–131.
[42]
V. Seshadri, D. Lee, T. Mullins, H. Hassan, A. Boroumand, J. Kim, M.A. Kozuch, O. Mutlu, P. B. Gibbons, and T.C. Mowry. 2017. Ambit: In-memory Accelerator for Bulk Bitwise Operations Using Commodity DRAM Technology. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture (Cambridge, Massachusetts) (MICRO-50 ’17). ACM, New York, NY, USA, 273–287. https://doi.org/10.1145/3123939.3124544
[43]
V. Seshadri, D. Lee, T. Mullins, H. Hassan, A. Boroumand, J. Kim, M. A. Kozuch, O. Mutlu, P. B. Gibbons, and T. C. Mowry. 2016. Buddy-RAM: Improving the Performance and Efficiency of Bulk Bitwise Operations Using DRAM. CoRR abs/1611.09988(2016). arxiv:1611.09988http://arxiv.org/abs/1611.09988
[44]
V. Seshadri and O. Mutlu. 2019. In-DRAM Bulk Bitwise Execution Engine. CoRR abs/1905.09822(2019). arxiv:1905.09822http://arxiv.org/abs/1905.09822
[45]
X. Si, J. Chen, Y. Tu, W. Huang, J. Wang, Y. Chiu, W. Wei, S. Wu, X. Sun, R. Liu, S. Yu, R. Liu, C. Hsieh, K. Tang, Q. Li, and M. Chang. 2019. 24.5 A Twin-8T SRAM Computation-In-Memory Macro for Multiple-Bit CNN-Based Machine Learning. In 2019 IEEE International Solid- State Circuits Conference - (ISSCC). 396–398.
[46]
X. Si, Y. Tu, W. Huanq, J. Su, P. Lu, J. Wang, T. Liu, S. Wu, R. Liu, Y. Chou, Z. Zhang, S. Sie, W. Wei, Y. Lo, T. Wen, T. Hsu, Y. Chen, W. Shih, C. Lo, R. Liu, C. Hsieh, K. Tang, N. Lien, W. Shih, Y. He, Q. Li, and M. Chang. 2020. 15.5 A 28nm 64Kb 6T SRAM Computing-in-Memory Macro with 8b MAC Operation for AI Edge Chips. In 2020 IEEE International Solid- State Circuits Conference - (ISSCC). 246–248.
[47]
J. Sim, H. Seol, and L. Kim. 2018. NID: Processing Binary Convolutional Neural Network in Commodity DRAM. In 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD). 1–8.
[48]
K. Simonyan and A. Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. In International Conference on Learning Representations.
[49]
J. E. Stine, I. Castellanos, M. Wood, J. Henson, F. Love, W. R. Davis, P. D. Franzon, M. Bucher, S. Basavarajaiah, J. Oh, and R. Jenkal. 2007. FreePDK: An Open-Source Variation-Aware Design Kit. In 2007 IEEE International Conference on Microelectronic Systems Education (MSE’07). 173–174.
[50]
H.S. Stone. 1970. A Logic-in-Memory Computer. IEEE Trans. Comput. 19, 1 (Jan. 1970), 73–78. https://doi.org/10.1109/TC.1970.5008902
[51]
C. Sudarshan, J. Lappas, M. M. Ghaffar, V. Rybalkin, C. Weis, M. Jung, and N. Wehn. 2019. An In-DRAM Neural Network Processing Engine. In 2019 IEEE International Symposium on Circuits and Systems (ISCAS). 1–5.
[52]
M. Sung, S. Jang, H. Lee, Y. Ji, J. Kang, T. Jung, T. Ahn, Y. Son, H. Kim, S. Lee, S. Lee, J. Lee, S. Baek, E. Doh, H. Cho, T. Jang, I. Jang, J. Han, K. Ko, Y. Lee, S. Shin, J. Yu, S. Cho, J. Han, D. Kang, J. Kim, J. Lee, K. Ban, S. Yeom, H. Nam, D. Lee, M. Jeong, B. Kwak, J. Park, K. Choi, S. Park, N. Kwak, and S. Hong. 2015. Gate-first high-k/metal gate DRAM technology for low power and high performance products. In 2015 IEEE International Electron Devices Meeting (IEDM). 26.6.1–26.6.4.
[53]
C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. 2015. Going Deeper with Convolutions. In Computer Vision and Pattern Recognition (CVPR). http://arxiv.org/abs/1409.4842
[54]
TechInsights. 2014. TECHNOLOGY ROADMAP of DRAM for Three Major manufacturers: Samsung, SK-Hynix and Micron. https://vdocuments.site/technology-roadmap-of-dram-for-three-major-manufacturers-samsung-sk-hynix.html
[55]
TechInsights. 2017. Samsung 18 nm DRAMAnalysis. https://www.techinsights.com/blog/samsung-18-nm-dram-analysis
[56]
TechInsights. 2018. Micron Technology MT43A4G40200NFA-S15 ES:A HMC Gen2 - Memory Functional Analysis. https://w2.techinsights.com/l/4202/2019-08-28/2hbr19/4202/248106/Sample_Report_MFR_1810_802_Memory_Floorplan_Analysis.pdf.
[57]
C. Weis, I. Loi, L. Benini, and N. Wehn. 2013. Exploration and Optimization of 3-D Integrated DRAM Subsystems. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 32(2013), 597–610.
[58]
C. Weis, A. Mutaal, O. Naji, M. Jung, A. Hansson, and N. Wehn. 2016. DRAMSpec: A High-Level DRAM Timing, Power and Area Exploration Tool. International Journal of Parallel Programming 45 (11 2016). https://doi.org/10.1007/s10766-016-0473-y
[59]
S. Winograd. 1980. Arithmetic Complexity of Computations. Society for Industrial and Applied Mathematics. https://doi.org/10.1137/1.9781611970364 arXiv:https://epubs.siam.org/doi/pdf/10.1137/1.9781611970364
[60]
C. Xue, W. Chen, J. Liu, J. Li, W. Lin, W. Lin, J. Wang, W. Wei, T. Chang, T. Chang, T. Huang, H. Kao, S. Wei, Y. Chiu, C. Lee, C. Lo, Y. King, C. Lin, R. Liu, C. Hsieh, K. Tang, and M. Chang. 2019. 24.1 A 1Mb Multibit ReRAM Computing-In-Memory Macro with 14.6ns Parallel MAC Computing Time for CNN Based AI Edge Processors. In 2019 IEEE International Solid- State Circuits Conference - (ISSCC). 388–390.
[61]
C. Xue, W. Chen, J. Liu, J. Li, W. Lin, W. Lin, J. Wang, W. Wei, T. Huang, T. Chang, T. Chang, H. Kao, Y. Chiu, C. Lee, Y. King, C. Lin, R. Liu, C. Hsieh, K. Tang, and M. Chang. 2020. Embedded 1-Mb ReRAM-Based Computing-in- Memory Macro With Multibit Input and Weight for CNN-Based AI Edge Processors. IEEE Journal of Solid-State Circuits 55, 1 (2020), 203–215.
[62]
Y. Kang, W. Huang, S.-M. Yoo, D. Keen, Z. Ge, V. Lam, P. Pattnaik, and J. Torrellas. 1999. FlexRAM: toward an advanced intelligent memory system. In Proceedings 1999 IEEE International Conference on Computer Design: VLSI in Computers and Processors (Cat. No.99CB37040). 192–201.
[63]
C. Yang, Y. Wang, X. Wang, and L. Geng. 2019. WRA: A 2.2-to-6.3 TOPS Highly Unified Dynamically Reconfigurable Accelerator Using a Novel Winograd Decomposition Algorithm for Convolutional Neural Networks. IEEE Transactions on Circuits and Systems I: Regular Papers 66, 9(2019), 3480–3493.
[64]
D. Zhang, N. Jayasena, A. Lyashevsky, J. L. Greathouse, L. Xu, and M. Ignatowski. 2014. TOP-PIM: Throughput-Oriented Programmable Processing in Memory. In Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed Computing (Vancouver, BC, Canada) (HPDC ’14). Association for Computing Machinery, New York, NY, USA, 85–98. https://doi.org/10.1145/2600212.2600213
[65]
Z. Zhang, J. Chen, X. Si, Y. Tu, J. Su, W. Huang, J. Wang, W. Wei, Y. Chiu, J. Hong, S. Sheu, S. Li, R. Liu, C. Hsieh, K. Tang, and M. Chang. 2019. A 55nm 1-to-8 bit Configurable 6T SRAM based Computing-in-Memory Unit-Macro for CNN-based AI Edge Processors. In 2019 IEEE Asian Solid-State Circuits Conference (A-SSCC). 217–218.
[66]
W. Zhao and Y. Cao. 2007. Predictive Technology Model for Nano-CMOS Design Exploration. J. Emerg. Technol. Comput. Syst. 3, 1 (April 2007), 1–es. https://doi.org/10.1145/1229175.1229176
[67]
S. Zhou, Y. Wu, Z. Ni, X. Zhou, H. Wen, and Y. Zou. 2016. DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients. arxiv:1606.06160 [cs.NE]

Cited By

View all
  • (2024)PipePIM: Maximizing Computing Unit Utilization in ML-Oriented Digital PIM by Pipelining and Dual BufferingIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.341084243:12(4585-4598)Online publication date: Dec-2024
  • (2024)HiFi-DRAM: Enabling High-fidelity DRAM Research by Uncovering Sense Amplifiers with IC Imaging2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00020(133-149)Online publication date: 29-Jun-2024
  • (2023)Fusing In-storage and Near-storage Acceleration of Convolutional Neural NetworksACM Journal on Emerging Technologies in Computing Systems10.1145/359749620:1(1-22)Online publication date: 14-Nov-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
MEMSYS '20: Proceedings of the International Symposium on Memory Systems
September 2020
362 pages
ISBN:9781450388993
DOI:10.1145/3422575
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 March 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. DRAM
  2. Neural Networks
  3. PIM architecture
  4. Quantized CNN

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

MEMSYS 2020
MEMSYS 2020: The International Symposium on Memory Systems
September 28 - October 1, 2020
DC, Washington, USA

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)50
  • Downloads (Last 6 weeks)6
Reflects downloads up to 12 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)PipePIM: Maximizing Computing Unit Utilization in ML-Oriented Digital PIM by Pipelining and Dual BufferingIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.341084243:12(4585-4598)Online publication date: Dec-2024
  • (2024)HiFi-DRAM: Enabling High-fidelity DRAM Research by Uncovering Sense Amplifiers with IC Imaging2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00020(133-149)Online publication date: 29-Jun-2024
  • (2023)Fusing In-storage and Near-storage Acceleration of Convolutional Neural NetworksACM Journal on Emerging Technologies in Computing Systems10.1145/359749620:1(1-22)Online publication date: 14-Nov-2023
  • (2023)HAW: Hardware-Aware Point Selection for Efficient Winograd ConvolutionIEEE Signal Processing Letters10.1109/LSP.2023.325886330(269-273)Online publication date: 2023
  • (2023)An Efficient Accelerator on FPGA for Large Convolution and Correlation using Winograd2023 8th International Conference on Integrated Circuits and Microsystems (ICICM)10.1109/ICICM59499.2023.10365973(629-636)Online publication date: 20-Oct-2023
  • (2022)FeFET versus DRAM based PIM Architectures: A Comparative Study2022 IFIP/IEEE 30th International Conference on Very Large Scale Integration (VLSI-SoC)10.1109/VLSI-SoC54400.2022.9939629(1-6)Online publication date: 3-Oct-2022
  • (2022)A Weighted Current Summation Based Mixed Signal DRAM-PIM Architecture for Deep Neural Network InferenceIEEE Journal on Emerging and Selected Topics in Circuits and Systems10.1109/JETCAS.2022.317023512:2(367-380)Online publication date: Jun-2022
  • (2022)Optimization of DRAM based PIM Architecture for Energy-Efficient Deep Neural Network Training2022 IEEE International Symposium on Circuits and Systems (ISCAS)10.1109/ISCAS48785.2022.9937832(1472-1476)Online publication date: 28-May-2022
  • (2022)Achieving the Performance of All-Bank In-DRAM PIM With Standard Memory Interface: Memory-Computation DecouplingIEEE Access10.1109/ACCESS.2022.320305110(93256-93272)Online publication date: 2022

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media