[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3123939.3124552acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
research-article
Public Access

CirCNN: accelerating and compressing deep neural networks using block-circulant weight matrices

Published: 14 October 2017 Publication History

Abstract

Large-scale deep neural networks (DNNs) are both compute and memory intensive. As the size of DNNs continues to grow, it is critical to improve the energy efficiency and performance while maintaining accuracy. For DNNs, the model size is an important factor affecting performance, scalability and energy efficiency. Weight pruning achieves good compression ratios but suffers from three drawbacks: 1) the irregular network structure after pruning, which affects performance and throughput; 2) the increased training complexity; and 3) the lack of rigirous guarantee of compression ratio and inference accuracy.
To overcome these limitations, this paper proposes CirCNN, a principled approach to represent weights and process neural networks using block-circulant matrices. CirCNN utilizes the Fast Fourier Transform (FFT)-based fast multiplication, simultaneously reducing the computational complexity (both in inference and training) from O(n2) to O(n log n) and the storage complexity from O(n2) to O(n), with negligible accuracy loss. Compared to other approaches, CirCNN is distinct due to its mathematical rigor: the DNNs based on CirCNN can converge to the same "effectiveness" as DNNs without compression. We propose the CirCNN architecture, a universal DNN inference engine that can be implemented in various hardware/software platforms with configurable network architecture (e.g., layer type, size, scales, etc.). In CirCNN architecture: 1) Due to the recursive property, FFT can be used as the key computing kernel, which ensures universal and small-footprint implementations. 2) The compressed but regular network structure avoids the pitfalls of the network pruning and facilitates high performance and throughput with highly pipelined and parallel design. To demonstrate the performance and energy efficiency, we test CirCNN in FPGA, ASIC and embedded processors. Our results show that CirCNN architecture achieves very high energy efficiency and performance with a small hardware footprint. Based on the FPGA implementation and ASIC synthesis results, CirCNN achieves 6 - 102X energy efficiency improvements compared with the best state-of-the-art results.

References

[1]
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, "Imagenet: A large-scale hierarchical image database," in Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pp. 248--255, IEEE, 2009.
[2]
Y. Taigman, M. Yang, M. Ranzato, and L. Wolf, "Deepface: Closing the gap to human-level performance in face verification," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1701--1708, 2014.
[3]
B. Huval, T. Wang, S. Tandon, J. Kiske, W. Song, J. Pazhayampallil, M. Andriluka, P. Rajpurkar, T. Migimatsu, R. Cheng-Yue, et al., "An empirical evaluation of deep learning on highway driving," arXiv preprint arXiv:1504.01716, 2015.
[4]
R. Collobert and J. Weston, "A unified architecture for natural language processing: Deep neural networks with multitask learning," in Proceedings of the 25th international conference on Machine learning, pp. 160--167, ACM, 2008.
[5]
R. Burbidge, M. Trotter, B. Buxton, and S. Holden, "Drug design by machine learning: support vector machines for pharmaceutical data analysis," Computers & chemistry, vol. 26, no. 1, pp. 5--14, 2001.
[6]
A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," in Advances in neural information processing systems, pp. 1097--1105, 2012.
[7]
A. Karpathy and L. Fei-Fei, "Deep visual-semantic alignments for generating image descriptions," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3128--3137, 2015.
[8]
B. Catanzaro, "Deep learning with cots hpc systems," 2013.
[9]
K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," arXiv preprint arXiv:1409.1556, 2014.
[10]
Q. V. Le, "Building high-level features using large scale unsupervised learning," in Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on, pp. 8595--8598, IEEE, 2013.
[11]
D. Ciregan, U. Meier, and J. Schmidhuber, "Multi-column deep neural networks for image classification," in Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pp. 3642--3649, IEEE, 2012.
[12]
J. Schmidhuber, "Deep learning in neural networks: An overview," Neural networks, vol. 61, pp. 85--117, 2015.
[13]
N. Suda, V. Chandra, G. Dasika, A. Mohanty, Y. Ma, S. Vrudhula, J.-s. Seo, and Y. Cao, "Throughput-optimized opencl-based fpga accelerator for large-scale convolutional neural networks," in Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 16--25, ACM, 2016.
[14]
J. Qiu, J. Wang, S. Yao, K. Guo, B. Li, E. Zhou, J. Yu, T. Tang, N. Xu, S. Song, et al., "Going deeper with embedded fpga platform for convolutional neural network," in Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 26--35, ACM, 2016.
[15]
C. Zhang, Z. Fang, P. Zhou, P. Pan, and J. Cong, "Caffeine: towards uniformed representation and acceleration for deep convolutional neural networks," in Proceedings of the 35th International Conference on Computer-Aided Design, p. 12, ACM, 2016.
[16]
C. Zhang, D. Wu, J. Sun, G. Sun, G. Luo, and J. Cong, "Energy-efficient cnn implementation on a deeply pipelined fpga cluster," in Proceedings of the 2016 International Symposium on Low Power Electronics and Design, pp. 326--331, ACM, 2016.
[17]
D. Mahajan, J. Park, E. Amaro, H. Sharma, A. Yazdanbakhsh, J. K. Kim, and H. Esmaeilzadeh, "Tabla: A unified template-based framework for accelerating statistical machine learning," in High Performance Computer Architecture (HPCA), 2016 IEEE International Symposium on, pp. 14--26, IEEE, 2016.
[18]
R. Zhao, W. Song, W. Zhang, T. Xing, J.-H. Lin, M. Srivastava, R. Gupta, and Z. Zhang, "Accelerating binarized convolutional neural networks with software-programmable fpgas," in Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 15--24, ACM, 2017.
[19]
Y. Umuroglu, N. J. Fraser, G. Gambardella, M. Blott, P. Leong, M. Jahre, and K. Vissers, "Finn: A framework for fast, scalable binarized neural network inference," in Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 65--74, ACM, 2017.
[20]
S. Han, J. Kang, H. Mao, Y. Hu, X. Li, Y. Li, D. Xie, H. Luo, S. Yao, Y. Wang, et al., "Ese: Efficient speech recognition engine with sparse lstm on fpga," in Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 75--84, ACM, 2017.
[21]
http://www.techradar.com/news/computing-components/processors/google-s-tensor-processing-unit-explained-this-is-what-the-future-of-\computing-looks-like-1326915.
[22]
https://www.sdxcentral.com/articles/news/intels-deep-learning-chips-will-arrive-2017/2016/11/.
[23]
Y.-H. Chen, T. Krishna, J. S. Emer, and V. Sze, "Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks," IEEE Journal of Solid-State Circuits, vol. 52, no. 1, pp. 127--138, 2017.
[24]
S. Han, X. Liu, H. Mao, J. Pu, A. Pedram, M. A. Horowitz, and W. J. Dally, "Eie: efficient inference engine on compressed deep neural network," in Proceedings of the 43rd International Symposium on Computer Architecture, pp. 243--254, IEEE Press, 2016.
[25]
T. Chen, Z. Du, N. Sun, J. Wang, C. Wu, Y. Chen, and O. Temam, "Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning," in ACM Sigplan Notices, vol. 49, pp. 269--284, ACM, 2014.
[26]
Y. Chen, T. Luo, S. Liu, S. Zhang, L. He, J. Wang, L. Li, T. Chen, Z. Xu, N. Sun, et al., "Dadiannao: A machine-learning supercomputer," in Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 609--622, IEEE Computer Society, 2014.
[27]
Z. Du, R. Fasthuber, T. Chen, P. Ienne, L. Li, T. Luo, X. Feng, Y. Chen, and O. Temam, "Shidiannao: Shifting vision processing closer to the sensor," in ACM SIGARCH Computer Architecture News, vol. 43, pp. 92--104, ACM, 2015.
[28]
B. Reagen, P. Whatmough, R. Adolf, S. Rama, H. Lee, S. K. Lee, J. M. Hernández-Lobato, G.-Y. Wei, and D. Brooks, "Minerva: Enabling low-power, highly-accurate deep neural network accelerators," in Proceedings of the 43rd International Symposium on Computer Architecture, pp. 267--278, IEEE Press, 2016.
[29]
G. Desoli, N. Chawla, T. Boesch, S.-p. Singh, E. Guidetti, F. De Ambroggi, T. Majo, P. Zambotti, M. Ayodhyawasi, H. Singh, et al., "14.1 a 2.9 tops/w deep convolutional neural network soc in fd-soi 28nm for intelligent embedded systems," in Solid-State Circuits Conference (ISSCC), 2017 IEEE International, pp. 238--239, IEEE, 2017.
[30]
B. Moons, R. Uytterhoeven, W. Dehaene, and M. Verhelst, "14.5 envision: A 0.26-to-10tops/w subword-parallel dynamic-voltage-accuracy-frequency-scalable convolutional neural network processor in 28nm fdsoi," in Solid-State Circuits Conference (ISSCC), 2017 IEEE International, pp. 246--247, IEEE, 2017.
[31]
J. Sim, J. Park, M. Kim, D. Bae, Y. Choi, and L. Kim, "A 1.42 tops/w deep convolutional neural network recognition processor for intelligent iot systems," in 2016 IEEE ISSCC, IEEE solid-state circuits society, 2016.
[32]
P. N. Whatmough, S. K. Lee, H. Lee, S. Rama, D. Brooks, and G.-Y. Wei, "14.3 a 28nm soc with a 1.2 ghz 568nj/prediction sparse deep-neural-network engine with> 0.1 timing error rate tolerance for iot applications," in Solid-State Circuits Conference (ISSCC), 2017 IEEE International, pp. 242--243, IEEE, 2017.
[33]
S. Bang, J. Wang, Z. Li, C. Gao, Y. Kim, Q. Dong, Y.-P. Chen, L. Fick, X. Sun, R. Dreslinski, et al., "14.7 a 288μw programmable deep-learning processor with 270kb on-chip weight storage using non-uniform memory hierarchy for mobile intelligence," in Solid-State Circuits Conference (ISSCC), 2017 IEEE International, pp. 250--251, IEEE, 2017.
[34]
S. Han, J. Pool, J. Tran, and W. Dally, "Learning both weights and connections for efficient neural network," in Advances in Neural Information Processing Systems, pp. 1135--1143, 2015.
[35]
S. Han, H. Mao, and W. J. Dally, "Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding," arXiv preprint arXiv.1510.00149, 2015.
[36]
J. Wu, C. Leng, Y. Wang, Q. Hu, and J. Chen, "Quantized convolutional neural networks for mobile devices," in Computer Vision and Pattern Recognition, 2016. CVPR 2016. IEEE Conference on, 2016.
[37]
D. Lin, S. Talathi, and S. Annapureddy, "Fixed point quantization of deep convolutional networks," in International Conference on Machine Learning, pp. 2849--2858, 2016.
[38]
M. Jaderberg, A. Vedaldi, and A. Zisserman, "Speeding up convolutional neural networks with low rank expansions," arXiv preprint arXiv.1405.3866, 2014.
[39]
C. Tai, T. Xiao, Y. Zhang, X. Wang, et al., "Convolutional neural networks with low-rank regularization," arXiv preprint arXiv:1511.06067, 2015.
[40]
J. Yu, A. Lukefahr, D. Palframan, G. Dasika, R. Das, and S. Mahlke, "Scalpel: Customizing dnn pruning to the underlying hardware parallelism," in Proceedings of the 44th Annual International Symposium on Computer Architecture, pp. 548--560, ACM, 2017.
[41]
V. Pan, Structured matrices and polynomials: unified superfast algorithms. Springer Science & Business Media, 2012.
[42]
https://drive.google.com/open?id=0B19XkzlgXlwAYjVjWClKc2xSRm8.
[43]
L. Zhao, S. Liao, Y. Wang, J. Tang, and B. Yuan, "Theoretical properties for neural networks with weight matrices of low displacement rank," arXiv preprint arXiv.1703.00144, 2017.
[44]
H. Lee, R. Grosse, R. Ranganath, and A. Y. Ng, "Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations," in Proceedings of the 26th annual international conference on machine learning, pp. 609--616, ACM, 2009.
[45]
A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei, "Large-scale video classification with convolutional neural networks," in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 1725--1732, 2014.
[46]
D. Yu and L. Deng, "Deep learning and its applications to signal and information processing {exploratory dsp}," IEEE Signal Processing Magazine, vol. 28, no. 1, pp. 145--154, 2011.
[47]
Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," Proceedings of the IEEE, vol. 86, no. 11, pp. 2278--2324, 1998.
[48]
J. Xue, J. Li, and Y. Gong, "Restructuring of deep neural network acoustic models with singular value decomposition." in Interspeech, pp. 2365--2369, 2013.
[49]
B. Liu, W. Wen, Y. Chen, X. Li, C.-R. Wu, and T.-Y. Ho, "Eda challenges for memristor-crossbar based neuromorphic computing," in Proceedings of the 25th edition on Great Lakes Symposium on VLSI, pp. 185--188, ACM, 2015.
[50]
J. Chung and T. Shin, "Simplifying deep neural networks for neuromorphic architectures," in Design Automation Conference (DAC), 2016 53nd ACM/EDAC/LEEE, pp. 1--6, IEEE, 2016.
[51]
K. Hwang and W. Sung, "Fixed-point feedforward deep neural network design using weights+ 1, 0, and- 1," in Signal Processing Systems (SiPS), 2014 IEEE Workshop on, pp. 1--6, IEEE, 2014.
[52]
M. Mathieu, M. Henaff, and Y. LeCun, "Fast training of convolutional networks through ffts," arXiv preprint arXiv.1312.5851, 2013.
[53]
K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770--778, 2016.
[54]
Y. Cheng, F. X. Yu, R. S. Feris, S. Kumar, A. Choudhary, and S.-F. Chang, "An exploration of parameter redundancy in deep networks with circulant projections," in Proceedings of the IEEE International Conference on Computer Vision, pp. 2857--2865, 2015.
[55]
D. Bini, V. Pan, and W. Eberly, "Polynomial and matrix computations volume 1: Fundamental algorithms," SIAM Review, vol. 38, no. 1, pp. 161--164, 1996.
[56]
Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell, "Caffe: Convolutional architecture for fast feature embedding," in Proceedings of the 22nd ACM international conference on Multimedia, pp. 675--678, ACM, 2014.
[57]
A. Vedaldi and K. Lenc, "Matconvnet: Convolutional neural networks for matlab," in Proceedings of the 23rd ACM international conference on Multimedia, pp. 689--692, ACM, 2015.
[58]
L. Narici and E. Beckenstein, "The hahn-banach theorem: the life and times," Topology and its Applications, vol. 77, no. 2, pp. 193--211, 1997.
[59]
Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Y. Ng, "Reading digits in natural images with unsupervised feature learning," in NIPS workshop on deep learning and unsupervised feature learning, vol. 2011, p. 5, 2011.
[60]
A. Krizhevsky and G. Hinton, "Learning multiple layers of features from tiny images," 2009.
[61]
A. Coates, H. Lee, and A. Y. Ng, "An analysis of single-layer networks in unsupervised feature learning," Ann Arbor, vol. 1001, no. 48109, p. 2, 2010.
[62]
L. Deng, "The mnist database of handwritten digit images for machine learning research {best of the web}," IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 141--142, 2012.
[63]
S. A. Salehi, R. Amirfattahi, and K. K. Parhi, "Pipelined architectures for real-valued fft and hermitian-symmetric ifft with real datapaths," IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 60, no. 8, pp. 507--511, 2013.
[64]
Y.-N. Chang and K. K. Parhi, "An efficient pipelined fft architecture," IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing, vol. 50, no. 6, pp. 322--325, 2003.
[65]
C. Cheng and K. K. Parhi, "High-throughput vlsi architecture for fft computation," IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 54, no. 10, pp. 863--867, 2007.
[66]
M. Garrido, K. K. Parhi, and J. Grajal, "A pipelined fft architecture for real-valued signals," IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 56, no. 12, pp. 2634--2643, 2009.
[67]
M. Ayinala, Y. Lao, and K. K. Parhi, "An in-place fft architecture for real-valued signals," IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 60, no. 10, pp. 652--656, 2013.
[68]
M. Ayinala and K. K. Parhi, "Fft architectures for real-valued signals based on radix-$2{3}$ and radix-$2{4}$ algorithms," IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 60, no. 9, pp. 2422--2430, 2013.
[69]
A. V. Oppenheim, Discrete-time signal processing. Pearson Education India, 1999.
[70]
Altera, "Fft mega-core function user guide," Altera, San Jose, Calif USA, 2010.
[71]
P. Judd, J. Albericio, T. Hetherington, T. M. Aamodt, and A. Moshovos, "Stripes: Bit-serial deep neural network computing," in Microarchitecture (MICRO), 2016 49th Annual IEEE/ACM International Symposium on, pp. 1--12, IEEE, 2016.
[72]
https://www.altera.com/products/fpga/stratix-series/stratix-10/overview.html.
[73]
https://www.xilinx.com/products/silicon-devices/fpga/virtex-7.html.
[74]
W. Wang, P. Mishra, and S. Ranka, "Dynamic cache reconfiguration and partitioning for energy optimization in real-time multi-core systems," in Proceedings of the 48th Design Automation Conference, pp. 948--953, ACM, 2011.
[75]
A. Gordon-Ross, J. Lau, and B. Calder, "Phase-based cache reconfiguration for a highly-configurable two-level cache hierarchy," in Proceedings of the 18th ACM Great Lakes symposium on VLSI, pp. 379--382, ACM, 2008.
[76]
N. P. Jouppi, "Improving direct-mapped cache performance by the addition of a small fully-associative cache prefetch buffers," in 25 years of the international symposia on Computer architecture (selected papers), pp. 388--397, ACM, 1998.
[77]
J. Dundas and T. Mudge, "Improving data cache performance by pre-executing instructions under a cache miss," in Proceedings of the 11th international conference on Supercomputing, pp. 68--75, ACM, 1997.
[78]
N. Weste, D. Harris, and A. Banerjee, "Cmos vlsi design," A circuits and systems perspective, vol. 11, p. 739, 2005.
[79]
S. K. Esser, P. A. Merolla, J. V. Arthur, A.S. Cassidy, R. Appuswamy, A. Andreopoulos, D. J. Berg, J. L. McKinstry, T. Melano, D. R. Barch, et al., "Convolutional networks for fast, energy-efficient neuromorphic computing," Proceedings of the National Academy of Sciences, p. 201604850, 2016.
[80]
S. K. Esser, R. Appuswamy, P. Merolla, J. V. Arthur, and D. S. Modha, "Backpropagation for energy-efficient neuromorphic computing," in Advances in Neural Information Processing Systems, pp. 1117--1125, 2015.
[81]
http://www.nangate.com/?page_id=22.
[82]
http://quid.hpl.hp.com:9081/cacti/.

Cited By

View all
  • (2024)Photonic-Electronic Integrated Circuits for High-Performance Computing and AI AcceleratorsJournal of Lightwave Technology10.1109/JLT.2024.342771642:22(7834-7859)Online publication date: 15-Nov-2024
  • (2024)Compositional Kronecker Context Optimization for vision–language modelsNeurocomputing10.1016/j.neucom.2024.128421608(128421)Online publication date: Dec-2024
  • (2024)Decoupling semantic and localization for semantic segmentation via magnitude-aware and phase-sensitive learningInformation Fusion10.1016/j.inffus.2024.102314107(102314)Online publication date: Jul-2024
  • Show More Cited By

Index Terms

  1. CirCNN: accelerating and compressing deep neural networks using block-circulant weight matrices

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MICRO-50 '17: Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture
    October 2017
    850 pages
    ISBN:9781450349529
    DOI:10.1145/3123939
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 14 October 2017

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. FPGA
    2. acceleration
    3. block-circulant matrix
    4. compression
    5. deep learning

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    MICRO-50
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 484 of 2,242 submissions, 22%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)563
    • Downloads (Last 6 weeks)74
    Reflects downloads up to 05 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Photonic-Electronic Integrated Circuits for High-Performance Computing and AI AcceleratorsJournal of Lightwave Technology10.1109/JLT.2024.342771642:22(7834-7859)Online publication date: 15-Nov-2024
    • (2024)Compositional Kronecker Context Optimization for vision–language modelsNeurocomputing10.1016/j.neucom.2024.128421608(128421)Online publication date: Dec-2024
    • (2024)Decoupling semantic and localization for semantic segmentation via magnitude-aware and phase-sensitive learningInformation Fusion10.1016/j.inffus.2024.102314107(102314)Online publication date: Jul-2024
    • (2024)TANet: Transmission and atmospheric light driven enhancement of underwater imagesExpert Systems with Applications10.1016/j.eswa.2023.122693242(122693)Online publication date: May-2024
    • (2024)A light-weight skeleton human action recognition model with knowledge distillation for edge intelligent surveillance applicationsApplied Soft Computing10.1016/j.asoc.2023.111166151(111166)Online publication date: Jan-2024
    • (2024)Fully invertible hyperbolic neural networks for segmenting large-scale surface and sub-surface dataArtificial Intelligence in Geosciences10.1016/j.aiig.2024.1000875(100087)Online publication date: Dec-2024
    • (2024)Optimized Neural Network Processor Based on Frequency-Domain Compression AlgorithmHigh Energy Efficiency Neural Network Processor with Combined Digital and Computing-in-Memory Architecture10.1007/978-981-97-3477-1_4(47-70)Online publication date: 1-Aug-2024
    • (2024)IntroductionHigh Energy Efficiency Neural Network Processor with Combined Digital and Computing-in-Memory Architecture10.1007/978-981-97-3477-1_1(1-12)Online publication date: 1-Aug-2024
    • (2024)Spectral-Blaze: A High-Performance FFT-Based CNN AcceleratorApplied Reconfigurable Computing. Architectures, Tools, and Applications10.1007/978-3-031-55673-9_16(222-238)Online publication date: 10-Mar-2024
    • (2024)An algorithm/hardware co‐optimized method to accelerate CNNs with compressed convolutional weights on FPGAConcurrency and Computation: Practice and Experience10.1002/cpe.801136:11Online publication date: 6-Jan-2024
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media