Hu et al., 2022 - Google Patents
A survey on convolutional neural network accelerators: GPU, FPGA and ASICHu et al., 2022
- Document ID
- 11314307987561233300
- Author
- Hu Y
- Liu Y
- Liu Z
- Publication year
- Publication venue
- 2022 14th International Conference on Computer Research and Development (ICCRD)
External Links
Snippet
In recent years, artificial intelligence (AI) has been under rapid development, applied in various areas. Among a vast number of neural network (NN) models, the convolutional neural network (CNN) has a mainstream status in application such as image and sound …
- 230000001537 neural 0 title abstract description 22
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored programme computers
- G06F15/80—Architectures of general purpose stored programme computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
- G06F15/8007—Architectures of general purpose stored programme computers comprising an array of processing units with common control, e.g. single instruction multiple data processors single instruction multiple data [SIMD] multiprocessors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/52—Multiplying; Dividing
- G06F7/523—Multiplying only
- G06F7/53—Multiplying only in parallel-parallel fashion, i.e. both operands being entered in parallel
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/50—Adding; Subtracting
- G06F7/505—Adding; Subtracting in bit-parallel fashion, i.e. having a different digit-handling circuit for each denomination
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/16—Combinations of two or more digital computers each having at least an arithmetic unit, a programme unit and a register, e.g. for a simultaneous processing of several programmes
- G06F15/163—Interprocessor communication
- G06F15/173—Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
- G06F15/17356—Indirect interconnection networks
- G06F15/17368—Indirect interconnection networks non hierarchical topologies
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/50—Computer-aided design
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/02—Computer systems based on biological models using neural network models
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored programme computers
- G06F15/78—Architectures of general purpose stored programme computers comprising a single central processing unit
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/30—Arrangements for executing machine-instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F2207/00—Indexing scheme relating to methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F2207/38—Indexing scheme relating to groups G06F7/38 - G06F7/575
- G06F2207/3804—Details
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/12—Computer systems based on biological models using genetic models
- G06N3/126—Genetic algorithms, i.e. information processing using digital simulations of the genetic system
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F1/00—Details of data-processing equipment not covered by groups G06F3/00 - G06F13/00, e.g. cooling, packaging or power supply specially adapted for computer application
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Hu et al. | A survey on convolutional neural network accelerators: GPU, FPGA and ASIC | |
Jouppi et al. | A domain-specific supercomputer for training deep neural networks | |
Boutros et al. | You cannot improve what you do not measure: FPGA vs. ASIC efficiency gaps for convolutional neural network inference | |
Zhou et al. | Transpim: A memory-based acceleration via software-hardware co-design for transformer | |
Hashemi et al. | Understanding the impact of precision quantization on the accuracy and energy of neural networks | |
Zaman et al. | Custom hardware architectures for deep learning on portable devices: a review | |
Venkataramanaiah et al. | Automatic compiler based FPGA accelerator for CNN training | |
Welser et al. | Future computing hardware for AI | |
Bavikadi et al. | A survey on machine learning accelerators and evolutionary hardware platforms | |
Deng et al. | PermCNN: Energy-efficient convolutional neural network hardware architecture with permuted diagonal structure | |
Noh et al. | FlexBlock: A flexible DNN training accelerator with multi-mode block floating point support | |
Sun et al. | A high-performance accelerator for large-scale convolutional neural networks | |
He et al. | A survey to predict the trend of AI-able server evolution in the cloud | |
Heo et al. | T-PIM: An energy-efficient processing-in-memory accelerator for end-to-end on-device training | |
Morad et al. | Efficient dense and sparse matrix multiplication on GP-SIMD | |
Lou et al. | RV-CNN: Flexible and efficient instruction set for CNNs based on RISC-V processors | |
Que et al. | A reconfigurable multithreaded accelerator for recurrent neural networks | |
Jiang et al. | A network-on-chip-based annealing processing architecture for large-scale fully connected ising model | |
Kim et al. | A low-power graph convolutional network processor with sparse grouping for 3d point cloud semantic segmentation in mobile devices | |
Tiwari et al. | Design and implementation of rough set algorithms on FPGA: A survey | |
Han et al. | EGCN: An efficient GCN accelerator for minimizing off-chip memory access | |
Bavikadi et al. | Heterogeneous multi-functional look-up-table-based processing-in-memory architecture for deep learning acceleration | |
Sharma et al. | A Heterogeneous Chiplet Architecture for Accelerating End-to-End Transformer Models | |
Lu et al. | A reconfigurable DNN training accelerator on FPGA | |
Bavikadi et al. | Reconfigurable Processing-in-Memory Architecture for Data Intensive Applications |