Umuroglu et al., 2018 - Google Patents

Bismo: A scalable bit-serial matrix multiplication overlay for reconfigurable computing

Umuroglu et al., 2018

Document ID: 17444248641600969961
Author: Umuroglu Y; Rasnayake L; Själander M
Publication year: 2018
Publication venue: 2018 28th International Conference on Field Programmable Logic and Applications (FPL)

External Links

Cited by

Snippet

Matrix-matrix multiplication is a key computational kernel for numerous applications in science and engineering, with ample parallelism and data locality that lends itself well to high-performance implementations. Many matrix multiplication-dependent applications can …

Continue reading at arxiv.org (PDF) (other versions)

239000011159 matrix material 0 title abstract description 108

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/52—Multiplying; Dividing
- G06F7/523—Multiplying only
- G06F7/53—Multiplying only in parallel-parallel fashion, i.e. both operands being entered in parallel
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/544—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices for evaluating functions by calculation
- G06F7/5443—Sum of products
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/30—Arrangements for executing machine-instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3802—Instruction prefetching
- G06F9/3808—Instruction prefetching for instruction reuse, e.g. trace cache, branch target cache
- G06F9/381—Loop buffering
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/14—Fourier, Walsh or analogous domain transformations, e.g. Laplace, Hilbert, Karhunen-Loeve, transforms
- G06F17/141—Discrete Fourier transforms
- G06F17/142—Fast Fourier transforms, e.g. using a Cooley-Tukey type algorithm
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/30—Arrangements for executing machine-instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored programme computers
- G06F15/78—Architectures of general purpose stored programme computers comprising a single central processing unit
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored programme computers
- G06F15/80—Architectures of general purpose stored programme computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F2207/00—Indexing scheme relating to methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F2207/38—Indexing scheme relating to groups G06F7/38 - G06F7/575
- G06F2207/3804—Details
- G06F2207/386—Special constructional features
- G06F2207/388—Skewing
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/50—Computer-aided design
- G06F17/5009—Computer-aided design using simulation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/11—Complex mathematical operations for solving equations, e.g. nonlinear equations, general mathematical optimization problems
- G06F17/12—Simultaneous equations, e.g. systems of linear equations
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/60—Methods or arrangements for performing computations using a digital non-denominational number representation, i.e. number representation without radix; Computing devices using combinations of denominational and non-denominational quantity representations, e.g. using difunction pulse trains, STEELE computers, phase computers
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F1/00—Details of data-processing equipment not covered by groups G06F3/00 - G06F13/00, e.g. cooling, packaging or power supply specially adapted for computer application
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F2217/00—Indexing scheme relating to computer aided design [CAD]
- G06F2217/68—Processors
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements

Similar Documents

Publication	Publication Date	Title
Umuroglu et al.	2018	Bismo: A scalable bit-serial matrix multiplication overlay for reconfigurable computing
JP7374236B2 (en)	2023-11-06	accelerated math engine
CN107704916B (en)	2020-07-03	Hardware accelerator and method for realizing RNN neural network based on FPGA
Khan et al.	2021	NPE: An FPGA-based overlay processor for natural language processing
US10698657B2 (en)	2020-06-30	Hardware accelerator for compressed RNN on FPGA
CN107229967B (en)	2021-06-15	Hardware accelerator and method for realizing sparse GRU neural network based on FPGA
Jaiswal et al.	2011	FPGA-based high-performance and scalable block LU decomposition architecture
US20180046901A1 (en)	2018-02-15	Hardware accelerator for compressed gru on fpga
Umuroglu et al.	2019	Optimizing bit-serial matrix multiplication for reconfigurable computing
Kumar et al.	2010	FPGA based high performance double-precision matrix multiplication
CN109144469B (en)	2023-12-05	Pipeline structure neural network matrix operation architecture and method
CN111506520B (en)	2020-09-22	Address generation method, related device and storage medium
CN116301920B (en)	2023-11-07	Compiling system for deploying CNN model to high-performance accelerator based on FPGA
Que et al.	2021	Recurrent neural networks with column-wise matrix–vector multiplication on FPGAs
Cho et al.	2021	FARNN: FPGA-GPU hybrid acceleration platform for recurrent neural networks
Jain et al.	2022	Enabling real-time object detection on low cost FPGAs
US8825730B1 (en)	2014-09-02	Matrix decomposition using dataflow techniques
Shivapakash et al.	2021	A power efficiency enhancements of a multi-bit accelerator for memory prohibitive deep neural networks
Bytyn et al.	2020	Convaix: An application-specific instruction-set processor for the efficient acceleration of cnns
Devic et al.	2021	Highly-adaptive mixed-precision MAC unit for smart and low-power edge computing
Sayahi et al.	2018	FPGA implementation of matrix-vector multiplication using xilinx system generator
Boland et al.	2011	Optimizing memory bandwidth use and performance for matrix-vector multiplication in iterative methods
Jin et al.	2013	Extending the BEAGLE library to a multi-FPGA platform
Zhu et al.	2012	An FPGA-based acceleration platform for auction algorithm
Rousseaux et al.	2007	A high performance FPGA-based accelerator for BLAS library implementation