Yazdani et al., 2019 - Google Patents
LSTM-sharp: An adaptable, energy-efficient hardware accelerator for long short-term memoryYazdani et al., 2019
View PDF- Document ID
- 13607874219919733697
- Author
- Yazdani R
- Ruwase O
- Zhang M
- He Y
- Arnau J
- González A
- Publication year
- Publication venue
- arXiv preprint arXiv:1911.01258
External Links
Snippet
The effectiveness of LSTM neural networks for popular tasks such as Automatic Speech Recognition has fostered an increasing interest in LSTM inference acceleration. Due to the recurrent nature and data dependencies of LSTM computations, designing a customized …
- 230000006403 short-term memory 0 title description 2
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5061—Partitioning or combining of resources
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3457—Performance evaluation by simulation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3442—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation; Recording or statistical evaluation of user activity, e.g. usability assessment for planning or managing the needed capacity
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/50—Computer-aided design
- G06F17/5009—Computer-aided design using simulation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3409—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/30—Arrangements for executing machine-instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3885—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units
- G06F9/3889—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units controlled by multiple instructions, e.g. MIMD, decoupled access or execute
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/44—Arrangements for executing specific programmes
- G06F9/455—Emulation; Software simulation, i.e. virtualisation or emulation of application or operating system execution engines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored programme computers
- G06F15/78—Architectures of general purpose stored programme computers comprising a single central processing unit
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F1/00—Details of data-processing equipment not covered by groups G06F3/00 - G06F13/00, e.g. cooling, packaging or power supply specially adapted for computer application
- G06F1/26—Power supply means, e.g. regulation thereof
- G06F1/32—Means for saving power
- G06F1/3203—Power Management, i.e. event-based initiation of power-saving mode
- G06F1/3234—Action, measure or step performed to reduce power consumption
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/02—Computer systems based on biological models using neural network models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F2217/00—Indexing scheme relating to computer aided design [CAD]
- G06F2217/68—Processors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F2201/00—Indexing scheme relating to error detection, to error correction, and to monitoring
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F2217/00—Indexing scheme relating to computer aided design [CAD]
- G06F2217/78—Power analysis and optimization
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Fowers et al. | A configurable cloud-scale DNN processor for real-time AI | |
Park et al. | Scale-out acceleration for machine learning | |
Yazdani et al. | LSTM-sharp: An adaptable, energy-efficient hardware accelerator for long short-term memory | |
You et al. | Fast deep neural network training on distributed systems and cloud TPUs | |
Jaiswal et al. | FPGA-based high-performance and scalable block LU decomposition architecture | |
Ambrosi et al. | Hardware-software co-design for an analog-digital accelerator for machine learning | |
Corral et al. | Execution of a parallel edge-based Navier–Stokes solver on commodity graphics processor units | |
CN116384312B (en) | Circuit yield analysis method based on parallel heterogeneous computation | |
Sun et al. | An I/O bandwidth-sensitive sparse matrix-vector multiplication engine on FPGAs | |
Torabzadehkashi et al. | Accelerating hpc applications using computational storage devices | |
Matthews et al. | MosaicSim: A lightweight, modular simulator for heterogeneous systems | |
Cho et al. | FARNN: FPGA-GPU hybrid acceleration platform for recurrent neural networks | |
Stathis et al. | eBrainII: a 3 kW realtime custom 3D DRAM integrated ASIC implementation of a biologically plausible model of a human scale cortex | |
Drumond et al. | Equinox: Training (for free) on a custom inference accelerator | |
Scheffler et al. | Sparse stream semantic registers: A lightweight ISA extension accelerating general sparse linear algebra | |
Sanaullah et al. | Application aware tuning of reconfigurable multi-layer perceptron architectures | |
Merchant et al. | Andromeda: An fpga based risc-v mpsoc exploration framework | |
Hotfilter et al. | FLECSim-SoC: A flexible end-to-end co-design simulation framework for system on chips | |
Aminabadi et al. | SHARP: An adaptable, energy-efficient accelerator for recurrent neural networks | |
Lopes et al. | Generating optimized multicore accelerator architectures | |
Yazdani et al. | SHARP: An Adaptable, Energy-Efficient Accelerator for Recurrent Neural Network | |
Yi et al. | Hardware-software codesign of a CNN accelerator | |
Ou et al. | A case for MVPs: mixed-precision vector processors | |
Dey et al. | An application specific processor architecture with 3D integration for recurrent neural networks | |
Faber et al. | Efficient parallel execution of genetic algorithms on Epiphany manycore processor |