CN107329936A - A kind of apparatus and method for performing neural network computing and matrix/vector computing - Google Patents
A kind of apparatus and method for performing neural network computing and matrix/vector computing Download PDFInfo
- Publication number
- CN107329936A CN107329936A CN201610281291.5A CN201610281291A CN107329936A CN 107329936 A CN107329936 A CN 107329936A CN 201610281291 A CN201610281291 A CN 201610281291A CN 107329936 A CN107329936 A CN 107329936A
- Authority
- CN
- China
- Prior art keywords
- instruction
- matrix
- vector
- computing
- neural network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
- G06F9/3001—Arithmetic instructions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30098—Register arrangements
- G06F9/3012—Organisation of register space, e.g. banked or distributed register file
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30098—Register arrangements
- G06F9/3012—Organisation of register space, e.g. banked or distributed register file
- G06F9/30134—Register stacks; shift registers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30145—Instruction analysis, e.g. decoding, instruction word fields
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Biomedical Technology (AREA)
- Mathematical Analysis (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Optimization (AREA)
- Biophysics (AREA)
- Computational Mathematics (AREA)
- Computing Systems (AREA)
- Pure & Applied Mathematics (AREA)
- Computational Linguistics (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Artificial Intelligence (AREA)
- Neurology (AREA)
- Algebra (AREA)
- Databases & Information Systems (AREA)
- Complex Calculations (AREA)
- Advance Control (AREA)
Abstract
A kind of apparatus and method for performing neural network computing and matrix/vector computing, the device includes memory cell, register cell, control unit, arithmetic element and scratchpad, neuron/matrix/vector the data for participating in calculating are temporarily stored in scratchpad, so that more flexibly can effectively support the data of different in width in calculating process, the execution performance of calculating task is lifted.The neural network computing and matrix/vector computing module that the present invention is customized can more efficiently realize various neural network computings and matrix/vector computing, lift the execution performance of calculating task;In addition, form of the instruction with very long instruction word that the present invention is used.
Description
Technical field
The present invention relates to neural network computing technical field, relate more specifically to a kind of for performing nerve
Network operations and the apparatus and method of matrix/vector computing.
Background technology
Artificial neural network (ANNs), abbreviation neutral net (NNs) is a kind of imitation animal god
Through network behavior feature, the algorithm mathematics model of distributed parallel information processing is carried out.This network according to
By the complexity of system, by adjusting the relation being connected with each other between internal great deal of nodes, so as to reach
To the purpose of processing information.At present, neutral net is equal in many fields such as intelligent control, machine learning
Obtain tremendous development.Because neutral net belongs to algorithm mathematics model, it is related to substantial amounts of mathematical operation,
Therefore the problem of neural network computing is current in the urgent need to address how is quickly and accurately performed.
The content of the invention
In view of this, it is an object of the invention to provide one kind perform neural network computing and matrix/
The apparatus and method of vector operation, to realize efficient neural network computing and matrix/vector computing.
To achieve these goals, as one aspect of the present invention, it is used for the invention provides one kind
Perform the device of neural network computing and matrix/vector computing, including memory cell, register cell,
Control unit, arithmetic element and scratchpad, wherein:
Memory cell, for storing neuron/matrix/vector;
Register cell, for storing neuron address/matrix address/vector address, wherein the god
It is the address that neuron is stored in the memory cell through first address, the matrix address is that matrix exists
The address stored in the memory cell, the vector address stores for vector in the memory cell
Address;
Control unit, for performing decoded operation, unit module is controlled according to instruction is read;
Arithmetic element, for according to instruction from the register cell with obtaining neuron address/matrix
Location/vector address, according to the neuron address/matrix address/vector address in the memory cell
Obtain corresponding neuron/matrix/vector, and according to the neuron/matrix/vector thus obtained and/
Or the data carried in instruction carry out computing, obtain operation result;
Characterized in that, neuron/matrix/vector the data for participating in the arithmetic element calculating are kept in
In scratchpad, the arithmetic element is read from the scratchpad when needed
Take.
Wherein, the scratchpad can support different size of neuron/matrix/vector number
According to.
Wherein, the register cell is that there is provided the scalar needed for calculating process for scalar register heap
Register.
Wherein, the arithmetic element includes vector multiplication part, cumulative part and scalar multiplication part;
And
The arithmetic element is responsible for neutral net/matrix/vector computing of device, including convolutional Neural net
Network forward operation operation, convolutional neural networks training operation, neutral net Pooling arithmetic operations,
Full connection neutral nets forward operation is operated, full connection neural metwork trainings are operated,
Batch normalization arithmetic operations, the operation of RBM neural network computings, Matrix-Vector multiplication
Operation, matrix-matrix plus/minus arithmetic operation, Outer Product of Vectors arithmetic operation, inner product of vectors arithmetic operation,
Vectorial arithmetic operation, vector logic arithmetic operation, vector surmount function arithmetic operation, vector ratio
Compared with arithmetic operation, ask vector maximum/minimum operation operation, vector circulant shift operation operation, generation
Obey the random vector arithmetic operation being necessarily distributed.
Wherein, described device also includes instruction cache unit, the pending operational order for storing;
The instruction cache unit is preferably the caching that reorders;And
Described device also includes instruction queue, the instruction after being decoded for order buffer, is sent to dependence and closes
It is processing unit.
Wherein, described device also includes dependence processing unit and storage queue, the dependence
Processing unit is used for before arithmetic element obtains instruction, judges the operational order and previous operational order
Whether identical neuron/matrix/vector storage address is accessed, if so, the operational order is stored
In the storage queue;Otherwise, the operational order is directly supplied to the arithmetic element, before treating
After one operational order is finished, the operational order in storage queue is supplied to the computing list
Member;The storage queue is used to store and instructs the instruction for having dependence in data before, and
After dependence is eliminated, the instruction is submitted.
Wherein, the instruction set of described device uses Load/Store structures, and the arithmetic element is not internal
Data in depositing are operated;And
The instruction set of described device is preferred to use VLIW structured, while being preferred to use fixed length instructions.
Wherein, the operational order that the arithmetic element is performed includes an at least command code and at least three is grasped
Count;Wherein, the command code is used for the function of indicating the operational order, and arithmetic element passes through identification
One or more command codes carry out different computings;The operand is used to indicate the operational order
Data message, wherein, the data message is immediate or register number.
Preferably, when the operational order is that neural network computing is instructed, the neutral net fortune
Calculating instruction includes an at least command code and 16 operands;
Preferably, when the operational order is matrix-matrix operational order, the matrix-matrix
Operational order includes an at least command code and at least four operand;
Preferably, when the operational order is vector operation instruction, the vector operation instruction bag
Include an at least command code and at least three operand;
Preferably, when the operational order is Matrix-Vector operational order, the Matrix-Vector
Operational order includes an at least command code and at least six operand.
As another aspect of the present invention, it is used to perform neutral net fortune present invention also offers one kind
Calculation and the device of matrix/vector computing, it is characterised in that including:
Fetching module, for taking out the next instruction that will be performed from command sequence, and this is referred to
Order is transmitted to decoding module;
Decoding module, instruction team is transmitted to for being instructed to described into row decoding, and by the instruction after decoding
Row;
Instruction queue, for the instruction after decoding module decoding described in order buffer, and is sent to dependence pass
It is processing unit;
Scalar register heap, is used for providing scalar register for computing;
Dependence processing unit, for judge the instruction of present instruction and previous bar with the presence or absence of data according to
The relation of relying, if there is the present instruction then is stored in into storage queue;
Storage queue, the present instruction that there is data dependence relation is instructed for caching with previous bar, when
The present instruction instructs the dependence existed to launch the present instruction after eliminating with previous bar;
Reorder caching, for being cached when instructing and performing, and judges described after having performed
Whether instruction is not to be submitted an instruction earliest in instruction in the caching that reorders, if it is
The instruction is submitted;
Arithmetic element, for performing all neural network computings and matrix/vector arithmetic operation;
Scratchpad, the neuron/matrix calculated for the temporary participation arithmetic element/to
Data are measured, the arithmetic element is read from the scratchpad when needed;The high speed
Temporary storage is preferably able to support different size of data;
IO memory access modules, for directly accessing the scratchpad, are responsible for from the height
Data are read or write in fast temporary storage.
As another aspect of the invention, present invention also offers one kind perform neural network computing with
And the method for matrix/vector instruction, it is characterised in that comprise the following steps:
Step S1, fetching module takes out a neural network computing and matrix/vector instruction, and will
Decoding module is sent in the instruction;
Step S2, decoding module is sent to instruction queue to the Instruction decoding, and by the instruction;
Step S3, in decoding module, the instruction is sent to instruction receiving module;
The instruction is sent to microcommand decoding module by step S4, instruction receiving module, carries out micro- refer to
Order decoding;
Step S5, microcommand decoding module obtains the neutral net of the instruction in scalar register heap
Arithmetic operation code and neural network computing operand, while by the Instruction decoding into control each function
The microcommand of part, is sent to microcommand transmitting queue;
Step S6, after the data needed are obtained, the instruction is sent to dependence processing unit;
The dependence processing unit analysis instruction and the instruction that has had not carried out before in data whether
There is dependence, if it is present it is described instruction need waited until in storage queue its with before
Untill no longer there is dependence in data in the instruction being not carried out;
Step S7, arithmetic element is sent to by the corresponding microcommand of the instruction;
Step S8, the arithmetic element address of data and size needed for take from scratchpad
Go out the data of needs, then completed in arithmetic element the corresponding neural network computing of the instruction and/
Or matrix/vector computing.
Understood based on above-mentioned technical proposal, neural network computing and matrix/vector arithmetic unit of the invention
Had the advantages that with method:The data for participating in calculating are temporarily stored in scratchpad
On (Scratchpad Memory, scratch ROM) so that neural network computing and matrix/
The data of different in width more flexibly can be effectively supported during vector operation, while the god of customization
Various neutral net fortune can be more efficiently realized through network operations and matrix/vector computing module
Calculate and matrix/vector computing, lift the execution performance of calculating task, the instruction that the present invention is used has
The form of very long instruction word.
Brief description of the drawings
Fig. 1 is that the execution neural network computing of the present invention and the structure of matrix/vector arithmetic unit are shown
It is intended to;
Fig. 2 is the form schematic diagram of the instruction set of the present invention;
Fig. 3 is the form schematic diagram of the neural network computing instruction of the present invention;
Fig. 4 is the form schematic diagram of the matrix operation command of the present invention;
Fig. 5 is the form schematic diagram of the vector operation instruction of the present invention;
Fig. 6 is the form schematic diagram of the Matrix-Vector operational order of the present invention;
Fig. 7 is the neural network computing and matrix/vector arithmetic unit as one embodiment of the invention
Structural representation;
Fig. 8 is the neural network computing and matrix/vector arithmetic unit as one embodiment of the invention
In decoding module structural representation;
Fig. 9 is the neural network computing and matrix/vector arithmetic unit as one embodiment of the invention
Perform the flow chart of neural network computing and matrix/vector instruction.
Embodiment
The invention discloses a kind of neural network computing and the device of matrix/vector computing, including storage
Be stored with neuron/matrix in unit, register cell, control unit and arithmetic element, memory cell
Address and other specification that the neuron/matrix/vector that is stored with/vector, register cell is stored, control
Unit processed performs decoded operation, and according to instruction control modules are read, arithmetic element is according to nerve net
Network computing and matrix/vector operational order obtained in instruction or in register cell neuron/matrix/
Vector address and other specification, then, according to the neuron/matrix/vector address in the memory unit
Corresponding neuron/matrix/vector is obtained, then, is transported according to neuron/matrix/vector of acquisition
Calculate, obtain operation result.Neuron/matrix/vector the data for participating in calculating are temporarily stored in height by the present invention
On fast temporary storage so that the number of different in width more flexibly can be effectively supported in calculating process
According to the execution performance of lifting calculating task.
For the object, technical solutions and advantages of the present invention are more clearly understood, below in conjunction with specific reality
Example is applied, and referring to the drawings, the present invention is described in more detail.
Fig. 1 be the present invention neural network computing and matrix/vector arithmetic unit structural representation,
As shown in figure 1, the neural network computing and matrix/vector arithmetic unit include:
Memory cell, for storing neuron/matrix/vector, in one embodiment, the storage list
Member can be scratchpad, it would be preferable to support different size of neuron/matrix/vector data;
Necessary calculating data are temporarily stored in scratchpad (Scratchpad Memory) by the present invention,
Make this arithmetic unit can be cleverer in neural network computing and matrix/vector calculating process is carried out
The data of effectively support different in width living.
Register cell, for storing neuron/matrix/vector address, wherein:Neuron address is
Address that neuron is stored in the memory unit, matrix address are the ground that matrix is stored in the memory unit
The address that location, vector address are stored in the memory unit for vector;In one embodiment, deposit
Device unit can be scalar register heap there is provided the scalar register needed for calculating process, scalar is posted
Storage not only deposits neuron/matrix/vector address, and also storage has scalar data.When be related to matrix/
During the computing of vector and scalar, arithmetic element will not only obtain matrix/vector address from register cell,
Corresponding scalar is also obtained from register cell.
Control unit, the behavior for modules in control device.In one embodiment, control
Unit reads ready instruction, enters row decoding and generates a plurality of microcommand, is sent to other in device
Module, other modules perform corresponding operation according to obtained microcommand.
Arithmetic element, for obtaining various neural network computings and matrix/vector operational order, according to
Instruction obtains neuron/matrix/vector address in the register cell, then, according to the nerve
Member/matrix/vector address obtains corresponding neuron/matrix/vector in the memory unit, then, according to
The neuron of acquisition/matrix/vector carries out computing, obtains operation result, and operation result is stored in
In memory cell.Neural network computing and matrix/vector arithmetic element include vector multiplication part,
Cumulative part and scalar multiplication part.Neural network computing and matrix/vector arithmetic element are responsible for device
Neutral net/matrix/vector computing, include but is not limited to:The operation of convolutional neural networks forward operation,
Convolutional neural networks training operation, neutral net Pooling arithmetic operations, full connection nerve
The operation of network forward operation, the operation of full connection neural metwork trainings, batch normalization
Arithmetic operation, the operation of RBM neural network computings, the operation of Matrix-Vector multiplication, matrix-matrix add
/ subtracting arithmetic operation, Outer Product of Vectors (tensor) arithmetic operation, inner product of vectors arithmetic operation, vectorial four fundamental rules
Arithmetic operation, vector logic arithmetic operation, vector surmount function arithmetic operation, vectorial comparison operation behaviour
Make, ask vector maximum/minimum operation operation, vector circulant shift operation operation, generation to obey certain
The random vector arithmetic operation of distribution.Operational order is sent to arithmetic element execution.
According to an embodiment of the present invention, the device also includes:Instruction cache unit, for storing
Pending operational order.Instruct in the process of implementation, while be also buffered in instruction cache unit,
After an instruction has been performed, if the instruction is while be also not to be submitted finger in instruction cache unit
Earliest one instruction in order, the instruction will be submitted, once submit, the operation that this instruction is carried out
Change to unit state will be unable to revocation.In one embodiment, instruction cache unit can be weight
Order buffer.
According to an embodiment of the present invention, the device also includes:Instruction queue, after to decoding
Neural network computing and matrix/vector computing operational order carry out sequential storage, it is contemplated that difference refers to
Make comprising register on there may exist dependence, for cache decoding after instruction, when according to
Firing order after bad relation is satisfied.
According to an embodiment of the present invention, the device also includes:Dependence processing unit, is used for
Before arithmetic element obtains instruction, judge whether the operational order accesses identical with previous operational order
Neuron/matrix/vector storage address, if so, the operational order is stored in storage queue, is treated
After previous operational order is finished, the operational order in storage queue is supplied to the computing list
Member;Otherwise, the operational order is directly supplied to the arithmetic element.Specifically, operational order is visited
When asking scratchpad, front and rear instruction may access same memory space, in order to ensure to refer to
Make the correctness of implementing result, if present instruction be detected with the data of instruction before exist according to
The relation of relying, the instruction must wait until that dependence is eliminated in storage queue.
According to an embodiment of the present invention, the device also includes:Input-output unit, for by god
Memory cell is stored in through member/matrix/vector, or, operation result is obtained from memory cell.Its
In, input-output unit can direct memory cell, be responsible for reading data or write-in data from internal memory.
According to an embodiment of the present invention, the instruction set for apparatus of the present invention uses Load/Store
Structure, arithmetic element will not be operated to the data in internal memory.This instruction set uses very long instruction word
Framework, the neural network computing of complexity can be completed by carrying out different configurations to instruction, can also
Complete simple matrix/vector computing.In addition, this instruction set uses fixed length instructions simultaneously so that this hair
Bright neural network computing and matrix/vector arithmetic unit are the upper one decoding stage instructed under
One instruction carries out fetching.
Fig. 2 is the form schematic diagram of the operational order of the present invention, as shown in Fig. 2 operational order includes
An at least command code and at least three operand, wherein, command code is used for the work(for indicating the operational order
Can, arithmetic element is by recognizing that one or more command codes can carry out different computings, and operand is used for
The data message of the operational order is indicated, wherein, data message can be immediate or register number,
For example, when obtaining a matrix, matrix can be obtained in corresponding register according to register number
Initial address and matrix length, are obtained in the memory unit further according to matrix initial address and matrix length
The matrix of appropriate address storage.
Fig. 3 is the form schematic diagram of the neural network computing instruction of the present invention, as shown in figure 3, neural
Network operations instruction includes an at least command code and 16 operands, wherein, command code is used to indicate
The function of neural network computing instruction, arithmetic element is by recognizing that one or more command codes can be carried out
Different neural network computings, operand is used for the data message for indicating neural network computing instruction,
Wherein, data message can be immediate or register number.
Fig. 4 is the form schematic diagram of matrix-operational order of the present invention, as shown in figure 3, matrix operation
Instruction includes an at least command code and at least four operand, wherein, command code is used to indicate the matrix
The function of operational order, arithmetic element is by recognizing that one or more command codes can carry out different matrixes
Computing, operand is used to indicate the data message of the matrix operation command, wherein, data message can be with
It is immediate or register number.
Fig. 5 is the form schematic diagram of the vector operation instruction of the present invention, as shown in figure 3, vector operation
Instruction includes an at least command code and at least three operand, wherein, command code is used to indicate the vector
The function of operational order, arithmetic element is by recognizing that one or more command codes can carry out different vectors
Computing, operand is used to indicate the data message of the vector operation instruction, wherein, data message can be with
It is immediate or register number.
Fig. 6 is the form schematic diagram of the Matrix-Vector operational order of the present invention, as shown in fig. 6, matrix
- vector operation instruction includes an at least command code and at least six operand, wherein, command code is used to refer to
Show the function of the Matrix-Vector operational order, arithmetic element is by recognizing that one or more command codes can be entered
The different Matrix-Vector computing of row, operand is used for the data letter for indicating the Matrix-Vector operational order
Breath, wherein, data message can be immediate or register number.
Fig. 7 is the neural network computing and matrix/vector computing as one embodiment of the present invention
The structural representation of device, as shown in fig. 7, the device includes fetching module, decoding module, instruction
Queue, scalar register heap, dependence processing unit, storage queue, reorder caching, computing
Unit, scratch pad memory, IO memory access modules;
Fetching module, the module is responsible for taking out the next instruction that will be performed from command sequence, and
The instruction is transmitted to decoding module;
Decoding module, the module is responsible for instructing into row decoding, and instruction after decoding is transmitted into instruction team
Row;As shown in figure 8, the decoding module includes:Instruct receiving module, it is microcommand generation module, micro-
Instruction queue, microcommand transmitter module;Wherein, instruction receiving module is responsible for receiving to take from fetching module
The instruction obtained;Microcommand decoding module will instruct the Instruction decoding that receiving module is obtained into each work(of control
The microcommand of energy part;Micro instruction queue is used to deposit the microcommand sent from microcommand decoding module;
Microcommand transmitter module is responsible for microcommand being transmitted into each functional part;
Instruction queue, the instruction after being decoded for order buffer, is sent to dependence processing unit;
Scalar register heap is there is provided device in the scalar register needed for calculating process;
Dependence processing unit, the resume module process instruction instructs that may be present deposit with previous bar
Store up dependence.Matrix operation command can access scratchpad, and front and rear instruction may be accessed
Same memory space.In order to ensure the correctness of instruction execution result, if present instruction is detected
There is dependence to the data with instruction before, the instruction must be waited until in storage queue according to
Bad relation is eliminated.
Storage queue, the module is an ordered queue, has dependence in data with instructing before
Instruction be stored in the queue, until dependence eliminate after, submit instruction.
Reorder caching, instructs in the process of implementation, while being also buffered in the module, when one
After instruction has been performed, if the instruction be also simultaneously reorder be not submitted in caching it is earliest in instruction
An instruction, the instruction will be submitted.Once submitting, the operation that this instruction is carried out is to device shape
The change of state will be unable to revocation;Occupy-place is played a part of in the instruction reordered in caching, when it is included
First instruction when there is data dependence, then the instruction would not be submitted (release);Although after
Face has many instructions and constantly entered, but can only the receiving portion instruction (cache size that reordered control
System), until first instruction is submitted, whole calculating process can be just smoothed out.
Arithmetic element, the module is responsible for all neural network computings and the matrix/vector computing behaviour of device
Make, include but is not limited to:Convolutional neural networks forward operation is operated, convolutional neural networks training is operated,
Neutral net Pooling arithmetic operations, the operation of full connection neutral nets forward operation, full
The operation of connection neural metwork trainings, batch normalization arithmetic operations, RBM nerve nets
Network arithmetic operation, the operation of Matrix-Vector multiplication, matrix-matrix plus/minus arithmetic operation, Outer Product of Vectors
(tensor) arithmetic operation, inner product of vectors arithmetic operation, vectorial arithmetic operation, vector logic fortune
Operation, vector is calculated to surmount function arithmetic operation, vectorial comparison operation operation, seek vector maximum/minimum value
The random vector computing behaviour being necessarily distributed is obeyed in arithmetic operation, vector circulant shift operation operation, generation
Make.Operational order is sent to arithmetic element execution;
Scratch pad memory, the module is the special temporary storage device of data, it would be preferable to support different size
Data;
IO memory access modules, the module is used to directly access scratchpad, is responsible for from a high speed
Data or write-in data are read in temporary storage.
Fig. 9 is to perform neural network computing and square as the arithmetic unit of one embodiment of the present invention
The flow chart of battle array/vector operation instruction, as shown in figure 9, perform neural network computing and matrix/to
The process of amount instruction includes:
S1, fetching module takes out this neural network computing and matrix/vector instruction, and this is instructed
It is sent to decoding module.
S2, decoding module is sent to instruction queue to Instruction decoding, and by instruction.
S3, in decoding module, instruction is sent to instruction receiving module.
S4, instruction receiving module sends an instruction to microcommand generation module, carries out microcommand decoding.
S5, microcommand decoding module obtains the neural network computing operation of instruction in scalar register heap
Code and neural network computing operand, while by Instruction decoding into the microcommand for controlling each functional part,
It is sent to microcommand transmitting queue.
S6, after the data needed are obtained, the instruction is sent to dependence processing unit.Rely on and close
It is that processing unit is analyzed the instruction and whether there is with the instruction for having not carried out end above in data
Dependence.This instruction needs to wait until itself and the finger for being not carried out terminating above in storage queue
Untill no longer there is dependence in data in order.
S7, after dependence is not present, this neural network computing and matrix/vector instruction are corresponding
Microcommand is sent to the functional parts such as arithmetic element.
S8, the arithmetic element address of data and size needed for take out what is needed from scratch pad memory
Data, then complete neural network computing and matrix/vector computing in arithmetic element.
S9, after the completion of computing, output data is written back to the specified address of scratchpad, together
When reorder caching in the instruction be submitted.
In summary, the invention discloses a kind of neural network computing and the device of matrix/vector computing and
Method, coordinates corresponding instruction, can solve well current computer field neural network algorithm and
The problem of a large amount of matrix/vector computings, compared to existing traditional solution, the present invention can have
In the scale of neural network and matrix/vector scaleable, piece that instruction is configurable, easy to use, support
The advantages of caching is sufficient.
Particular embodiments described above, is carried out to the purpose of the present invention, technical scheme and beneficial effect
It is further described, it should be understood that the foregoing is only the specific embodiment of the present invention,
It is not intended to limit the invention, within the spirit and principles of the invention, any modification for being made,
Equivalent substitution, improvement etc., should be included in the scope of the protection.
Claims (10)
1. a kind of device for being used to perform neural network computing and matrix/vector computing, including storage
Unit, register cell, control unit, arithmetic element and scratchpad, wherein:
Memory cell, for storing neuron/matrix/vector;
Register cell, for storing neuron address/matrix address/vector address, wherein the god
It is the address that neuron is stored in the memory cell through first address, the matrix address is that matrix exists
The address stored in the memory cell, the vector address stores for vector in the memory cell
Address;
Control unit, for performing decoded operation, unit module is controlled according to instruction is read;
Arithmetic element, for according to instruction from the register cell with obtaining neuron address/matrix
Location/vector address, according to the neuron address/matrix address/vector address in the memory cell
Obtain corresponding neuron/matrix/vector, and according to the neuron/matrix/vector thus obtained and/
Or the data carried in instruction carry out computing, obtain operation result;
Characterized in that, neuron/matrix/vector the data for participating in the arithmetic element calculating are kept in
In scratchpad, the arithmetic element is read from the scratchpad when needed
Take.
2. as claimed in claim 1 be used to performing neural network computing and matrix/vector computing
Device, it is characterised in that the scratchpad can support different size of neuron/matrix
/ vector data.
3. as claimed in claim 1 be used to performing neural network computing and matrix/vector computing
Device, it is characterised in that there is provided institute in calculating process for scalar register heap for the register cell
The scalar register needed.
4. as claimed in claim 1 be used to performing neural network computing and matrix/vector computing
Device, it is characterised in that the arithmetic element includes vector multiplication part, cumulative part and scalar multiplication
Method part;And
The arithmetic element is responsible for neutral net/matrix/vector computing of device, including convolutional Neural net
Network forward operation operation, convolutional neural networks training operation, neutral net Pooling arithmetic operations,
Full connection neutral nets forward operation is operated, full connection neural metwork trainings are operated,
Batch normalization arithmetic operations, the operation of RBM neural network computings, Matrix-Vector multiplication
Operation, matrix-matrix plus/minus arithmetic operation, Outer Product of Vectors arithmetic operation, inner product of vectors arithmetic operation,
Vectorial arithmetic operation, vector logic arithmetic operation, vector surmount function arithmetic operation, vector ratio
Compared with arithmetic operation, ask vector maximum/minimum operation operation, vector circulant shift operation operation, generation
Obey the random vector arithmetic operation being necessarily distributed.
5. as claimed in claim 1 be used to performing neural network computing and matrix/vector computing
Device, it is characterised in that described device also includes instruction cache unit, the pending fortune for storing
Calculate instruction;The instruction cache unit is preferably the caching that reorders;And
Described device also includes instruction queue, the instruction after being decoded for order buffer, is sent to dependence and closes
It is processing unit.
6. as claimed in claim 5 be used to performing neural network computing and matrix/vector computing
Device, it is characterised in that described device also includes dependence processing unit and storage queue, described
Dependence processing unit be used for arithmetic element obtain instruction before, judge the operational order with it is previous
Whether operational order is anti-to be asked identical neuron/matrix/vector storage address, if so, by the computing
Instruction is stored in the storage queue;Otherwise, the operational order is directly supplied to the computing list
Member, after previous operational order is finished, institute is supplied to by the operational order in storage queue
State arithmetic element;The storage queue is used to store and instructs the finger for having dependence in data before
Order, and after dependence elimination, submit the instruction.
7. as claimed in claim 1 be used to performing neural network computing and matrix/vector computing
Device, it is characterised in that the instruction set of described device uses Load/Store structures, the computing list
Member is not operated to the data in internal memory;And
The instruction set of described device is preferred to use VLIW structured, while being preferred to use fixed length instructions.
8. as claimed in claim 1 be used to performing neural network computing and matrix/vector computing
Device, it is characterised in that the operational order that the arithmetic element is performed is including an at least command code and extremely
Few 3 operands;Wherein, the command code is used for the function of indicating the operational order, arithmetic element
By recognizing that one or more command codes carry out different computings;The operand is used to indicate the fortune
The data message of instruction is calculated, wherein, the data message is immediate or register number.
Preferably, when the operational order is that neural network computing is instructed, the neutral net fortune
Calculating instruction includes an at least command code and 16 operands;
Preferably, when the operational order is matrix-matrix operational order, the matrix-matrix
Operational order includes an at least command code and at least four operand;
Preferably, when the operational order is vector operation instruction, the vector operation instruction bag
Include an at least command code and at least three operand;
Preferably, when the operational order is Matrix-Vector operational order, the Matrix-Vector
Operational order includes an at least command code and at least six operand.
9. a kind of device for being used to perform neural network computing and matrix/vector computing, its feature exists
In, including:
Fetching module, for taking out the next instruction that will be performed from command sequence, and this is referred to
Order is transmitted to decoding module;
Decoding module, instruction team is transmitted to for being instructed to described into row decoding, and by the instruction after decoding
Row;
Instruction queue, for the instruction after decoding module decoding described in order buffer, and is sent to dependence pass
It is processing unit;
Scalar register heap, is used for providing scalar register for computing;
Dependence processing unit, for judge the instruction of present instruction and previous bar with the presence or absence of data according to
The relation of relying, if there is the present instruction then is stored in into storage queue;
Storage queue, the present instruction that there is data dependence relation is instructed for caching with previous bar, when
The present instruction instructs the dependence existed to launch the present instruction after eliminating with previous bar;
Reorder caching, for being cached when instructing and performing, and judges described after having performed
Whether instruction is not to be submitted an instruction earliest in instruction in the caching that reorders, if it is
The instruction is submitted;
Arithmetic element, for performing all neural network computings and matrix/vector arithmetic operation;
Scratchpad, the neuron/matrix calculated for the temporary participation arithmetic element/to
Data are measured, the arithmetic element is read from the scratchpad when needed;The high speed
Temporary storage is preferably able to support different size of data;
IO memory access modules, for directly accessing the scratchpad, are responsible for from the height
Data are read or write in fast temporary storage.
10. a kind of method for performing neural network computing and matrix/vector instruction, it is characterised in that
Comprise the following steps:
Step S1, fetching module takes out a neural network computing and matrix/vector instruction, and will
Decoding module is sent in the instruction;
Step S2, decoding module is sent to instruction queue to the Instruction decoding, and by the instruction;
Step S3, in decoding module, the instruction is sent to instruction receiving module;
The instruction is sent to microcommand decoding module by step S4, instruction receiving module, carries out micro- refer to
Order decoding;
Step S5, microcommand decoding module obtains the neutral net of the instruction in scalar register heap
Arithmetic operation code and neural network computing operand, while by the Instruction decoding into control each function
The microcommand of part, is sent to microcommand transmitting queue;
Step S6, after the data needed are obtained, the instruction is sent to dependence processing unit;
The dependence processing unit analysis instruction and the instruction that has had not carried out before in data whether
There is dependence, if it is present it is described instruction need waited until in storage queue its with before
Untill no longer there is dependence in data in the instruction being not carried out;
Step S7, arithmetic element is sent to by the corresponding microcommand of the instruction;
Step S8, the arithmetic element address of data and size needed for take from scratchpad
Go out the data of needs, then completed in arithmetic element the corresponding neural network computing of the instruction and/
Or matrix/vector computing.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610281291.5A CN107329936A (en) | 2016-04-29 | 2016-04-29 | A kind of apparatus and method for performing neural network computing and matrix/vector computing |
PCT/CN2016/082015 WO2017185418A1 (en) | 2016-04-29 | 2016-05-13 | Device and method for performing neural network computation and matrix/vector computation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610281291.5A CN107329936A (en) | 2016-04-29 | 2016-04-29 | A kind of apparatus and method for performing neural network computing and matrix/vector computing |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107329936A true CN107329936A (en) | 2017-11-07 |
Family
ID=60161583
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610281291.5A Pending CN107329936A (en) | 2016-04-29 | 2016-04-29 | A kind of apparatus and method for performing neural network computing and matrix/vector computing |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN107329936A (en) |
WO (1) | WO2017185418A1 (en) |
Cited By (29)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108108189A (en) * | 2017-12-15 | 2018-06-01 | 北京中科寒武纪科技有限公司 | A kind of computational methods and Related product |
CN108197705A (en) * | 2017-12-29 | 2018-06-22 | 国民技术股份有限公司 | Convolutional neural networks hardware accelerator and convolutional calculation method and storage medium |
CN108416431A (en) * | 2018-01-19 | 2018-08-17 | 上海兆芯集成电路有限公司 | Neural network microprocessor and macro instruction processing method |
CN108520296A (en) * | 2018-03-20 | 2018-09-11 | 福州瑞芯微电子股份有限公司 | A kind of method and apparatus based on the distribution of deep learning chip dynamic cache |
CN108701015A (en) * | 2017-11-30 | 2018-10-23 | 深圳市大疆创新科技有限公司 | For the arithmetic unit of neural network, chip, equipment and correlation technique |
CN108764470A (en) * | 2018-05-18 | 2018-11-06 | 中国科学院计算技术研究所 | A kind of processing method of artificial neural network operation |
CN109032669A (en) * | 2018-02-05 | 2018-12-18 | 上海寒武纪信息科技有限公司 | Processing with Neural Network device and its method for executing the instruction of vector minimum value |
CN109615059A (en) * | 2018-11-06 | 2019-04-12 | 海南大学 | Edge filling and filter dilation operation method and system in a kind of convolutional neural networks |
CN109871952A (en) * | 2017-12-01 | 2019-06-11 | 阿比特电子科技有限公司 | Electronic device, accelerator, the accelerated method of neural network and acceleration system |
CN109919308A (en) * | 2017-12-13 | 2019-06-21 | 腾讯科技(深圳)有限公司 | A kind of neural network model dispositions method, prediction technique and relevant device |
CN110147222A (en) * | 2018-09-18 | 2019-08-20 | 北京中科寒武纪科技有限公司 | Arithmetic unit and method |
WO2019165940A1 (en) * | 2018-02-27 | 2019-09-06 | 上海寒武纪信息科技有限公司 | Integrated circuit chip apparatus, board card and related product |
CN110276447A (en) * | 2018-03-14 | 2019-09-24 | 上海寒武纪信息科技有限公司 | A kind of computing device and method |
CN110647973A (en) * | 2018-06-27 | 2020-01-03 | 北京中科寒武纪科技有限公司 | Operation method and related method and product |
CN110673786A (en) * | 2019-09-03 | 2020-01-10 | 浪潮电子信息产业股份有限公司 | Data caching method and device |
CN110941584A (en) * | 2019-11-19 | 2020-03-31 | 中科寒武纪科技股份有限公司 | Operation engine and data operation method |
CN111027690A (en) * | 2019-11-26 | 2020-04-17 | 陈子祺 | Combined processing device, chip and method for executing deterministic inference |
CN111079911A (en) * | 2018-10-19 | 2020-04-28 | 中科寒武纪科技股份有限公司 | Operation method, system and related product |
CN111078286A (en) * | 2018-10-19 | 2020-04-28 | 上海寒武纪信息科技有限公司 | Data communication method, computing system and storage medium |
CN111126583A (en) * | 2019-12-23 | 2020-05-08 | 中国电子科技集团公司第五十八研究所 | Universal neural network accelerator |
CN111258634A (en) * | 2018-11-30 | 2020-06-09 | 上海寒武纪信息科技有限公司 | Data selection device, data processing method, chip and electronic equipment |
CN111260045A (en) * | 2018-11-30 | 2020-06-09 | 上海寒武纪信息科技有限公司 | Decoder and atomic instruction analysis method |
CN111353591A (en) * | 2018-12-20 | 2020-06-30 | 中科寒武纪科技股份有限公司 | Computing device and related product |
CN111860798A (en) * | 2019-04-27 | 2020-10-30 | 中科寒武纪科技股份有限公司 | Operation method, device and related product |
CN111898752A (en) * | 2020-08-03 | 2020-11-06 | 乐鑫信息科技(上海)股份有限公司 | Apparatus and method for performing LSTM neural network operations |
CN112348179A (en) * | 2020-11-26 | 2021-02-09 | 湃方科技(天津)有限责任公司 | Efficient convolutional neural network operation instruction set architecture, device and server |
CN115826910A (en) * | 2023-02-07 | 2023-03-21 | 成都申威科技有限责任公司 | Vector fixed point ALU processing system |
WO2023123453A1 (en) * | 2021-12-31 | 2023-07-06 | 华为技术有限公司 | Operation acceleration processing method, operation accelerator use method, and operation accelerator |
CN117992396A (en) * | 2024-03-29 | 2024-05-07 | 深存科技(无锡)有限公司 | Stream tensor processor |
Families Citing this family (27)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109754061B (en) * | 2017-11-07 | 2023-11-24 | 上海寒武纪信息科技有限公司 | Execution method of convolution expansion instruction and related product |
CN109754062B (en) * | 2017-11-07 | 2024-05-14 | 上海寒武纪信息科技有限公司 | Execution method of convolution expansion instruction and related product |
CN108037908B (en) * | 2017-12-15 | 2021-02-09 | 中科寒武纪科技股份有限公司 | Calculation method and related product |
CN108416422B (en) * | 2017-12-29 | 2024-03-01 | 国民技术股份有限公司 | FPGA-based convolutional neural network implementation method and device |
JP6846534B2 (en) * | 2018-02-13 | 2021-03-24 | シャンハイ カンブリコン インフォメーション テクノロジー カンパニー リミテッドShanghai Cambricon Information Technology Co.,Ltd. | Arithmetic logic unit and calculation method |
US12073215B2 (en) | 2018-02-13 | 2024-08-27 | Shanghai Cambricon Information Technology Co., Ltd | Computing device with a conversion unit to convert data values between various sizes of fixed-point and floating-point data |
CN111767997B (en) * | 2018-02-27 | 2023-08-29 | 上海寒武纪信息科技有限公司 | Integrated circuit chip device and related products |
CN111767998B (en) * | 2018-02-27 | 2024-05-14 | 上海寒武纪信息科技有限公司 | Integrated circuit chip device and related products |
CN110472734B (en) * | 2018-05-11 | 2024-03-29 | 上海寒武纪信息科技有限公司 | Computing device and related product |
CN110503179B (en) * | 2018-05-18 | 2024-03-01 | 上海寒武纪信息科技有限公司 | Calculation method and related product |
CN108959180B (en) * | 2018-06-15 | 2022-04-22 | 北京探境科技有限公司 | Data processing method and system |
CN111275197B (en) * | 2018-12-05 | 2023-11-10 | 上海寒武纪信息科技有限公司 | Operation method, device, computer equipment and storage medium |
CN111222632B (en) * | 2018-11-27 | 2023-06-30 | 中科寒武纪科技股份有限公司 | Computing device, computing method and related product |
CN111047024B (en) * | 2018-10-12 | 2023-05-23 | 上海寒武纪信息科技有限公司 | Computing device and related product |
EP4009185A1 (en) | 2018-10-18 | 2022-06-08 | Shanghai Cambricon Information Technology Co., Ltd | Network-on-chip data processing method and device |
CN111079908B (en) * | 2018-10-18 | 2024-02-13 | 上海寒武纪信息科技有限公司 | Network-on-chip data processing method, storage medium, computer device and apparatus |
CN109542513B (en) * | 2018-11-21 | 2023-04-21 | 山东浪潮科学研究院有限公司 | Convolutional neural network instruction data storage system and method |
CN111857828B (en) * | 2019-04-25 | 2023-03-14 | 安徽寒武纪信息科技有限公司 | Processor operation method and device and related product |
WO2020220935A1 (en) | 2019-04-27 | 2020-11-05 | 中科寒武纪科技股份有限公司 | Operation apparatus |
US11841822B2 (en) | 2019-04-27 | 2023-12-12 | Cambricon Technologies Corporation Limited | Fractal calculating device and method, integrated circuit and board card |
CN110780921B (en) * | 2019-08-30 | 2023-09-26 | 腾讯科技(深圳)有限公司 | Data processing method and device, storage medium and electronic device |
CN111047036B (en) * | 2019-12-09 | 2023-11-14 | Oppo广东移动通信有限公司 | Neural network processor, chip and electronic equipment |
CN111325321B (en) * | 2020-02-13 | 2023-08-29 | 中国科学院自动化研究所 | Brain-like computing system based on multi-neural network fusion and execution method of instruction set |
CN113361679B (en) * | 2020-03-05 | 2023-10-17 | 华邦电子股份有限公司 | Memory device and method of operating the same |
CN116841614B (en) * | 2023-05-29 | 2024-03-15 | 进迭时空(杭州)科技有限公司 | Sequential vector scheduling method under disordered access mechanism |
CN117807082B (en) * | 2023-12-20 | 2024-09-27 | 中科驭数(北京)科技有限公司 | Hash processing method, device, equipment and computer readable storage medium |
CN118467136A (en) * | 2024-05-30 | 2024-08-09 | 上海交通大学 | Calculation and storage method and system suitable for large language model sparse reasoning |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
SU980097A1 (en) * | 1981-06-05 | 1982-12-07 | Предприятие П/Я М-5769 | Device for control of scratchpad buffer storage of multiprocessor electronic computer |
CN1105138A (en) * | 1992-10-30 | 1995-07-12 | 国际商业机器公司 | Register architecture for a super scalar computer |
CN1133452A (en) * | 1994-10-13 | 1996-10-16 | 北京多思科技工业园股份有限公司 | Macroinstruction set symmetrical parallel system structure microprocessor |
CN101504599A (en) * | 2009-03-16 | 2009-08-12 | 西安电子科技大学 | Special instruction set micro-processing system suitable for digital signal processing application |
CN101739235A (en) * | 2008-11-26 | 2010-06-16 | 中国科学院微电子研究所 | Processor device for seamless mixing 32-bit DSP and general RISC CPU |
CN103348318A (en) * | 2011-02-07 | 2013-10-09 | Arm有限公司 | Controlling the execution of adjacent instructions that are dependent upon a same data condition |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6961719B1 (en) * | 2002-01-07 | 2005-11-01 | The United States Of America As Represented By The Administrator Of The National Aeronautics And Space Administration | Hybrid neural network and support vector machine method for optimization |
SI21200A (en) * | 2002-03-27 | 2003-10-31 | Jože Balič | The CNC control unit for controlling processing centres with learning ability |
CN1331092C (en) * | 2004-05-17 | 2007-08-08 | 中国科学院半导体研究所 | Special purpose neural net computer system for pattern recognition and application method |
-
2016
- 2016-04-29 CN CN201610281291.5A patent/CN107329936A/en active Pending
- 2016-05-13 WO PCT/CN2016/082015 patent/WO2017185418A1/en active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
SU980097A1 (en) * | 1981-06-05 | 1982-12-07 | Предприятие П/Я М-5769 | Device for control of scratchpad buffer storage of multiprocessor electronic computer |
CN1105138A (en) * | 1992-10-30 | 1995-07-12 | 国际商业机器公司 | Register architecture for a super scalar computer |
CN1133452A (en) * | 1994-10-13 | 1996-10-16 | 北京多思科技工业园股份有限公司 | Macroinstruction set symmetrical parallel system structure microprocessor |
CN101739235A (en) * | 2008-11-26 | 2010-06-16 | 中国科学院微电子研究所 | Processor device for seamless mixing 32-bit DSP and general RISC CPU |
CN101504599A (en) * | 2009-03-16 | 2009-08-12 | 西安电子科技大学 | Special instruction set micro-processing system suitable for digital signal processing application |
CN103348318A (en) * | 2011-02-07 | 2013-10-09 | Arm有限公司 | Controlling the execution of adjacent instructions that are dependent upon a same data condition |
Non-Patent Citations (3)
Title |
---|
CHRISTOPHER H.CHOU等: "VEGAS: Soft Vector Processor with Scratchpad Memory", 《FPGA’11 PROCEEDINGS OF THE 19TH ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD PROGRAMMABLE GATE ARRAYS》 * |
LEISHANGWEN: "自己动手写CPU之第五阶段(1)-流水线相关问题", 《HTTPS://BLOG.CSDN.NET/LEISHANGWEN/ARTICLE/DETAILS/38298787》 * |
张治元,张耀辉: "《信息导论》", 30 September 2013, 西安电子科技大学出版社 * |
Cited By (57)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019104695A1 (en) * | 2017-11-30 | 2019-06-06 | 深圳市大疆创新科技有限公司 | Arithmetic device for neural network, chip, equipment and related method |
CN108701015A (en) * | 2017-11-30 | 2018-10-23 | 深圳市大疆创新科技有限公司 | For the arithmetic unit of neural network, chip, equipment and correlation technique |
CN109871952A (en) * | 2017-12-01 | 2019-06-11 | 阿比特电子科技有限公司 | Electronic device, accelerator, the accelerated method of neural network and acceleration system |
CN109919308A (en) * | 2017-12-13 | 2019-06-21 | 腾讯科技(深圳)有限公司 | A kind of neural network model dispositions method, prediction technique and relevant device |
US12020142B2 (en) | 2017-12-13 | 2024-06-25 | Tencent Technology (Shenzhen) Company Limited | Neural network model deployment method, prediction method and related device |
CN109919308B (en) * | 2017-12-13 | 2022-11-11 | 腾讯科技(深圳)有限公司 | Neural network model deployment method, prediction method and related equipment |
CN112230994A (en) * | 2017-12-15 | 2021-01-15 | 安徽寒武纪信息科技有限公司 | Calculation method and related product |
CN108108189A (en) * | 2017-12-15 | 2018-06-01 | 北京中科寒武纪科技有限公司 | A kind of computational methods and Related product |
US11263007B2 (en) | 2017-12-29 | 2022-03-01 | Nationz Technologies Inc. | Convolutional neural network hardware acceleration device, convolutional calculation method, and storage medium |
WO2019127731A1 (en) * | 2017-12-29 | 2019-07-04 | 国民技术股份有限公司 | Convolutional neural network hardware acceleration device, convolutional calculation method and storage medium |
CN108197705A (en) * | 2017-12-29 | 2018-06-22 | 国民技术股份有限公司 | Convolutional neural networks hardware accelerator and convolutional calculation method and storage medium |
CN108416431B (en) * | 2018-01-19 | 2021-06-01 | 上海兆芯集成电路有限公司 | Neural network microprocessor and macroinstruction processing method |
CN108416431A (en) * | 2018-01-19 | 2018-08-17 | 上海兆芯集成电路有限公司 | Neural network microprocessor and macro instruction processing method |
CN109189474B (en) * | 2018-02-05 | 2023-08-29 | 上海寒武纪信息科技有限公司 | Neural network processing device and method for executing vector addition instruction |
CN109062612B (en) * | 2018-02-05 | 2023-06-27 | 上海寒武纪信息科技有限公司 | Neural network processing device and method for executing plane rotation instruction |
CN109189474A (en) * | 2018-02-05 | 2019-01-11 | 上海寒武纪信息科技有限公司 | Processing with Neural Network device and its method for executing vector adduction instruction |
CN109165041A (en) * | 2018-02-05 | 2019-01-08 | 上海寒武纪信息科技有限公司 | Processing with Neural Network device and its method for executing vector norm instruction |
CN109117186A (en) * | 2018-02-05 | 2019-01-01 | 上海寒武纪信息科技有限公司 | Processing with Neural Network device and its method for executing Outer Product of Vectors instruction |
CN109032669A (en) * | 2018-02-05 | 2018-12-18 | 上海寒武纪信息科技有限公司 | Processing with Neural Network device and its method for executing the instruction of vector minimum value |
CN109086076A (en) * | 2018-02-05 | 2018-12-25 | 上海寒武纪信息科技有限公司 | Processing with Neural Network device and its method for executing dot product instruction |
CN109062612A (en) * | 2018-02-05 | 2018-12-21 | 上海寒武纪信息科技有限公司 | Processing with Neural Network device and its method for executing Plane Rotation instruction |
CN109165041B (en) * | 2018-02-05 | 2023-06-30 | 上海寒武纪信息科技有限公司 | Neural network processing device and method for executing vector norm instruction |
CN109101273A (en) * | 2018-02-05 | 2018-12-28 | 上海寒武纪信息科技有限公司 | Processing with Neural Network device and its method for executing vector maximization instruction |
US11836497B2 (en) | 2018-02-05 | 2023-12-05 | Shanghai Cambricon Information Technology Co., Ltd | Operation module and method thereof |
CN109086076B (en) * | 2018-02-05 | 2023-08-25 | 上海寒武纪信息科技有限公司 | Neural network processing device and method for executing vector dot product instruction |
CN109101273B (en) * | 2018-02-05 | 2023-08-25 | 上海寒武纪信息科技有限公司 | Neural network processing device and method for executing vector maximum value instruction |
CN109032669B (en) * | 2018-02-05 | 2023-08-29 | 上海寒武纪信息科技有限公司 | Neural network processing device and method for executing vector minimum value instruction |
WO2019165940A1 (en) * | 2018-02-27 | 2019-09-06 | 上海寒武纪信息科技有限公司 | Integrated circuit chip apparatus, board card and related product |
CN110276447A (en) * | 2018-03-14 | 2019-09-24 | 上海寒武纪信息科技有限公司 | A kind of computing device and method |
CN108520296B (en) * | 2018-03-20 | 2020-05-15 | 福州瑞芯微电子股份有限公司 | Deep learning chip-based dynamic cache allocation method and device |
CN108520296A (en) * | 2018-03-20 | 2018-09-11 | 福州瑞芯微电子股份有限公司 | A kind of method and apparatus based on the distribution of deep learning chip dynamic cache |
CN108764470A (en) * | 2018-05-18 | 2018-11-06 | 中国科学院计算技术研究所 | A kind of processing method of artificial neural network operation |
CN110647973A (en) * | 2018-06-27 | 2020-01-03 | 北京中科寒武纪科技有限公司 | Operation method and related method and product |
CN110147222A (en) * | 2018-09-18 | 2019-08-20 | 北京中科寒武纪科技有限公司 | Arithmetic unit and method |
CN111078286A (en) * | 2018-10-19 | 2020-04-28 | 上海寒武纪信息科技有限公司 | Data communication method, computing system and storage medium |
CN111079911A (en) * | 2018-10-19 | 2020-04-28 | 中科寒武纪科技股份有限公司 | Operation method, system and related product |
CN111078286B (en) * | 2018-10-19 | 2023-09-01 | 上海寒武纪信息科技有限公司 | Data communication method, computing system and storage medium |
CN111079911B (en) * | 2018-10-19 | 2021-02-09 | 中科寒武纪科技股份有限公司 | Operation method, system and related product |
CN109615059A (en) * | 2018-11-06 | 2019-04-12 | 海南大学 | Edge filling and filter dilation operation method and system in a kind of convolutional neural networks |
CN111258634A (en) * | 2018-11-30 | 2020-06-09 | 上海寒武纪信息科技有限公司 | Data selection device, data processing method, chip and electronic equipment |
CN111260045B (en) * | 2018-11-30 | 2022-12-02 | 上海寒武纪信息科技有限公司 | Decoder and atomic instruction analysis method |
CN111260045A (en) * | 2018-11-30 | 2020-06-09 | 上海寒武纪信息科技有限公司 | Decoder and atomic instruction analysis method |
CN111353591A (en) * | 2018-12-20 | 2020-06-30 | 中科寒武纪科技股份有限公司 | Computing device and related product |
CN111860798A (en) * | 2019-04-27 | 2020-10-30 | 中科寒武纪科技股份有限公司 | Operation method, device and related product |
CN110673786B (en) * | 2019-09-03 | 2020-11-10 | 浪潮电子信息产业股份有限公司 | Data caching method and device |
CN110673786A (en) * | 2019-09-03 | 2020-01-10 | 浪潮电子信息产业股份有限公司 | Data caching method and device |
CN110941584A (en) * | 2019-11-19 | 2020-03-31 | 中科寒武纪科技股份有限公司 | Operation engine and data operation method |
CN111027690B (en) * | 2019-11-26 | 2023-08-04 | 陈子祺 | Combined processing device, chip and method for performing deterministic reasoning |
CN111027690A (en) * | 2019-11-26 | 2020-04-17 | 陈子祺 | Combined processing device, chip and method for executing deterministic inference |
CN111126583A (en) * | 2019-12-23 | 2020-05-08 | 中国电子科技集团公司第五十八研究所 | Universal neural network accelerator |
CN111898752A (en) * | 2020-08-03 | 2020-11-06 | 乐鑫信息科技(上海)股份有限公司 | Apparatus and method for performing LSTM neural network operations |
CN112348179A (en) * | 2020-11-26 | 2021-02-09 | 湃方科技(天津)有限责任公司 | Efficient convolutional neural network operation instruction set architecture, device and server |
CN112348179B (en) * | 2020-11-26 | 2023-04-07 | 湃方科技(天津)有限责任公司 | Efficient convolutional neural network operation instruction set architecture construction method and device, and server |
WO2023123453A1 (en) * | 2021-12-31 | 2023-07-06 | 华为技术有限公司 | Operation acceleration processing method, operation accelerator use method, and operation accelerator |
CN115826910A (en) * | 2023-02-07 | 2023-03-21 | 成都申威科技有限责任公司 | Vector fixed point ALU processing system |
CN117992396A (en) * | 2024-03-29 | 2024-05-07 | 深存科技(无锡)有限公司 | Stream tensor processor |
CN117992396B (en) * | 2024-03-29 | 2024-05-28 | 深存科技(无锡)有限公司 | Stream tensor processor |
Also Published As
Publication number | Publication date |
---|---|
WO2017185418A1 (en) | 2017-11-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107329936A (en) | A kind of apparatus and method for performing neural network computing and matrix/vector computing | |
CN107688854A (en) | A kind of arithmetic element, method and device that can support different bit wide operational datas | |
CN106991077A (en) | A kind of matrix computations device | |
Fang et al. | swdnn: A library for accelerating deep learning applications on sunway taihulight | |
CN108197705A (en) | Convolutional neural networks hardware accelerator and convolutional calculation method and storage medium | |
CN109062606A (en) | Machine learning processor and the method for executing vector scaling instruction using processor | |
CN106990940A (en) | A kind of vector calculation device | |
CN106991478A (en) | Apparatus and method for performing artificial neural network reverse train | |
CN107578098A (en) | Neural network processor based on systolic arrays | |
CN106970896A (en) | The vectorization implementation method of the two-dimensional matrix convolution of vector processor-oriented | |
CN106991476A (en) | Apparatus and method for performing artificial neural network forward operation | |
CN107563497A (en) | Computing device and method | |
CN108009126A (en) | A kind of computational methods and Related product | |
CN107305538A (en) | One Seed Matrix arithmetic unit and method | |
CN103955446B (en) | DSP-chip-based FFT computing method with variable length | |
CN109542830A (en) | A kind of data processing system and data processing method | |
CN109359730A (en) | Neural network processor towards fixed output normal form Winograd convolution | |
CN113010213B (en) | Simplified instruction set storage and calculation integrated neural network coprocessor based on resistance change memristor | |
CN108446534A (en) | Select the method, apparatus and computer readable storage medium of neural network hyper parameter | |
CN107193761A (en) | The method and apparatus of queue priority arbitration | |
CN107305486A (en) | A kind of neutral net maxout layers of computing device | |
CN107315567A (en) | A kind of apparatus and method for performing vector maximization minimum operation | |
EP3933703B1 (en) | Dynamic loading neural network inference at dram/on-bus sram/serial flash for power optimization | |
CN108037908A (en) | A kind of computational methods and Related product | |
CN107688466A (en) | A kind of arithmetic unit and its operating method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information |
Address after: 100190 room 644, comprehensive research building, No. 6 South Road, Haidian District Academy of Sciences, Beijing Applicant after: Zhongke Cambrian Technology Co., Ltd Address before: 100190 room 644, scientific research complex, No. 6, South Road, Academy of Sciences, Haidian District, Beijing Applicant before: Beijing Zhongke Cambrian Technology Co., Ltd. |
|
CB02 | Change of applicant information |