CN107329936A

CN107329936A - A kind of apparatus and method for performing neural network computing and matrix/vector computing

Info

Publication number: CN107329936A
Application number: CN201610281291.5A
Authority: CN
Inventors: 陶劲桦; 陈天石; 陈云霁
Original assignee: Beijing Zhongke Cambrian Technology Co Ltd
Current assignee: Cambricon Technologies Corp Ltd; Beijing Zhongke Cambrian Technology Co Ltd
Priority date: 2016-04-29
Filing date: 2016-04-29
Publication date: 2017-11-07
Also published as: WO2017185418A1

Abstract

A kind of apparatus and method for performing neural network computing and matrix/vector computing, the device includes memory cell, register cell, control unit, arithmetic element and scratchpad, neuron/matrix/vector the data for participating in calculating are temporarily stored in scratchpad, so that more flexibly can effectively support the data of different in width in calculating process, the execution performance of calculating task is lifted.The neural network computing and matrix/vector computing module that the present invention is customized can more efficiently realize various neural network computings and matrix/vector computing, lift the execution performance of calculating task；In addition, form of the instruction with very long instruction word that the present invention is used.

Description

A kind of apparatus and method for performing neural network computing and matrix/vector computing

Technical field

The present invention relates to neural network computing technical field, relate more specifically to a kind of for performing nerve Network operations and the apparatus and method of matrix/vector computing.

Background technology

Artificial neural network (ANNs), abbreviation neutral net (NNs) is a kind of imitation animal god Through network behavior feature, the algorithm mathematics model of distributed parallel information processing is carried out.This network according to By the complexity of system, by adjusting the relation being connected with each other between internal great deal of nodes, so as to reach To the purpose of processing information.At present, neutral net is equal in many fields such as intelligent control, machine learning Obtain tremendous development.Because neutral net belongs to algorithm mathematics model, it is related to substantial amounts of mathematical operation, Therefore the problem of neural network computing is current in the urgent need to address how is quickly and accurately performed.

The content of the invention

In view of this, it is an object of the invention to provide one kind perform neural network computing and matrix/ The apparatus and method of vector operation, to realize efficient neural network computing and matrix/vector computing.

To achieve these goals, as one aspect of the present invention, it is used for the invention provides one kind Perform the device of neural network computing and matrix/vector computing, including memory cell, register cell, Control unit, arithmetic element and scratchpad, wherein：

Memory cell, for storing neuron/matrix/vector；

Register cell, for storing neuron address/matrix address/vector address, wherein the god It is the address that neuron is stored in the memory cell through first address, the matrix address is that matrix exists The address stored in the memory cell, the vector address stores for vector in the memory cell Address；

Control unit, for performing decoded operation, unit module is controlled according to instruction is read；

Arithmetic element, for according to instruction from the register cell with obtaining neuron address/matrix Location/vector address, according to the neuron address/matrix address/vector address in the memory cell Obtain corresponding neuron/matrix/vector, and according to the neuron/matrix/vector thus obtained and/ Or the data carried in instruction carry out computing, obtain operation result；

Characterized in that, neuron/matrix/vector the data for participating in the arithmetic element calculating are kept in In scratchpad, the arithmetic element is read from the scratchpad when needed Take.

Wherein, the scratchpad can support different size of neuron/matrix/vector number According to.

Wherein, the register cell is that there is provided the scalar needed for calculating process for scalar register heap Register.

Wherein, the arithmetic element includes vector multiplication part, cumulative part and scalar multiplication part； And

The arithmetic element is responsible for neutral net/matrix/vector computing of device, including convolutional Neural net Network forward operation operation, convolutional neural networks training operation, neutral net Pooling arithmetic operations, Full connection neutral nets forward operation is operated, full connection neural metwork trainings are operated, Batch normalization arithmetic operations, the operation of RBM neural network computings, Matrix-Vector multiplication Operation, matrix-matrix plus/minus arithmetic operation, Outer Product of Vectors arithmetic operation, inner product of vectors arithmetic operation, Vectorial arithmetic operation, vector logic arithmetic operation, vector surmount function arithmetic operation, vector ratio Compared with arithmetic operation, ask vector maximum/minimum operation operation, vector circulant shift operation operation, generation Obey the random vector arithmetic operation being necessarily distributed.

Wherein, described device also includes instruction cache unit, the pending operational order for storing； The instruction cache unit is preferably the caching that reorders；And

Described device also includes instruction queue, the instruction after being decoded for order buffer, is sent to dependence and closes It is processing unit.

Wherein, described device also includes dependence processing unit and storage queue, the dependence Processing unit is used for before arithmetic element obtains instruction, judges the operational order and previous operational order Whether identical neuron/matrix/vector storage address is accessed, if so, the operational order is stored In the storage queue；Otherwise, the operational order is directly supplied to the arithmetic element, before treating After one operational order is finished, the operational order in storage queue is supplied to the computing list Member；The storage queue is used to store and instructs the instruction for having dependence in data before, and After dependence is eliminated, the instruction is submitted.

Wherein, the instruction set of described device uses Load/Store structures, and the arithmetic element is not internal Data in depositing are operated；And

The instruction set of described device is preferred to use VLIW structured, while being preferred to use fixed length instructions.

Wherein, the operational order that the arithmetic element is performed includes an at least command code and at least three is grasped Count；Wherein, the command code is used for the function of indicating the operational order, and arithmetic element passes through identification One or more command codes carry out different computings；The operand is used to indicate the operational order Data message, wherein, the data message is immediate or register number.

Preferably, when the operational order is that neural network computing is instructed, the neutral net fortune Calculating instruction includes an at least command code and 16 operands；

Preferably, when the operational order is matrix-matrix operational order, the matrix-matrix Operational order includes an at least command code and at least four operand；

Preferably, when the operational order is vector operation instruction, the vector operation instruction bag Include an at least command code and at least three operand；

Preferably, when the operational order is Matrix-Vector operational order, the Matrix-Vector Operational order includes an at least command code and at least six operand.

As another aspect of the present invention, it is used to perform neutral net fortune present invention also offers one kind Calculation and the device of matrix/vector computing, it is characterised in that including：

Fetching module, for taking out the next instruction that will be performed from command sequence, and this is referred to Order is transmitted to decoding module；

Decoding module, instruction team is transmitted to for being instructed to described into row decoding, and by the instruction after decoding Row；

Instruction queue, for the instruction after decoding module decoding described in order buffer, and is sent to dependence pass It is processing unit；

Scalar register heap, is used for providing scalar register for computing；

Dependence processing unit, for judge the instruction of present instruction and previous bar with the presence or absence of data according to The relation of relying, if there is the present instruction then is stored in into storage queue；

Storage queue, the present instruction that there is data dependence relation is instructed for caching with previous bar, when The present instruction instructs the dependence existed to launch the present instruction after eliminating with previous bar；

Reorder caching, for being cached when instructing and performing, and judges described after having performed Whether instruction is not to be submitted an instruction earliest in instruction in the caching that reorders, if it is The instruction is submitted；

Arithmetic element, for performing all neural network computings and matrix/vector arithmetic operation；

Scratchpad, the neuron/matrix calculated for the temporary participation arithmetic element/to Data are measured, the arithmetic element is read from the scratchpad when needed；The high speed Temporary storage is preferably able to support different size of data；

IO memory access modules, for directly accessing the scratchpad, are responsible for from the height Data are read or write in fast temporary storage.

As another aspect of the invention, present invention also offers one kind perform neural network computing with And the method for matrix/vector instruction, it is characterised in that comprise the following steps：

Step S1, fetching module takes out a neural network computing and matrix/vector instruction, and will Decoding module is sent in the instruction；

Step S2, decoding module is sent to instruction queue to the Instruction decoding, and by the instruction；

Step S3, in decoding module, the instruction is sent to instruction receiving module；

The instruction is sent to microcommand decoding module by step S4, instruction receiving module, carries out micro- refer to Order decoding；

Step S5, microcommand decoding module obtains the neutral net of the instruction in scalar register heap Arithmetic operation code and neural network computing operand, while by the Instruction decoding into control each function The microcommand of part, is sent to microcommand transmitting queue；

Step S6, after the data needed are obtained, the instruction is sent to dependence processing unit； The dependence processing unit analysis instruction and the instruction that has had not carried out before in data whether There is dependence, if it is present it is described instruction need waited until in storage queue its with before Untill no longer there is dependence in data in the instruction being not carried out；

Step S7, arithmetic element is sent to by the corresponding microcommand of the instruction；

Step S8, the arithmetic element address of data and size needed for take from scratchpad Go out the data of needs, then completed in arithmetic element the corresponding neural network computing of the instruction and/ Or matrix/vector computing.

Understood based on above-mentioned technical proposal, neural network computing and matrix/vector arithmetic unit of the invention Had the advantages that with method：The data for participating in calculating are temporarily stored in scratchpad On (Scratchpad Memory, scratch ROM) so that neural network computing and matrix/ The data of different in width more flexibly can be effectively supported during vector operation, while the god of customization Various neutral net fortune can be more efficiently realized through network operations and matrix/vector computing module Calculate and matrix/vector computing, lift the execution performance of calculating task, the instruction that the present invention is used has The form of very long instruction word.

Brief description of the drawings

Fig. 1 is that the execution neural network computing of the present invention and the structure of matrix/vector arithmetic unit are shown It is intended to；

Fig. 2 is the form schematic diagram of the instruction set of the present invention；

Fig. 3 is the form schematic diagram of the neural network computing instruction of the present invention；

Fig. 4 is the form schematic diagram of the matrix operation command of the present invention；

Fig. 5 is the form schematic diagram of the vector operation instruction of the present invention；

Fig. 6 is the form schematic diagram of the Matrix-Vector operational order of the present invention；

Fig. 7 is the neural network computing and matrix/vector arithmetic unit as one embodiment of the invention Structural representation；

Fig. 8 is the neural network computing and matrix/vector arithmetic unit as one embodiment of the invention In decoding module structural representation；

Fig. 9 is the neural network computing and matrix/vector arithmetic unit as one embodiment of the invention Perform the flow chart of neural network computing and matrix/vector instruction.

Embodiment

The invention discloses a kind of neural network computing and the device of matrix/vector computing, including storage Be stored with neuron/matrix in unit, register cell, control unit and arithmetic element, memory cell Address and other specification that the neuron/matrix/vector that is stored with/vector, register cell is stored, control Unit processed performs decoded operation, and according to instruction control modules are read, arithmetic element is according to nerve net Network computing and matrix/vector operational order obtained in instruction or in register cell neuron/matrix/ Vector address and other specification, then, according to the neuron/matrix/vector address in the memory unit Corresponding neuron/matrix/vector is obtained, then, is transported according to neuron/matrix/vector of acquisition Calculate, obtain operation result.Neuron/matrix/vector the data for participating in calculating are temporarily stored in height by the present invention On fast temporary storage so that the number of different in width more flexibly can be effectively supported in calculating process According to the execution performance of lifting calculating task.

For the object, technical solutions and advantages of the present invention are more clearly understood, below in conjunction with specific reality Example is applied, and referring to the drawings, the present invention is described in more detail.

Fig. 1 be the present invention neural network computing and matrix/vector arithmetic unit structural representation, As shown in figure 1, the neural network computing and matrix/vector arithmetic unit include：

Memory cell, for storing neuron/matrix/vector, in one embodiment, the storage list Member can be scratchpad, it would be preferable to support different size of neuron/matrix/vector data； Necessary calculating data are temporarily stored in scratchpad (Scratchpad Memory) by the present invention, Make this arithmetic unit can be cleverer in neural network computing and matrix/vector calculating process is carried out The data of effectively support different in width living.

Register cell, for storing neuron/matrix/vector address, wherein：Neuron address is Address that neuron is stored in the memory unit, matrix address are the ground that matrix is stored in the memory unit The address that location, vector address are stored in the memory unit for vector；In one embodiment, deposit Device unit can be scalar register heap there is provided the scalar register needed for calculating process, scalar is posted Storage not only deposits neuron/matrix/vector address, and also storage has scalar data.When be related to matrix/ During the computing of vector and scalar, arithmetic element will not only obtain matrix/vector address from register cell, Corresponding scalar is also obtained from register cell.

Control unit, the behavior for modules in control device.In one embodiment, control Unit reads ready instruction, enters row decoding and generates a plurality of microcommand, is sent to other in device Module, other modules perform corresponding operation according to obtained microcommand.

Arithmetic element, for obtaining various neural network computings and matrix/vector operational order, according to Instruction obtains neuron/matrix/vector address in the register cell, then, according to the nerve Member/matrix/vector address obtains corresponding neuron/matrix/vector in the memory unit, then, according to The neuron of acquisition/matrix/vector carries out computing, obtains operation result, and operation result is stored in In memory cell.Neural network computing and matrix/vector arithmetic element include vector multiplication part, Cumulative part and scalar multiplication part.Neural network computing and matrix/vector arithmetic element are responsible for device Neutral net/matrix/vector computing, include but is not limited to：The operation of convolutional neural networks forward operation, Convolutional neural networks training operation, neutral net Pooling arithmetic operations, full connection nerve The operation of network forward operation, the operation of full connection neural metwork trainings, batch normalization Arithmetic operation, the operation of RBM neural network computings, the operation of Matrix-Vector multiplication, matrix-matrix add / subtracting arithmetic operation, Outer Product of Vectors (tensor) arithmetic operation, inner product of vectors arithmetic operation, vectorial four fundamental rules Arithmetic operation, vector logic arithmetic operation, vector surmount function arithmetic operation, vectorial comparison operation behaviour Make, ask vector maximum/minimum operation operation, vector circulant shift operation operation, generation to obey certain The random vector arithmetic operation of distribution.Operational order is sent to arithmetic element execution.

According to an embodiment of the present invention, the device also includes：Instruction cache unit, for storing Pending operational order.Instruct in the process of implementation, while be also buffered in instruction cache unit, After an instruction has been performed, if the instruction is while be also not to be submitted finger in instruction cache unit Earliest one instruction in order, the instruction will be submitted, once submit, the operation that this instruction is carried out Change to unit state will be unable to revocation.In one embodiment, instruction cache unit can be weight Order buffer.

According to an embodiment of the present invention, the device also includes：Instruction queue, after to decoding Neural network computing and matrix/vector computing operational order carry out sequential storage, it is contemplated that difference refers to Make comprising register on there may exist dependence, for cache decoding after instruction, when according to Firing order after bad relation is satisfied.

According to an embodiment of the present invention, the device also includes：Dependence processing unit, is used for Before arithmetic element obtains instruction, judge whether the operational order accesses identical with previous operational order Neuron/matrix/vector storage address, if so, the operational order is stored in storage queue, is treated After previous operational order is finished, the operational order in storage queue is supplied to the computing list Member；Otherwise, the operational order is directly supplied to the arithmetic element.Specifically, operational order is visited When asking scratchpad, front and rear instruction may access same memory space, in order to ensure to refer to Make the correctness of implementing result, if present instruction be detected with the data of instruction before exist according to The relation of relying, the instruction must wait until that dependence is eliminated in storage queue.

According to an embodiment of the present invention, the device also includes：Input-output unit, for by god Memory cell is stored in through member/matrix/vector, or, operation result is obtained from memory cell.Its In, input-output unit can direct memory cell, be responsible for reading data or write-in data from internal memory.

According to an embodiment of the present invention, the instruction set for apparatus of the present invention uses Load/Store Structure, arithmetic element will not be operated to the data in internal memory.This instruction set uses very long instruction word Framework, the neural network computing of complexity can be completed by carrying out different configurations to instruction, can also Complete simple matrix/vector computing.In addition, this instruction set uses fixed length instructions simultaneously so that this hair Bright neural network computing and matrix/vector arithmetic unit are the upper one decoding stage instructed under One instruction carries out fetching.

Fig. 2 is the form schematic diagram of the operational order of the present invention, as shown in Fig. 2 operational order includes An at least command code and at least three operand, wherein, command code is used for the work(for indicating the operational order Can, arithmetic element is by recognizing that one or more command codes can carry out different computings, and operand is used for The data message of the operational order is indicated, wherein, data message can be immediate or register number, For example, when obtaining a matrix, matrix can be obtained in corresponding register according to register number Initial address and matrix length, are obtained in the memory unit further according to matrix initial address and matrix length The matrix of appropriate address storage.

Fig. 3 is the form schematic diagram of the neural network computing instruction of the present invention, as shown in figure 3, neural Network operations instruction includes an at least command code and 16 operands, wherein, command code is used to indicate The function of neural network computing instruction, arithmetic element is by recognizing that one or more command codes can be carried out Different neural network computings, operand is used for the data message for indicating neural network computing instruction, Wherein, data message can be immediate or register number.

Fig. 4 is the form schematic diagram of matrix-operational order of the present invention, as shown in figure 3, matrix operation Instruction includes an at least command code and at least four operand, wherein, command code is used to indicate the matrix The function of operational order, arithmetic element is by recognizing that one or more command codes can carry out different matrixes Computing, operand is used to indicate the data message of the matrix operation command, wherein, data message can be with It is immediate or register number.

Fig. 5 is the form schematic diagram of the vector operation instruction of the present invention, as shown in figure 3, vector operation Instruction includes an at least command code and at least three operand, wherein, command code is used to indicate the vector The function of operational order, arithmetic element is by recognizing that one or more command codes can carry out different vectors Computing, operand is used to indicate the data message of the vector operation instruction, wherein, data message can be with It is immediate or register number.

Fig. 6 is the form schematic diagram of the Matrix-Vector operational order of the present invention, as shown in fig. 6, matrix - vector operation instruction includes an at least command code and at least six operand, wherein, command code is used to refer to Show the function of the Matrix-Vector operational order, arithmetic element is by recognizing that one or more command codes can be entered The different Matrix-Vector computing of row, operand is used for the data letter for indicating the Matrix-Vector operational order Breath, wherein, data message can be immediate or register number.

Fig. 7 is the neural network computing and matrix/vector computing as one embodiment of the present invention The structural representation of device, as shown in fig. 7, the device includes fetching module, decoding module, instruction Queue, scalar register heap, dependence processing unit, storage queue, reorder caching, computing Unit, scratch pad memory, IO memory access modules；

Fetching module, the module is responsible for taking out the next instruction that will be performed from command sequence, and The instruction is transmitted to decoding module；

Decoding module, the module is responsible for instructing into row decoding, and instruction after decoding is transmitted into instruction team Row；As shown in figure 8, the decoding module includes：Instruct receiving module, it is microcommand generation module, micro- Instruction queue, microcommand transmitter module；Wherein, instruction receiving module is responsible for receiving to take from fetching module The instruction obtained；Microcommand decoding module will instruct the Instruction decoding that receiving module is obtained into each work(of control The microcommand of energy part；Micro instruction queue is used to deposit the microcommand sent from microcommand decoding module； Microcommand transmitter module is responsible for microcommand being transmitted into each functional part；

Instruction queue, the instruction after being decoded for order buffer, is sent to dependence processing unit；

Scalar register heap is there is provided device in the scalar register needed for calculating process；

Dependence processing unit, the resume module process instruction instructs that may be present deposit with previous bar Store up dependence.Matrix operation command can access scratchpad, and front and rear instruction may be accessed Same memory space.In order to ensure the correctness of instruction execution result, if present instruction is detected There is dependence to the data with instruction before, the instruction must be waited until in storage queue according to Bad relation is eliminated.

Storage queue, the module is an ordered queue, has dependence in data with instructing before Instruction be stored in the queue, until dependence eliminate after, submit instruction.

Reorder caching, instructs in the process of implementation, while being also buffered in the module, when one After instruction has been performed, if the instruction be also simultaneously reorder be not submitted in caching it is earliest in instruction An instruction, the instruction will be submitted.Once submitting, the operation that this instruction is carried out is to device shape The change of state will be unable to revocation；Occupy-place is played a part of in the instruction reordered in caching, when it is included First instruction when there is data dependence, then the instruction would not be submitted (release)；Although after Face has many instructions and constantly entered, but can only the receiving portion instruction (cache size that reordered control System), until first instruction is submitted, whole calculating process can be just smoothed out.

Arithmetic element, the module is responsible for all neural network computings and the matrix/vector computing behaviour of device Make, include but is not limited to：Convolutional neural networks forward operation is operated, convolutional neural networks training is operated, Neutral net Pooling arithmetic operations, the operation of full connection neutral nets forward operation, full The operation of connection neural metwork trainings, batch normalization arithmetic operations, RBM nerve nets Network arithmetic operation, the operation of Matrix-Vector multiplication, matrix-matrix plus/minus arithmetic operation, Outer Product of Vectors (tensor) arithmetic operation, inner product of vectors arithmetic operation, vectorial arithmetic operation, vector logic fortune Operation, vector is calculated to surmount function arithmetic operation, vectorial comparison operation operation, seek vector maximum/minimum value The random vector computing behaviour being necessarily distributed is obeyed in arithmetic operation, vector circulant shift operation operation, generation Make.Operational order is sent to arithmetic element execution；

Scratch pad memory, the module is the special temporary storage device of data, it would be preferable to support different size Data；

IO memory access modules, the module is used to directly access scratchpad, is responsible for from a high speed Data or write-in data are read in temporary storage.

Fig. 9 is to perform neural network computing and square as the arithmetic unit of one embodiment of the present invention The flow chart of battle array/vector operation instruction, as shown in figure 9, perform neural network computing and matrix/to The process of amount instruction includes：

S1, fetching module takes out this neural network computing and matrix/vector instruction, and this is instructed It is sent to decoding module.

S2, decoding module is sent to instruction queue to Instruction decoding, and by instruction.

S3, in decoding module, instruction is sent to instruction receiving module.

S4, instruction receiving module sends an instruction to microcommand generation module, carries out microcommand decoding.

S5, microcommand decoding module obtains the neural network computing operation of instruction in scalar register heap Code and neural network computing operand, while by Instruction decoding into the microcommand for controlling each functional part, It is sent to microcommand transmitting queue.

S6, after the data needed are obtained, the instruction is sent to dependence processing unit.Rely on and close It is that processing unit is analyzed the instruction and whether there is with the instruction for having not carried out end above in data Dependence.This instruction needs to wait until itself and the finger for being not carried out terminating above in storage queue Untill no longer there is dependence in data in order.

S7, after dependence is not present, this neural network computing and matrix/vector instruction are corresponding Microcommand is sent to the functional parts such as arithmetic element.

S8, the arithmetic element address of data and size needed for take out what is needed from scratch pad memory Data, then complete neural network computing and matrix/vector computing in arithmetic element.

S9, after the completion of computing, output data is written back to the specified address of scratchpad, together When reorder caching in the instruction be submitted.

In summary, the invention discloses a kind of neural network computing and the device of matrix/vector computing and Method, coordinates corresponding instruction, can solve well current computer field neural network algorithm and The problem of a large amount of matrix/vector computings, compared to existing traditional solution, the present invention can have In the scale of neural network and matrix/vector scaleable, piece that instruction is configurable, easy to use, support The advantages of caching is sufficient.

Particular embodiments described above, is carried out to the purpose of the present invention, technical scheme and beneficial effect It is further described, it should be understood that the foregoing is only the specific embodiment of the present invention, It is not intended to limit the invention, within the spirit and principles of the invention, any modification for being made, Equivalent substitution, improvement etc., should be included in the scope of the protection.

Claims

1. a kind of device for being used to perform neural network computing and matrix/vector computing, including storage Unit, register cell, control unit, arithmetic element and scratchpad, wherein：

Memory cell, for storing neuron/matrix/vector；

2. as claimed in claim 1 be used to performing neural network computing and matrix/vector computing Device, it is characterised in that the scratchpad can support different size of neuron/matrix / vector data.

3. as claimed in claim 1 be used to performing neural network computing and matrix/vector computing Device, it is characterised in that there is provided institute in calculating process for scalar register heap for the register cell The scalar register needed.

4. as claimed in claim 1 be used to performing neural network computing and matrix/vector computing Device, it is characterised in that the arithmetic element includes vector multiplication part, cumulative part and scalar multiplication Method part；And

5. as claimed in claim 1 be used to performing neural network computing and matrix/vector computing Device, it is characterised in that described device also includes instruction cache unit, the pending fortune for storing Calculate instruction；The instruction cache unit is preferably the caching that reorders；And

6. as claimed in claim 5 be used to performing neural network computing and matrix/vector computing Device, it is characterised in that described device also includes dependence processing unit and storage queue, described Dependence processing unit be used for arithmetic element obtain instruction before, judge the operational order with it is previous Whether operational order is anti-to be asked identical neuron/matrix/vector storage address, if so, by the computing Instruction is stored in the storage queue；Otherwise, the operational order is directly supplied to the computing list Member, after previous operational order is finished, institute is supplied to by the operational order in storage queue State arithmetic element；The storage queue is used to store and instructs the finger for having dependence in data before Order, and after dependence elimination, submit the instruction.

7. as claimed in claim 1 be used to performing neural network computing and matrix/vector computing Device, it is characterised in that the instruction set of described device uses Load/Store structures, the computing list Member is not operated to the data in internal memory；And

8. as claimed in claim 1 be used to performing neural network computing and matrix/vector computing Device, it is characterised in that the operational order that the arithmetic element is performed is including an at least command code and extremely Few 3 operands；Wherein, the command code is used for the function of indicating the operational order, arithmetic element By recognizing that one or more command codes carry out different computings；The operand is used to indicate the fortune The data message of instruction is calculated, wherein, the data message is immediate or register number.

9. a kind of device for being used to perform neural network computing and matrix/vector computing, its feature exists In, including：

Scalar register heap, is used for providing scalar register for computing；

10. a kind of method for performing neural network computing and matrix/vector instruction, it is characterised in that Comprise the following steps：