[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN107329936A - A kind of apparatus and method for performing neural network computing and matrix/vector computing - Google Patents

A kind of apparatus and method for performing neural network computing and matrix/vector computing Download PDF

Info

Publication number
CN107329936A
CN107329936A CN201610281291.5A CN201610281291A CN107329936A CN 107329936 A CN107329936 A CN 107329936A CN 201610281291 A CN201610281291 A CN 201610281291A CN 107329936 A CN107329936 A CN 107329936A
Authority
CN
China
Prior art keywords
instruction
matrix
vector
computing
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610281291.5A
Other languages
Chinese (zh)
Inventor
陶劲桦
陈天石
陈云霁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Cambricon Technologies Corp Ltd
Beijing Zhongke Cambrian Technology Co Ltd
Original Assignee
Beijing Zhongke Cambrian Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zhongke Cambrian Technology Co Ltd filed Critical Beijing Zhongke Cambrian Technology Co Ltd
Priority to CN201610281291.5A priority Critical patent/CN107329936A/en
Priority to PCT/CN2016/082015 priority patent/WO2017185418A1/en
Publication of CN107329936A publication Critical patent/CN107329936A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands
    • G06F9/3001Arithmetic instructions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30098Register arrangements
    • G06F9/3012Organisation of register space, e.g. banked or distributed register file
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30098Register arrangements
    • G06F9/3012Organisation of register space, e.g. banked or distributed register file
    • G06F9/30134Register stacks; shift registers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30145Instruction analysis, e.g. decoding, instruction word fields
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Optimization (AREA)
  • Biophysics (AREA)
  • Computational Mathematics (AREA)
  • Computing Systems (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • Neurology (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Complex Calculations (AREA)
  • Advance Control (AREA)

Abstract

A kind of apparatus and method for performing neural network computing and matrix/vector computing, the device includes memory cell, register cell, control unit, arithmetic element and scratchpad, neuron/matrix/vector the data for participating in calculating are temporarily stored in scratchpad, so that more flexibly can effectively support the data of different in width in calculating process, the execution performance of calculating task is lifted.The neural network computing and matrix/vector computing module that the present invention is customized can more efficiently realize various neural network computings and matrix/vector computing, lift the execution performance of calculating task;In addition, form of the instruction with very long instruction word that the present invention is used.

Description

A kind of apparatus and method for performing neural network computing and matrix/vector computing
Technical field
The present invention relates to neural network computing technical field, relate more specifically to a kind of for performing nerve Network operations and the apparatus and method of matrix/vector computing.
Background technology
Artificial neural network (ANNs), abbreviation neutral net (NNs) is a kind of imitation animal god Through network behavior feature, the algorithm mathematics model of distributed parallel information processing is carried out.This network according to By the complexity of system, by adjusting the relation being connected with each other between internal great deal of nodes, so as to reach To the purpose of processing information.At present, neutral net is equal in many fields such as intelligent control, machine learning Obtain tremendous development.Because neutral net belongs to algorithm mathematics model, it is related to substantial amounts of mathematical operation, Therefore the problem of neural network computing is current in the urgent need to address how is quickly and accurately performed.
The content of the invention
In view of this, it is an object of the invention to provide one kind perform neural network computing and matrix/ The apparatus and method of vector operation, to realize efficient neural network computing and matrix/vector computing.
To achieve these goals, as one aspect of the present invention, it is used for the invention provides one kind Perform the device of neural network computing and matrix/vector computing, including memory cell, register cell, Control unit, arithmetic element and scratchpad, wherein:
Memory cell, for storing neuron/matrix/vector;
Register cell, for storing neuron address/matrix address/vector address, wherein the god It is the address that neuron is stored in the memory cell through first address, the matrix address is that matrix exists The address stored in the memory cell, the vector address stores for vector in the memory cell Address;
Control unit, for performing decoded operation, unit module is controlled according to instruction is read;
Arithmetic element, for according to instruction from the register cell with obtaining neuron address/matrix Location/vector address, according to the neuron address/matrix address/vector address in the memory cell Obtain corresponding neuron/matrix/vector, and according to the neuron/matrix/vector thus obtained and/ Or the data carried in instruction carry out computing, obtain operation result;
Characterized in that, neuron/matrix/vector the data for participating in the arithmetic element calculating are kept in In scratchpad, the arithmetic element is read from the scratchpad when needed Take.
Wherein, the scratchpad can support different size of neuron/matrix/vector number According to.
Wherein, the register cell is that there is provided the scalar needed for calculating process for scalar register heap Register.
Wherein, the arithmetic element includes vector multiplication part, cumulative part and scalar multiplication part; And
The arithmetic element is responsible for neutral net/matrix/vector computing of device, including convolutional Neural net Network forward operation operation, convolutional neural networks training operation, neutral net Pooling arithmetic operations, Full connection neutral nets forward operation is operated, full connection neural metwork trainings are operated, Batch normalization arithmetic operations, the operation of RBM neural network computings, Matrix-Vector multiplication Operation, matrix-matrix plus/minus arithmetic operation, Outer Product of Vectors arithmetic operation, inner product of vectors arithmetic operation, Vectorial arithmetic operation, vector logic arithmetic operation, vector surmount function arithmetic operation, vector ratio Compared with arithmetic operation, ask vector maximum/minimum operation operation, vector circulant shift operation operation, generation Obey the random vector arithmetic operation being necessarily distributed.
Wherein, described device also includes instruction cache unit, the pending operational order for storing; The instruction cache unit is preferably the caching that reorders;And
Described device also includes instruction queue, the instruction after being decoded for order buffer, is sent to dependence and closes It is processing unit.
Wherein, described device also includes dependence processing unit and storage queue, the dependence Processing unit is used for before arithmetic element obtains instruction, judges the operational order and previous operational order Whether identical neuron/matrix/vector storage address is accessed, if so, the operational order is stored In the storage queue;Otherwise, the operational order is directly supplied to the arithmetic element, before treating After one operational order is finished, the operational order in storage queue is supplied to the computing list Member;The storage queue is used to store and instructs the instruction for having dependence in data before, and After dependence is eliminated, the instruction is submitted.
Wherein, the instruction set of described device uses Load/Store structures, and the arithmetic element is not internal Data in depositing are operated;And
The instruction set of described device is preferred to use VLIW structured, while being preferred to use fixed length instructions.
Wherein, the operational order that the arithmetic element is performed includes an at least command code and at least three is grasped Count;Wherein, the command code is used for the function of indicating the operational order, and arithmetic element passes through identification One or more command codes carry out different computings;The operand is used to indicate the operational order Data message, wherein, the data message is immediate or register number.
Preferably, when the operational order is that neural network computing is instructed, the neutral net fortune Calculating instruction includes an at least command code and 16 operands;
Preferably, when the operational order is matrix-matrix operational order, the matrix-matrix Operational order includes an at least command code and at least four operand;
Preferably, when the operational order is vector operation instruction, the vector operation instruction bag Include an at least command code and at least three operand;
Preferably, when the operational order is Matrix-Vector operational order, the Matrix-Vector Operational order includes an at least command code and at least six operand.
As another aspect of the present invention, it is used to perform neutral net fortune present invention also offers one kind Calculation and the device of matrix/vector computing, it is characterised in that including:
Fetching module, for taking out the next instruction that will be performed from command sequence, and this is referred to Order is transmitted to decoding module;
Decoding module, instruction team is transmitted to for being instructed to described into row decoding, and by the instruction after decoding Row;
Instruction queue, for the instruction after decoding module decoding described in order buffer, and is sent to dependence pass It is processing unit;
Scalar register heap, is used for providing scalar register for computing;
Dependence processing unit, for judge the instruction of present instruction and previous bar with the presence or absence of data according to The relation of relying, if there is the present instruction then is stored in into storage queue;
Storage queue, the present instruction that there is data dependence relation is instructed for caching with previous bar, when The present instruction instructs the dependence existed to launch the present instruction after eliminating with previous bar;
Reorder caching, for being cached when instructing and performing, and judges described after having performed Whether instruction is not to be submitted an instruction earliest in instruction in the caching that reorders, if it is The instruction is submitted;
Arithmetic element, for performing all neural network computings and matrix/vector arithmetic operation;
Scratchpad, the neuron/matrix calculated for the temporary participation arithmetic element/to Data are measured, the arithmetic element is read from the scratchpad when needed;The high speed Temporary storage is preferably able to support different size of data;
IO memory access modules, for directly accessing the scratchpad, are responsible for from the height Data are read or write in fast temporary storage.
As another aspect of the invention, present invention also offers one kind perform neural network computing with And the method for matrix/vector instruction, it is characterised in that comprise the following steps:
Step S1, fetching module takes out a neural network computing and matrix/vector instruction, and will Decoding module is sent in the instruction;
Step S2, decoding module is sent to instruction queue to the Instruction decoding, and by the instruction;
Step S3, in decoding module, the instruction is sent to instruction receiving module;
The instruction is sent to microcommand decoding module by step S4, instruction receiving module, carries out micro- refer to Order decoding;
Step S5, microcommand decoding module obtains the neutral net of the instruction in scalar register heap Arithmetic operation code and neural network computing operand, while by the Instruction decoding into control each function The microcommand of part, is sent to microcommand transmitting queue;
Step S6, after the data needed are obtained, the instruction is sent to dependence processing unit; The dependence processing unit analysis instruction and the instruction that has had not carried out before in data whether There is dependence, if it is present it is described instruction need waited until in storage queue its with before Untill no longer there is dependence in data in the instruction being not carried out;
Step S7, arithmetic element is sent to by the corresponding microcommand of the instruction;
Step S8, the arithmetic element address of data and size needed for take from scratchpad Go out the data of needs, then completed in arithmetic element the corresponding neural network computing of the instruction and/ Or matrix/vector computing.
Understood based on above-mentioned technical proposal, neural network computing and matrix/vector arithmetic unit of the invention Had the advantages that with method:The data for participating in calculating are temporarily stored in scratchpad On (Scratchpad Memory, scratch ROM) so that neural network computing and matrix/ The data of different in width more flexibly can be effectively supported during vector operation, while the god of customization Various neutral net fortune can be more efficiently realized through network operations and matrix/vector computing module Calculate and matrix/vector computing, lift the execution performance of calculating task, the instruction that the present invention is used has The form of very long instruction word.
Brief description of the drawings
Fig. 1 is that the execution neural network computing of the present invention and the structure of matrix/vector arithmetic unit are shown It is intended to;
Fig. 2 is the form schematic diagram of the instruction set of the present invention;
Fig. 3 is the form schematic diagram of the neural network computing instruction of the present invention;
Fig. 4 is the form schematic diagram of the matrix operation command of the present invention;
Fig. 5 is the form schematic diagram of the vector operation instruction of the present invention;
Fig. 6 is the form schematic diagram of the Matrix-Vector operational order of the present invention;
Fig. 7 is the neural network computing and matrix/vector arithmetic unit as one embodiment of the invention Structural representation;
Fig. 8 is the neural network computing and matrix/vector arithmetic unit as one embodiment of the invention In decoding module structural representation;
Fig. 9 is the neural network computing and matrix/vector arithmetic unit as one embodiment of the invention Perform the flow chart of neural network computing and matrix/vector instruction.
Embodiment
The invention discloses a kind of neural network computing and the device of matrix/vector computing, including storage Be stored with neuron/matrix in unit, register cell, control unit and arithmetic element, memory cell Address and other specification that the neuron/matrix/vector that is stored with/vector, register cell is stored, control Unit processed performs decoded operation, and according to instruction control modules are read, arithmetic element is according to nerve net Network computing and matrix/vector operational order obtained in instruction or in register cell neuron/matrix/ Vector address and other specification, then, according to the neuron/matrix/vector address in the memory unit Corresponding neuron/matrix/vector is obtained, then, is transported according to neuron/matrix/vector of acquisition Calculate, obtain operation result.Neuron/matrix/vector the data for participating in calculating are temporarily stored in height by the present invention On fast temporary storage so that the number of different in width more flexibly can be effectively supported in calculating process According to the execution performance of lifting calculating task.
For the object, technical solutions and advantages of the present invention are more clearly understood, below in conjunction with specific reality Example is applied, and referring to the drawings, the present invention is described in more detail.
Fig. 1 be the present invention neural network computing and matrix/vector arithmetic unit structural representation, As shown in figure 1, the neural network computing and matrix/vector arithmetic unit include:
Memory cell, for storing neuron/matrix/vector, in one embodiment, the storage list Member can be scratchpad, it would be preferable to support different size of neuron/matrix/vector data; Necessary calculating data are temporarily stored in scratchpad (Scratchpad Memory) by the present invention, Make this arithmetic unit can be cleverer in neural network computing and matrix/vector calculating process is carried out The data of effectively support different in width living.
Register cell, for storing neuron/matrix/vector address, wherein:Neuron address is Address that neuron is stored in the memory unit, matrix address are the ground that matrix is stored in the memory unit The address that location, vector address are stored in the memory unit for vector;In one embodiment, deposit Device unit can be scalar register heap there is provided the scalar register needed for calculating process, scalar is posted Storage not only deposits neuron/matrix/vector address, and also storage has scalar data.When be related to matrix/ During the computing of vector and scalar, arithmetic element will not only obtain matrix/vector address from register cell, Corresponding scalar is also obtained from register cell.
Control unit, the behavior for modules in control device.In one embodiment, control Unit reads ready instruction, enters row decoding and generates a plurality of microcommand, is sent to other in device Module, other modules perform corresponding operation according to obtained microcommand.
Arithmetic element, for obtaining various neural network computings and matrix/vector operational order, according to Instruction obtains neuron/matrix/vector address in the register cell, then, according to the nerve Member/matrix/vector address obtains corresponding neuron/matrix/vector in the memory unit, then, according to The neuron of acquisition/matrix/vector carries out computing, obtains operation result, and operation result is stored in In memory cell.Neural network computing and matrix/vector arithmetic element include vector multiplication part, Cumulative part and scalar multiplication part.Neural network computing and matrix/vector arithmetic element are responsible for device Neutral net/matrix/vector computing, include but is not limited to:The operation of convolutional neural networks forward operation, Convolutional neural networks training operation, neutral net Pooling arithmetic operations, full connection nerve The operation of network forward operation, the operation of full connection neural metwork trainings, batch normalization Arithmetic operation, the operation of RBM neural network computings, the operation of Matrix-Vector multiplication, matrix-matrix add / subtracting arithmetic operation, Outer Product of Vectors (tensor) arithmetic operation, inner product of vectors arithmetic operation, vectorial four fundamental rules Arithmetic operation, vector logic arithmetic operation, vector surmount function arithmetic operation, vectorial comparison operation behaviour Make, ask vector maximum/minimum operation operation, vector circulant shift operation operation, generation to obey certain The random vector arithmetic operation of distribution.Operational order is sent to arithmetic element execution.
According to an embodiment of the present invention, the device also includes:Instruction cache unit, for storing Pending operational order.Instruct in the process of implementation, while be also buffered in instruction cache unit, After an instruction has been performed, if the instruction is while be also not to be submitted finger in instruction cache unit Earliest one instruction in order, the instruction will be submitted, once submit, the operation that this instruction is carried out Change to unit state will be unable to revocation.In one embodiment, instruction cache unit can be weight Order buffer.
According to an embodiment of the present invention, the device also includes:Instruction queue, after to decoding Neural network computing and matrix/vector computing operational order carry out sequential storage, it is contemplated that difference refers to Make comprising register on there may exist dependence, for cache decoding after instruction, when according to Firing order after bad relation is satisfied.
According to an embodiment of the present invention, the device also includes:Dependence processing unit, is used for Before arithmetic element obtains instruction, judge whether the operational order accesses identical with previous operational order Neuron/matrix/vector storage address, if so, the operational order is stored in storage queue, is treated After previous operational order is finished, the operational order in storage queue is supplied to the computing list Member;Otherwise, the operational order is directly supplied to the arithmetic element.Specifically, operational order is visited When asking scratchpad, front and rear instruction may access same memory space, in order to ensure to refer to Make the correctness of implementing result, if present instruction be detected with the data of instruction before exist according to The relation of relying, the instruction must wait until that dependence is eliminated in storage queue.
According to an embodiment of the present invention, the device also includes:Input-output unit, for by god Memory cell is stored in through member/matrix/vector, or, operation result is obtained from memory cell.Its In, input-output unit can direct memory cell, be responsible for reading data or write-in data from internal memory.
According to an embodiment of the present invention, the instruction set for apparatus of the present invention uses Load/Store Structure, arithmetic element will not be operated to the data in internal memory.This instruction set uses very long instruction word Framework, the neural network computing of complexity can be completed by carrying out different configurations to instruction, can also Complete simple matrix/vector computing.In addition, this instruction set uses fixed length instructions simultaneously so that this hair Bright neural network computing and matrix/vector arithmetic unit are the upper one decoding stage instructed under One instruction carries out fetching.
Fig. 2 is the form schematic diagram of the operational order of the present invention, as shown in Fig. 2 operational order includes An at least command code and at least three operand, wherein, command code is used for the work(for indicating the operational order Can, arithmetic element is by recognizing that one or more command codes can carry out different computings, and operand is used for The data message of the operational order is indicated, wherein, data message can be immediate or register number, For example, when obtaining a matrix, matrix can be obtained in corresponding register according to register number Initial address and matrix length, are obtained in the memory unit further according to matrix initial address and matrix length The matrix of appropriate address storage.
Fig. 3 is the form schematic diagram of the neural network computing instruction of the present invention, as shown in figure 3, neural Network operations instruction includes an at least command code and 16 operands, wherein, command code is used to indicate The function of neural network computing instruction, arithmetic element is by recognizing that one or more command codes can be carried out Different neural network computings, operand is used for the data message for indicating neural network computing instruction, Wherein, data message can be immediate or register number.
Fig. 4 is the form schematic diagram of matrix-operational order of the present invention, as shown in figure 3, matrix operation Instruction includes an at least command code and at least four operand, wherein, command code is used to indicate the matrix The function of operational order, arithmetic element is by recognizing that one or more command codes can carry out different matrixes Computing, operand is used to indicate the data message of the matrix operation command, wherein, data message can be with It is immediate or register number.
Fig. 5 is the form schematic diagram of the vector operation instruction of the present invention, as shown in figure 3, vector operation Instruction includes an at least command code and at least three operand, wherein, command code is used to indicate the vector The function of operational order, arithmetic element is by recognizing that one or more command codes can carry out different vectors Computing, operand is used to indicate the data message of the vector operation instruction, wherein, data message can be with It is immediate or register number.
Fig. 6 is the form schematic diagram of the Matrix-Vector operational order of the present invention, as shown in fig. 6, matrix - vector operation instruction includes an at least command code and at least six operand, wherein, command code is used to refer to Show the function of the Matrix-Vector operational order, arithmetic element is by recognizing that one or more command codes can be entered The different Matrix-Vector computing of row, operand is used for the data letter for indicating the Matrix-Vector operational order Breath, wherein, data message can be immediate or register number.
Fig. 7 is the neural network computing and matrix/vector computing as one embodiment of the present invention The structural representation of device, as shown in fig. 7, the device includes fetching module, decoding module, instruction Queue, scalar register heap, dependence processing unit, storage queue, reorder caching, computing Unit, scratch pad memory, IO memory access modules;
Fetching module, the module is responsible for taking out the next instruction that will be performed from command sequence, and The instruction is transmitted to decoding module;
Decoding module, the module is responsible for instructing into row decoding, and instruction after decoding is transmitted into instruction team Row;As shown in figure 8, the decoding module includes:Instruct receiving module, it is microcommand generation module, micro- Instruction queue, microcommand transmitter module;Wherein, instruction receiving module is responsible for receiving to take from fetching module The instruction obtained;Microcommand decoding module will instruct the Instruction decoding that receiving module is obtained into each work(of control The microcommand of energy part;Micro instruction queue is used to deposit the microcommand sent from microcommand decoding module; Microcommand transmitter module is responsible for microcommand being transmitted into each functional part;
Instruction queue, the instruction after being decoded for order buffer, is sent to dependence processing unit;
Scalar register heap is there is provided device in the scalar register needed for calculating process;
Dependence processing unit, the resume module process instruction instructs that may be present deposit with previous bar Store up dependence.Matrix operation command can access scratchpad, and front and rear instruction may be accessed Same memory space.In order to ensure the correctness of instruction execution result, if present instruction is detected There is dependence to the data with instruction before, the instruction must be waited until in storage queue according to Bad relation is eliminated.
Storage queue, the module is an ordered queue, has dependence in data with instructing before Instruction be stored in the queue, until dependence eliminate after, submit instruction.
Reorder caching, instructs in the process of implementation, while being also buffered in the module, when one After instruction has been performed, if the instruction be also simultaneously reorder be not submitted in caching it is earliest in instruction An instruction, the instruction will be submitted.Once submitting, the operation that this instruction is carried out is to device shape The change of state will be unable to revocation;Occupy-place is played a part of in the instruction reordered in caching, when it is included First instruction when there is data dependence, then the instruction would not be submitted (release);Although after Face has many instructions and constantly entered, but can only the receiving portion instruction (cache size that reordered control System), until first instruction is submitted, whole calculating process can be just smoothed out.
Arithmetic element, the module is responsible for all neural network computings and the matrix/vector computing behaviour of device Make, include but is not limited to:Convolutional neural networks forward operation is operated, convolutional neural networks training is operated, Neutral net Pooling arithmetic operations, the operation of full connection neutral nets forward operation, full The operation of connection neural metwork trainings, batch normalization arithmetic operations, RBM nerve nets Network arithmetic operation, the operation of Matrix-Vector multiplication, matrix-matrix plus/minus arithmetic operation, Outer Product of Vectors (tensor) arithmetic operation, inner product of vectors arithmetic operation, vectorial arithmetic operation, vector logic fortune Operation, vector is calculated to surmount function arithmetic operation, vectorial comparison operation operation, seek vector maximum/minimum value The random vector computing behaviour being necessarily distributed is obeyed in arithmetic operation, vector circulant shift operation operation, generation Make.Operational order is sent to arithmetic element execution;
Scratch pad memory, the module is the special temporary storage device of data, it would be preferable to support different size Data;
IO memory access modules, the module is used to directly access scratchpad, is responsible for from a high speed Data or write-in data are read in temporary storage.
Fig. 9 is to perform neural network computing and square as the arithmetic unit of one embodiment of the present invention The flow chart of battle array/vector operation instruction, as shown in figure 9, perform neural network computing and matrix/to The process of amount instruction includes:
S1, fetching module takes out this neural network computing and matrix/vector instruction, and this is instructed It is sent to decoding module.
S2, decoding module is sent to instruction queue to Instruction decoding, and by instruction.
S3, in decoding module, instruction is sent to instruction receiving module.
S4, instruction receiving module sends an instruction to microcommand generation module, carries out microcommand decoding.
S5, microcommand decoding module obtains the neural network computing operation of instruction in scalar register heap Code and neural network computing operand, while by Instruction decoding into the microcommand for controlling each functional part, It is sent to microcommand transmitting queue.
S6, after the data needed are obtained, the instruction is sent to dependence processing unit.Rely on and close It is that processing unit is analyzed the instruction and whether there is with the instruction for having not carried out end above in data Dependence.This instruction needs to wait until itself and the finger for being not carried out terminating above in storage queue Untill no longer there is dependence in data in order.
S7, after dependence is not present, this neural network computing and matrix/vector instruction are corresponding Microcommand is sent to the functional parts such as arithmetic element.
S8, the arithmetic element address of data and size needed for take out what is needed from scratch pad memory Data, then complete neural network computing and matrix/vector computing in arithmetic element.
S9, after the completion of computing, output data is written back to the specified address of scratchpad, together When reorder caching in the instruction be submitted.
In summary, the invention discloses a kind of neural network computing and the device of matrix/vector computing and Method, coordinates corresponding instruction, can solve well current computer field neural network algorithm and The problem of a large amount of matrix/vector computings, compared to existing traditional solution, the present invention can have In the scale of neural network and matrix/vector scaleable, piece that instruction is configurable, easy to use, support The advantages of caching is sufficient.
Particular embodiments described above, is carried out to the purpose of the present invention, technical scheme and beneficial effect It is further described, it should be understood that the foregoing is only the specific embodiment of the present invention, It is not intended to limit the invention, within the spirit and principles of the invention, any modification for being made, Equivalent substitution, improvement etc., should be included in the scope of the protection.

Claims (10)

1. a kind of device for being used to perform neural network computing and matrix/vector computing, including storage Unit, register cell, control unit, arithmetic element and scratchpad, wherein:
Memory cell, for storing neuron/matrix/vector;
Register cell, for storing neuron address/matrix address/vector address, wherein the god It is the address that neuron is stored in the memory cell through first address, the matrix address is that matrix exists The address stored in the memory cell, the vector address stores for vector in the memory cell Address;
Control unit, for performing decoded operation, unit module is controlled according to instruction is read;
Arithmetic element, for according to instruction from the register cell with obtaining neuron address/matrix Location/vector address, according to the neuron address/matrix address/vector address in the memory cell Obtain corresponding neuron/matrix/vector, and according to the neuron/matrix/vector thus obtained and/ Or the data carried in instruction carry out computing, obtain operation result;
Characterized in that, neuron/matrix/vector the data for participating in the arithmetic element calculating are kept in In scratchpad, the arithmetic element is read from the scratchpad when needed Take.
2. as claimed in claim 1 be used to performing neural network computing and matrix/vector computing Device, it is characterised in that the scratchpad can support different size of neuron/matrix / vector data.
3. as claimed in claim 1 be used to performing neural network computing and matrix/vector computing Device, it is characterised in that there is provided institute in calculating process for scalar register heap for the register cell The scalar register needed.
4. as claimed in claim 1 be used to performing neural network computing and matrix/vector computing Device, it is characterised in that the arithmetic element includes vector multiplication part, cumulative part and scalar multiplication Method part;And
The arithmetic element is responsible for neutral net/matrix/vector computing of device, including convolutional Neural net Network forward operation operation, convolutional neural networks training operation, neutral net Pooling arithmetic operations, Full connection neutral nets forward operation is operated, full connection neural metwork trainings are operated, Batch normalization arithmetic operations, the operation of RBM neural network computings, Matrix-Vector multiplication Operation, matrix-matrix plus/minus arithmetic operation, Outer Product of Vectors arithmetic operation, inner product of vectors arithmetic operation, Vectorial arithmetic operation, vector logic arithmetic operation, vector surmount function arithmetic operation, vector ratio Compared with arithmetic operation, ask vector maximum/minimum operation operation, vector circulant shift operation operation, generation Obey the random vector arithmetic operation being necessarily distributed.
5. as claimed in claim 1 be used to performing neural network computing and matrix/vector computing Device, it is characterised in that described device also includes instruction cache unit, the pending fortune for storing Calculate instruction;The instruction cache unit is preferably the caching that reorders;And
Described device also includes instruction queue, the instruction after being decoded for order buffer, is sent to dependence and closes It is processing unit.
6. as claimed in claim 5 be used to performing neural network computing and matrix/vector computing Device, it is characterised in that described device also includes dependence processing unit and storage queue, described Dependence processing unit be used for arithmetic element obtain instruction before, judge the operational order with it is previous Whether operational order is anti-to be asked identical neuron/matrix/vector storage address, if so, by the computing Instruction is stored in the storage queue;Otherwise, the operational order is directly supplied to the computing list Member, after previous operational order is finished, institute is supplied to by the operational order in storage queue State arithmetic element;The storage queue is used to store and instructs the finger for having dependence in data before Order, and after dependence elimination, submit the instruction.
7. as claimed in claim 1 be used to performing neural network computing and matrix/vector computing Device, it is characterised in that the instruction set of described device uses Load/Store structures, the computing list Member is not operated to the data in internal memory;And
The instruction set of described device is preferred to use VLIW structured, while being preferred to use fixed length instructions.
8. as claimed in claim 1 be used to performing neural network computing and matrix/vector computing Device, it is characterised in that the operational order that the arithmetic element is performed is including an at least command code and extremely Few 3 operands;Wherein, the command code is used for the function of indicating the operational order, arithmetic element By recognizing that one or more command codes carry out different computings;The operand is used to indicate the fortune The data message of instruction is calculated, wherein, the data message is immediate or register number.
Preferably, when the operational order is that neural network computing is instructed, the neutral net fortune Calculating instruction includes an at least command code and 16 operands;
Preferably, when the operational order is matrix-matrix operational order, the matrix-matrix Operational order includes an at least command code and at least four operand;
Preferably, when the operational order is vector operation instruction, the vector operation instruction bag Include an at least command code and at least three operand;
Preferably, when the operational order is Matrix-Vector operational order, the Matrix-Vector Operational order includes an at least command code and at least six operand.
9. a kind of device for being used to perform neural network computing and matrix/vector computing, its feature exists In, including:
Fetching module, for taking out the next instruction that will be performed from command sequence, and this is referred to Order is transmitted to decoding module;
Decoding module, instruction team is transmitted to for being instructed to described into row decoding, and by the instruction after decoding Row;
Instruction queue, for the instruction after decoding module decoding described in order buffer, and is sent to dependence pass It is processing unit;
Scalar register heap, is used for providing scalar register for computing;
Dependence processing unit, for judge the instruction of present instruction and previous bar with the presence or absence of data according to The relation of relying, if there is the present instruction then is stored in into storage queue;
Storage queue, the present instruction that there is data dependence relation is instructed for caching with previous bar, when The present instruction instructs the dependence existed to launch the present instruction after eliminating with previous bar;
Reorder caching, for being cached when instructing and performing, and judges described after having performed Whether instruction is not to be submitted an instruction earliest in instruction in the caching that reorders, if it is The instruction is submitted;
Arithmetic element, for performing all neural network computings and matrix/vector arithmetic operation;
Scratchpad, the neuron/matrix calculated for the temporary participation arithmetic element/to Data are measured, the arithmetic element is read from the scratchpad when needed;The high speed Temporary storage is preferably able to support different size of data;
IO memory access modules, for directly accessing the scratchpad, are responsible for from the height Data are read or write in fast temporary storage.
10. a kind of method for performing neural network computing and matrix/vector instruction, it is characterised in that Comprise the following steps:
Step S1, fetching module takes out a neural network computing and matrix/vector instruction, and will Decoding module is sent in the instruction;
Step S2, decoding module is sent to instruction queue to the Instruction decoding, and by the instruction;
Step S3, in decoding module, the instruction is sent to instruction receiving module;
The instruction is sent to microcommand decoding module by step S4, instruction receiving module, carries out micro- refer to Order decoding;
Step S5, microcommand decoding module obtains the neutral net of the instruction in scalar register heap Arithmetic operation code and neural network computing operand, while by the Instruction decoding into control each function The microcommand of part, is sent to microcommand transmitting queue;
Step S6, after the data needed are obtained, the instruction is sent to dependence processing unit; The dependence processing unit analysis instruction and the instruction that has had not carried out before in data whether There is dependence, if it is present it is described instruction need waited until in storage queue its with before Untill no longer there is dependence in data in the instruction being not carried out;
Step S7, arithmetic element is sent to by the corresponding microcommand of the instruction;
Step S8, the arithmetic element address of data and size needed for take from scratchpad Go out the data of needs, then completed in arithmetic element the corresponding neural network computing of the instruction and/ Or matrix/vector computing.
CN201610281291.5A 2016-04-29 2016-04-29 A kind of apparatus and method for performing neural network computing and matrix/vector computing Pending CN107329936A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201610281291.5A CN107329936A (en) 2016-04-29 2016-04-29 A kind of apparatus and method for performing neural network computing and matrix/vector computing
PCT/CN2016/082015 WO2017185418A1 (en) 2016-04-29 2016-05-13 Device and method for performing neural network computation and matrix/vector computation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610281291.5A CN107329936A (en) 2016-04-29 2016-04-29 A kind of apparatus and method for performing neural network computing and matrix/vector computing

Publications (1)

Publication Number Publication Date
CN107329936A true CN107329936A (en) 2017-11-07

Family

ID=60161583

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610281291.5A Pending CN107329936A (en) 2016-04-29 2016-04-29 A kind of apparatus and method for performing neural network computing and matrix/vector computing

Country Status (2)

Country Link
CN (1) CN107329936A (en)
WO (1) WO2017185418A1 (en)

Cited By (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108108189A (en) * 2017-12-15 2018-06-01 北京中科寒武纪科技有限公司 A kind of computational methods and Related product
CN108197705A (en) * 2017-12-29 2018-06-22 国民技术股份有限公司 Convolutional neural networks hardware accelerator and convolutional calculation method and storage medium
CN108416431A (en) * 2018-01-19 2018-08-17 上海兆芯集成电路有限公司 Neural network microprocessor and macro instruction processing method
CN108520296A (en) * 2018-03-20 2018-09-11 福州瑞芯微电子股份有限公司 A kind of method and apparatus based on the distribution of deep learning chip dynamic cache
CN108701015A (en) * 2017-11-30 2018-10-23 深圳市大疆创新科技有限公司 For the arithmetic unit of neural network, chip, equipment and correlation technique
CN108764470A (en) * 2018-05-18 2018-11-06 中国科学院计算技术研究所 A kind of processing method of artificial neural network operation
CN109032669A (en) * 2018-02-05 2018-12-18 上海寒武纪信息科技有限公司 Processing with Neural Network device and its method for executing the instruction of vector minimum value
CN109615059A (en) * 2018-11-06 2019-04-12 海南大学 Edge filling and filter dilation operation method and system in a kind of convolutional neural networks
CN109871952A (en) * 2017-12-01 2019-06-11 阿比特电子科技有限公司 Electronic device, accelerator, the accelerated method of neural network and acceleration system
CN109919308A (en) * 2017-12-13 2019-06-21 腾讯科技(深圳)有限公司 A kind of neural network model dispositions method, prediction technique and relevant device
CN110147222A (en) * 2018-09-18 2019-08-20 北京中科寒武纪科技有限公司 Arithmetic unit and method
WO2019165940A1 (en) * 2018-02-27 2019-09-06 上海寒武纪信息科技有限公司 Integrated circuit chip apparatus, board card and related product
CN110276447A (en) * 2018-03-14 2019-09-24 上海寒武纪信息科技有限公司 A kind of computing device and method
CN110647973A (en) * 2018-06-27 2020-01-03 北京中科寒武纪科技有限公司 Operation method and related method and product
CN110673786A (en) * 2019-09-03 2020-01-10 浪潮电子信息产业股份有限公司 Data caching method and device
CN110941584A (en) * 2019-11-19 2020-03-31 中科寒武纪科技股份有限公司 Operation engine and data operation method
CN111027690A (en) * 2019-11-26 2020-04-17 陈子祺 Combined processing device, chip and method for executing deterministic inference
CN111079911A (en) * 2018-10-19 2020-04-28 中科寒武纪科技股份有限公司 Operation method, system and related product
CN111078286A (en) * 2018-10-19 2020-04-28 上海寒武纪信息科技有限公司 Data communication method, computing system and storage medium
CN111126583A (en) * 2019-12-23 2020-05-08 中国电子科技集团公司第五十八研究所 Universal neural network accelerator
CN111258634A (en) * 2018-11-30 2020-06-09 上海寒武纪信息科技有限公司 Data selection device, data processing method, chip and electronic equipment
CN111260045A (en) * 2018-11-30 2020-06-09 上海寒武纪信息科技有限公司 Decoder and atomic instruction analysis method
CN111353591A (en) * 2018-12-20 2020-06-30 中科寒武纪科技股份有限公司 Computing device and related product
CN111860798A (en) * 2019-04-27 2020-10-30 中科寒武纪科技股份有限公司 Operation method, device and related product
CN111898752A (en) * 2020-08-03 2020-11-06 乐鑫信息科技(上海)股份有限公司 Apparatus and method for performing LSTM neural network operations
CN112348179A (en) * 2020-11-26 2021-02-09 湃方科技(天津)有限责任公司 Efficient convolutional neural network operation instruction set architecture, device and server
CN115826910A (en) * 2023-02-07 2023-03-21 成都申威科技有限责任公司 Vector fixed point ALU processing system
WO2023123453A1 (en) * 2021-12-31 2023-07-06 华为技术有限公司 Operation acceleration processing method, operation accelerator use method, and operation accelerator
CN117992396A (en) * 2024-03-29 2024-05-07 深存科技(无锡)有限公司 Stream tensor processor

Families Citing this family (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109754061B (en) * 2017-11-07 2023-11-24 上海寒武纪信息科技有限公司 Execution method of convolution expansion instruction and related product
CN109754062B (en) * 2017-11-07 2024-05-14 上海寒武纪信息科技有限公司 Execution method of convolution expansion instruction and related product
CN108037908B (en) * 2017-12-15 2021-02-09 中科寒武纪科技股份有限公司 Calculation method and related product
CN108416422B (en) * 2017-12-29 2024-03-01 国民技术股份有限公司 FPGA-based convolutional neural network implementation method and device
JP6846534B2 (en) * 2018-02-13 2021-03-24 シャンハイ カンブリコン インフォメーション テクノロジー カンパニー リミテッドShanghai Cambricon Information Technology Co.,Ltd. Arithmetic logic unit and calculation method
US12073215B2 (en) 2018-02-13 2024-08-27 Shanghai Cambricon Information Technology Co., Ltd Computing device with a conversion unit to convert data values between various sizes of fixed-point and floating-point data
CN111767997B (en) * 2018-02-27 2023-08-29 上海寒武纪信息科技有限公司 Integrated circuit chip device and related products
CN111767998B (en) * 2018-02-27 2024-05-14 上海寒武纪信息科技有限公司 Integrated circuit chip device and related products
CN110472734B (en) * 2018-05-11 2024-03-29 上海寒武纪信息科技有限公司 Computing device and related product
CN110503179B (en) * 2018-05-18 2024-03-01 上海寒武纪信息科技有限公司 Calculation method and related product
CN108959180B (en) * 2018-06-15 2022-04-22 北京探境科技有限公司 Data processing method and system
CN111275197B (en) * 2018-12-05 2023-11-10 上海寒武纪信息科技有限公司 Operation method, device, computer equipment and storage medium
CN111222632B (en) * 2018-11-27 2023-06-30 中科寒武纪科技股份有限公司 Computing device, computing method and related product
CN111047024B (en) * 2018-10-12 2023-05-23 上海寒武纪信息科技有限公司 Computing device and related product
EP4009185A1 (en) 2018-10-18 2022-06-08 Shanghai Cambricon Information Technology Co., Ltd Network-on-chip data processing method and device
CN111079908B (en) * 2018-10-18 2024-02-13 上海寒武纪信息科技有限公司 Network-on-chip data processing method, storage medium, computer device and apparatus
CN109542513B (en) * 2018-11-21 2023-04-21 山东浪潮科学研究院有限公司 Convolutional neural network instruction data storage system and method
CN111857828B (en) * 2019-04-25 2023-03-14 安徽寒武纪信息科技有限公司 Processor operation method and device and related product
WO2020220935A1 (en) 2019-04-27 2020-11-05 中科寒武纪科技股份有限公司 Operation apparatus
US11841822B2 (en) 2019-04-27 2023-12-12 Cambricon Technologies Corporation Limited Fractal calculating device and method, integrated circuit and board card
CN110780921B (en) * 2019-08-30 2023-09-26 腾讯科技(深圳)有限公司 Data processing method and device, storage medium and electronic device
CN111047036B (en) * 2019-12-09 2023-11-14 Oppo广东移动通信有限公司 Neural network processor, chip and electronic equipment
CN111325321B (en) * 2020-02-13 2023-08-29 中国科学院自动化研究所 Brain-like computing system based on multi-neural network fusion and execution method of instruction set
CN113361679B (en) * 2020-03-05 2023-10-17 华邦电子股份有限公司 Memory device and method of operating the same
CN116841614B (en) * 2023-05-29 2024-03-15 进迭时空(杭州)科技有限公司 Sequential vector scheduling method under disordered access mechanism
CN117807082B (en) * 2023-12-20 2024-09-27 中科驭数(北京)科技有限公司 Hash processing method, device, equipment and computer readable storage medium
CN118467136A (en) * 2024-05-30 2024-08-09 上海交通大学 Calculation and storage method and system suitable for large language model sparse reasoning

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
SU980097A1 (en) * 1981-06-05 1982-12-07 Предприятие П/Я М-5769 Device for control of scratchpad buffer storage of multiprocessor electronic computer
CN1105138A (en) * 1992-10-30 1995-07-12 国际商业机器公司 Register architecture for a super scalar computer
CN1133452A (en) * 1994-10-13 1996-10-16 北京多思科技工业园股份有限公司 Macroinstruction set symmetrical parallel system structure microprocessor
CN101504599A (en) * 2009-03-16 2009-08-12 西安电子科技大学 Special instruction set micro-processing system suitable for digital signal processing application
CN101739235A (en) * 2008-11-26 2010-06-16 中国科学院微电子研究所 Processor device for seamless mixing 32-bit DSP and general RISC CPU
CN103348318A (en) * 2011-02-07 2013-10-09 Arm有限公司 Controlling the execution of adjacent instructions that are dependent upon a same data condition

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6961719B1 (en) * 2002-01-07 2005-11-01 The United States Of America As Represented By The Administrator Of The National Aeronautics And Space Administration Hybrid neural network and support vector machine method for optimization
SI21200A (en) * 2002-03-27 2003-10-31 Jože Balič The CNC control unit for controlling processing centres with learning ability
CN1331092C (en) * 2004-05-17 2007-08-08 中国科学院半导体研究所 Special purpose neural net computer system for pattern recognition and application method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
SU980097A1 (en) * 1981-06-05 1982-12-07 Предприятие П/Я М-5769 Device for control of scratchpad buffer storage of multiprocessor electronic computer
CN1105138A (en) * 1992-10-30 1995-07-12 国际商业机器公司 Register architecture for a super scalar computer
CN1133452A (en) * 1994-10-13 1996-10-16 北京多思科技工业园股份有限公司 Macroinstruction set symmetrical parallel system structure microprocessor
CN101739235A (en) * 2008-11-26 2010-06-16 中国科学院微电子研究所 Processor device for seamless mixing 32-bit DSP and general RISC CPU
CN101504599A (en) * 2009-03-16 2009-08-12 西安电子科技大学 Special instruction set micro-processing system suitable for digital signal processing application
CN103348318A (en) * 2011-02-07 2013-10-09 Arm有限公司 Controlling the execution of adjacent instructions that are dependent upon a same data condition

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
CHRISTOPHER H.CHOU等: "VEGAS: Soft Vector Processor with Scratchpad Memory", 《FPGA’11 PROCEEDINGS OF THE 19TH ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD PROGRAMMABLE GATE ARRAYS》 *
LEISHANGWEN: "自己动手写CPU之第五阶段(1)-流水线相关问题", 《HTTPS://BLOG.CSDN.NET/LEISHANGWEN/ARTICLE/DETAILS/38298787》 *
张治元,张耀辉: "《信息导论》", 30 September 2013, 西安电子科技大学出版社 *

Cited By (57)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019104695A1 (en) * 2017-11-30 2019-06-06 深圳市大疆创新科技有限公司 Arithmetic device for neural network, chip, equipment and related method
CN108701015A (en) * 2017-11-30 2018-10-23 深圳市大疆创新科技有限公司 For the arithmetic unit of neural network, chip, equipment and correlation technique
CN109871952A (en) * 2017-12-01 2019-06-11 阿比特电子科技有限公司 Electronic device, accelerator, the accelerated method of neural network and acceleration system
CN109919308A (en) * 2017-12-13 2019-06-21 腾讯科技(深圳)有限公司 A kind of neural network model dispositions method, prediction technique and relevant device
US12020142B2 (en) 2017-12-13 2024-06-25 Tencent Technology (Shenzhen) Company Limited Neural network model deployment method, prediction method and related device
CN109919308B (en) * 2017-12-13 2022-11-11 腾讯科技(深圳)有限公司 Neural network model deployment method, prediction method and related equipment
CN112230994A (en) * 2017-12-15 2021-01-15 安徽寒武纪信息科技有限公司 Calculation method and related product
CN108108189A (en) * 2017-12-15 2018-06-01 北京中科寒武纪科技有限公司 A kind of computational methods and Related product
US11263007B2 (en) 2017-12-29 2022-03-01 Nationz Technologies Inc. Convolutional neural network hardware acceleration device, convolutional calculation method, and storage medium
WO2019127731A1 (en) * 2017-12-29 2019-07-04 国民技术股份有限公司 Convolutional neural network hardware acceleration device, convolutional calculation method and storage medium
CN108197705A (en) * 2017-12-29 2018-06-22 国民技术股份有限公司 Convolutional neural networks hardware accelerator and convolutional calculation method and storage medium
CN108416431B (en) * 2018-01-19 2021-06-01 上海兆芯集成电路有限公司 Neural network microprocessor and macroinstruction processing method
CN108416431A (en) * 2018-01-19 2018-08-17 上海兆芯集成电路有限公司 Neural network microprocessor and macro instruction processing method
CN109189474B (en) * 2018-02-05 2023-08-29 上海寒武纪信息科技有限公司 Neural network processing device and method for executing vector addition instruction
CN109062612B (en) * 2018-02-05 2023-06-27 上海寒武纪信息科技有限公司 Neural network processing device and method for executing plane rotation instruction
CN109189474A (en) * 2018-02-05 2019-01-11 上海寒武纪信息科技有限公司 Processing with Neural Network device and its method for executing vector adduction instruction
CN109165041A (en) * 2018-02-05 2019-01-08 上海寒武纪信息科技有限公司 Processing with Neural Network device and its method for executing vector norm instruction
CN109117186A (en) * 2018-02-05 2019-01-01 上海寒武纪信息科技有限公司 Processing with Neural Network device and its method for executing Outer Product of Vectors instruction
CN109032669A (en) * 2018-02-05 2018-12-18 上海寒武纪信息科技有限公司 Processing with Neural Network device and its method for executing the instruction of vector minimum value
CN109086076A (en) * 2018-02-05 2018-12-25 上海寒武纪信息科技有限公司 Processing with Neural Network device and its method for executing dot product instruction
CN109062612A (en) * 2018-02-05 2018-12-21 上海寒武纪信息科技有限公司 Processing with Neural Network device and its method for executing Plane Rotation instruction
CN109165041B (en) * 2018-02-05 2023-06-30 上海寒武纪信息科技有限公司 Neural network processing device and method for executing vector norm instruction
CN109101273A (en) * 2018-02-05 2018-12-28 上海寒武纪信息科技有限公司 Processing with Neural Network device and its method for executing vector maximization instruction
US11836497B2 (en) 2018-02-05 2023-12-05 Shanghai Cambricon Information Technology Co., Ltd Operation module and method thereof
CN109086076B (en) * 2018-02-05 2023-08-25 上海寒武纪信息科技有限公司 Neural network processing device and method for executing vector dot product instruction
CN109101273B (en) * 2018-02-05 2023-08-25 上海寒武纪信息科技有限公司 Neural network processing device and method for executing vector maximum value instruction
CN109032669B (en) * 2018-02-05 2023-08-29 上海寒武纪信息科技有限公司 Neural network processing device and method for executing vector minimum value instruction
WO2019165940A1 (en) * 2018-02-27 2019-09-06 上海寒武纪信息科技有限公司 Integrated circuit chip apparatus, board card and related product
CN110276447A (en) * 2018-03-14 2019-09-24 上海寒武纪信息科技有限公司 A kind of computing device and method
CN108520296B (en) * 2018-03-20 2020-05-15 福州瑞芯微电子股份有限公司 Deep learning chip-based dynamic cache allocation method and device
CN108520296A (en) * 2018-03-20 2018-09-11 福州瑞芯微电子股份有限公司 A kind of method and apparatus based on the distribution of deep learning chip dynamic cache
CN108764470A (en) * 2018-05-18 2018-11-06 中国科学院计算技术研究所 A kind of processing method of artificial neural network operation
CN110647973A (en) * 2018-06-27 2020-01-03 北京中科寒武纪科技有限公司 Operation method and related method and product
CN110147222A (en) * 2018-09-18 2019-08-20 北京中科寒武纪科技有限公司 Arithmetic unit and method
CN111078286A (en) * 2018-10-19 2020-04-28 上海寒武纪信息科技有限公司 Data communication method, computing system and storage medium
CN111079911A (en) * 2018-10-19 2020-04-28 中科寒武纪科技股份有限公司 Operation method, system and related product
CN111078286B (en) * 2018-10-19 2023-09-01 上海寒武纪信息科技有限公司 Data communication method, computing system and storage medium
CN111079911B (en) * 2018-10-19 2021-02-09 中科寒武纪科技股份有限公司 Operation method, system and related product
CN109615059A (en) * 2018-11-06 2019-04-12 海南大学 Edge filling and filter dilation operation method and system in a kind of convolutional neural networks
CN111258634A (en) * 2018-11-30 2020-06-09 上海寒武纪信息科技有限公司 Data selection device, data processing method, chip and electronic equipment
CN111260045B (en) * 2018-11-30 2022-12-02 上海寒武纪信息科技有限公司 Decoder and atomic instruction analysis method
CN111260045A (en) * 2018-11-30 2020-06-09 上海寒武纪信息科技有限公司 Decoder and atomic instruction analysis method
CN111353591A (en) * 2018-12-20 2020-06-30 中科寒武纪科技股份有限公司 Computing device and related product
CN111860798A (en) * 2019-04-27 2020-10-30 中科寒武纪科技股份有限公司 Operation method, device and related product
CN110673786B (en) * 2019-09-03 2020-11-10 浪潮电子信息产业股份有限公司 Data caching method and device
CN110673786A (en) * 2019-09-03 2020-01-10 浪潮电子信息产业股份有限公司 Data caching method and device
CN110941584A (en) * 2019-11-19 2020-03-31 中科寒武纪科技股份有限公司 Operation engine and data operation method
CN111027690B (en) * 2019-11-26 2023-08-04 陈子祺 Combined processing device, chip and method for performing deterministic reasoning
CN111027690A (en) * 2019-11-26 2020-04-17 陈子祺 Combined processing device, chip and method for executing deterministic inference
CN111126583A (en) * 2019-12-23 2020-05-08 中国电子科技集团公司第五十八研究所 Universal neural network accelerator
CN111898752A (en) * 2020-08-03 2020-11-06 乐鑫信息科技(上海)股份有限公司 Apparatus and method for performing LSTM neural network operations
CN112348179A (en) * 2020-11-26 2021-02-09 湃方科技(天津)有限责任公司 Efficient convolutional neural network operation instruction set architecture, device and server
CN112348179B (en) * 2020-11-26 2023-04-07 湃方科技(天津)有限责任公司 Efficient convolutional neural network operation instruction set architecture construction method and device, and server
WO2023123453A1 (en) * 2021-12-31 2023-07-06 华为技术有限公司 Operation acceleration processing method, operation accelerator use method, and operation accelerator
CN115826910A (en) * 2023-02-07 2023-03-21 成都申威科技有限责任公司 Vector fixed point ALU processing system
CN117992396A (en) * 2024-03-29 2024-05-07 深存科技(无锡)有限公司 Stream tensor processor
CN117992396B (en) * 2024-03-29 2024-05-28 深存科技(无锡)有限公司 Stream tensor processor

Also Published As

Publication number Publication date
WO2017185418A1 (en) 2017-11-02

Similar Documents

Publication Publication Date Title
CN107329936A (en) A kind of apparatus and method for performing neural network computing and matrix/vector computing
CN107688854A (en) A kind of arithmetic element, method and device that can support different bit wide operational datas
CN106991077A (en) A kind of matrix computations device
Fang et al. swdnn: A library for accelerating deep learning applications on sunway taihulight
CN108197705A (en) Convolutional neural networks hardware accelerator and convolutional calculation method and storage medium
CN109062606A (en) Machine learning processor and the method for executing vector scaling instruction using processor
CN106990940A (en) A kind of vector calculation device
CN106991478A (en) Apparatus and method for performing artificial neural network reverse train
CN107578098A (en) Neural network processor based on systolic arrays
CN106970896A (en) The vectorization implementation method of the two-dimensional matrix convolution of vector processor-oriented
CN106991476A (en) Apparatus and method for performing artificial neural network forward operation
CN107563497A (en) Computing device and method
CN108009126A (en) A kind of computational methods and Related product
CN107305538A (en) One Seed Matrix arithmetic unit and method
CN103955446B (en) DSP-chip-based FFT computing method with variable length
CN109542830A (en) A kind of data processing system and data processing method
CN109359730A (en) Neural network processor towards fixed output normal form Winograd convolution
CN113010213B (en) Simplified instruction set storage and calculation integrated neural network coprocessor based on resistance change memristor
CN108446534A (en) Select the method, apparatus and computer readable storage medium of neural network hyper parameter
CN107193761A (en) The method and apparatus of queue priority arbitration
CN107305486A (en) A kind of neutral net maxout layers of computing device
CN107315567A (en) A kind of apparatus and method for performing vector maximization minimum operation
EP3933703B1 (en) Dynamic loading neural network inference at dram/on-bus sram/serial flash for power optimization
CN108037908A (en) A kind of computational methods and Related product
CN107688466A (en) A kind of arithmetic unit and its operating method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 100190 room 644, comprehensive research building, No. 6 South Road, Haidian District Academy of Sciences, Beijing

Applicant after: Zhongke Cambrian Technology Co., Ltd

Address before: 100190 room 644, scientific research complex, No. 6, South Road, Academy of Sciences, Haidian District, Beijing

Applicant before: Beijing Zhongke Cambrian Technology Co., Ltd.

CB02 Change of applicant information