CN110163356A - A kind of computing device and method - Google Patents
A kind of computing device and method Download PDFInfo
- Publication number
- CN110163356A CN110163356A CN201910195600.0A CN201910195600A CN110163356A CN 110163356 A CN110163356 A CN 110163356A CN 201910195600 A CN201910195600 A CN 201910195600A CN 110163356 A CN110163356 A CN 110163356A
- Authority
- CN
- China
- Prior art keywords
- data
- input data
- mentioned
- input
- scaling position
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims description 209
- 238000012545 processing Methods 0.000 claims abstract description 207
- 238000003860 storage Methods 0.000 claims abstract description 63
- 238000010801 machine learning Methods 0.000 claims abstract description 47
- 230000008569 process Effects 0.000 claims description 154
- 210000002364 input neuron Anatomy 0.000 claims description 102
- 210000004205 output neuron Anatomy 0.000 claims description 87
- 238000013528 artificial neural network Methods 0.000 claims description 53
- 238000004364 calculation method Methods 0.000 claims description 34
- 238000009826 distribution Methods 0.000 claims description 26
- 210000002569 neuron Anatomy 0.000 claims description 22
- 230000004913 activation Effects 0.000 claims description 21
- 230000005540 biological transmission Effects 0.000 claims description 15
- 230000005611 electricity Effects 0.000 claims description 14
- 210000005036 nerve Anatomy 0.000 claims description 12
- 238000009795 derivation Methods 0.000 claims description 7
- 235000013399 edible fruits Nutrition 0.000 claims description 5
- 238000009825 accumulation Methods 0.000 claims description 4
- 238000010276 construction Methods 0.000 claims description 4
- 230000006399 behavior Effects 0.000 claims description 3
- 230000008901 benefit Effects 0.000 claims description 3
- 238000013500 data storage Methods 0.000 claims description 3
- 239000004744 fabric Substances 0.000 claims description 2
- 239000011295 pitch Substances 0.000 claims description 2
- 238000012549 training Methods 0.000 abstract description 52
- 238000006243 chemical reaction Methods 0.000 description 113
- 239000010410 layer Substances 0.000 description 99
- 238000007667 floating Methods 0.000 description 81
- 241001269238 Data Species 0.000 description 67
- 239000011159 matrix material Substances 0.000 description 48
- 230000006870 function Effects 0.000 description 29
- 238000010586 diagram Methods 0.000 description 21
- 230000009471 action Effects 0.000 description 11
- 230000002441 reversible effect Effects 0.000 description 11
- 210000004027 cell Anatomy 0.000 description 7
- 238000012546 transfer Methods 0.000 description 7
- 230000001537 neural effect Effects 0.000 description 5
- 238000004422 calculation algorithm Methods 0.000 description 4
- 101100498818 Arabidopsis thaliana DDR4 gene Proteins 0.000 description 3
- 101100452593 Caenorhabditis elegans ina-1 gene Proteins 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000013135 deep learning Methods 0.000 description 3
- 239000000284 extract Substances 0.000 description 3
- 230000010365 information processing Effects 0.000 description 3
- 238000003062 neural network model Methods 0.000 description 3
- 238000004806 packaging method and process Methods 0.000 description 3
- 108090000623 proteins and genes Proteins 0.000 description 3
- 239000002356 single layer Substances 0.000 description 3
- 238000005481 NMR spectroscopy Methods 0.000 description 2
- 238000004378 air conditioning Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 230000008878 coupling Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 239000002245 particle Substances 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- 238000002604 ultrasonography Methods 0.000 description 2
- 238000005406 washing Methods 0.000 description 2
- 240000000972 Agathis dammara Species 0.000 description 1
- 241000208340 Araliaceae Species 0.000 description 1
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 description 1
- 235000003140 Panax quinquefolius Nutrition 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000000151 deposition Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000007717 exclusion Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000005194 fractionation Methods 0.000 description 1
- 235000008434 ginseng Nutrition 0.000 description 1
- 230000009191 jumping Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000011017 operating method Methods 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 230000000630 rising effect Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 230000011664 signaling Effects 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/57—Arithmetic logic units [ALU], i.e. arrangements or devices for performing two or more of the operations covered by groups G06F7/483 – G06F7/556 or for performing logical operations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Optimization (AREA)
- General Health & Medical Sciences (AREA)
- Mathematical Analysis (AREA)
- Computational Mathematics (AREA)
- Neurology (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Pure & Applied Mathematics (AREA)
- Molecular Biology (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Advance Control (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Feedback Control In General (AREA)
- Image Processing (AREA)
Abstract
A kind of computing device, comprising: for obtaining the storage unit (10) of input data and computations;For extracting computations from storage unit (10), which is decoded to obtain one or more operational orders and one or more operational orders and input data are sent to the controller unit (11) of arithmetic element (12);With the arithmetic element (12) of the result for input data execution being calculated according to one or more operational orders computations.The computing device to participate in machine learning calculate data be indicated using fixed-point data, can training for promotion operation processing speed and treatment effeciency.
Description
Technical field
This application involves technical field of information processing, and in particular to a kind of computing device and method.
Background technique
With the continuous development of information technology and the growing demand of people, requirement of the people to information timeliness is more next
It is higher.Currently, terminal is all based on general processor acquisition to the acquisition and processing of information.
In practice, it has been found that this mode for handling information based on general processor runs software program, is limited to lead to
With the operating rate of processor, especially in the biggish situation of general processor load, information processing efficiency is lower, time delay compared with
Greatly, for the computation model of information processing such as training pattern, the calculation amount of training operation is bigger, and general processor is complete
Time at training operation is long, low efficiency.
Summary of the invention
The embodiment of the present application provides a kind of computing device and method, can promote the processing speed of operation, improve efficiency.
In a first aspect, the embodiment of the present application provides a kind of computing device, comprising: storage unit, converting unit, arithmetic element
And controller unit;The storage unit includes caching and register,
The controller unit, for determining the scaling position of the first input data and the bit wide of fixed-point data;It is described
The bit wide of fixed-point data is the bit wide that first input data is converted to fixed-point data;
The arithmetic element, for initializing the scaling position of first input data and adjusting first input
The scaling position of data;And the scaling position of first input data adjusted is stored to the storage unit
In caching,
The controller unit, for obtaining the first input data and multiple operational orders from the register, and from
The scaling position of first input data adjusted is obtained in the caching;By first input data adjusted
Scaling position and first input data be transmitted to the converting unit;
The converting unit, it is defeated by described first for the scaling position according to first input data adjusted
Enter data and is converted to the second input data;
Wherein, the arithmetic element initializes the scaling position of first input data, comprising:
The decimal of first input data is initialized according to relationship between different types of data in first input data
Point position.
In a feasible embodiment, the arithmetic element is according between different types of data in first input data
Relationship initializes the scaling position of first input data, comprising:
The arithmetic element is according to the data type a of any layer (such as l layers) in Multi-Layered Network Model(l)Decimal
Point positionObtain l layers of data type b(l)Scaling positionAccording to formulaIt determines.
Wherein, a(l)To input neuron X(l), output neuron Y(l), weight W(l), input neuron derivativeOutput
Neuron derivativeOr weight derivativeb(l)To input neuron X(l), output neuron Y(l), weight W(l), input mind
Through first derivativeOutput neuron derivativeOr weight derivativeAnd a(l)And b(l)It is inconsistent;αbAnd βbIt is normal for integer
Number.
In a feasible embodiment, the arithmetic element adjusts the scaling position of first input data, packet
It includes:
First input data is adjusted upward according to the maximum value single step of data absolute value in first input data
Scaling position, or;
First input data is gradually adjusted upward according to the maximum value of data absolute value in first input data
Scaling position, or;
It is distributed the scaling position that single step adjusts upward first input data according to first input data, or
Person;
The scaling position of first input data is gradually adjusted upward according to first input data distribution, or
Person;
Adjust the scaling position of first input data downwards according to the first input data maximum absolute value value.
In a feasible embodiment, the computing device is calculated for executing machine learning,
The controller unit is also used to the multiple operational order being transmitted to the arithmetic element;
The converting unit is also used to second input data being transmitted to the arithmetic element;
The arithmetic element is also used to carry out operation to second input data according to the multiple operational order, with
Obtain operation result.
In a feasible embodiment, the machine learning calculating includes: artificial neural network operation, and described first is defeated
Entering data includes: input neuron number evidence and weight data;The calculated result is output nerve metadata.
In a feasible embodiment, the arithmetic element includes a main process task circuit and multiple from processing circuit;
The main process task circuit, for second input data carry out execute preamble processing and with it is the multiple from
Data and the multiple operational order are transmitted between processing circuit;
It is the multiple from processing circuit, for according to from second input data of main process task circuit transmission and described more
A operational order simultaneously executes intermediate operations and obtains multiple intermediate results, and multiple intermediate results are transferred to the main process task electricity
Road;
The main process task circuit obtains the operation result for executing subsequent processing to the multiple intermediate result.
In a feasible embodiment, the computing device further include: direct memory access DMA unit;
The caching is also used to store first input data;Wherein, the caching includes that scratchpad caches;
The register is also used to store scalar data in first input data;
The DMA unit, for reading data from the storage unit or to the storage unit stores data.
In a feasible embodiment, when first input data is fixed-point data, the arithmetic element is also wrapped
It includes:
Derivation unit is derived by one or more for the scaling position according to first input data
Between result scaling position, wherein one or more of intermediate results be obtained according to the first input data operation
's.
In a feasible embodiment, the arithmetic element further include:
Data buffer storage unit, for caching one or more of intermediate results.
In a feasible embodiment, the arithmetic element includes: tree-shaped module, and the tree-shaped module includes: one
Root port and multiple ports, the root port of the tree-shaped module connect the main process task circuit, the tree-shaped module it is multiple
Branch port is separately connected multiple one from processing circuit from processing circuit;
The tree-shaped module, for forward the main process task circuit and the multiple data between processing circuit and
Operational order;Wherein, the tree type model is that n pitches tree construction, and the n is the integer more than or equal to 2.
In a feasible embodiment, the arithmetic element further includes branch process circuit,
The main process task circuit is specifically used for determining that the input neuron is broadcast data, and weight is distribution data, will
One distribution data is distributed into multiple data blocks, by the multiple data block at least one data block, broadcast data and
At least one operational order in multiple operational orders is sent to the branch process circuit;
The branch process circuit, for forwarding the main process task circuit and the multiple data between processing circuit
Block, broadcast data and operational order;
It is the multiple from processing circuit, for being executed according to the operational order to the data block and broadcast data received
Operation obtains intermediate result, and intermediate result is transferred to the branch process circuit;
The main process task circuit is also used to the intermediate result that the branch process circuit is sent carrying out subsequent processing to obtain
The operational order as a result, the result of the computations is sent to the controller unit.
In a feasible embodiment, it is the multiple from processing circuit be in array distribution;Each from processing circuit and phase
Other adjacent are connected from processing circuit, the multiple K from processing circuit of the main process task circuit connection from processing circuit,
The K is from processing circuit are as follows: n of the 1st row from processing circuit, m row n m arranged from processing circuit and the 1st from
Processing circuit;
The K from processing circuit, for the main process task circuit and multiple data between processing circuit with
And the forwarding of instruction;
The main process task circuit is also used to determine that the input neuron is broadcast data, and weight is distribution data, by one
A distribution data are distributed into multiple data blocks, by least one data block and multiple operational orders in the multiple data block
In at least one operational order be sent to the K from processing circuit;
The K is a from processing circuit, for converting the main process task circuit and the multiple number between processing circuit
According to;
It is the multiple from processing circuit, for being executed during operation obtains according to the operational order to the data block received
Between as a result, and operation result is transferred to the K from processing circuit;
The main process task circuit, by being handled to obtain based on this from the intermediate result that processing circuit is sent by the K
Calculate instruction as a result, the result of the computations is sent to the controller unit.
In a feasible embodiment, the main process task circuit, specifically for the centre for sending multiple processing circuits
As a result it is combined sequence and obtains the result of the computations;
Or the main process task circuit, specifically for by the intermediate result of the transmission of multiple processing circuits be combined sequence with
And the result of the computations is obtained after activation processing.
In a feasible embodiment, the main process task circuit includes: in activation processing circuit and addition process circuit
One kind or any combination;
The activation processing circuit, for executing the activation operation of data in main process task circuit;
The addition process circuit, for executing add operation or accumulating operation;
It is described to include: from processing circuit
Multiplication process circuit obtains result of product for executing product calculation to the data block received;
Accumulation process circuit obtains the intermediate result for executing accumulating operation to the result of product.
Second aspect, the embodiment of the present application provide a kind of calculation method, comprising:
Controller unit determines the scaling position of the first input data and the bit wide of fixed-point data, the fixed-point data
Bit wide is the bit wide that first input data is fixed-point data;Arithmetic element initializes the decimal of first input data
Point position and the scaling position for adjusting first input data;Converting unit obtains the small of the first input data adjusted
Several positions, and first input data is converted to by the second input data according to the scaling position adjusted;Its
In, the arithmetic element initializes the scaling position of first input data, comprising: according in first input data
Relationship initializes the scaling position of first input data between different types of data.
In a feasible embodiment, the arithmetic element is according between different types of data in first input data
Relationship initializes the scaling position of first input data, comprising:
According to the data type a of any layer (such as l layers) in Multi-Layered Network Model(l)Scaling positionIt obtains
L layers of data type b(l)Scaling positionAccording to formulaIt determines.
Wherein, a(l)To input neuron X(l), output neuron Y(l), weight W(l), input neuron derivativeOutput
Neuron derivativeOr weight derivativeb(l)To input neuron X(l), output neuron Y(l), weight W(l), input mind
Through first derivativeOutput neuron derivativeOr weight derivativeAnd a(l)And b(l)It is inconsistent;αbAnd βbIt is normal for integer
Number.
In a feasible embodiment, the arithmetic element adjusts the scaling position of first input data, packet
It includes:
First input data is adjusted upward according to the maximum value single step of data absolute value in first input data
Scaling position, or;Described is gradually adjusted upward according to the maximum value of data absolute value in first input data
The scaling position of one input data, or;Single step, which is distributed, according to first input data adjusts upward first input
The scaling position of data, or;First input data is gradually adjusted upward according to first input data distribution
Scaling position, or;The small of first input data is adjusted downwards according to the first input data maximum absolute value value
Several positions.
In a feasible embodiment, the calculation method is the method for executing machine learning calculating, the side
Method further include: the arithmetic element carries out operation to second input data according to the multiple operational order, to be transported
Calculate result.
In a feasible embodiment, the machine learning calculating includes: artificial neural network operation, and described first is defeated
Entering data includes: input neuron and weight;The calculated result is output neuron.
In a feasible embodiment, when first input data is fixed-point data, the method also includes:
The arithmetic element is derived by among one or more according to the scaling position of first input data
As a result scaling position, wherein one or more of intermediate results are obtained according to the first input data operation.
The third aspect, the embodiment of the invention provides a kind of machine learning arithmetic unit, the machine learning arithmetic unit packets
Include computing device described in one or more first aspect.The machine learning arithmetic unit from other processing units for obtaining
It takes to operational data and control information, and executes specified machine learning operation, implementing result is passed to it by I/O interface
His processing unit;
It, can between the multiple computing device when the machine learning arithmetic unit includes multiple computing devices
To be linked by specific structure and transmit data;
Wherein, multiple computing devices are interconnected by PCIE bus and are transmitted data, more massive to support
The operation of machine learning;Multiple computing devices share same control system or possess respective control system;It is multiple described
Computing device shared drive possesses respective memory;The mutual contact mode of multiple computing devices is any interconnection topology.
Fourth aspect, the embodiment of the invention provides a kind of combined treatment device, which includes such as third
Machine learning processing unit, general interconnecting interface described in aspect and other processing units.The machine learning arithmetic unit with it is upper
It states other processing units to interact, the common operation completing user and specifying.The combined treatment device can also include storage dress
It sets, which connect with the machine learning arithmetic unit and other described processing units respectively, for saving the machine
The data of device study arithmetic unit and other processing units.
5th aspect, the embodiment of the invention provides a kind of neural network chip, which includes above-mentioned the
Computing device described in one side, described in machine learning arithmetic unit or above-mentioned fourth aspect described in the above-mentioned third aspect
Combined treatment device.
6th aspect, the embodiment of the invention provides a kind of neural network chip encapsulating structure, neural network chip envelopes
Assembling structure includes neural network chip described in above-mentioned 5th aspect;
7th aspect, the embodiment of the invention provides a kind of board, which includes memory device, interface arrangement and control
Neural network chip described in device and above-mentioned 5th aspect;
Wherein, the neural network chip and the memory device, the control device and the interface arrangement are distinguished
Connection;
The memory device, for storing data;
The interface arrangement, for realizing the data transmission between the chip and external equipment;
The control device is monitored for the state to the chip.
Further, the memory device includes: multiple groups storage unit, and storage unit described in each group and the chip are logical
Cross bus connection, the storage unit are as follows: DDR SDRAM;
The chip includes: DDR controller, the control for data transmission and data storage to each storage unit
System;
The interface arrangement are as follows: standard PCIE interface.
Eighth aspect, the embodiment of the invention provides a kind of electronic device, which includes above-mentioned 5th aspect institute
Plate described in neural network chip encapsulating structure described in the neural network chip stated, the 6th aspect or above-mentioned 7th aspect
Card.
In some embodiments, the electronic equipment includes data processing equipment, robot, computer, printer, scanning
Instrument, tablet computer, intelligent terminal, mobile phone, automobile data recorder, navigator, sensor, camera, server, cloud server,
Camera, video camera, projector, wrist-watch, earphone, mobile storage, wearable device, the vehicles, household electrical appliance, and/or medical treatment
Equipment.
In some embodiments, the vehicles include aircraft, steamer and/or vehicle;The household electrical appliance include electricity
Depending on, air-conditioning, micro-wave oven, refrigerator, electric cooker, humidifier, washing machine, electric light, gas-cooker, kitchen ventilator;The Medical Devices include
Nuclear Magnetic Resonance, B ultrasound instrument and/or electrocardiograph.
The aspects of the invention or other aspects can more straightforwards in the following description.
Detailed description of the invention
In order to more clearly explain the technical solutions in the embodiments of the present application, make required in being described below to embodiment
Attached drawing is briefly described, it should be apparent that, the accompanying drawings in the following description is some embodiments of the present application, for ability
For the those of ordinary skill of domain, without creative efforts, it can also be obtained according to these attached drawings other attached
Figure.
Fig. 1 provides a kind of data structure schematic diagram of fixed-point data for the embodiment of the present application;
Fig. 2 provides the data structure schematic diagram of another fixed-point data for the embodiment of the present application;
Fig. 2A provides the data structure schematic diagram of another fixed-point data for the embodiment of the present application;
Fig. 2 B provides the data structure schematic diagram of another fixed-point data for the embodiment of the present application;
Fig. 3 the embodiment of the present application provides a kind of structural schematic diagram of computing device;
Fig. 3 A is the structural schematic diagram for the computing device that the application one embodiment provides;
Fig. 3 B is the structural schematic diagram for the computing device that another embodiment of the application provides;
Fig. 3 C is the structural schematic diagram for the computing device that another embodiment of the application provides;
Fig. 3 D is the structural schematic diagram of main process task circuit provided by the embodiments of the present application;
Fig. 3 E is the structural schematic diagram for the computing device that another embodiment of the application provides;
Fig. 3 F is the structural schematic diagram of tree-shaped module provided by the embodiments of the present application;
Fig. 3 G is the structural schematic diagram for the computing device that another embodiment of the application provides;
Fig. 3 H is the structural schematic diagram for the computing device that another embodiment of the application provides;
Fig. 4 is a kind of single layer artificial neural network forward operation flow chart provided by the embodiments of the present application;
Fig. 5 is a kind of neural network forward operation provided by the embodiments of the present application and reverse train flow chart;
Fig. 6 is a kind of structure chart of combined treatment device provided by the embodiments of the present application;
Fig. 6 A is the structural schematic diagram for the computing device that another embodiment of the application provides;
Fig. 7 is the structure chart of another combined treatment device provided by the embodiments of the present application;
Fig. 8 is a kind of structural schematic diagram of board provided by the embodiments of the present application;
Fig. 9 is a kind of flow diagram of calculation method provided by the embodiments of the present application;
Figure 10 is scaling position determination and the adjustment flow diagram of a kind of data provided by the embodiments of the present application;
Figure 11 is a kind of structural schematic diagram of distributed system provided by the embodiments of the present application;
Figure 12 is the structural schematic diagram of another distributed system provided by the embodiments of the present application.
Specific embodiment
Below in conjunction with the attached drawing in the embodiment of the present application, technical solutions in the embodiments of the present application carries out clear, complete
Site preparation description, it is clear that described embodiment is some embodiments of the present application, instead of all the embodiments.Based on this Shen
Please in embodiment, every other implementation obtained by those of ordinary skill in the art without making creative efforts
Example, shall fall in the protection scope of this application.
The description and claims of this application and term " first ", " second ", " third " and " in the attached drawing
Four " etc. are not use to describe a particular order for distinguishing different objects.In addition, term " includes " and " having " and it
Any deformation, it is intended that cover and non-exclusive include.Such as it contains the process, method of a series of steps or units, be
System, product or equipment are not limited to listed step or unit, but optionally further comprising the step of not listing or list
Member, or optionally further comprising other step or units intrinsic for these process, methods, product or equipment.
Referenced herein " embodiment " is it is meant that a particular feature, structure, or characteristic described can wrap in conjunction with the embodiments
It is contained at least one embodiment of the application.Each position in the description occur the phrase might not each mean it is identical
Embodiment, nor the independent or alternative embodiment with other embodiments mutual exclusion.Those skilled in the art explicitly and
Implicitly understand, embodiment described herein can be combined with other embodiments
The embodiment of the present application provides a kind of data type, which includes Dynamic gene, and the Dynamic gene is for referring to
Show the value range and precision of the data type.
Wherein, above-mentioned Dynamic gene include the first zoom factor and the second zoom factor (optionally), this first scaling because
Son is used to indicate the precision of above-mentioned data type;Above-mentioned second zoom factor is used to adjust the value range of above-mentioned data type.
Optionally, above-mentioned first zoom factor can be 2-m、8-m、10-m、2、3、6、9、10、2m、8m、10mOr other values.
Specifically, above-mentioned first zoom factor can be scaling position.Such as with the input data INA1 of binary representation
Scaling position move right the input data INB1=INA1*2 obtained after mm, i.e. input data INB1 is relative to input
Data INA1 is exaggerated 2mTimes;After for another example, moving left n with the scaling position of the input data INA2 of decimal representation
The input data INB2=INA2/10 arrivedn, i.e. input data INA2 reduces 10 relative to input data INB2nTimes, m and n are equal
For integer.
Optionally, above-mentioned second zoom factor can be 2,8,10,16 or other values.
For example, the value range of the corresponding data type of above-mentioned input data is 8-15-816, carrying out calculating process
In, when obtained operation result is greater than the value range corresponding maximum value of the corresponding data type of input data, by the data
The value range of type obtains new value range 8 multiplied by the second zoom factor (i.e. 8) of the data type-14-817;When above-mentioned
When operation result is less than the value range corresponding minimum value of the corresponding data type of above-mentioned input data, by the data type
Value range obtains new value range 8 divided by the second zoom factor (8) of the data type-16-815。
Zoom factor can be added for the data (such as floating number, discrete data) of any format, to adjust the number
According to size and precision.
It should be noted that the scaling position that present specification is mentioned below can be it is above-mentioned first scaling because
Son no longer describes herein.
The structure of fixed-point data is described below, participates in Fig. 1, Fig. 1 provides a kind of number of fixed-point data for the embodiment of the present application
According to structural schematic diagram.There is the fixed-point data of symbol as shown in Figure 1, which accounts for X bit, which can be described as again
X fixed-point datas.Wherein, the X fixed-point data includes the decimal for accounting for the sign bit of 1 bit, the integer-bit of M-bit and N-bit
Position, X-1=M+N.For signless fixed-point data, the integer-bit of M-bit and the decimal place of N-bit, i.e. X=M+N are only included.
Compared to 32 floating data representations, the short position fixed-point data representation that the present invention uses is in addition to occupying ratio
Special digit is less outer, for same layer, same type of data in the model of network, such as all convolution kernels of first convolutional layer,
Neuron or biased data are inputted, there is additionally provided the scaling position of a flag bit record fixed-point data, the mark
Position is Point Location.The size of above-mentioned flag bit can be adjusted according to the distribution of input data in this way, to reach
Precision and fixed-point data to adjustment fixed-point data can indicate range.
For example, it is 5 to have 16 fixed-point datas of symbol that floating number 68.6875, which is converted to scaling position,.Wherein,
It is 5 to have 16 fixed-point datas of symbol for scaling position, integer part accounts for 10 bits, fractional part accounts for 5 bits, symbol
Position accounts for 1 bit.Above-mentioned floating number 68.6875 has been converted into 16 fixed-point datas of symbol by above-mentioned converting unit
0000010010010110, as shown in Figure 2.
In a kind of possible embodiment, above-mentioned fixed-point data also can be used mode shown in Fig. 2A to be indicated.Such as Fig. 2A institute
Show, number of bits shared by the fixed-point data is bitnum, and scaling position s, the precision of the fixed-point data is 2s.First
For sign bit, being used to indicate fixed number evidence is positive number or negative.For example when sign bit is 0, indicate that the fixed-point data is positive
Number;When sign bit is 1, indicate that the fixed-point data is negative.The range that the fixed-point data indicates is [neg, pos], wherein
Pos=(2bitnum-1-1)*2s, neg=- (2bitnum-1-1)*2s。
Wherein, above-mentioned bitnum can go any positive integer.Above-mentioned s can be any integer for being not less than s_min
Optionally, above-mentioned bitnum can be 8,16,24,32,64 or other values.Further, above-mentioned s_min is -64.
Optionally, above-mentioned bitnum is 8,16,24,32 or other values.S can take any integer for being not less than s_min,
Further, s_min takes -64.
In one embodiment, a variety of fixed-point representation methods can be used in data biggish for numerical value, referring specifically to Fig. 2 B:
As shown in Figure 2 B, the biggish data of above-mentioned numerical value are indicated using the combination of 3 kinds of fixed-point datas, i.e., the data are by fixed-point data 1, fixed
Point data 2 and fixed-point data 3 form.Wherein, the bit wide of fixed-point data 1 is bitnum1, scaling position s1, fixed-point data 2
Bit wide be bitnum2, scaling position s2;The bit wide of fixed-point data 3 be bitnum3, scaling position s3, and
Bitnum2-2=s1-1, bitnum3-2=s2-1.The range for using 3 kinds of fixed-point datas to indicate is [neg, pos], wherein pos
=(2bitnum-1-1)*2s, neg=- (2bitnum-1-1)*2s。
Computing device used in this application is introduced first.Refering to Fig. 3, a kind of computing device is provided, the computing device packet
It includes: controller unit 11, arithmetic element 12 and converting unit 13, wherein controller unit 11 is connect with arithmetic element 12, conversion
Unit 13 is connected with above controller unit 11 and arithmetic element 12;
In a possible embodiment, controller unit 11, for obtaining the first input data and computations.
In one embodiment, the first input data is machine learning data.Further, machine learning data include defeated
Enter neural metadata, weight data.Output nerve metadata is final output or intermediate data.
In a kind of optinal plan, obtaining the first input data and computations mode can specifically be inputted by data
Output unit obtains, which is specifically as follows one or more data I/O interfaces or I/O pin.
Above-mentioned computations include but is not limited to: forward operation instruction or reverse train instruction or other neural networks fortune
Instruction etc. is calculated, such as convolution algorithm instruction, the application specific embodiment are not intended to limit the specific manifestation of above-mentioned computations
Form.
Above controller unit 11 is also used to parse the computations and obtains data conversion instruction and/or one or more
A operational order, wherein the data conversion instruction includes operation domain and operation code, which is used to indicate the data class
The operation domain of the function of type conversion instruction, the data type conversion instruction includes scaling position, is used to indicate the first input
The flag bit of the data type of data and the conversion regime mark of data type.
When the operation domain of above-mentioned data conversion instruction is the address of memory space, above controller unit 11 is according to the ground
Obtained in the corresponding memory space in location above-mentioned scaling position, the data type for being used to indicate the first input data flag bit and
The conversion regime of data type identifies.
Above controller unit 11 is by the operation code of the data conversion instruction and operation domain and first input data
It is transmitted to the converting unit 13;The multiple operational order is transmitted to the arithmetic element 12;
Above-mentioned converting unit 13, for being inputted according to the operation code and operation domain of the data conversion instruction by described first
Data are converted to the second input data, which is fixed-point data;And second input data is transmitted to fortune
Calculate unit 12;
Above-mentioned arithmetic element 12, for carrying out operation to second input data according to the multiple operational order, with
Obtain the calculated result of the computations.
In one in possible embodiment, arithmetic element 12 is arranged to one master and multiple slaves knot by technical solution provided by the present application
Data can be split the computations of forward operation by structure by the computations according to forward operation, logical in this way
Cross it is multiple can carry out concurrent operation to the biggish part of calculation amount from processing circuit 102, to improve arithmetic speed, save
Operation time, and then reduce power consumption.As shown in Figure 3A, above-mentioned arithmetic element 12 include a main process task circuit 101 and it is multiple from
Processing circuit 102;
Above-mentioned main process task circuit 101, for above-mentioned second input data carry out execute preamble processing and with it is above-mentioned more
It is a from transmitting data and above-mentioned multiple operational orders between processing circuit 102;
It is above-mentioned multiple from processing circuit 102, for according to from above-mentioned main process task circuit 101 transmit the second input data and
Above-mentioned multiple operational orders simultaneously execute intermediate operations and obtain multiple intermediate results, and multiple intermediate results are transferred to above-mentioned main place
Manage circuit 101;
Above-mentioned main process task circuit 101 obtains above-mentioned computations for executing subsequent processing to above-mentioned multiple intermediate results
Calculated result.
In one embodiment, machine learning operation includes deep learning operation (i.e. artificial neural network operation), machine
Learning data (i.e. the first input data) includes input neuron and weight (i.e. neural network model data).Output neuron is
The calculated result or intermediate result of above-mentioned computations.Below by taking deep learning operation as an example, it will be appreciated that, it is not limited to
Deep learning operation.
Optionally, above-mentioned computing device can also include: the storage unit 10 and direct memory access (direct
Memory access, DMA) unit 50, storage unit 10 may include: register, one or any combination in caching, tool
Body, the caching, for storing the computations;The register 201, for storing first input data and mark
Amount.Wherein the first input data includes input neuron, weight and output neuron.
The caching 202 caches for scratchpad.
DMA unit 50 is used to read from storage unit 10 or storing data.
In a kind of possible embodiment, be stored in above-mentioned register 201 above-mentioned operational order, the first input data,
Scaling position, the flag bit of data type for being used to indicate the first input data and the conversion regime of data type identify;On
State controller unit 11 directly obtained from above-mentioned register 201 above-mentioned operational order, the first input data, scaling position,
It is used to indicate the flag bit of the data type of the first input data and the conversion regime mark of data type;Number is inputted by first
According to the conversion regime mark of, scaling position, the flag bit for the data type for being used to indicate the first input data and data type
It is transmitted to above-mentioned converting unit 13 out;Above-mentioned operational order is transmitted to above-mentioned arithmetic element 12;
Above-mentioned converting unit 13 is according to the mark of above-mentioned scaling position, the data type for being used to indicate the first input data
The conversion regime of position and data type, which is identified, is converted to the second input data for above-mentioned first input data;Then this is second defeated
Enter data and is transmitted to above-mentioned arithmetic element 12;
Above-mentioned arithmetic element 12 carries out operation to above-mentioned second input data according to above-mentioned operational order, to obtain operation knot
Fruit.
Optionally, which includes: instruction cache unit 110, instruction process unit 111 and storage team's list
Member 113;
Described instruction cache unit 110, for storing the associated computations of artificial neural network operation;
Described instruction processing unit 111, for parsing to obtain the data conversion instruction and described to the computations
Multiple operational orders, and the data conversion instruction is parsed to obtain the operation code and operation domain of the data conversion instruction;
The storage queue unit 113, for storing instruction queue, the instruction queue include: suitable by the front and back of the queue
The pending multiple operational orders of sequence or computations.
For example, main process task circuit 101 also may include a control unit in an optional technical solution,
The control unit may include master instruction processing unit, be specifically used for Instruction decoding into microcommand.Certainly another optional
It also may include another control unit from processing circuit 102 in scheme, which includes single from instruction processing
Member, specifically for receiving and processing microcommand.Above-mentioned microcommand can pass through for the next stage instruction of instruction, the microcommand
To obtaining after the fractionation or decoding of instruction, it can be further decoded as the control signal of each component, each unit or each processing circuit.
In a kind of optinal plan, the structure of the computations can be as shown in table 1 below.
Operation code | Register or immediate | Register/immediate | …… |
Table 1
Ellipsis expression in upper table may include multiple registers or immediate.
In alternative dispensing means, which may include: one or more operation domains and an operation code.
The computations may include neural network computing instruction.By taking neural network computing instructs as an example, as shown in table 1, wherein deposit
Device number 0, register number 1, register number 2, register number 3, register number 4 can be operation domain.Wherein, each register number 0,
Register number 1, register number 2, register number 3, register number 4 can be the number of one or more register.
Table 2
Above-mentioned register can be chip external memory, certainly in practical applications, or on-chip memory, for depositing
Store up data, which is specifically as follows n dimension data, and n is the integer more than or equal to 1, for example, be 1 dimension data when n=1, i.e., to
Amount is 2 dimension datas, i.e. matrix when such as n=2, is multidimensional tensor when such as n=3 or 3 or more.
Optionally, which can also include:
Dependence processing unit 112 determines the first operational order and described for when with multiple operational orders
The 0th operational order before one operational order whether there is incidence relation, such as first operational order and the 0th operation
There are incidence relations for instruction, then first operational order are buffered in described instruction cache unit 110, in the 0th fortune
After calculating instruction execution, first operational order is extracted from described instruction cache unit 110 and is transmitted to the arithmetic element;
The determination first operational order whether there is with the 0th operational order before the first operational order to be associated with
System includes: the first storage that required data (such as matrix) in first operational order are extracted according to first operational order
Address section extracts the 0th stored address area of required matrix in the 0th operational order according to the 0th operational order
Between, such as first storage address section has Chong Die region with the 0th storage address section, it is determined that described first
Operational order and the 0th operational order have incidence relation, such as first storage address section and the 0th storage
Location section does not have the region of overlapping, it is determined that first operational order does not have with the 0th operational order to be associated with
System.
In another alternative embodiment, as shown in Figure 3B, above-mentioned arithmetic element 12 include a main process task circuit 101,
It is multiple from processing circuit 102 and multiple branch process circuits 103.
Above-mentioned main process task circuit 101 is specifically used for determining that the input neuron is broadcast data, and weight is distribution number
According to a distribution data being distributed into multiple data blocks, by least one data block in the multiple data block, broadcast data
And at least one operational order in multiple operational orders is sent to the branch process circuit 103;
The branch process circuit 103, for forward the main process task circuit 101 with it is the multiple from processing circuit 102
Between data block, broadcast data and operational order;
It is the multiple from processing circuit 102, for according to the operational order to the data block and broadcast data received
It executes operation and obtains intermediate result, and intermediate result is transferred to the branch process circuit 103;
Above-mentioned main process task circuit 101, the intermediate result for being also used to send in above-mentioned branch process circuit 103 carry out subsequent place
Reason obtain above-mentioned operational order as a result, the result of above-mentioned computations is sent to above controller unit 11.
In another alternative embodiment, arithmetic element 12 may include 101 He of main process task circuit as shown in Figure 3 C
It is multiple from processing circuit 102.As shown in Figure 3 C, it is multiple from processing circuit 102 be in array distribution;Each from processing circuit 102 with
Other adjacent are connected from processing circuit 102, and main process task circuit 101 connects the multiple K from processing circuit 102 from
Circuit 102 is managed, the K is a from processing circuit 102 are as follows: the n of n of the 1st row from processing circuit 102, m row is a from processing circuit
The m of 102 and the 1st column is a from processing circuit 102, it should be noted that as shown in Figure 3 C K only wrap from processing circuit 102
Include the n of the 1st row from processing circuit 102, m row n m arranged from processing circuit 102 and the 1st from processing circuit 102,
That is K from processing circuit 102 be multiple slave processing circuits directly being connect with main process task circuit 101 from processing circuit 102
102。
K is a from processing circuit 102, in above-mentioned main process task circuit 101 and multiple numbers between processing circuit 102
Accordingly and instruction forwarding;
The main process task circuit 101 is also used to determine that above-mentioned input neuron is broadcast data, and weight is distribution data,
One distribution data is distributed into multiple data blocks, by least one data block and multiple operations in the multiple data block
At least one operational order in instruction is sent to the K from processing circuit 102;
The K from processing circuit 102, for convert the main process task circuit 101 with it is the multiple from processing circuit 102
Between data;
It is the multiple from processing circuit 102, obtained for executing operation to the data block received according to the operational order
The K are transferred to from processing circuit 102 to intermediate result, and by operation result;
The main process task circuit 101, for handle the intermediate result that the K send from processing circuit 102
To the computations as a result, the result of the computations is sent to the controller unit 11.
Optionally, as shown in Figure 3D, the main process task circuit 101 in above-mentioned Fig. 3 A- Fig. 3 C can also include: activation processing electricity
One of road 1011, addition process circuit 1012 or any combination;
Processing circuit 1011 is activated, for executing the activation operation of data in main process task circuit 101;
Addition process circuit 1012, for executing add operation or accumulating operation.
Above-mentioned from processing circuit 102 includes: multiplication process circuit, is obtained for executing product calculation to the data block that receives
To result of product;Forward process circuit (optional), for forwarding the data block received or result of product.Accumulation process electricity
Road obtains the intermediate result for executing accumulating operation to the result of product.
In a kind of feasible embodiment, above-mentioned first input data is data type and the operational order institute for participating in operation
The inconsistent data of the arithmetic type of instruction, the second input data are data type and participate in indicated by the operational order of operation
The consistent data of arithmetic type, above-mentioned converting unit 13 obtain the operation code and operation domain of above-mentioned data conversion instruction, the operation
Code is used to indicate the function of the data conversion instruction, and operation domain includes the conversion regime mark of scaling position and data type.
Above-mentioned converting unit 13 is identified according to the conversion regime of above-mentioned scaling position and data type turns above-mentioned first input data
It is changed to the second input data.
Specifically, the conversion regime mark of above-mentioned data type and the conversion regime of above-mentioned data type correspond.Ginseng
It see the table below 3, table 3 is the corresponding relationship of the conversion regime mark and the conversion regime of data type of a kind of feasible data type
Table.
The conversion regime of data type identifies | The conversion regime of data type |
00 | Fixed-point data is converted to fixed-point data |
01 | Floating data is converted to floating data |
10 | Fixed-point data is converted to floating data |
11 | Floating data is converted to fixed-point data |
Table 3
As shown in table 3, when the conversion regime of above-mentioned data type is identified as 00, the conversion regime of above-mentioned data type is
Fixed-point data is converted to fixed-point data;When the conversion regime of above-mentioned data type is identified as 01, the conversion of above-mentioned data type
Mode is that floating data is converted to floating data;When the conversion regime of above-mentioned data type is identified as 10, above-mentioned data type
Conversion regime be fixed-point data be converted to floating data;When the conversion regime of above-mentioned data type is identified as 11, above-mentioned number
Conversion regime according to type is that floating data is converted to fixed-point data.
Optionally, the conversion regime mark of above-mentioned data type and the corresponding relationship of the conversion regime of data type can also be such as
Shown in the following table 4.
The conversion regime of data type identifies | The conversion regime of data type |
0000 | 64 fixed-point datas are converted to 64 floating datas |
0001 | 32 fixed-point datas are converted to 64 floating datas |
0010 | 16 fixed-point datas are converted to 64 floating datas |
0011 | 32 fixed-point datas are converted to 32 floating datas |
0100 | 16 fixed-point datas are converted to 32 floating datas |
0101 | 16 fixed-point datas are converted to 16 floating datas |
0110 | 64 floating datas are converted to 64 fixed-point datas |
0111 | 32 floating datas are converted to 64 fixed-point datas |
1000 | 16 floating datas are converted to 64 fixed-point datas |
1001 | 32 floating datas are converted to 32 fixed-point datas |
1010 | 16 floating datas are converted to 32 fixed-point datas |
1011 | 16 floating datas are converted to 16 fixed-point datas |
Table 4
As shown in table 4, when the conversion regime of above-mentioned data type is identified as 0000, the conversion regime of above-mentioned data type
64 floating datas are converted to for 64 fixed-point datas;When the conversion regime of above-mentioned data type is identified as 0001, above-mentioned number
Conversion regime according to type is that 32 fixed-point datas are converted to 64 floating datas;When the conversion regime of above-mentioned data type identifies
When being 0010, the conversion regime of above-mentioned data type is that 16 fixed-point datas are converted to 64 floating datas;When above-mentioned data class
When the conversion regime of type is identified as 0011, the conversion regime of above-mentioned data type is that 32 fixed-point datas are converted to 32 floating numbers
According to;When the conversion regime of above-mentioned data type is identified as 0100, the conversion regime of above-mentioned data type is 16 fixed-point datas
Be converted to 32 floating datas;When the conversion regime of above-mentioned data type is identified as 0101, the conversion side of above-mentioned data type
Formula is that 16 fixed-point datas are converted to 16 floating datas;It is above-mentioned when the conversion regime of above-mentioned data type is identified as 0110
The conversion regime of data type is that 64 floating datas are converted to 64 fixed-point datas;When the conversion regime mark of above-mentioned data type
When knowledge is 0111, the conversion regime of above-mentioned data type is that 32 floating datas are converted to 64 fixed-point datas;When above-mentioned data
When the conversion regime of type is identified as 1000, the conversion regime of above-mentioned data type is that 16 floating datas are converted to 64 fixed points
Data;When the conversion regime of above-mentioned data type is identified as 1001, the conversion regime of above-mentioned data type is 32 floating numbers
According to being converted to 32 fixed-point datas;When the conversion regime of above-mentioned data type is identified as 1010, the conversion of above-mentioned data type
Mode is that 16 floating datas are converted to 32 fixed-point datas;When the conversion regime of above-mentioned data type is identified as 1011, on
The conversion regime for stating data type is that 16 floating datas are converted to 16 fixed-point datas.
In a kind of feasible embodiment, above controller unit 11 obtains computations from said memory cells 10,
The computations are parsed to obtain one or more operational order, wherein the operational order can for variable format operational order or
Person's fixed point format operational order.
Wherein, above-mentioned variable format operational order includes operation code and operation domain, which is used to indicate this can declension
The function of formula operational order, aforesaid operations domain include that the first address of the first input data, the length of the first input data are (optional
Ground), the first address of output data, scaling position, the first input data data type flag bit (optionally) and operation class
Type mark.
When above-mentioned operational order is variable format operational order, above controller unit 11 parses above-mentioned variable format fortune
Instruction is calculated, to obtain the first address, small of the first address of above-mentioned first input data, the length of the first input data, output data
Several positions, the data type flag bit of the first input data and action type mark, then according to the head of the first input data
The length of address and above-mentioned first input data obtains the first input data from said memory cells 10, then by the first input
Data, scaling position, the data type flag bit of the first input data and action type identification transmission are to above-mentioned converting unit
13, then by the above-mentioned arithmetic element 12 of the first address of above-mentioned output data;
Above-mentioned converting unit 13 is according to above-mentioned data type flag bit, above-mentioned scaling position and aforesaid operations type identification
Above-mentioned first input data is converted to the second input data by the action type of instruction;Then second input data is transmitted to
Above-mentioned arithmetic element 12.
The main process task circuit 101 of above-mentioned arithmetic element 12 and above-mentioned second input data is transported from processing circuit 102
It calculates, to obtain the result of above-mentioned computations;The result of the computations is stored into said memory cells 10 above-mentioned output
The corresponding position of the first address of data.
Wherein, aforesaid operations type identification is used to indicate above-mentioned arithmetic element 12 and carries out the data for participating in operation when operation
Type.The type includes fixed-point data, floating data, integer data and discrete data etc..
In a kind of possible embodiment, be stored in said memory cells 10 above-mentioned first input data first address,
The length of first input data, the first address of output data, scaling position, the first input data data type flag bit and
Action type mark, above controller unit 11 directly obtain above-mentioned first input data from said memory cells 10
First address, the length of the first input data, the first address of output data, scaling position, the data type of the first input data
Flag bit and action type mark, then carry out subsequent operation as procedure described above.
For example, aforesaid operations type identification is 0 or 1.When the flag bit is 1, the main place of above-mentioned arithmetic element 12
It manages circuit 101 and carries out floating-point operation from processing circuit 102, is i.e. the data type of participation operation is floating data;When operation class
When type is identified as 0, the main process task circuit 101 of above-mentioned arithmetic element 12 and from processing circuit 102 carry out fixed-point calculation, that is, participate in fortune
The data type of calculation is fixed-point data.
Above-mentioned arithmetic element 12 can be identified according to above-mentioned Data Labels position and action type the type for determining input data and
Carry out the type of operation.
Specifically, referring to table 5, table 5 is the mapping table of data type flag bit and action type mark.
Table 5
As shown in table 5, when aforesaid operations type identification is 0 and above-mentioned data type flag bit is 0, above-mentioned first input
Data are fixed-point data, the main process task circuit 101 of above-mentioned arithmetic element 12 and from processing circuit 102 carry out fixed-point calculation, not into
Row data conversion;When aforesaid operations type identification is 0 and above-mentioned data type flag bit is 1, above-mentioned first input data is
Floating data, the main process task circuit 101 of above-mentioned arithmetic element 12 and from processing circuit 102 carry out floating-point operation, without data
Conversion;When aforesaid operations type identification is 1 and above-mentioned data type flag bit is 0, above-mentioned first input data is fixed-point number
According to, above-mentioned first input data is first converted to the second input data according to above-mentioned scaling position by above-mentioned converting unit 13, the
Two input datas be floating data, the main process task circuit 101 of above-mentioned arithmetic element 12 and from processing circuit 102 to second input number
According to progress operation;When aforesaid operations type identification is 1 and above-mentioned data type flag bit is 1, above-mentioned first input data is
Above-mentioned first input data is first converted to the second input number according to above-mentioned scaling position by floating data, above-mentioned converting unit 13
Be fixed-point data according to, the second input data, the main process task circuit 101 of above-mentioned arithmetic element 12 and from processing circuit 102 to this
Two input datas carry out operation.
Wherein, above-mentioned fixed-point data includes 64 fixed-point datas, 32 fixed-point datas, 16 fixed-point datas.Above-mentioned floating-point
64 floating datas of data, 32 floating datas and 16 floating datas.The mapping of above-mentioned flag bit and action type mark is closed
System is specifically referring also to the following table 6.
Table 6
As shown in table 6, when aforesaid operations type identification is 0000 and above-mentioned data type flag bit is 0, above-mentioned first
It is fixed-point data that input data, which is 64, the main process task circuit 101 of above-mentioned arithmetic element 12 and from processing circuit 102 carry out 64 it is fixed
Point processing, without data type conversion;When aforesaid operations type identification is 0000 and above-mentioned data type flag bit is 1,
It is floating data that above-mentioned first input data, which is 64, the main process task circuit 101 of above-mentioned arithmetic element 12 and from processing circuit 102 into
64 floating-point operations of row, without data type conversion;When aforesaid operations type identification is 0001 and above-mentioned data type mark
When position is 0, it is fixed-point data that above-mentioned first input data, which is 32, the main process task circuit 101 of above-mentioned arithmetic element 12 and from processing
Circuit 102 carries out 32 fixed-point calculations, without data type conversion;When aforesaid operations type identification is 0001 and above-mentioned data
When type code position is 1, it is floating data that above-mentioned first input data, which is 32, the main process task circuit 101 of above-mentioned arithmetic element 12
32 floating-point operations are carried out with from processing circuit 102, without data type conversion;When aforesaid operations type identification be 0010 and
When above-mentioned data type flag bit is 0, it is fixed-point data that above-mentioned first input data, which is 16, the main process task of above-mentioned arithmetic element 12
Circuit 101 and from processing circuit 102 carry out 16 fixed-point calculations, without data type conversion;When aforesaid operations type identification
For 0010 and when above-mentioned data type flag bit is 1, it is floating data that above-mentioned first input data, which is 16, above-mentioned arithmetic element 12
Main process task circuit 101 and from processing circuit 102 carry out 16 floating-point operations, without data type conversion.
When aforesaid operations type identification is 0011 and above-mentioned data type flag bit is 0, above-mentioned first input data is
64 be fixed-point data, and it is defeated that above-mentioned first input data is first converted to second according to above-mentioned scaling position by above-mentioned converting unit 13
Enter data, it is floating data that the second input data, which is 64, then the main process task circuit 101 of above-mentioned arithmetic element 12 and from processing electricity
Road 102 carries out 64 floating-point operations to second input data;When aforesaid operations type identification is 0011 and above-mentioned data type
When flag bit is 1, it is floating data that above-mentioned first input data, which is 64, and above-mentioned converting unit 13 is first according to above-mentioned scaling position
Above-mentioned first input data is converted into the second input data, it is fixed-point data that the second input data, which is 64, then above-mentioned operation
The main process task circuit 101 of unit 12 and from processing circuit 102 to second input data carry out 64 fixed-point calculations.
When aforesaid operations type identification is 0100 and above-mentioned data type flag bit is 0, above-mentioned first input data is
32 be fixed-point data, and above-mentioned first input data is converted to the second input according to above-mentioned scaling position by above-mentioned converting unit 13
Data, it is floating data that the second input data, which is 64, then the main process task circuit 101 of above-mentioned arithmetic element 12 and from processing circuit
102 pairs of second input datas carry out 64 floating-point operations;When aforesaid operations type identification is 0100 and above-mentioned data type mark
When will position is 1, it is floating data that above-mentioned first input data, which is 32, and above-mentioned converting unit 13 first will according to above-mentioned scaling position
Above-mentioned first input data is converted to the second input data, and it is fixed-point data that the second input data, which is 64, then above-mentioned operation list
Member 12 main process task circuit 101 and from processing circuit 102 to second input data carry out 64 fixed-point calculations.
When aforesaid operations type identification is 0101 and above-mentioned data type flag bit is 0, above-mentioned first input data is
16 be fixed-point data, and it is defeated that above-mentioned first input data is first converted to second according to above-mentioned scaling position by above-mentioned converting unit 13
Enter data, it is floating data that the second input data, which is 64, then the main process task circuit 101 of above-mentioned arithmetic element 12 and from processing electricity
Road 102 carries out 64 floating-point operations to second input data;When aforesaid operations type identification is 0101 and above-mentioned data type
When flag bit is 1, it is floating data that above-mentioned first input data, which is 16, and above-mentioned converting unit 13 is first according to above-mentioned scaling position
Above-mentioned first input data is converted into the second input data, it is fixed-point data that the second input data, which is 64, then above-mentioned operation
The main process task circuit 101 of unit 12 and from processing circuit 102 to second input data carry out 64 fixed-point calculations.
When aforesaid operations type identification is 0110 and above-mentioned data type flag bit is 0, above-mentioned first input data is
32 be fixed-point data, and it is defeated that above-mentioned first input data is first converted to second according to above-mentioned scaling position by above-mentioned converting unit 13
Enter data, it is floating data that the second input data, which is 32, then the main process task circuit 101 of above-mentioned arithmetic element 12 and from processing electricity
Road 102 carries out 32 floating-point operations to second input data;When aforesaid operations type identification is 0110 and above-mentioned data type
When flag bit is 1, it is floating data that above-mentioned first input data, which is 32, and above-mentioned converting unit 13 is first according to above-mentioned scaling position
Above-mentioned first input data is converted into the second input data, it is fixed-point data that the second input data, which is 32, then above-mentioned operation
The main process task circuit 101 of unit 12 and from processing circuit 102 to second input data carry out 32 fixed-point calculations.
When aforesaid operations type identification is 0111 and above-mentioned data type flag bit is 0, above-mentioned first input data is
16 be fixed-point data, and it is defeated that above-mentioned first input data is first converted to second according to above-mentioned scaling position by above-mentioned converting unit 13
Enter data, it is floating data that the second input data, which is 32, then the main process task circuit 101 of above-mentioned arithmetic element 12 and from processing electricity
Road 102 carries out 32 floating-point operations to second input data;When aforesaid operations type identification is 0111 and above-mentioned data type
When flag bit is 1, it is floating data that above-mentioned first input data, which is 16, and above-mentioned converting unit 13 is first according to above-mentioned scaling position
Above-mentioned first input data is converted into the second input data, it is fixed-point data that the second input data, which is 32, then above-mentioned operation
The main process task circuit 101 of unit 12 and from processing circuit 102 to second input data carry out 32 fixed-point calculations.
When aforesaid operations type identification is 1000 and above-mentioned data type flag bit is 0, above-mentioned first input data is
16 be fixed-point data, and it is defeated that above-mentioned first input data is first converted to second according to above-mentioned scaling position by above-mentioned converting unit 13
Enter data, it is floating data that the second input data, which is 16, then the main process task circuit 101 of above-mentioned arithmetic element 12 and from processing electricity
Road 102 carries out 16 floating-point operations to second input data;When aforesaid operations type identification is 1000 and above-mentioned data type
When flag bit is 1, it is floating data that above-mentioned first input data, which is 16, and above-mentioned converting unit 13 is first according to above-mentioned scaling position
Above-mentioned first input data is converted into the second input data, it is fixed-point data that the second input data, which is 16, then above-mentioned operation
The main process task circuit 101 of unit 12 and from processing circuit 102 to second input data carry out 16 fixed-point calculations.
When aforesaid operations type identification is 1001 and above-mentioned data type flag bit is 0, above-mentioned first input data is
64 be fixed-point data, and it is defeated that above-mentioned first input data is first converted to second according to above-mentioned scaling position by above-mentioned converting unit 13
Enter data, it is floating data that the second input data, which is 32, then the main process task circuit 101 of above-mentioned arithmetic element 12 and from processing electricity
Road 102 carries out 32 floating-point operations to second input data;When aforesaid operations type identification is 1001 and above-mentioned data type
When flag bit is 1, it is floating data that above-mentioned first input data, which is 64, and above-mentioned converting unit 13 is first according to above-mentioned scaling position
Above-mentioned first input data is converted into the second input data, it is fixed-point data that the second input data, which is 32, then above-mentioned operation
The main process task circuit 101 of unit 12 and from processing circuit 102 to second input data carry out 32 fixed-point calculations.
When aforesaid operations type identification is 1010 and above-mentioned data type flag bit is 0, above-mentioned first input data is
64 be fixed-point data, and it is defeated that above-mentioned first input data is first converted to second according to above-mentioned scaling position by above-mentioned converting unit 13
Enter data, it is floating data that the second input data, which is 16, then the main process task circuit 101 of above-mentioned arithmetic element 12 and from processing electricity
Road 102 carries out 16 floating-point operations to second input data;When aforesaid operations type identification is 1010 and above-mentioned data type
When flag bit is 1, it is floating data that above-mentioned first input data, which is 64, and above-mentioned converting unit 13 is first according to above-mentioned scaling position
Above-mentioned first input data is converted into the second input data, it is fixed-point data that the second input data, which is 16, then above-mentioned operation
The main process task circuit 101 of unit 12 and from processing circuit 102 to second input data carry out 16 fixed-point calculations.
When aforesaid operations type identification is 1011 and above-mentioned data type flag bit is 0, above-mentioned first input data is
32 be fixed-point data, and it is defeated that above-mentioned first input data is first converted to second according to above-mentioned scaling position by above-mentioned converting unit 13
Enter data, it is floating data that the second input data, which is 16, then the main process task circuit 101 of above-mentioned arithmetic element 12 and from processing electricity
Road 102 carries out 16 floating-point operations to second input data;When aforesaid operations type identification is 1011 and above-mentioned data type
When flag bit is 1, it is floating data that above-mentioned first input data, which is 32, and above-mentioned converting unit 13 is first according to above-mentioned scaling position
Above-mentioned first input data is converted into the second input data, it is fixed-point data that the second input data, which is 16, then above-mentioned operation
The main process task circuit 101 of unit 12 and from processing circuit 102 to second input data carry out 16 fixed-point calculations.
In a kind of feasible embodiment, above-mentioned operational order is fixed point format operational order, which refers to
Enabling includes operation domain and operation code, which is used to indicate the function of the fixed point format operational order, and fixed point format operation refers to
The operation code of order includes the first address of the first input data, the first ground of the length (optionally) of the first input data, output data
Location and scaling position.
After above controller unit 11 obtains above-mentioned fixed point format operational order, the fixed point format operational order is parsed, with
Obtain first address, the length of the first input data, the first address of output data and the scaling position of above-mentioned first input data;
Then above controller unit 11 is deposited according to the first address of above-mentioned first input data and the length of the first input data from above-mentioned
The first input data is obtained in storage unit 10, and first input data and scaling position are then transmitted to above-mentioned converting unit
13;The first address of above-mentioned output data is transmitted to above-mentioned arithmetic element 12.Above-mentioned converting unit is according to above-mentioned scaling position
Above-mentioned first input data is converted into the second input data, second input data is then transmitted to above-mentioned arithmetic element
13, the main process task circuit 101 of the arithmetic element 12 and from processing circuit 102 according to above-mentioned second input data carry out operation,
With obtain computations as a result, and the result of the computations is stored into said memory cells 10 above-mentioned output data
The corresponding position of first address.
In a kind of feasible embodiment, the arithmetic element 13 of above-mentioned computing device is carrying out above-mentioned multilayer neural network mould
Before i-th layer of operation of type, the controller unit 11 of the computing device obtains configuration-direct, which includes decimal point
Position and the data type for participating in operation.The controller unit 11 parses the configuration-direct, to obtain scaling position and participation
The data type of operation, or directly acquire above-mentioned scaling position from said memory cells 10 and participate in the data class of operation
Type after then above controller unit 11 obtains input data, judges the data type of input data and participates in the data of operation
Whether type is consistent;When the data type for determining input data and the inconsistent data type for participating in operation, above controller
Unit 11 is by above-mentioned input data, scaling position and participates in the data type delivery of operation to above-mentioned converting unit 13;This turn
Unit is changed according to above-mentioned scaling position and the data type for participating in operation to above-mentioned input data progress data type conversion, is made
The data type for obtaining input data is consistent with the data type of operation is participated in;Then the data after conversion are transmitted to above-mentioned operation
Unit 12, the main process task circuit 101 of the arithmetic element 12 and from 102 pairs of processing circuit conversion after input data carry out operation;
When determining that the data type of input data is consistent with the data type of operation is participated in, above controller unit 11 is by above-mentioned input
Data are transmitted to above-mentioned arithmetic element 12, the main process task circuit 101 of the arithmetic element 12 and from processing circuit 102 directly to input
Data carry out operation, do not have to carry out data type conversion.
Further, when above-mentioned input data be fixed-point data and participate in operation data type be fixed-point data when, on
It states controller unit 11 and judges whether the scaling position of input data and the scaling position for participating in operation are consistent, if different
Cause, above controller unit 11 by above-mentioned input data, the scaling position of input data and participate in operation scaling position
It is transmitted to above-mentioned converting unit 13, which is converted to scaling position and participates in the data of operation by input data
The consistent fixed-point data of scaling position, the data after conversion are then transmitted to above-mentioned arithmetic element, the arithmetic element 12
Main process task circuit 101 and from 102 pairs of processing circuit conversion after data carry out operation.
In other words, above-mentioned operational order could alternatively be above-mentioned configuration-direct.
In another embodiment, which is Matrix Multiplication in terms of the instruction of matrix, accumulated instruction, activation instruction etc.
Calculate instruction.
In a kind of optional embodiment, as shown in FIGURE 3 E, the arithmetic element includes: tree-shaped module 40, the tree
Pattern block includes: a root port 401 and multiple ports 404, and the root port of the tree-shaped module connects the main process task electricity
Road 101, multiple ports of the tree-shaped module are separately connected multiple one from processing circuit 102 from processing circuit 102;
Above-mentioned tree-shaped module has transmission-receiving function, and as shown in FIGURE 3 E, which is sending function, such as Fig. 6 A institute
Show, which is receive capabilities.
The tree-shaped module, for forwarding the main process task circuit 101 and the multiple number between processing circuit 102
According to block, weight and operational order.
Optionally, which is the optional as a result, it may include at least 1 node layer, the node of computing device
For the cable architecture with forwarding capability, the node itself can not have computing function.If tree-shaped module has zero layer node, i.e.,
Without the tree-shaped module.
Optionally, which can pitch tree construction for n, for example, binary tree structure as illustrated in Figure 3 F, certainly may be used
Think trident tree construction, which can be the integer more than or equal to 2.The application specific embodiment is not intended to limit the specific of above-mentioned n
Value, the above-mentioned number of plies may be 2, can connect the section of other layers in addition to node layer second from the bottom from processing circuit 102
Point, such as can connect the node of layer last as illustrated in Figure 3 F.
Optionally, above-mentioned arithmetic element can carry individual caching, may include: neuron caching as shown in Figure 3 G
Unit, the neuron cache unit 63 cache the input neuron vector data and output neuron value from processing circuit 102
Data.
As shown in figure 3h, which can also include: weight cache unit 64, for caching this from processing circuit
102 weight datas needed in calculating process.
In an alternative embodiment, by taking the full connection operation in neural network computing as an example, process can be with are as follows: y=f
(wx+b), wherein x is to input neural variable matrix, and w is weight matrix, and b is biasing scalar, and f is activation primitive, is specifically as follows:
Sigmoid function, any one in tanh, relu, softmax function.It is assumed that be binary tree structure, have 8 from
Processing circuit 102, the method realized can be with are as follows:
Controller unit 11 obtains input nerve variable matrix x, weight matrix w out of storage unit 10 and connects operation entirely
Input nerve variable matrix x, weight matrix w and full connection operational order are transferred to main process task circuit 101 by instruction;
Main process task circuit 101 splits into 8 submatrixs for nerve variable matrix x is inputted, and 8 submatrixs are then passed through tree-shaped
Module is distributed to 8 from processing circuit 102, and weight matrix w is broadcast to 8 from processing circuit 102,
The multiplying of 8 submatrixs and weight matrix w is executed parallel from processing circuit 102 and accumulating operation obtains 8
8 intermediate results are sent to main process task circuit 101 by intermediate result;
Above-mentioned main process task circuit 101, for sorting to obtain the operation result of wx by 8 intermediate results, by the operation result
Execution activation operation obtains final result y after executing the operation of biasing b, and final result y is sent to controller unit 11, is controlled
Final result y is exported or is stored to storage unit 10 by device unit 11.
In one embodiment, arithmetic element 12 is included but are not limited to: one or more multipliers of first part;
One or more adder of second part (more specifically, the adder of the second part can also form add tree);The
The activation primitive unit of three parts;And/or the vector processing unit of Part IV.More specifically, vector processing unit can be located
Manage vector operation and/or pond operation.Input data 1 (in1) is multiplied by first part with input data 2 (in2)
Output (out) later, process are as follows: out=in1*in2;Input data in1 is added to obtain defeated by second part by adder
Data (out) out.More specifically, second part be add tree when, by input data in1 by add tree be added step by step obtain it is defeated
Data (out) out, wherein in1 is the vector that a length is N, and N is greater than 1, process are as follows: out=in1 [1]+in1 [2]+...+
In1 [N], and/or be added to obtain output data with input data (in2) after input data (in1) is added up by addition number
(out), process are as follows: out=in1 [1]+in1 [2]+...+in1 [N]+in2, or by input data (in1) and input data
(in2) it is added and obtains output data (out), process are as follows: out=in1+in2;Input data (in) is passed through activation by Part III
Function (active) operation obtains activation output data (out), process are as follows: out=active (in), activation primitive active can
To be sigmoid, tanh, relu, softmax etc., in addition to doing activation operation, other non-linear letters are may be implemented in Part III
Input data (in) can be obtained output data (out), process by operation (f) are as follows: out=f (in) by number.Vector Processing list
Input data (in) is obtained the output data (out) after pondization operation, process out=pool by pond operation by member
(in), wherein pool is pondization operation, and pondization operation includes but is not limited to: average value pond, maximum value pond, intermediate value pond,
Input data in is and exports the data in the relevant pond core of out.
It is that the input data 1 is multiplied with input data 2 that it includes first part that the arithmetic element, which executes operation, is obtained
Data after multiplication;And/or second part executes add operation and (more specifically, is add tree operation, is used for input data
1 is added step by step by add tree), or the input data 1 is passed through and is added to obtain output data with input data 2;And/or
Part III executes activation primitive operation, obtains output data by activation primitive (active) operation to input data;And/or
Part IV executes pond operation, and out=pool (in), wherein pool is pondization operation, and pondization operation includes but is not limited to: flat
Mean value pond, maximum value pond, intermediate value pond, input data in are and export the data in the relevant pond core of out.With
The operation of upper several parts can carry out the combination of different order with one multiple portions of unrestricted choice, to realize various different function
The operation of energy.Computing unit constitutes second level, three-level or level Four flowing water level framework accordingly.
It should be noted that above-mentioned first input data is the non-fixed-point data of long digit, such as 32 floating datas can also
To be 64 or 16 floating numbers etc. for standard, only it is illustrated here with 32 for specific embodiment;Above-mentioned
Two input datas are short digit fixed-point data, and also known as less digit fixed-point data is indicated relative to the non-fixed-point data of long digit
The first input data for, the fixed-point data that is indicated using less digit.
In a kind of feasible embodiment, above-mentioned first input data is non-fixed-point data, and above-mentioned second input data is
Fixed-point data, the number of bits which accounts for are more than or equal to the number of bits that above-mentioned second input data accounts for.
For example above-mentioned first input input data is 32 floating numbers, above-mentioned second input data is 32 fixed-point datas;On for another example
Stating the first input input data is 32 floating numbers, and above-mentioned second input data is 16 fixed-point datas.
Specifically, for the different layers of different network models, above-mentioned first input data includes different types of number
According to.The scaling position of the different types of data is not identical, i.e., the precision of corresponding fixed-point data is different.For connecting entirely
Layer, above-mentioned first input data include the data such as input neuron, weight and biased data;When for convolutional layer, above-mentioned first
Input data includes convolution kernel, the input data such as neuron and biased data.
Such as full articulamentum, above-mentioned scaling position includes the scaling position for inputting neuron, the decimal of weight
The scaling position of point position and biased data.Wherein, the scaling position, the scaling position of weight of above-mentioned input neuron
With the scaling position of biased data can all identical or parts it is identical or different.
In a kind of feasible embodiment, the controller unit 11 is also used to: obtaining the first input data and meter
Before calculating instruction, the scaling position of first input data and the bit wide of fixed-point data are determined;The position of the fixed-point data
Width is the bit wide that first input data is converted to fixed-point data;
Arithmetic element 12 is also used to initialize the scaling position of first input data and adjusts first input
The scaling position of data.
Wherein, the bit wide of the fixed-point data of above-mentioned first input data is the first input data institute for being indicated with fixed-point data
The bit accounted for, above-mentioned scaling position are bit shared by the fractional part of the first Data Data indicated with fixed-point data
Position.The scaling position is used to characterize the precision of fixed-point data.Referring specifically to the associated description of Fig. 2A.
Specifically, the first input data can be any type of data, and the first input data a is according to above-mentioned decimal point
The bit width conversion of position and fixed-point data is the second input dataIt is specific as follows:
Wherein, when above-mentioned first input data a meets condition neg≤a≤pos, above-mentioned second input dataFor | a/
2s|*2s;When above-mentioned first input data a is greater than pos, above-mentioned second input dataFor pos;When above-mentioned first input data
When a is less than neg, above-mentioned second input dataFor neg.
In one embodiment, for convolutional layer and the input neuron of full articulamentum, weight, output neuron, input
Neuron derivative, output neuron derivative and weight derivative are all made of fixed-point data and are indicated.
Optionally, the bit wide for the fixed-point data that above-mentioned input neuron uses can be 8,16,32,64 or other values.Into
One step, the bit wide of fixed-point data that above-mentioned input neuron uses is 8.
Optionally, the bit wide for the fixed-point data that above-mentioned weight uses can be 8,16,32,64 or other values.Further,
The bit wide for the fixed-point data that above-mentioned weight uses is 8.
Optionally, the bit wide for the fixed-point data that above-mentioned input neuron derivative uses can for 8,16,32,64 or other
Value.Further, the bit wide for the fixed-point data that above-mentioned input neuron derivative uses is 16.
Optionally, the bit wide for the fixed-point data that above-mentioned output neuron derivative uses can for 8,16,32,64 or other
Value.Further, the bit wide for the fixed-point data that above-mentioned output neuron derivative uses is 24.
Optionally, the bit wide for the fixed-point data that above-mentioned weight derivative uses can be 8,16,32,64 or other values.Into one
Step ground, the bit wide of the fixed-point data that above-mentioned weight derivative uses is 24.
In one embodiment, the biggish data a of numerical value can in the data for participating in above-mentioned Multi-Layered Network Model operation
Using a variety of fixed-point representation methods, referring specifically to the associated description of Fig. 2 B.
Specifically, the first input data can be any type of data, and the first input data a is according to above-mentioned decimal point
The bit width conversion of position and fixed-point data is the second input dataIt is specific as follows:
Wherein, when above-mentioned first input data a meets condition neg≤a≤pos, above-mentioned second input dataForAndWhen above-mentioned first input data a is greater than pos, above-mentioned second input dataFor pos;When above-mentioned first input data a is less than neg, above-mentioned second input dataFor neg.
Further, the arithmetic element 12 initializes the scaling position of first input data, comprising:
The scaling position of first input data is initialized according to the maximum value of the first input data absolute value,
Or;
The scaling position of first input data is initialized according to the absolute value minimum value of first input data,
Or;
The decimal of first input data is initialized according to relationship between different types of data in first input data
Point position, or;
Constant initializes the scaling position of first input data based on experience value.
Specifically, wherein above-mentioned scaling position s needs the number according to different classes of data, different neural net layers
According to the data in different iteration rounds carry out initialization and dynamic adjusts.
Lower mask body introduces the initialization procedure of the scaling position s of the first input data, that is, determines and for the first time will
Scaling position s used by fixed-point data when first input data is converted.
Wherein, it includes: root that above-mentioned arithmetic element 1211, which carries out initialization to the scaling position s of above-mentioned first input data,
The scaling position s of the first input data is initialized according to the first input data maximum absolute value value;It is exhausted according to the first input data
The scaling position s of the first input data is initialized to the minimum value of value;According between different types of data in the first input data
Relationship initializes the scaling position s of the first input data;Constant initializes the decimal point of the first input data based on experience value
Position s.
Specifically, above-mentioned initialization procedure is specifically introduced separately below.
A), above-mentioned computing unit 12 initializes the small of the first input data according to the maximum value of the first input data absolute value
Several position s:
Above-mentioned arithmetic element 12 is especially by operation shown in following formula is carried out, to initialize above-mentioned first input data
Scaling position s:.
Wherein, above-mentioned amaxFor the maximum value of above-mentioned first input data absolute value, above-mentioned bitnum is above-mentioned first input
Data are converted to the bit wide of fixed-point data, above-mentioned saFor the scaling position of above-mentioned first input data.
Wherein, the data category and network layer for participating in operation can be divided into: l layers of input neuron X(l), output mind
Through first Y(l), weight W(l), input neuron derivativeOutput neuron derivativeWith weight derivativeIt finds absolute
When being worth maximum value, it can be found by data category;It can be layered, sub-category searching;It can be layered, is sub-category, grouping is found.The
The determination method of the maximum value of one input data absolute value includes:
A.1), above-mentioned computing unit 12 finds maximum absolute value value by data category
Specifically, it is a that the first input data, which includes each element in vector/matrix,i (l), wherein a(l)It can be input
Neuron X(l)Or output neuron Y(l)Or weight W(l)Or input neuron derivativeOr output neuron derivativeOr power
It is worth derivativeIn other words, above-mentioned first input data includes input neuron, weight, output neuron, input neuron
Derivative, weight derivative and output neuron derivative, the scaling position of above-mentioned first input data include the small of input neuron
Several positions, the scaling position of weight, the scaling position of output neuron, input neuron derivative scaling position,
The scaling position of weight derivative and the scaling position of output neuron derivative.Input neuron, weight, the output nerve
What member, input neuron derivative, weight derivative and output neuron derivative were indicated with matrix or vector form.Above-mentioned operation
All elements in each layer of vector/matrix of the unit 12 by traversing above-mentioned Multi-Layered Network Model, obtain every kind of classification number
According to maximum absolute value value, i.e.,Pass through formula
Determine that every kind of categorical data a is converted to the scaling position s of fixed-point dataa。
A.2), above-mentioned computing unit 12 finds maximum absolute value value according to layering and divided data classification
Specifically, each element in the first input data vector/matrix is ai (l), wherein a(l)It can be input nerve
First X(l)Or output neuron Y(l)Or weight W(l)Or input neuron derivativeOr output neuron derivativeOr weight
DerivativeIn other words, every layer of above-mentioned Multi-Layered Network Model includes input neuron, weight, output neuron, input
Neuron derivative, weight derivative and output neuron derivative.The scaling position of above-mentioned first input data includes input nerve
The scaling position of member, the scaling position of weight, the scaling position of output neuron, the decimal point for inputting neuron derivative
Position, the scaling position of weight derivative and output neuron derivative scaling position.Input neuron, weight, the output
Neuron, input neuron derivative, weight derivative and output neuron derivative are indicated with matrix/vector.Above-mentioned arithmetic element
12, by all elements in the vector/matrix of every layer of every kind of data of traversal Multi-Layered Network Model, obtain every kind of classification number
According to absolute value maximum value, i.e.,Pass through formula:It determines
The scaling position of l layers of every kind of categorical data a
A.3), above-mentioned computing unit 12 according to layering, divided data classification and is grouped into searching maximum absolute value value
Specifically, each element in the first input data vector/matrix is ai (l), wherein a(l)It can be input neuron X(l)Or output neuron Y(l)Or weight W(l)Or input neuron derivativeOr output neuron derivativeOr weight derivativeIn other words, every layer of data category of above-mentioned Multi-Layered Network Model include input neuron, weight, output neuron,
Input neuron derivative, weight derivative and output neuron derivative.Above-mentioned arithmetic element 12 is by the every of above-mentioned Multi-Layered Network Model
The each type data of layer are divided into g group, or are grouped by any other rule of classification.Then time above-mentioned arithmetic element 12
Each element of every group of data in the corresponding g group data of every layer of each type data in above-mentioned Multi-Layered Network Model is gone through, obtaining should
The element of maximum absolute value in group data, i.e.,Pass through formula
Determine the scaling position of every group of the corresponding g group data of every kind of data category in every layer
Wherein, above-mentioned any rule of classification is including but not limited to grouped, according to data training according to data area
Batch such as is grouped at the rules.
B) above-mentioned arithmetic element 12 initializes first input data according to the absolute value minimum value of the first input data
Scaling position s:
Specifically, above-mentioned arithmetic element 12 finds the absolute value minimum value a of data to be quantifiedMin,It is determined by following formula
Fixed point precision s.
Wherein, above-mentioned aminFor the absolute value minimum value of above-mentioned first input data.Obtain aminProcess for details, reference can be made to
Above-mentioned steps are a.1), a.2), a.3).
C) above-mentioned arithmetic element 12 is defeated according to relationship initialization described first between different types of data in the first input data
Enter the scaling position s of data:
Specifically, the data type a of any layer in Multi-Layered Network Model (such as l layers)(l)Scaling position
By above-mentioned arithmetic element 12 according to l layers of data type b(l)Scaling positionAnd formula
It determines.
Wherein, a(l)And b(l)It can be input neuron X(l)Or output neuron Y(l)Or weight W(l)Or input neuron is led
NumberOr output neuron derivativeOr weight derivativeWherein, a(l)And b(l)For integer constant.
D) above-mentioned computing unit 12 based on experience value constant initialize the first input data scaling position s:
Specifically, the data type a of any layer (such as l layers) of above-mentioned Multi-Layered Network Model(l)Scaling position sa (l)S can be manually seta (l)=c, wherein c is integer constant, above-mentioned a(l)It can be input neuron X(l)Or output neuron Y(l)Or
Weight W(l)Or input neuron derivativeOr output neuron derivativeOr weight derivative
Further, at the beginning of the scaling position initialization value of above-mentioned input neuron and the scaling position of output neuron
Beginning value can be chosen in [- 8,8] range;The scaling position initialization value of weight can be chosen in [- 17,8] range, defeated
The scaling position initialization value of the scaling position initialization value and output neuron derivative that enter neuron derivative can [-
40, -20] it is chosen in range.The scaling position initialization value of weight derivative can be chosen in [- 48, -12] range.
The method that lower mask body introduces the above-mentioned above-mentioned scaling position s of 12 dynamic adjusting data of arithmetic element.
The method that above-mentioned arithmetic element 12 dynamically adjusts scaling position s includes adjusting upward s (s becomes larger), and to downward
Whole s (s becomes smaller).It specifically includes and is adjusted upward according to the first input data maximum absolute value value single step;According to the first input data
Maximum absolute value value gradually adjusts upward;Single step is distributed according to the first input data to adjust upward;According to the first input data point
Cloth gradually adjusts upward;It is adjusted downwards according to the first input data maximum absolute value value.
A), above-mentioned arithmetic element 12 is adjusted upward according to the maximum value single step of data absolute value in the first input data:
Assuming that being s_old before above-mentioned scaling position adjustment, the corresponding fixed-point data of scaling position s_old can table
Show that data area is [neg, pos].Wherein, (2 pos=bitnum-1-1)*2s_old, neg=- (2bitnum-1-1)*2s_old.When above-mentioned
The maximum value a of data absolute value in first input datamaxWhen >=pos, then the scaling position after adjusting isOtherwise it will not be adjusted above-mentioned scaling position, i.e. s_new=s_old.
B), above-mentioned arithmetic element 12 is gradually adjusted upward according to the maximum value of data absolute value in the first input data:
Assuming that being s_old before above-mentioned scaling position adjustment, the corresponding fixed-point data of scaling position s_old can table
Show that data area is [neg, pos], wherein (2 pos=bitnum-1-1)*2s_old, neg=- (2bitnum-1-1)*2s_old.When above-mentioned
The maximum value a of data absolute value in first input datamaxWhen >=pos, then the scaling position after adjusting is s_new=s_
old+1;Otherwise it will not be adjusted above-mentioned scaling position, i.e. s_new=s_old.
C), above-mentioned arithmetic element 12 is distributed single step according to the first input data and adjusts upward:
Assuming that being s_old before above-mentioned scaling position adjustment, the corresponding fixed-point data of scaling position s_old can table
Show that data area is [neg, pos], wherein (2 pos=bitnum-1-1)*2s_old, neg=- (2bitnum-1-1)*2s_old.Calculate the
The statistic of the absolute value of one input data, such as the mean value a of absolute valuemeanWith the standard deviation a of absolute valuestd.Data are set most
A wide range of amax=amean+nastd.Work as amaxWhen >=pos,It is above-mentioned small that otherwise it will not be adjusted
Several positions, i.e. s_new=s_old.
Further, desirable 2 or 3 above-mentioned n
D), above-mentioned arithmetic element 12 is gradually adjusted upward according to the distribution of the first input data:
Assuming that being s_old before above-mentioned scaling position adjustment, the corresponding fixed-point data of scaling position s_old can table
Show that data area is [neg, pos], wherein (2 pos=bitnum-1-1)*2s_old, neg=- (2bitnum-1-1)*2s_old.Calculate the
The statistic of the absolute value of one input data, such as the mean value a of absolute valuemeanWith the standard deviation a of absolute valuestd.Data are set most
A wide range of amax=amean+nastd, n desirable 3.Work as amaxWhen >=pos, s_new=s_old+1, otherwise it will not be adjusted above-mentioned decimal point
Position, i.e. s_new=s_old.
E), above-mentioned arithmetic element 12 adjusts downwards according to the first input data maximum absolute value value:
Assuming that being s_old before above-mentioned scaling position adjustment, the corresponding fixed-point data of scaling position s_old can table
Show that data area is [neg, pos], wherein (2 pos=bitnum-1-1)*2s_old, neg=- (2bitnum-1-1)*2s_old.When first
The maximum absolute value value a of input datamax< 2s_old+(bitnum-n)And s_old >=sminWhen, s_new=s_old-1, wherein n be
Integer constant, sminIt can be integer, be also possible to bear infinite.
Further, above-mentioned n is 3, above-mentioned sminIt is -64.
Optionally, the frequency of scaling position above-mentioned for adjustment, can be and never adjust the small of the first input data
Several positions;Either primary every n adjustment the first cycle of training (i.e. iteration), n is constant;Or every n the
Two cycles of training (i.e. epoch), adjustment was primary, and n is constant;Either every n the first cycles of training or n second training week
Phase adjusts the scaling position of first input data, primary every n the first cycles of training or the second cycle of training of adjustment
Then the scaling position of first input data adjusts n=α n, wherein α is greater than 1;Either every n the first cycles of training or
Second cycle of training adjusted the scaling position of first input data, as exercise wheel number is incremented by, was gradually reduced n.
Further, the small of the scaling position, weight for once inputting neuron is adjusted every 100 the first cycles of training
The scaling position of several positions and output neuron.Neuron derivative is once inputted every 20 the first cycle of training of adjustment
The scaling position of scaling position and output neuron derivative.
It should be noted that above-mentioned first cycle of training is time needed for one batch sample of training, the second cycle of training
For all training samples are once trained with the required time.
In a kind of possible embodiment, above controller unit 11 or above-mentioned arithmetic element 12 obtain as procedure described above
After the scaling position of above-mentioned first input data, the scaling position of the first input data is stored into delaying to storage unit 10
It deposits in 202.
When above-mentioned computations are the instruction of immediate addressing, above-mentioned Main Processor Unit 101 directly refers to according to the calculating
Scaling position indicated by the operation domain of order carries out the first input data to be converted to the second input data;When above-mentioned calculating refers to
When enabling the instruction for directly addressing or indirect addressing, above-mentioned Main Processor Unit 101 is signified according to the operation domain of the computations
The memory space that shows obtains the scaling position of the first input data, then according to the scaling position by the first input data into
Row is converted to the second input data.
Above-mentioned computing device further includes rounding-off unit, in carrying out calculating process, due to adding to the second input data
(operation result includes that intermediate calculation results and calculating refer to the operation result that method operation, multiplying and/or other operations obtain
The result of order) precision can exceed the accuracy ratings of current fixed-point data, therefore above-mentioned operation cache unit caches above-mentioned centre
Operation result.After operation, above-mentioned rounding-off unit carries out rounding-off behaviour to the operation result beyond fixed-point data accuracy rating
Make, the operation result after being rounded, then above-mentioned Date Conversion Unit is converted to the operation result after the rounding-off current fixed
The data of number data type.
Specifically, above-mentioned rounding-off unit carries out rounding-off operation to above-mentioned intermediate calculation results, and rounding-off operation is random house
Any one of enter operation, the operation that rounds up, be rounded up to operation, operated to round down and rounding-off operation is truncated.
When above-mentioned rounding-off unit executes random rounding-off operation, which is specifically performed the following operations:
Wherein, y indicates to be rounded obtained data at random to the operation result x progress before rounding-off, i.e., the fortune after above-mentioned rounding-off
It calculates as a result, the minimum positive number that ε can indicate for current fixed-point data representation format, i.e., 2-Point Location,It indicates to above-mentioned
Operation result x before rounding-off directly intercepts the resulting number of fixed-point data (doing downward floor operation similar to decimal), w.p. table
Show probability, above-mentioned formula indicates that the data that random rounding-off obtains are carried out to the operation result x before above-mentioned rounding-off isProbability beCarrying out the data that random rounding-off obtains to above-mentioned intermediate calculation results x isProbability be
When above-mentioned rounding-off unit, which round up, to be operated, which is specifically performed the following operations:
Wherein, y indicates the data obtained after rounding up to the operation result x before above-mentioned rounding-off, i.e., above-mentioned rounding-off
Operation result afterwards, the minimum positive integer that ε can indicate for current fixed-point data representation format, i.e., 2-Point Location,For ε
Integral multiple, value is maximum number less than or equal to x.Above-mentioned formula indicates that the operation result x before above-mentioned rounding-off meets item
PartWhen, the operation result after above-mentioned rounding-off isOperation result before above-mentioned rounding-off meets conditionWhen, the operation result after above-mentioned rounding-off is
When above-mentioned rounding-off unit carries out being rounded up to operation, which is specifically performed the following operations:
Wherein, y indicates the data obtained after being rounded up to operation result x before above-mentioned rounding-off, i.e., after above-mentioned rounding-off
Operation result,For the integral multiple of ε, value is the minimum number more than or equal to x, and ε is current fixed-point data representation format institute
The minimum positive integer that can be indicated, i.e., 2-Point Location。
When above-mentioned rounding-off unit operate to round down, which is specifically performed the following operations:
Wherein, y expression carries out the data obtained after round down to the operation result x before above-mentioned rounding-off, i.e., above-mentioned rounding-off
Operation result afterwards,For the integral multiple of ε, value is the maximum number less than or equal to x, and ε is current fixed-point data representation format
The minimum positive integer that can be indicated, i.e., 2-Point Location。
When above-mentioned rounding-off unit carries out truncation rounding-off operation, which is specifically performed the following operations:
Y=[x]
Wherein, y expression carries out the data obtained after truncation rounding-off to the operation result x before above-mentioned rounding-off, i.e., above-mentioned rounding-off
Operation result afterwards, [x] expression directly intercept the resulting data of fixed-point data to above-mentioned operation result x.
After above-mentioned rounding-off unit obtains the intermediate calculation results after above-mentioned rounding-off, above-mentioned arithmetic element 12 is according to above-mentioned first
Intermediate calculation results after the rounding-off are converted to the data of current fixed-point data type by the scaling position of input data.
In a kind of feasible embodiment, above-mentioned arithmetic element 12 is to the data in said one or multiple intermediate results
Type is that the intermediate result of floating data does not do truncation.
The slave processing circuit 102 of above-mentioned arithmetic element 12 carries out the intermediate result that operation obtains according to the above method, due to
The intermediate result that can make in the calculating process there are multiplication, division etc. exceeds the operation of memory memory range, for
Intermediate result beyond memory memory range generally can carry out truncation to it;But due in the application calculating process
The intermediate result of middle generation does not have to storage in memory, therefore does not have to carry out the intermediate result beyond memory memory range
Truncation, greatly reduces the loss of significance of intermediate result, improves the precision of calculated result.
In a kind of feasible embodiment, above-mentioned arithmetic element 12 further includes derivation unit, when the arithmetic element 12 receives
To the scaling position for the input data for participating in fixed-point calculation, the derivation unit is according to the input data of the participation fixed-point calculation
Scaling position obtains the scaling position of one or more intermediate result during being derived by progress fixed-point calculation.It is above-mentioned
When the intermediate result that the progress operation of operation subelement obtains is more than range indicated by its corresponding scaling position, above-mentioned derivation
The scaling position of the intermediate result is moved to left M by unit, so that the precision of the intermediate result is located at the decimal of the intermediate result
Within accuracy rating indicated by point position, which is the integer greater than 0.
For example, above-mentioned first input data includes input data I1 and input data I2, corresponding decimal point
Position is respectively P1 and P2, and P1 > P2, and the arithmetic type indicated by the above-mentioned operational order is that add operation or subtraction are transported
It calculates, i.e., when above-mentioned operation subelement carries out I1+I2 or I1-I2 operation, above-mentioned derivation unit, which is derived by, carries out above-mentioned operation
The scaling position of the intermediate result of the indicated calculating process of instruction is P1;The arithmetic type indicated by the above-mentioned operational order
For multiplying, i.e., when above-mentioned operation subelement carries out I1*I2 operation, above-mentioned derivation unit is derived by the above-mentioned operation of progress and refers to
The scaling position for enabling the intermediate result of indicated calculating process is P1*P2.
In a kind of feasible embodiment, above-mentioned arithmetic element 12 further include:
Data buffer storage unit, for caching said one or multiple intermediate results.
In an alternative embodiment, above-mentioned computing device further includes data statistics unit, which uses
Same type of input data counts in each layer to the Multi-Layered Network Model, every in each layer to obtain
The scaling position of the input data of seed type.
The data statistics unit is also possible to a part of external device (ED), above-mentioned computing device carry out data conversion it
Before, the scaling position for participating in operational data is obtained from external device (ED).
Specifically, above-mentioned data statistics unit includes:
Obtain subelement, same type of input data in each layer for extracting the Multi-Layered Network Model;
Count subelement, same type of input data in each layer for counting and obtaining the Multi-Layered Network Model
Distribution proportion in pre-set interval;
Subelement is analyzed, for obtaining same type in each layer of the Multi-Layered Network Model according to the distribution proportion
Input data scaling position.
Wherein, above-mentioned pre-set interval can be [- 2X-1-i,2X-1-i-2-i], i=0,1,2 ..., n, n be default settings one just
Integer, X are number of bits shared by fixed-point data.Above-mentioned pre-set interval [- 2X-1-i,2X-1-i-2-i] it include n+1 subinterval.On
It states statistics subelement and counts in each layer of above-mentioned Multi-Layered Network Model same type of input data in above-mentioned n+1 subinterval
Upper distributed intelligence, and above-mentioned first distribution proportion is obtained according to the distributed intelligence.First distribution proportion is p0,p1,p2,…,
pn, which is same type of input data in each layer of above-mentioned Multi-Layered Network Model in above-mentioned n+1 subinterval
On distribution proportion.Above-mentioned analysis subelement presets a flood rate EPL, from 0,1,2 ..., obtains in n and removes maximum i,
So that pi>=1-EPL, the maximum i are the decimal point of same type of input data in each layer of above-mentioned Multi-Layered Network Model
Position.In other words, above-mentioned analysis subelement takes same type of input data in each layer of above-mentioned Multi-Layered Network Model
Scaling position are as follows: max { i/pi>=1-EPL, i ∈ { 0,1,2 ..., n } }, that is, meeting the p for being more than or equal to 1-EPLiIn,
Choose the scaling position of same type of input data in each layer that maximum subscript value i is above-mentioned Multi-Layered Network Model.
It should be noted that above-mentioned piTo be taken in same type of input data in each layer of above-mentioned Multi-Layered Network Model
Value is in section [- 2X-1-i,2X-1-i-2-i] in the number of input data and each layer of above-mentioned Multi-Layered Network Model in same class
The ratio of the input data total number of type.For example have in same type of input data in each layer of m1 Multi-Layered Network Model
M2 input data value is in section [- 2X-1-i,2X-1-i-2-i] in, then it is above-mentioned
In a kind of feasible embodiment, in order to improve operation efficiency, above-mentioned acquisition subelement is random or sampling is extracted
Partial data in each layer of the Multi-Layered Network Model in same type of input data, then obtains according to the method described above
Then the scaling position of the partial data carries out data to the type input data according to the scaling position of the partial data
Conversion (including floating data is converted to fixed-point data, fixed-point data is converted to fixed-point data, fixed-point data is converted to fixed-point data
Etc.), it may be implemented under the premise of keeping precision shortly, and calculating speed and efficiency can be improved.
Optionally, above-mentioned data statistics unit can be according to the median of above-mentioned same type of data or same layer data
Determine the same type of the data perhaps bit wide of same layer data and scaling position or according to above-mentioned same type of number
According to or the average value of same layer data determine the bit wide and scaling position of the same type of data or same layer data.
Optionally, above-mentioned arithmetic element is obtained according to above-mentioned same type of data or the progress operation of same layer data
Intermediate result be more than the same channel type data or same layer data scaling position and bit wide corresponding to value
When range, which does not carry out truncation to the intermediate result, and the intermediate result is cached to the arithmetic element
In data buffer storage unit, for subsequent operation use.
Specifically, aforesaid operations domain includes the scaling position of input data and the conversion regime mark of data type.On
Instruction process unit is stated to parse the data conversion instruction to obtain the scaling position of above-mentioned input data and data type
Conversion regime mark.Above-mentioned processing unit further includes Date Conversion Unit, and the Date Conversion Unit is according to above-mentioned input data
Above-mentioned first input data is converted to the second input data by the conversion regime of scaling position and data type mark.
It should be noted that above-mentioned network model includes multilayer, such as full articulamentum, convolutional layer, pond layer and input layer.
In at least one above-mentioned input data, the input data for belonging to same layer has same scaling position, i.e., same layer is defeated
Enter data sharing or shares the same scaling position.
Above-mentioned input data includes different types of data, such as including input neuron, weight and biased data.It is above-mentioned
Belong to same type of input data in input data with same scaling position, i.e., above-mentioned same type of input data
Share or share the same scaling position.
For example arithmetic type indicated by operational order is fixed-point calculation, and participate in operation indicated by the operational order
Input data is floating data, so before carrying out fixed-point calculation, above-mentioned several converting units are by the input data from floating number
According to being converted to fixed-point data;Arithmetic type indicated by operational order is floating-point operation for another example, and participates in the operational order institute
The input data of the operation of instruction is fixed-point data, then before carrying out floating-point operation, above-mentioned Date Conversion Unit is by above-mentioned fortune
It calculates and corresponding input data is instructed to be converted to floating data from fixed-point data.
For macro-instruction involved in the application (such as computations and data conversion instruction), above controller unit 11
Macro-instruction can be parsed, to obtain the operation domain and operation code of the macro-instruction;Being generated according to the operation domain and operation code should
The corresponding microcommand of macro-instruction;Alternatively, above controller unit 11 decodes macro-instruction, it is corresponding micro- to obtain the macro-instruction
Instruction.
It include primary processor and association in system on chip (System On Chip, SOC) in a kind of feasible embodiment
Processor, the primary processor include above-mentioned computing device.The coprocessor obtains above-mentioned Multi-Layered Network Model according to the above method
Each layer in same type of input data scaling position, and by same type in each layer of the Multi-Layered Network Model
The scaling position of input data be transmitted to above-mentioned computing device or the computing device and needing using above-mentioned multitiered network
In each layer of model when the scaling position of same type of input data, above-mentioned Multilayer Network is obtained from above-mentioned coprocessor
The scaling position of same type of input data in each layer of network model.
In a kind of feasible embodiment, above-mentioned first input data is is non-fixed-point data, the non-fixed-point data packet
It includes including long digit floating data, short digit floating data, integer data and discrete data etc..
The data type of above-mentioned first input data is different.Such as above-mentioned input neuron, weight and biased data
It is floating data;Partial data in above-mentioned input neuron, weight and biased data is floating data, and partial data is whole
Type data;Above-mentioned input neuron, weight and biased data are integer data.Above-mentioned computing device can realize non-fixed-point data
To the conversion of fixed-point data, the classes such as long digit floating data, short digit floating data, integer data and discrete data can be realized
Conversion of the data such as type to fixed-point data.The fixed-point data can be for signed fixed-point number evidence or without symbol fixed-point data.
In a kind of feasible embodiment, above-mentioned first input data and the second input data are fixed-point data, and the
One input data and the second input data can be to have the fixed-point data of symbol, be perhaps signless fixed-point data or
One of them is signless fixed-point data, another is the fixed-point data for having symbol.And first input data decimal point
It sets different with the scaling position of the second input data.
In a kind of feasible embodiment, the first input data is fixed-point data, and above-mentioned second input data is non-fixed point
Data.In other words, above-mentioned computing device can realize the conversion of fixed-point data to non-fixed-point data.
Fig. 4 is a kind of monolayer neural networks forward operation flow chart provided in an embodiment of the present invention.Flow chart description benefit
A kind of process for monolayer neural networks forward operation that the computing device and instruction set implemented with the present invention are realized.For each layer
For, the intermediate result vector that read group total goes out this layer is weighted to input neuron vector first.The intermediate result vector
Biasing sets and activates to obtain output neuron vector.Using output neuron vector as next layer of input neuron vector.
In a specific application scenarios, above-mentioned computing device can be a training device.Carrying out neural network
Before model training, which obtains the training data for participating in neural network model training, which is non-fixed point
Data, and the scaling position of above-mentioned training data is obtained according to the method described above.Above-mentioned training device is according to above-mentioned training data
Scaling position the training data is converted to the training data indicated with fixed-point data.Above-mentioned training device is determined according to being somebody's turn to do
The training data that point data indicates carries out positive neural network computing, obtains neural network computing result.Above-mentioned training device pair
Scaling position beyond training data can indicate that the neural network computing result of data precision range carries out random rounding-off behaviour
Make, with the neural network computing after being rounded as a result, the neural network computing result is located at the decimal point of above-mentioned training data
Position can indicate within the scope of data precision.According to the method described above, above-mentioned training device obtains the mind of every layer of multilayer neural network
Through network operations as a result, i.e. output neuron.Above-mentioned training device obtains the ladder of output neuron according to every layer of output neuron
Degree, and reversed operation is carried out according to the gradient of the output neuron, weight gradient is obtained, thus according to the weight gradient updating mind
Weight through network model.
Above-mentioned training device repeats the above process, to achieve the purpose that trained neural network model.
It should be pointed out that above-mentioned computing device is to participation forward operation before carrying out forward operation and reverse train
Data carry out data conversion;To participate in reverse train data without data conversion;Alternatively, above-mentioned computing device is to participation
The data of forward operation are without data conversion;Data conversion is carried out to the data for participating in reverse train;Above-mentioned computing device pair
The data for participating in the data participation reverse train of forward operation carry out data conversion;Specific data conversion process can be found in above-mentioned
The description of related embodiment, no longer describes herein.
Wherein, above-mentioned forward operation includes above-mentioned multilayer neural network operation, which includes convolution
Equal operations, the convolution algorithm are to be instructed to realize by convolution algorithm.
Above-mentioned convolution algorithm instruction is the instruction of one of Cambricon instruction set, the feature of the Cambricon instruction set
It is, instruction is by operation code and groups of operands at it is control instruction (control respectively that instruction set, which includes the instruction of four seed types,
), instructions data transfer instruction (data transfer instructions), operational order (computational
), instructions logical order (logical instructions).
Preferably, each command length is fixed length in instruction set.For example, each command length can be in instruction set
64bit。
Further, control instruction is for controlling implementation procedure.Control instruction includes jumping (jump) instruction and condition point
Branch (conditional branch) instruction.
Further, data transfer instruction is used to complete the data transmission between different storage mediums.Data transfer instruction
It is instructed including load (load), (move) instruction is carried in storage (store) instruction.Load is instructed for adding data from main memory
It is downloaded to caching, store instruction is for from buffer memory to main memory, move instruction to be in caching and caching or caching by data
Data are carried between register or register and register.Data transfer instruction supports three kinds of different data organization sides
Formula, including matrix, vector sum scalar.
Further, operational order is for completing neural network arithmetical operation.Operational order includes matrix operation command, to
Measure operational order and scalar operation instruction.
Further, matrix operation command completes the matrix operation in neural network, including Matrix Multiplication vector (matrix
Multiply vector), vector multiplies matrix (vector multiply matrix), Matrix Multiplication scalar (matrix
Multiply scalar), apposition (outer product), matrix adds matrix (matrix add matrix), and matrix subtracts matrix
(matrix subtract matrix)。
Further, vector operation instruction completes the vector operation in neural network, including vector basic operation
(vector elementary arithmetics), vector surmounts function operation (vector transcendental
Functions), inner product (dot product), vector generate (random vector generator) at random, in vector most
Greatly/minimum value (maximum/minimum of a vector).Wherein vector basic operation includes that vector adds, and subtracts, multiplies, removes
(add, subtract, multiply, divide), vector, which surmounts function, refers to that those are unsatisfactory for any making coefficient with multinomial
The function of polynomial equation includes but are not limited to exponential function, logarithmic function, trigonometric function, antitrigonometric function.
Further, the scalar operation in neural network, including scalar basic operation are completed in scalar operation instruction
(scalar elementary arithmetics) and scalar surmount function operation (scalar transcendental
functions).Wherein scalar basic operation includes that scalar adds, and subtracts, multiplies, and is removed (add, subtract, multiply, divide),
Scalar surmounts function the function for referring to that those are unsatisfactory for any polynomial equation for making coefficient with multinomial, includes but are not limited to
Exponential function, logarithmic function, trigonometric function, antitrigonometric function.
Further, logical order is used for the logical operation of neural network.Logical operation includes vector logic operational order
With scalar logic instruction.
Further, vector logic operational order includes that vector compares (vector compare), vector logic operation
(vector logical operations) and vector, which are greater than, merges (vector greater than merge).Wherein vector
Relatively include but is not limited to be greater than, be less than, be equal to, be greater than or equal to, is less than or equal to and is not equal to.Vector logic operation includes
With or, non-.
Further, scalar logical operation includes that scalar compares (scalar compare), scalar logical operation
(scalar logical operations).Wherein scalar relatively includes but is not limited to be greater than, and is less than, and is equal to, and is greater than or waits
In being less than or equal to and be not equal to.Scalar logical operation include with or, non-.
For multilayer neural network, realization process is, in forward operation, when upper one layer of artificial neural network has executed
At later, next layer of operational order can be using output neuron calculated in arithmetic element as next layer of input neuron
It carries out operation (or the input neuron that certain operations are re-used as next layer is carried out to the output neuron), meanwhile, it will weigh
Value also replaces with next layer of weight;In reversed operation, after the completion of the reversed operation of upper one layer of artificial neural network executes,
Next layer of operational order can be using the neuron gradient that inputs calculated in arithmetic element as next layer of output neuron gradient
Operation (or the output neuron gradient that certain operations are re-used as next layer is carried out to the input neuron gradient) is carried out, together
When weight is replaced with to next layer of weight.As shown in figure 5, the arrow of dotted line indicates reversed operation, the arrow table of realization in Fig. 5
Show forward operation.
In another embodiment, which is Matrix Multiplication in terms of the instruction of matrix, accumulated instruction, activation instruction etc.
Calculate instruction, including forward operation instruction and direction training instruction.
Illustrate the circular of computing device as shown in Figure 3A below by neural network computing instruction.For
For neural network computing instruction, the formula that actually needs to be implemented can be with are as follows: s=s (∑ wxi+ b), wherein i.e. by weight w
Multiplied by input data xi, sum, then plus activation operation s (h) is done after biasing b, obtain final output result s.
The method that computing device as shown in Figure 3A executes the instruction of neural network forward operation is specifically as follows:
After above-mentioned converting unit 13 carries out data type conversion to above-mentioned first input data, controller unit 11 is from instruction
The instruction of neural network forward operation is extracted in cache unit 110, neural network computing instructs corresponding operation domain and at least one
The operation domain is transmitted to data access unit by a operation code, controller unit 11, which is sent to fortune
Calculate unit 12.
Controller unit 11 extracts the corresponding weight w of the operation domain and biasing b (when b is 0, no out of storage unit 10
It needs to extract biasing b), weight w and biasing b is transmitted to the main process task circuit 101 of arithmetic element, controller unit 11 is from storage
Input data Xi is extracted in unit 10, and input data Xi is sent to main process task circuit 101.
Input data Xi is split into n data block by main process task circuit 101;
The instruction process unit 111 of controller unit 11 determines that multiplying order, biasing refer to according at least one operation code
Multiplying order, offset instructions and accumulated instruction are sent to main process task circuit 101 by order and accumulated instruction, and main process task circuit 101 will
The multiplying order, weight w are sent to multiple from processing circuit 102 in a broadcast manner, which are distributed to multiple
From processing circuit 102 (such as with n from processing circuit 102, then each sending a data block from processing circuit 102);
It is multiple from processing circuit 102, obtained for the weight w to be executed multiplying with the data block received according to the multiplying order
The intermediate result is sent to main process task circuit 101 by intermediate result, which will be multiple according to the accumulated instruction
The intermediate result sent from processing circuit 102 executes accumulating operation and obtains accumulation result, this adds up according to the bigoted instruction and ties
Fruit execution biasing holds b and obtains final result, which is sent to the controller unit 11.
In addition, the sequence of add operation and multiplying can exchange.
It should be noted that the method that above-mentioned computing device executes the instruction of neural network reverse train is similar to above-mentioned calculating
Device executes the process that neural network executes forward operation instruction, and for details, reference can be made to the associated descriptions of above-mentioned reverse train, herein
No longer describe.
Technical solution provided by the present application is that neural network computing instruction realizes neural network by an instruction
Multiplying and biasing operation are not necessarily to store or extract, reduce intermediate data in the intermediate result of neural computing
Storage and extraction operation, so it, which has, reduces corresponding operating procedure, the advantages of improving the calculating effect of neural network.
The application is also disclosed that a machine learning arithmetic unit comprising the meter that one or more is mentioned in this application
Device is calculated, for being obtained from other processing units to operational data and control information, specified machine learning operation is executed, holds
Row result passes to peripheral equipment by I/O interface.Peripheral equipment for example camera, display, mouse, keyboard, network interface card, wifi
Interface, server.When comprising more than one computing device, it can be linked and be passed by specific structure between computing device
Transmission of data is for example interconnected by PCIE bus and is transmitted data, to support the operation of more massive machine learning.This
When, same control system can be shared, there can also be control system independent;Can also can each it be added with shared drive
Fast device has respective memory.In addition, its mutual contact mode can be any interconnection topology.
The machine learning arithmetic unit compatibility with higher can pass through PCIE interface and various types of server phases
Connection.
The application is also disclosed that a combined treatment device comprising above-mentioned machine learning arithmetic unit, general interconnection
Interface and other processing units.Machine learning arithmetic unit is interacted with other processing units, common to complete what user specified
Operation.Fig. 6 is the schematic diagram of combined treatment device.
Other processing units, including central processor CPU, graphics processor GPU, machine learning processor etc. are general/special
With one of processor or above processor type.Processor quantity included by other processing units is with no restrictions.Its
His interface of the processing unit as machine learning arithmetic unit and external data and control, including data are carried, and are completed to the machine
Device learns the basic control such as unlatching, stopping of arithmetic unit;Other processing units can also cooperate with machine learning arithmetic unit
It is common to complete processor active task.
General interconnecting interface, for transmitting data and control between the machine learning arithmetic unit and other processing units
Instruction.The machine learning arithmetic unit obtains required input data, write-in machine learning operation dress from other processing units
Set the storage device of on piece;Control instruction can be obtained from other processing units, write-in machine learning arithmetic unit on piece
Control caching;It can also learn the data in the memory module of arithmetic unit with read machine and be transferred to other processing units.
Optionally, the structure as shown in fig. 7, can also include storage device, storage device respectively with the machine learning
Arithmetic unit is connected with other described processing units.Storage device for be stored in the machine learning arithmetic unit and it is described its
The data of the data of his processing unit, operation required for being particularly suitable for learn arithmetic unit or other processing units in machine
Storage inside in the data that can not all save.
The combined treatment device can be used as the SOC on piece of the equipment such as mobile phone, robot, unmanned plane, video monitoring equipment
The die area of control section is effectively reduced in system, improves processing speed, reduces overall power.When this situation, the combined treatment
The general interconnecting interface of device is connected with certain components of equipment.Certain components for example camera, display, mouse, keyboard,
Network interface card, wifi interface.
In a feasible embodiment, also applied for a kind of distributed system, the system include n1 primary processor and
N2 coprocessor, n1 are greater than or equal to 0 integer, and n2 is greater than or equal to 1 integer.The system can be various types
Topological structure, topology shown in topological structure, Figure 11 shown in topologies including but not limited to as shown in Figure 3B, Fig. 3 C
Topological structure shown in structure and Figure 12.
Input data and its scaling position and computations are respectively sent to above-mentioned multiple associations and handled by the primary processor
Device;Or above-mentioned primary processor above-mentioned input data and its scaling position and computations are sent to it is above-mentioned multiple from processing
From processor, which again sends above-mentioned input data and its scaling position and computations from processor for part in device
To other from processor.The above-mentioned coprocessor includes above-mentioned computing device, which refers to according to the above method and calculating
It enables and operation is carried out to above-mentioned input data, obtain operation result;
Wherein, above-mentioned input data including but not limited to inputs neuron, weight and biased data etc..
Operation result is sent directly to above-mentioned primary processor by above-mentioned coprocessor, or pass is not connect with primary processor
Operation result is first sent to the coprocessor for having connection relationship with primary processor by the coprocessor of system, and then the coprocessor will
The operation result received is sent to above-mentioned primary processor.
In some embodiments, a kind of chip has also been applied for comprising at above-mentioned machine learning arithmetic unit or combination
Manage device.
In some embodiments, a kind of chip-packaging structure has been applied for comprising said chip.
In some embodiments, a kind of board has been applied for comprising said chip encapsulating structure.
In some embodiments, a kind of electronic equipment has been applied for comprising above-mentioned board.It is provided refering to Fig. 8, Fig. 8
A kind of board, above-mentioned board can also include other matching components, the matching component other than including said chip 389
Including but not limited to: memory device 390, reception device 391 and control device 392;
The memory device 390 is connect with the chip in the chip-packaging structure by bus, for storing data.Institute
Stating memory device may include multiple groups storage unit 393.Storage unit described in each group is connect with the chip by bus.It can
To understand, storage unit described in each group can be DDR SDRAM (English: Double Data Rate SDRAM, Double Data Rate
Synchronous DRAM).
DDR, which does not need raising clock frequency, can double to improve the speed of SDRAM.DDR allows the rising in clock pulses
Edge and failing edge read data.The speed of DDR is twice of standard SDRAM.In one embodiment, the storage device can be with
Including storage unit described in 4 groups.Storage unit described in each group may include multiple DDR4 particles (chip).In one embodiment
In, the chip interior may include 4 72 DDR4 controllers, and 64bit is used for transmission number in above-mentioned 72 DDR4 controllers
According to 8bit is used for ECC check.It is appreciated that data pass when using DDR4-3200 particle in the storage unit described in each group
Defeated theoretical bandwidth can reach 25600MB/s.
In one embodiment, storage unit described in each group include multiple Double Data Rate synchronous dynamics being arranged in parallel with
Machine memory.DDR can transmit data twice within a clock cycle.The controller of setting control DDR in the chips,
Control for data transmission and data storage to each storage unit.
The interface arrangement is electrically connected with the chip in the chip-packaging structure.The interface arrangement is for realizing described
Data transmission between chip and external equipment (such as server or computer).Such as in one embodiment, the interface
Device can be standard PCIE interface.For example, data to be processed are transferred to the core by standard PCIE interface by server
Piece realizes data transfer.Preferably, when using the transmission of 16 interface of PCIE 3.0X, theoretical bandwidth can reach 16000MB/s.
In another embodiment, the interface arrangement can also be other interfaces, and the application is not intended to limit above-mentioned other interfaces
Specific manifestation form, the interface unit can be realized signaling transfer point.In addition, the calculated result of the chip is still by institute
It states interface arrangement and sends back external equipment (such as server).
The control device is electrically connected with the chip.The control device is for supervising the state of the chip
Control.Specifically, the chip can be electrically connected with the control device by SPI interface.The control device may include list
Piece machine (Micro Controller Unit, MCU).If the chip may include multiple processing chips, multiple processing cores or more
A processing circuit can drive multiple loads.Therefore, the chip may be at the different work shape such as multi-load and light load
State.It may be implemented by the control device to processing chips multiple in the chip, multiple processing and/or multiple processing circuits
Working condition regulation.
Electronic equipment include data processing equipment, robot, computer, printer, scanner, tablet computer, intelligent terminal,
Mobile phone, automobile data recorder, navigator, sensor, camera, server, cloud server, camera, video camera, projector, hand
Table, earphone, mobile storage, wearable device, the vehicles, household electrical appliance, and/or Medical Devices.
The vehicles include aircraft, steamer and/or vehicle;The household electrical appliance include TV, air-conditioning, micro-wave oven,
Refrigerator, electric cooker, humidifier, washing machine, electric light, gas-cooker, kitchen ventilator;The Medical Devices include Nuclear Magnetic Resonance, B ultrasound instrument
And/or electrocardiograph.
Referring to Fig. 9, Fig. 9 is a kind of method for executing machine learning and calculating provided in an embodiment of the present invention, the method packet
It includes:
S901, computing device obtain the first input data and computations.
Wherein, above-mentioned first input data includes input neuron and weight.
S902, computing device parse the computations, to obtain data conversion instruction and multiple operational orders.
Wherein, it includes operation domain and operation code that the data conversion instruction, which includes data conversion instruction, which is used for
Indicate the function of the data type conversion instruction, the operation domain of the data type conversion instruction includes scaling position, uses
In the flag bit for the data type for indicating the first input data and the conversion regime of data type.
First input data is converted to the second input number according to the data conversion instruction by S903, computing device
According to second input data is fixed-point data.
Wherein, described that first input data is converted to by the second input data according to the data conversion instruction, packet
It includes:
The data conversion instruction is parsed, to obtain the scaling position, first input data of being used to indicate
The flag bit of data type and the conversion regime of data type;
The data type of first input data is determined according to the data type flag bit of first input data;
According to the conversion regime of the scaling position and the data type, first input data is converted to
Two input datas, the data type of second input data and the data type of first input data are inconsistent.
Wherein, when first input data and second input data are fixed-point data, first input
The scaling position of the scaling position of data and second input data is inconsistent.
In a kind of feasible embodiment, when first input data is fixed-point data, the method also includes:
According to the scaling position of first input data, it is derived by the decimal point of one or more intermediate result
Position, wherein one or more of intermediate results are obtained according to the first input data operation.
S904, computing device are calculated calculating to second input data execution according to the multiple operational order and refer to
The result of order.
Wherein, above-mentioned operational order includes forward operation instruction and reverse train instruction, i.e., above-mentioned computing device is executing
During forward operation instruction and/or reverse train instruction (i.e. the computing device carries out forward operation and/or reverse train), on
Fixed-point data can be converted to for the data for participating in operation according to above-mentioned embodiment illustrated in fig. 9 by stating computing device, carry out fixed-point calculation.
It should be noted that above-mentioned steps S901-S904 is specifically described reference can be made to the correlation of Fig. 1-8 illustrated embodiment is retouched
It states, no longer describes herein.
In a specific application scenarios, the data for participating in operation are converted to fixed-point data by above-mentioned computing device, and
The scaling position of fixed-point data is adjusted, detailed process is referring to Figure 10, as shown in Figure 10, this method comprises:
S1001, computing device obtain the first input data.
Wherein, above-mentioned first input data is to participate in the data of m layers of operation of Multi-Layered Network Model, the first input number
According to for any type of data.For example first input data is fixed-point data, floating data, integer data or discrete data, m
For the integer greater than 0.
Wherein, m layers of above-mentioned Multi-Layered Network Model are linear layer, which includes but is not limited to convolutional layer and complete
Articulamentum.Above-mentioned first input data includes input neuron, weight, output neuron, input neuron derivative, weight derivative
With output neuron derivative.
S1002, computing device determine the scaling position of the first input data and the bit wide of fixed-point data.
Wherein, the bit wide of the fixed-point data of above-mentioned first input data is the first input data institute for being indicated with fixed-point data
The bit accounted for, above-mentioned scaling position are bit shared by the fractional part of the first Data Data indicated with fixed-point data
Position.The scaling position is used to characterize the precision of fixed-point data.Referring specifically to the associated description of Fig. 2A.
Specifically, the first input data can be any type of data, and the first input data a is according to above-mentioned decimal point
The bit width conversion of position and fixed-point data is the second input dataIt is specific as follows:
Wherein, when above-mentioned first input data a meets condition neg≤a≤pos, above-mentioned second input dataFor | a/
2s|*2s;When above-mentioned first input data a is greater than pos, above-mentioned second input dataFor pos;When above-mentioned first input data
When a is less than neg, above-mentioned second input dataFor neg.
In one embodiment, for convolutional layer and the input neuron of full articulamentum, weight, output neuron, input
Neuron derivative, output neuron derivative and weight derivative are all made of fixed-point data and are indicated.
Optionally, the bit wide for the fixed-point data that above-mentioned input neuron uses can be 8,16,32,64 or other values.Into
One step, the bit wide of fixed-point data that above-mentioned input neuron uses is 8.
Optionally, the bit wide for the fixed-point data that above-mentioned weight uses can be 8,16,32,64 or other values.Further,
The bit wide for the fixed-point data that above-mentioned weight uses is 8.
Optionally, the bit wide for the fixed-point data that above-mentioned input neuron derivative uses can for 8,16,32,64 or other
Value.Further, the bit wide for the fixed-point data that above-mentioned input neuron derivative uses is 16.
Optionally, the bit wide for the fixed-point data that above-mentioned output neuron derivative uses can for 8,16,32,64 or other
Value.Further, the bit wide for the fixed-point data that above-mentioned output neuron derivative uses is 24.
Optionally, the bit wide for the fixed-point data that above-mentioned weight derivative uses can be 8,16,32,64 or other values.Into one
Step ground, the bit wide of the fixed-point data that above-mentioned weight derivative uses is 24.
In one embodiment, the biggish data a of numerical value can in the data for participating in above-mentioned Multi-Layered Network Model operation
Using a variety of fixed-point representation methods, referring specifically to the associated description of Fig. 2 B.
Specifically, the first input data can be any type of data, and the first input data a is according to above-mentioned decimal point
The bit width conversion of position and fixed-point data is the second input dataIt is specific as follows:
Wherein, when above-mentioned first input data a meets condition neg≤a≤pos, above-mentioned second input dataForAndWhen above-mentioned first input data a is greater than pos, above-mentioned second input dataFor pos;When above-mentioned first input data a is less than neg, above-mentioned second input dataFor neg.
S903, computing device initialize the scaling position of the first input data and adjust the decimal point of the first input data
Position.
Wherein, above-mentioned scaling position s needs to be according to the data of different classes of data, different neural net layers
The data of different iteration rounds carry out initialization and dynamic adjusts.
Lower mask body introduces the initialization procedure of the scaling position s of the first input data, that is, determines and for the first time will
Scaling position s used by fixed-point data when first input data is converted.
Wherein, the initialization of the scaling position s of above-mentioned the first input data of computing device includes: according to the first input number
The scaling position s of the first input data is initialized according to maximum absolute value value;According to the minimum value of the first input data absolute value
Initialize the scaling position s of the first input data;According to relationship initialization the between different types of data in the first input data
The scaling position s of one input data;Constant initializes the scaling position s of the first input data based on experience value.
Specifically, above-mentioned initialization procedure is specifically introduced separately below.
A), above-mentioned computing device initializes the decimal of the first input data according to the maximum value of the first input data absolute value
Point position s:
Above-mentioned computing device initializes the scaling position s of above-mentioned first input data especially by following formula:.
Wherein, above-mentioned amaxFor the maximum value of above-mentioned first input data absolute value, above-mentioned bitnum is above-mentioned first input
Data are converted to the bit wide of fixed-point data, above-mentioned saFor the scaling position of above-mentioned first input data.
Wherein, the data category and network layer for participating in operation can be divided into: l layers of input neuron X(l), output mind
Through first Y(l), weight W(l), input neuron derivativeOutput neuron derivativeWith weight derivativeIt finds absolute
When being worth maximum value, it can be found by data category;It can be layered, sub-category searching;It can be layered, is sub-category, grouping is found.The
The determination method of the maximum value of one input data absolute value includes:
A.1), above-mentioned computing device finds maximum absolute value value by data category
Specifically, it is a that the first input data, which includes each element in vector/matrix,i (l), wherein a(l)It can be input
Neuron X(l)Or output neuron Y(l)Or weight W(l)Or input neuron derivativeOr output neuron derivativeOr power
It is worth derivativeIn other words, above-mentioned first input data includes input neuron, weight, output neuron, input neuron
Derivative, weight derivative and output neuron derivative, the scaling position of above-mentioned first input data include the small of input neuron
Several positions, the scaling position of weight, the scaling position of output neuron, input neuron derivative scaling position,
The scaling position of weight derivative and the scaling position of output neuron derivative.Input neuron, weight, the output nerve
What member, input neuron derivative, weight derivative and output neuron derivative were indicated with matrix or vector form.Computing device
All elements in each layer of the vector/matrix by traversing above-mentioned Multi-Layered Network Model, obtain the absolute of every kind of categorical data
It is worth maximum value, i.e.,Pass through formulaDetermine every kind of categorical data a
Be converted to the scaling position s of fixed-point dataa。
A.2), above-mentioned computing device Xun Zhao maximum absolute value value according to hierarchical classification
Specifically, each element in the first input data vector/matrix is ai (l), wherein a(l)It can be input nerve
First X(l)Or output neuron Y(l)Or weight W(l)Or input neuron derivativeOr output neuron derivativeOr weight is led
NumberIn other words, every layer of above-mentioned Multi-Layered Network Model includes input neuron, weight, output neuron, input mind
Through first derivative, weight derivative and output neuron derivative.The scaling position of above-mentioned first input data includes input neuron
Scaling position, the scaling position of weight, the scaling position of output neuron, input neuron derivative decimal point
It sets, the scaling position of the scaling position of weight derivative and output neuron derivative.The input neuron, weight, output mind
It is indicated through member, input neuron derivative, weight derivative and output neuron derivative with matrix/vector.Above-mentioned computing device is logical
The all elements in the vector/matrix of every layer of every kind of data of traversal Multi-Layered Network Model are crossed, every kind of categorical data is obtained
The maximum value of absolute value, i.e.,Pass through formula:It determines in l
The scaling position of every kind of categorical data a of layer
A.3), above-mentioned computing device is not grouped according to hierarchical classification into searching maximum absolute value value
Specifically, each element in the first input data vector/matrix is ai (l), wherein a(l)It can be input neuron X(l)Or output neuron Y(l)Or weight W(l)Or input neuron derivativeOr output neuron derivativeOr weight derivativeIn other words, every layer of data category of above-mentioned Multi-Layered Network Model include input neuron, weight, output neuron,
Input neuron derivative, weight derivative and output neuron derivative.Above-mentioned computing device is by every layer of above-mentioned Multi-Layered Network Model
Each type data be divided into g group, or be grouped by any other rule of classification.Then above-mentioned multitiered network mould is traversed
In type in the corresponding g group data of every layer of each type data every group of data each element, obtain in this group of data absolute value most
Big element, i.e.,Pass through formulaIt determines every in every layer
Corresponding every group of g group data of the scaling position of kind data category
Wherein, above-mentioned any rule of classification is including but not limited to grouped, according to data training according to data area
Batch such as is grouped at the rules.
B) above-mentioned computing device initializes the small of first input data according to the absolute value minimum value of the first input data
Several position s:
Specifically, above-mentioned computing device finds the absolute value minimum value a of data to be quantifiedmin, it is fixed to be determined by following formula
Reveal precision s.
Wherein, above-mentioned aminFor the absolute value minimum value of above-mentioned first input data.Obtain aminProcess for details, reference can be made to
Above-mentioned steps are a.1), a.2), a.3).
C) above-mentioned computing device is according to first input of relationship initialization between different types of data in the first input data
The scaling position s of data:
Specifically, the data type a of any layer in Multi-Layered Network Model (such as l layers)(l)Scaling positionIt can
With by above-mentioned computing device according to l layers of data type b(l)Scaling positionAnd formula
It determines.
Wherein, a(l)And b(l)It can be input neuron X(l)Or output neuron Y(l)Or weight W(l)Or input neuron is led
NumberOr output neuron derivativeOr weight derivativeWherein, a(l)And b(l)For integer constant.
D) above-mentioned computing device based on experience value constant initialize the first input data scaling position s:
Specifically, the data type a of any layer (such as l layers) of above-mentioned Multi-Layered Network Model(l)Scaling position sa (l)S can be manually seta (l)=c, wherein c is integer constant, above-mentioned a(l)It can be input neuron X(l)Or output neuron Y(l)Or
Weight W(l)Or input neuron derivativeOr output neuron derivativeOr weight derivative
Further, at the beginning of the scaling position initialization value of above-mentioned input neuron and the scaling position of output neuron
Beginning value can be chosen in [- 8,8] range;The scaling position initialization value of weight can be chosen in [- 17,8] range, defeated
The scaling position initialization value of the scaling position initialization value and output neuron derivative that enter neuron derivative can [-
40, -20] it is chosen in range.The scaling position initialization value of weight derivative can be chosen in [- 48, -12] range.
The method that lower mask body introduces the above-mentioned scaling position s of above-mentioned computing device dynamic adjusting data.
The method of above-mentioned computing device dynamic adjustment scaling position s includes adjusting upward s (s becomes larger), and adjust s downwards
(s becomes smaller).It specifically includes and is adjusted upward according to the first input data maximum absolute value value single step;It is absolute according to the first input data
Value maximum value gradually adjusts upward;Single step is distributed according to the first input data to adjust upward;According to the first input data distribution by
Step adjusts upward;It is adjusted downwards according to the first input data maximum absolute value value.
A), above-mentioned computing device is adjusted upward according to the maximum value single step of data absolute value in the first input data:
Assuming that being s_old before above-mentioned scaling position adjustment, the corresponding fixed-point data of scaling position s_old can table
Show that data area is [neg, pos].Wherein, (2 pos=bitnum-1-1)*2s_old, neg=- (2bitnum-1-1)*2s_old.When above-mentioned
The maximum value a of data absolute value in first input datamaxWhen >=pos, then the scaling position after adjusting isOtherwise it will not be adjusted above-mentioned scaling position, i.e. s_new=s_old.
B), above-mentioned computing device is gradually adjusted upward according to the maximum value of data absolute value in the first input data:
Assuming that being s_old before above-mentioned scaling position adjustment, the corresponding fixed-point data of scaling position s_old can table
Show that data area is [neg, pos], wherein (2 pos=bitnum-1-1)*2s_old, neg=- (2bitnum-1-1)*2s_old.When above-mentioned
The maximum value a of data absolute value in first input datamaxWhen >=pos, then the scaling position after adjusting is s_new=s_
old+1;Otherwise it will not be adjusted above-mentioned scaling position, i.e. s_new=s_old.
C), above-mentioned computing device is distributed single step according to the first input data and adjusts upward:
Assuming that being s_old before above-mentioned scaling position adjustment, the corresponding fixed-point data of scaling position s_old can table
Show that data area is [neg, pos], wherein (2 pos=bitnum-1-1)*2s_old, neg=- (2bitnum-1-1)*2s_old.Calculate the
The statistic of the absolute value of one input data, such as the mean value a of absolute valuemeanWith the standard deviation a of absolute valuestd.Data are set most
A wide range of amax=amean+nastd.Work as amaxWhen >=pos,It is above-mentioned small that otherwise it will not be adjusted
Several positions, i.e. s_new=s_old.
Further, desirable 2 or 3 above-mentioned n
D), above-mentioned computing device is gradually adjusted upward according to the distribution of the first input data:
Assuming that being s_old before above-mentioned scaling position adjustment, the corresponding fixed-point data of scaling position s_old can table
Show that data area is [neg, pos], wherein (2 pos=bitnum-1-1)*2s_old, neg=- (2bitnum-1-1)*2s_old.Calculate the
The statistic of the absolute value of one input data, such as the mean value a of absolute valuemeanWith the standard deviation a of absolute valuestd.Data are set most
A wide range of amax=amean+nastd, n desirable 3.Work as amaxWhen >=pos, s_new=s_old+1, otherwise it will not be adjusted above-mentioned decimal point
Position, i.e. s_new=s_old.
E), above-mentioned computing device adjusts downwards according to the first input data maximum absolute value value:
Assuming that being s_old before above-mentioned scaling position adjustment, the corresponding fixed-point data of scaling position s_old can table
Show that data area is [neg, pos], wherein (2 pos=bitnum-1-1)*2s_old, neg=- (2bitnum-1-1)*2s_old.When first
The maximum absolute value value a of input datamax< 2s_old+(bitnum-n)And s_old >=sminWhen, s_new=s_old-1, wherein n be
Integer constant, sminIt can be integer, be also possible to bear infinite.
Further, above-mentioned n is 3, above-mentioned sminIt is -64.
Optionally, the frequency of scaling position above-mentioned for adjustment, can be and never adjust the small of the first input data
Several positions;Either primary every n adjustment the first cycle of training (i.e. iteration), n is constant;Or every n the
Two cycles of training (i.e. epoch), adjustment was primary, and n is constant;Either every n the first cycles of training or n second training week
Phase adjusts the scaling position of first input data, primary every n the first cycles of training or the second cycle of training of adjustment
Then the scaling position of first input data adjusts n=α n, wherein α is greater than 1;Either every n the first cycles of training or
Second cycle of training adjusted the scaling position of first input data, as exercise wheel number is incremented by, was gradually reduced n.
Further, the small of the scaling position, weight for once inputting neuron is adjusted every 100 the first cycles of training
The scaling position of several positions and output neuron.Neuron derivative is once inputted every 20 the first cycle of training of adjustment
The scaling position of scaling position and output neuron derivative.
It should be noted that above-mentioned first cycle of training is time needed for one batch sample of training, the second cycle of training
For all training samples are once trained with the required time.
It should be noted that above by the average value or median of above-mentioned data absolute value, in initialization and adjustment
The scaling position for stating data for details, reference can be made to the initialization of the maximum value of the absolute value above by data and adjusts above-mentioned data
Scaling position associated description, no longer describe herein.
It should be noted that for the various method embodiments described above, for simple description, therefore, it is stated as a series of
Combination of actions, but those skilled in the art should understand that, the application is not limited by the described action sequence because
According to the application, some steps may be performed in other sequences or simultaneously.Secondly, those skilled in the art should also know
It knows, embodiment described in this description belongs to alternative embodiment, related actions and modules not necessarily the application
It is necessary.
In the above-described embodiments, it all emphasizes particularly on different fields to the description of each embodiment, there is no the portion being described in detail in some embodiment
Point, reference can be made to the related descriptions of other embodiments.
In several embodiments provided herein, it should be understood that disclosed device, it can be by another way
It realizes.For example, the apparatus embodiments described above are merely exemplary, such as the division of the unit, it is only a kind of
Logical function partition, there may be another division manner in actual implementation, such as multiple units or components can combine or can
To be integrated into another system, or some features can be ignored or not executed.Another point, shown or discussed is mutual
Coupling, direct-coupling or communication connection can be through some interfaces, the indirect coupling or communication connection of device or unit,
It can be electrical or other forms.
The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit
The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple
In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme
's.
It, can also be in addition, each functional unit in each embodiment of the application can integrate in one processing unit
It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list
Member both can take the form of hardware realization, can also be realized in the form of software program module.
If the integrated unit is realized in the form of software program module and sells or use as independent product
When, it can store in a computer-readable access to memory.Based on this understanding, the technical solution of the application substantially or
Person says that all or part of the part that contributes to existing technology or the technical solution can body in the form of software products
Reveal and, which is stored in a memory, including some instructions are used so that a computer equipment
(can be personal computer, server or network equipment etc.) executes all or part of each embodiment the method for the application
Step.And memory above-mentioned includes: USB flash disk, read-only memory (ROM, Read-Only Memory), random access memory
The various media that can store program code such as (RAM, Random Access Memory), mobile hard disk, magnetic or disk.
Those of ordinary skill in the art will appreciate that all or part of the steps in the various methods of above-described embodiment is can
It is completed with instructing relevant hardware by program, which can store in a computer-readable memory, memory
It may include: flash disk, ROM, RAM, disk or CD etc..
The embodiment of the present application is described in detail above, specific case used herein to the principle of the application and
Embodiment is expounded, the description of the example is only used to help understand the method for the present application and its core ideas;
At the same time, for those skilled in the art can in specific embodiments and applications according to the thought of the application
There is change place, in conclusion the contents of this specification should not be construed as limiting the present application.
Claims (25)
1. a kind of computing device characterized by comprising storage unit, converting unit, arithmetic element and controller unit;
The storage unit includes caching and register,
The controller unit, for determining the scaling position of the first input data and the bit wide of fixed-point data;The fixed point
The bit wide of data is the bit wide that first input data is converted to fixed-point data;
The arithmetic element, for initializing the scaling position of first input data and adjusting first input data
Scaling position;And the scaling position of first input data adjusted is stored to the caching of the storage unit
In,
The controller unit, for obtaining the first input data and multiple operational orders from the register, and from described
The scaling position of first input data adjusted is obtained in caching;By the small of first input data adjusted
Several positions and first input data are transmitted to the converting unit;
The converting unit, for inputting number for described first according to the scaling position of first input data adjusted
According to being converted to the second input data;
Wherein, the arithmetic element initializes the scaling position of first input data, comprising:
The decimal point of first input data is initialized according to relationship between different types of data in first input data
It sets.
2. the apparatus according to claim 1, which is characterized in that the arithmetic element according in first input data not
The scaling position of first input data is initialized with relationship between data type, comprising:
The arithmetic element is according to the data type a of any layer (such as l layers) in Multi-Layered Network Model(l)Decimal point
It setsObtain l layers of data type b(l)Scaling positionAccording to formulaIt determines;
Wherein, a(l)To input neuron X(l), output neuron Y(l), weight W(l), input neuron derivativeOutput nerve
First derivativeOr weight derivativeb(l)To input neuron X(l), output neuron Y(l), weight W(l), input neuron
DerivativeOutput neuron derivativeOr weight derivativeAnd a(l)And b(l)It is inconsistent;αbAnd βbFor integer constant.
3. device according to claim 1 or 2, which is characterized in that the arithmetic element adjusts first input data
Scaling position, comprising:
The small of first input data is adjusted upward according to the maximum value single step of data absolute value in first input data
Several positions, or;
The small of first input data is gradually adjusted upward according to the maximum value of data absolute value in first input data
Several positions, or;
It is distributed the scaling position that single step adjusts upward first input data according to first input data, or;
The scaling position of first input data is gradually adjusted upward according to first input data distribution, or;
Adjust the scaling position of first input data downwards according to the first input data maximum absolute value value.
4. device according to claim 1-3, which is characterized in that the computing device is for executing machine learning
It calculates,
The controller unit is also used to the multiple operational order being transmitted to the arithmetic element;
The converting unit is also used to second input data being transmitted to the arithmetic element;
The arithmetic element is also used to carry out operation to second input data according to the multiple operational order, to obtain
Operation result.
5. device according to claim 4, which is characterized in that the machine learning calculating includes: artificial neural network fortune
It calculates, first input data includes: input neuron number evidence and weight data;The calculated result is output neuron number
According to.
6. device according to claim 4 or 5, which is characterized in that the arithmetic element include main process task circuit and
It is multiple from processing circuit;
The main process task circuit executes preamble processing and with the multiple from processing for carrying out to second input data
Data and the multiple operational order are transmitted between circuit;
It is the multiple from processing circuit, for according to from second input data of main process task circuit transmission and the multiple fortune
It calculates to instruct and execute intermediate operations and obtains multiple intermediate results, and multiple intermediate results are transferred to the main process task circuit;
The main process task circuit obtains the operation result for executing subsequent processing to the multiple intermediate result.
7. device according to claim 6, which is characterized in that the computing device further include: direct memory access DMA is mono-
Member;
The caching is also used to store first input data;Wherein, the caching includes that scratchpad caches;
The register is also used to store scalar data in first input data;
The DMA unit, for reading data from the storage unit or to the storage unit stores data.
8. according to the described in any item devices of claim 4-7, which is characterized in that when first input data is fixed-point data
When, the arithmetic element further include:
Derivation unit is derived by one or more intermediate knot for the scaling position according to first input data
The scaling position of fruit, wherein one or more of intermediate results are obtained according to the first input data operation.
9. device according to claim 8, which is characterized in that the arithmetic element further include:
Data buffer storage unit, for caching one or more of intermediate results.
10. according to the described in any item devices of claim 4-7, which is characterized in that the arithmetic element includes: tree-shaped module,
The tree-shaped module includes: a root port and multiple ports, and the root port of the tree-shaped module connects the main process task electricity
Road, multiple ports of the tree-shaped module are separately connected multiple one from processing circuit from processing circuit;
The tree-shaped module, for forwarding the main process task circuit and the multiple data and operation between processing circuit
Instruction;
Wherein, the tree type model is that n pitches tree construction, and the n is the integer more than or equal to 2.
11. according to the described in any item devices of claim 4-7, which is characterized in that the arithmetic element further includes branch process
Circuit,
The main process task circuit is specifically used for determining that the input neuron is broadcast data, and weight is distribution data, by one
Distribution data are distributed into multiple data blocks, by least one data block in the multiple data block, broadcast data and multiple
At least one operational order in operational order is sent to the branch process circuit;
The branch process circuit, for forward the main process task circuit and the multiple data block between processing circuit,
Broadcast data and operational order;
It is the multiple from processing circuit, for executing operation to the data block and broadcast data received according to the operational order
Intermediate result is obtained, and intermediate result is transferred to the branch process circuit;
The main process task circuit is also used to obtain the intermediate result progress subsequent processing that the branch process circuit is sent described
Operational order as a result, the result of the computations is sent to the controller unit.
12. according to the described in any item devices of claim 4-7, which is characterized in that the multiple to divide from processing circuit in array
Cloth;It is each connect from processing circuit with other adjacent from processing circuit, the main process task circuit connection is the multiple from processing
K in circuit is a from processing circuit, and the K is a from processing circuit are as follows: the n of n of the 1st row from processing circuit, m row is a from
The m for managing circuit and the 1st column is a from processing circuit;
The K is from processing circuit, in the main process task circuit and multiple data between processing circuit and referring to
The forwarding of order;
The main process task circuit is also used to determine that the input neuron is broadcast data, and weight is distribution data, by one point
Hair data are distributed into multiple data blocks, will be at least one data block and multiple operational orders in the multiple data block
At least one operational order is sent to the K from processing circuit;
The K is a from processing circuit, for converting the main process task circuit and the multiple data between processing circuit;
It is the multiple from processing circuit, obtain intermediate knot for executing operation to the data block received according to the operational order
Fruit, and operation result is transferred to the K from processing circuit;
The main process task circuit refers to for being handled to obtain the calculating K from the intermediate result that processing circuit is sent
Enable as a result, the result of the computations is sent to the controller unit.
13. the described in any item devices of 0-12 according to claim 1, which is characterized in that
The main process task circuit is combined sequence specifically for the intermediate result for sending multiple processing circuits and obtains the calculating
The result of instruction;
Or the main process task circuit, specifically for the intermediate result of the transmission of multiple processing circuits is combined sequence and is swashed
The result of the computations is obtained after processing living.
14. the described in any item devices of 0-12 according to claim 1, which is characterized in that the main process task circuit includes: at activation
Manage one of circuit and addition process circuit or any combination;
The activation processing circuit, for executing the activation operation of data in main process task circuit;
The addition process circuit, for executing add operation or accumulating operation;
It is described to include: from processing circuit
Multiplication process circuit obtains result of product for executing product calculation to the data block received;
Accumulation process circuit obtains the intermediate result for executing accumulating operation to the result of product.
15. a kind of machine learning arithmetic unit, which is characterized in that the machine learning arithmetic unit includes one or more as weighed
Benefit requires the described in any item computing devices of 4-14, for being obtained from other processing units to operational data and control information,
And specified machine learning operation is executed, implementing result is passed into other processing units by I/O interface;
When the machine learning arithmetic unit includes multiple computing devices, can lead between the multiple computing device
Specific structure is crossed to be attached and transmit data;
Wherein, multiple computing devices are interconnected and are transmitted data by quick external equipment interconnection Bus PC IE bus,
To support the operation of more massive machine learning;Multiple computing devices share same control system or possess respective control
System processed;Multiple computing device shared drives possess respective memory;The mutual contact mode of multiple computing devices
It is any interconnection topology.
16. a kind of combined treatment device, which is characterized in that the combined treatment device includes machine as claimed in claim 15
Learn arithmetic unit, general interconnecting interface, storage device and other processing units;
The machine learning arithmetic unit is interacted with other described processing units, the common calculating behaviour for completing user and specifying
Make;
The storage device is connect with the machine learning arithmetic unit and other described processing units respectively, described for saving
The data of machine learning arithmetic unit and other processing units.
17. a kind of neural network chip, which is characterized in that the neural network chip includes machine as claimed in claim 15
Learn arithmetic unit or combined treatment device as claimed in claim 16.
18. a kind of electronic equipment, which is characterized in that the electronic equipment includes the chip as described in the claim 17.
19. a kind of board, which is characterized in that the board includes: memory device, interface arrangement and control device and such as right
It is required that neural network chip described in 17;
Wherein, the neural network chip is separately connected with the memory device, the control device and the interface arrangement;
The memory device, for storing data;
The interface arrangement, for realizing the data transmission between the chip and external equipment;
The control device is monitored for the state to the chip;
Wherein, the memory device includes: that multiple groups storage unit, storage unit described in each group and the chip are connected by bus
It connects, the storage unit are as follows: DDR SDRAM;
The chip includes: DDR controller, the control for data transmission and data storage to each storage unit;
The interface arrangement are as follows: standard PCIE interface.
20. a kind of calculation method characterized by comprising
Controller unit determines the scaling position of the first input data and the bit wide of fixed-point data, the bit wide of the fixed-point data
It is the bit wide of fixed-point data for first input data;
Arithmetic element initializes the scaling position of first input data and adjusts the decimal point of first input data
Position;
Converting unit obtains the scaling position of the first input data adjusted, and according to the scaling position adjusted
First input data is converted into the second input data;
Wherein, the computing device initializes the scaling position of first input data, comprising:
The decimal point of first input data is initialized according to relationship between different types of data in first input data
It sets.
21. according to the method for claim 25, which is characterized in that the arithmetic element is according in first input data
Relationship initializes the scaling position of first input data between different types of data, comprising:
According to the data type a of any layer (such as l layers) in Multi-Layered Network Model(l)Scaling positionObtain l layers
Data type b(l)Scaling positionAccording to formulaIt determines;
Wherein, a(l)To input neuron X(l), output neuron Y(l), weight W(l), input neuron derivativeOutput nerve
First derivativeOr weight derivativeb(l)To input neuron X(l), output neuron Y(l), weight W(l), input neuron
DerivativeOutput neuron derivativeOr weight derivativeAnd a(l)And b(l)It is inconsistent;αbAnd βbFor integer constant.
22. the method according to claim 20 or 21, which is characterized in that arithmetic element adjustment the first input number
According to scaling position, comprising:
The small of first input data is adjusted upward according to the maximum value single step of data absolute value in first input data
Several positions, or;
The small of first input data is gradually adjusted upward according to the maximum value of data absolute value in first input data
Several positions, or;
It is distributed the scaling position that single step adjusts upward first input data according to first input data, or;
The scaling position of first input data is gradually adjusted upward according to first input data distribution, or;
Adjust the scaling position of first input data downwards according to the first input data maximum absolute value value.
23. according to the described in any item methods of claim 20-22, which is characterized in that the method is for executing engineering
The method calculated is practised, the method also includes:
The arithmetic element carries out operation to second input data according to the multiple operational order, to obtain operation knot
Fruit.
24. according to the method for claim 23, which is characterized in that the machine learning calculating includes: artificial neural network
Operation, first input data include: input neuron and weight;The calculated result is output neuron.
25. according to the method for claim 24, which is characterized in that when first input data is fixed-point data, institute
State method further include:
The arithmetic element is derived by one or more intermediate result according to the scaling position of first input data
Scaling position, wherein one or more of intermediate results are obtained according to the first input data operation.
Applications Claiming Priority (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2018101492872 | 2018-02-13 | ||
CN201810149287.2A CN110163350B (en) | 2018-02-13 | 2018-02-13 | Computing device and method |
CN201810207915.8A CN110276447B (en) | 2018-03-14 | 2018-03-14 | Computing device and method |
CN2018102079158 | 2018-03-14 | ||
CN201880002628.1A CN110383300B (en) | 2018-02-13 | 2018-09-03 | Computing device and method |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201880002628.1A Division CN110383300B (en) | 2018-02-13 | 2018-09-03 | Computing device and method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110163356A true CN110163356A (en) | 2019-08-23 |
CN110163356B CN110163356B (en) | 2020-10-09 |
Family
ID=67638324
Family Applications (11)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910195600.0A Active CN110163356B (en) | 2018-02-13 | 2018-09-03 | Computing device and method |
CN201910195535.1A Active CN110163353B (en) | 2018-02-13 | 2018-09-03 | Computing device and method |
CN201910195899.XA Active CN110163363B (en) | 2018-02-13 | 2018-09-03 | Computing device and method |
CN201910195898.5A Active CN110163362B (en) | 2018-02-13 | 2018-09-03 | Computing device and method |
CN201910195819.0A Active CN110163360B (en) | 2018-02-13 | 2018-09-03 | Computing device and method |
CN201910195627.XA Active CN110163357B (en) | 2018-02-13 | 2018-09-03 | Computing device and method |
CN201910195820.3A Active CN110163361B (en) | 2018-02-13 | 2018-09-03 | Computing device and method |
CN201910195599.1A Active CN110163355B (en) | 2018-02-13 | 2018-09-03 | Computing device and method |
CN201910195598.7A Active CN110163354B (en) | 2018-02-13 | 2018-09-03 | Computing device and method |
CN201910195818.6A Active CN110163359B (en) | 2018-02-13 | 2018-09-03 | Computing device and method |
CN201910195816.7A Active CN110163358B (en) | 2018-02-13 | 2018-09-03 | Computing device and method |
Family Applications After (10)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910195535.1A Active CN110163353B (en) | 2018-02-13 | 2018-09-03 | Computing device and method |
CN201910195899.XA Active CN110163363B (en) | 2018-02-13 | 2018-09-03 | Computing device and method |
CN201910195898.5A Active CN110163362B (en) | 2018-02-13 | 2018-09-03 | Computing device and method |
CN201910195819.0A Active CN110163360B (en) | 2018-02-13 | 2018-09-03 | Computing device and method |
CN201910195627.XA Active CN110163357B (en) | 2018-02-13 | 2018-09-03 | Computing device and method |
CN201910195820.3A Active CN110163361B (en) | 2018-02-13 | 2018-09-03 | Computing device and method |
CN201910195599.1A Active CN110163355B (en) | 2018-02-13 | 2018-09-03 | Computing device and method |
CN201910195598.7A Active CN110163354B (en) | 2018-02-13 | 2018-09-03 | Computing device and method |
CN201910195818.6A Active CN110163359B (en) | 2018-02-13 | 2018-09-03 | Computing device and method |
CN201910195816.7A Active CN110163358B (en) | 2018-02-13 | 2018-09-03 | Computing device and method |
Country Status (1)
Country | Link |
---|---|
CN (11) | CN110163356B (en) |
Families Citing this family (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR102354722B1 (en) * | 2018-02-13 | 2022-01-21 | 상하이 캠브리콘 인포메이션 테크놀로지 컴퍼니 리미티드 | Computing device and method |
US11740898B2 (en) | 2018-02-13 | 2023-08-29 | Shanghai Cambricon Information Technology Co., Ltd | Computing device and method |
CN110597756B (en) * | 2019-08-26 | 2023-07-25 | 光子算数(北京)科技有限责任公司 | Calculation circuit and data operation method |
CN112445524A (en) * | 2019-09-02 | 2021-03-05 | 中科寒武纪科技股份有限公司 | Data processing method, related device and computer readable medium |
CN112765537B (en) * | 2019-11-01 | 2024-08-23 | 中科寒武纪科技股份有限公司 | Data processing method, device, computer equipment and storage medium |
CN112257870B (en) * | 2019-11-08 | 2024-04-09 | 安徽寒武纪信息科技有限公司 | Machine learning instruction conversion method and device, board card, main board and electronic equipment |
CN110929862B (en) * | 2019-11-26 | 2023-08-01 | 陈子祺 | Fixed-point neural network model quantification device and method |
KR20210077352A (en) * | 2019-12-17 | 2021-06-25 | 에스케이하이닉스 주식회사 | Data Processing System and accelerating DEVICE therefor |
CN113190209A (en) * | 2020-01-14 | 2021-07-30 | 中科寒武纪科技股份有限公司 | Computing device and computing method |
CN113867792A (en) * | 2020-06-30 | 2021-12-31 | 上海寒武纪信息科技有限公司 | Computing device, integrated circuit chip, board card, electronic equipment and computing method |
CN111767024A (en) * | 2020-07-09 | 2020-10-13 | 北京猿力未来科技有限公司 | Simple operation-oriented answering method and device |
US20230297270A1 (en) * | 2020-09-27 | 2023-09-21 | Cambricon (Xi'an) Semiconductor Co., Ltd. | Data processing device, integrated circuit chip, device, and implementation method therefor |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101510149A (en) * | 2009-03-16 | 2009-08-19 | 炬力集成电路设计有限公司 | Method and apparatus for processing data |
CN101754039A (en) * | 2009-12-22 | 2010-06-23 | 中国科学技术大学 | Three-dimensional parameter decoding system for mobile devices |
CN102981854A (en) * | 2012-11-16 | 2013-03-20 | 天津市天祥世联网络科技有限公司 | Neural network optimization method based on floating number operation inline function library |
CN105426344A (en) * | 2015-11-09 | 2016-03-23 | 南京大学 | Matrix calculation method of distributed large-scale matrix multiplication based on Spark |
US20170061279A1 (en) * | 2015-01-14 | 2017-03-02 | Intel Corporation | Updating an artificial neural network using flexible fixed point representation |
CN106502626A (en) * | 2016-11-03 | 2017-03-15 | 北京百度网讯科技有限公司 | Data processing method and device |
CN107316078A (en) * | 2016-04-27 | 2017-11-03 | 北京中科寒武纪科技有限公司 | Apparatus and method for performing artificial neural network self study computing |
CN107330515A (en) * | 2016-04-29 | 2017-11-07 | 北京中科寒武纪科技有限公司 | A kind of apparatus and method for performing artificial neural network forward operation |
CN107340993A (en) * | 2016-04-28 | 2017-11-10 | 北京中科寒武纪科技有限公司 | A kind of apparatus and method for the neural network computing for supporting less digit floating number |
CN107451658A (en) * | 2017-07-24 | 2017-12-08 | 杭州菲数科技有限公司 | Floating-point operation fixed point method and system |
CN107608715A (en) * | 2017-07-20 | 2018-01-19 | 上海寒武纪信息科技有限公司 | For performing the device and method of artificial neural network forward operation |
Family Cites Families (29)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6650327B1 (en) * | 1998-06-16 | 2003-11-18 | Silicon Graphics, Inc. | Display system having floating point rasterization and floating point framebuffering |
US6834293B2 (en) * | 2001-06-15 | 2004-12-21 | Hitachi, Ltd. | Vector scaling system for G.728 annex G |
CN100410871C (en) * | 2003-07-23 | 2008-08-13 | 联发科技股份有限公司 | Digital signal processor applying skip type floating number operational method |
US7432925B2 (en) * | 2003-11-21 | 2008-10-07 | International Business Machines Corporation | Techniques for representing 3D scenes using fixed point data |
CN1658153B (en) * | 2004-02-18 | 2010-04-28 | 联发科技股份有限公司 | Compound dynamic preset number representation and algorithm, and its processor structure |
CN100340972C (en) * | 2005-06-07 | 2007-10-03 | 北京北方烽火科技有限公司 | Method for implementing logarithm computation by field programmable gate array in digital auto-gain control |
JP4976798B2 (en) * | 2006-09-28 | 2012-07-18 | 株式会社東芝 | Two-degree-of-freedom position control method, two-degree-of-freedom position control device, and medium storage device |
CN101231632A (en) * | 2007-11-20 | 2008-07-30 | 西安电子科技大学 | Method for processing floating-point FFT by FPGA |
CN101183873B (en) * | 2007-12-11 | 2011-09-28 | 广州中珩电子科技有限公司 | BP neural network based embedded system data compression/decompression method |
US9104479B2 (en) * | 2011-12-07 | 2015-08-11 | Arm Limited | Apparatus and method for rounding a floating-point value to an integral floating-point value |
CN103019647B (en) * | 2012-11-28 | 2015-06-24 | 中国人民解放军国防科学技术大学 | Floating-point accumulation/gradual decrease operational method with floating-point precision maintaining function |
CN103455983A (en) * | 2013-08-30 | 2013-12-18 | 深圳市川大智胜科技发展有限公司 | Image disturbance eliminating method in embedded type video system |
CN104572011B (en) * | 2014-12-22 | 2018-07-31 | 上海交通大学 | Universal matrix fixed-point multiplication device based on FPGA and its computational methods |
CN104679720A (en) * | 2015-03-17 | 2015-06-03 | 成都金本华科技股份有限公司 | Operation method for FFT |
CN104679719B (en) * | 2015-03-17 | 2017-11-10 | 成都金本华科技股份有限公司 | A kind of floating-point operation method based on FPGA |
CN105094744B (en) * | 2015-07-28 | 2018-01-16 | 成都腾悦科技有限公司 | A kind of variable floating data microprocessor |
US9977116B2 (en) * | 2015-10-05 | 2018-05-22 | Analog Devices, Inc. | Scaling fixed-point fast Fourier transforms in radar and sonar applications |
CN106484362B (en) * | 2015-10-08 | 2020-06-12 | 上海兆芯集成电路有限公司 | Device for specifying two-dimensional fixed-point arithmetic operation by user |
CN111353589B (en) * | 2016-01-20 | 2024-03-01 | 中科寒武纪科技股份有限公司 | Apparatus and method for performing artificial neural network forward operations |
CN107545303B (en) * | 2016-01-20 | 2021-09-07 | 中科寒武纪科技股份有限公司 | Computing device and operation method for sparse artificial neural network |
CN109358900B (en) * | 2016-04-15 | 2020-07-03 | 中科寒武纪科技股份有限公司 | Artificial neural network forward operation device and method supporting discrete data representation |
CN111651201B (en) * | 2016-04-26 | 2023-06-13 | 中科寒武纪科技股份有限公司 | Apparatus and method for performing vector merge operation |
CN107315566B (en) * | 2016-04-26 | 2020-11-03 | 中科寒武纪科技股份有限公司 | Apparatus and method for performing vector circular shift operation |
CN107315563B (en) * | 2016-04-26 | 2020-08-07 | 中科寒武纪科技股份有限公司 | Apparatus and method for performing vector compare operations |
CN109375951B (en) * | 2016-04-27 | 2020-10-09 | 中科寒武纪科技股份有限公司 | Device and method for executing forward operation of full-connection layer neural network |
CN111310904B (en) * | 2016-04-29 | 2024-03-08 | 中科寒武纪科技股份有限公司 | Apparatus and method for performing convolutional neural network training |
CN106708780A (en) * | 2016-12-12 | 2017-05-24 | 中国航空工业集团公司西安航空计算技术研究所 | Low complexity branch processing circuit of uniform dyeing array towards SIMT framework |
CN106775599B (en) * | 2017-01-09 | 2019-03-01 | 南京工业大学 | Multi-computing-unit coarse-grained reconfigurable system and method for recurrent neural network |
CN107292334A (en) * | 2017-06-08 | 2017-10-24 | 北京深瞐科技有限公司 | Image-recognizing method and device |
-
2018
- 2018-09-03 CN CN201910195600.0A patent/CN110163356B/en active Active
- 2018-09-03 CN CN201910195535.1A patent/CN110163353B/en active Active
- 2018-09-03 CN CN201910195899.XA patent/CN110163363B/en active Active
- 2018-09-03 CN CN201910195898.5A patent/CN110163362B/en active Active
- 2018-09-03 CN CN201910195819.0A patent/CN110163360B/en active Active
- 2018-09-03 CN CN201910195627.XA patent/CN110163357B/en active Active
- 2018-09-03 CN CN201910195820.3A patent/CN110163361B/en active Active
- 2018-09-03 CN CN201910195599.1A patent/CN110163355B/en active Active
- 2018-09-03 CN CN201910195598.7A patent/CN110163354B/en active Active
- 2018-09-03 CN CN201910195818.6A patent/CN110163359B/en active Active
- 2018-09-03 CN CN201910195816.7A patent/CN110163358B/en active Active
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101510149A (en) * | 2009-03-16 | 2009-08-19 | 炬力集成电路设计有限公司 | Method and apparatus for processing data |
CN101754039A (en) * | 2009-12-22 | 2010-06-23 | 中国科学技术大学 | Three-dimensional parameter decoding system for mobile devices |
CN102981854A (en) * | 2012-11-16 | 2013-03-20 | 天津市天祥世联网络科技有限公司 | Neural network optimization method based on floating number operation inline function library |
US20170061279A1 (en) * | 2015-01-14 | 2017-03-02 | Intel Corporation | Updating an artificial neural network using flexible fixed point representation |
CN105426344A (en) * | 2015-11-09 | 2016-03-23 | 南京大学 | Matrix calculation method of distributed large-scale matrix multiplication based on Spark |
CN107316078A (en) * | 2016-04-27 | 2017-11-03 | 北京中科寒武纪科技有限公司 | Apparatus and method for performing artificial neural network self study computing |
CN107340993A (en) * | 2016-04-28 | 2017-11-10 | 北京中科寒武纪科技有限公司 | A kind of apparatus and method for the neural network computing for supporting less digit floating number |
CN107330515A (en) * | 2016-04-29 | 2017-11-07 | 北京中科寒武纪科技有限公司 | A kind of apparatus and method for performing artificial neural network forward operation |
CN106502626A (en) * | 2016-11-03 | 2017-03-15 | 北京百度网讯科技有限公司 | Data processing method and device |
CN107608715A (en) * | 2017-07-20 | 2018-01-19 | 上海寒武纪信息科技有限公司 | For performing the device and method of artificial neural network forward operation |
CN107451658A (en) * | 2017-07-24 | 2017-12-08 | 杭州菲数科技有限公司 | Floating-point operation fixed point method and system |
Also Published As
Publication number | Publication date |
---|---|
CN110163354A (en) | 2019-08-23 |
CN110163362B (en) | 2020-12-11 |
CN110163353A (en) | 2019-08-23 |
CN110163362A (en) | 2019-08-23 |
CN110163355B (en) | 2020-10-09 |
CN110163353B (en) | 2021-05-11 |
CN110163354B (en) | 2020-10-09 |
CN110163363B (en) | 2021-05-11 |
CN110163361A (en) | 2019-08-23 |
CN110163359A (en) | 2019-08-23 |
CN110163357A (en) | 2019-08-23 |
CN110163357B (en) | 2021-06-25 |
CN110163355A (en) | 2019-08-23 |
CN110163363A (en) | 2019-08-23 |
CN110163360B (en) | 2021-06-25 |
CN110163361B (en) | 2021-06-25 |
CN110163358B (en) | 2021-01-05 |
CN110163360A (en) | 2019-08-23 |
CN110163359B (en) | 2020-12-11 |
CN110163358A (en) | 2019-08-23 |
CN110163356B (en) | 2020-10-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110163356A (en) | A kind of computing device and method | |
CN110383300A (en) | A kind of computing device and method | |
US11710041B2 (en) | Feature map and weight selection method and accelerating device | |
CN109189474A (en) | Processing with Neural Network device and its method for executing vector adduction instruction | |
CN110163350A (en) | A kind of computing device and method | |
CN110276447A (en) | A kind of computing device and method | |
CN108734281B (en) | Processing device, processing method, chip and electronic device | |
CN111047022B (en) | Computing device and related product | |
CN107203808A (en) | A kind of two-value Convole Unit and corresponding two-value convolutional neural networks processor | |
CN111178492B (en) | Computing device, related product and computing method for executing artificial neural network model | |
CN111047021B (en) | Computing device and related product | |
CN111291871B (en) | Computing device and related product | |
CN111382848A (en) | Computing device and related product |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |