CN109784484A

CN109784484A - Neural network accelerated method, device, neural network accelerate chip and storage medium

Info

Publication number: CN109784484A
Application number: CN201910100514.7A
Authority: CN
Inventors: 陈海波
Original assignee: Deep Blue Technology Shanghai Co Ltd
Current assignee: Deep Blue Technology Shanghai Co Ltd
Priority date: 2019-01-31
Filing date: 2019-01-31
Publication date: 2019-05-21

Abstract

The invention discloses a kind of neural network accelerated method, device, neural networks to accelerate chip and storage medium, this method comprises: being directed to neural network to be accelerated, carry out following step, until determining that the neural network accelerates to complete: carrying out acceleration processing to the current layer using the parameter of current layer to be accelerated, and dispatch next layer of parameter of the current layer；When the current layer accelerates processing to complete, next layer is determined as current layer to be accelerated and carries out acceleration processing.Neural network accelerates chip when accelerate processing to neural network current layer in the present invention, is capable of next layer of parameter of Parallel Scheduling current layer, shortens the whole acceleration time of neural network, improve the acceleration efficiency of neural network.

Description

Neural network accelerated method, device, neural network accelerate chip and storage medium

Technical field

The present invention relates to field of artificial intelligence more particularly to a kind of neural network accelerated methods, device, neural network Accelerate chip and storage medium.

Background technique

With using deep learning as the promotion of the precision of the neural network algorithm of representative, artificial intelligence overall market scale Gradually expand, huge market potentiality have attracted numerous chips, algorithm and application vendor to bound oneself to it.Due to artificial intelligence It can need largely to calculate in model training and reasoning, the passing characteristic for having been limited to its algorithm and calculating itself, tradition Computing chip be unable to satisfy demand, therefore be that neural network algorithm makes dedicated chip i.e. nerve there have been chip manufacturer Network accelerator.

Neural network accelerator needs to manage at work the parameter that network model is successively obtained in device in the outside, is equivalent to Ppu is by bus successively to the parameter of neural network accelerator Configuration network model, the every processing of neural network accelerator A complete layer data obtains the parameter of primary next layer network model to ppu, causes after the completion of layer data processing, Next layer parameter data acquisition arrive before section this period, i.e., in parameter scheduling time section, neural network accelerator will not be into The processing of row layer data leads to entire neural network acceleration time long low efficiency.

Summary of the invention

The present invention provides a kind of neural network accelerated method, device, neural networks to accelerate chip and storage medium, to Solve the problems, such as neural network acceleration time long low efficiency in the prior art.

The present invention provides a kind of neural network accelerated methods, are applied to neural network and accelerate chip, this method comprises:

For neural network to be accelerated, following step is carried out, until determining that the neural network accelerates to complete:

Acceleration processing is carried out to the current layer using the parameter of current layer to be accelerated, and is dispatched under the current layer One layer of parameter；

When the current layer accelerates processing to complete, next layer is determined as current layer to be accelerated and is carried out at acceleration Reason.

Further, if the current layer to be accelerated is the last layer, next layer of the scheduling current layer Parameter include:

Dispatch the parameter of first layer.

Further, next layer of parameter of the scheduling current layer includes:

Next layer of parameter of the current layer saved in scheduling on-chip memory.

Further, next layer of the parameter for dispatching the current layer saved in on-chip memory includes:

By REG file, next layer of parameter of the current layer saved in on-chip memory is dispatched.

Further, described that acceleration processing is carried out to the current layer using the parameter of current layer to be accelerated, and dispatch Before next layer of parameter of the current layer, the method also includes:

Every layer of parameter needed for accelerating processing is extracted from neural network to be accelerated, and is saved in the on piece storage In device.

Further, the on-chip memory includes ROM.

The present invention provides a kind of neural network accelerators, are applied to neural network and accelerate chip, which includes:

Accelerate scheduler module, for being directed to neural network to be accelerated, following step is carried out, until determining the nerve net Network accelerates to complete: carrying out acceleration processing to the current layer using the parameter of current layer to be accelerated, and dispatches the current layer Next layer of parameter；

Determining module, for when the current layer accelerates processing to complete, next layer to be determined as to be accelerated work as Front layer carries out acceleration processing.

Further, the acceleration scheduler module is adjusted if being the last layer specifically for the current layer to be accelerated Spend the parameter of first layer.

Further, the acceleration scheduler module, specifically for dispatching the current layer saved in on-chip memory Next layer of parameter.

Further, the acceleration scheduler module is specifically used for passing through REG file, save in scheduling on-chip memory Next layer of parameter of the current layer.

Further, described device further include:

Preserving module is extracted, for extracting every layer of parameter for accelerating processing required from neural network to be accelerated, and It is saved in the on-chip memory.

Further, the on-chip memory includes ROM.

The present invention provides a kind of neural networks to accelerate chip, comprising: processor, communication interface, memory and communication are total Line, wherein processor, communication interface, memory complete mutual communication by communication bus；

It is stored with computer program in the memory, when described program is executed by the processor, so that the place Manage the step of device executes any of the above-described the method.

The present invention provides a kind of computer readable storage medium, being stored with can accelerate chip to execute by neural network Computer program, when described program is when the neural network accelerates to run on chip, so that the neural network accelerates chip The step of executing any of the above-described the method.

The present invention provides a kind of neural network accelerated method, device, neural networks to accelerate chip and storage medium, the party Method includes: to carry out following step for neural network to be accelerated, until determining that the neural network accelerates to complete: using to The parameter of the current layer of acceleration carries out acceleration processing to the current layer, and dispatches next layer of parameter of the current layer；When When the current layer accelerates processing to complete, next layer is determined as current layer to be accelerated and carries out acceleration processing.The present invention Middle neural network accelerates chip when accelerate processing to neural network current layer, is capable of next layer of Parallel Scheduling current layer Parameter, shorten the whole acceleration time of neural network, improve the acceleration efficiency of neural network.

Detailed description of the invention

In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with It obtains other drawings based on these drawings.

Fig. 1 is a kind of schematic diagram for neural network accelerator that the embodiment of the present invention 1 provides；

Fig. 2 is the structural schematic diagram that a kind of neural network that the embodiment of the present invention 6 provides accelerates chip；

Fig. 3 is a kind of neural network accelerator schematic diagram provided in an embodiment of the present invention.

Specific embodiment

In order to shorten the whole acceleration time of neural network, the acceleration efficiency of neural network is improved, the embodiment of the present invention mentions A kind of neural network accelerated method, device, neural network has been supplied to accelerate chip and storage medium.

To make the objectives, technical solutions, and advantages of the present invention clearer, make below in conjunction with the attached drawing present invention into one Step ground detailed description, it is clear that described embodiment is only a part of the embodiments of the present invention, rather than whole implementation Example.Based on the embodiments of the present invention, obtained by those of ordinary skill in the art without making creative efforts Every other embodiment, shall fall within the protection scope of the present invention.

Embodiment 1:

Fig. 1 is a kind of schematic diagram of neural network accelerator provided in an embodiment of the present invention, which includes following step It is rapid:

S101: for neural network to be accelerated, carrying out following step, until determining that the neural network accelerates to complete.

Neural network accelerated method provided in an embodiment of the present invention is applied to neural network and accelerates chip, which adds Fast chip can be GPU (Graphics Processing Unit, graphics processor), AI (Artificial Intelligence, artificial intelligence) chip, FPGA (Field-Programmable Gate Array, field-programmable gate array Column) chip or other be able to carry out neural network acceleration chip.Specifically, it can be and accelerate chip applied to neural network In computing unit.

The neural network accelerates to preserve the algorithm for accelerate to neural network processing, therefore the neural network in chip Chip pins are accelerated to treat the neural network of acceleration, the neural network that acceleration can be treated according to following step carries out acceleration processing.

The neural network accelerates chip to can determine whether the neural network accelerates to complete, and determines whether neural network accelerates The process of completion belongs to the prior art, does not repeat them here in embodiments of the present invention.

Signified neural network includes deep learning neural network model in the embodiment of the present invention.

S102: carrying out acceleration processing to the current layer using the parameter of current layer to be accelerated, and dispatches described current Next layer of parameter of layer.

The neural network accelerates chip to can determine current layer to be accelerated, using the parameter pair for the current layer being dispatched to The current layer carries out acceleration processing.

Accelerate the process of processing that can realize using the prior art on layer using parameter, in embodiments of the present invention not It repeats.

Current layer can be first layer, the last layer or other layers of neural network, and only generation, which refers to, currently carries out acceleration processing Layer, without limiting specific a certain layer, signified layer is usually the convolutional layer in neural network in the embodiment of the present invention.

If current layer is first layer, the parameter of the current layer is to be dispatched to before carrying out acceleration processing to current layer, It is specifically as follows i.e. determination after system starting to be scheduled immediately after the neural network accelerated.

After neural network accelerates chip to determine current layer, next layer of current layer can be determined, specifically neural network adds Next layer of the information that each current layer is preserved in fast chip, such as can be and directly preserves a layer contingency table, in layer contingency table Next layer of information for directly preserving each layer can be and be named using each layer of serial number, and neural network accelerates chip to press Each layer of next layer etc. is determined according to the sequence of serial number.

Neural network accelerates chip while the parameter using current layer accelerate processing to current layer, Parallel Scheduling Next layer of parameter of current layer.

Parameter needed for every layer can be stored in neural network i.e. external in ppu, can be stored in mind In storage inside module through network acceleration chip.Therefore accordingly, neural network accelerates chip that can manage in device in the outside The parameter for dispatching next layer can be and dispatch next layer of parameter in the storage inside module of itself.

S103: when the current layer accelerates processing to complete, next layer is determined as current layer to be accelerated and is carried out Acceleration processing.

Neural network accelerates chip that can determine whether current layer accelerates processing to complete, which belongs to the prior art, It is not repeated them here in the embodiment of the present invention.

Neural network accelerates chip when determining that current layer accelerates processing to complete, by next layer as current layer to be accelerated Continue acceleration processing, therefore when neural network accelerates not completing, circulation carries out acceleration processing to each layer.

In order to facilitate understanding, the process of neural network acceleration is described in a circulating manner below:

A: the parameter of current layer is dispatched using first layer as current layer to be accelerated for neural network to be accelerated.

B: carrying out acceleration processing to current layer using the parameter of current layer, and dispatches next layer of parameter of current layer.

C: judge whether neural network accelerates to complete；If not, carrying out D；If so, carrying out E.

D: if current layer accelerates processing to complete, next layer is determined as to current layer to be accelerated, and next layer of parameter It is determined as the parameter of current layer to be accelerated, returns to B.

E: determine that neural network accelerates to complete.

Neural network accelerates chip when accelerate processing to neural network current layer in the embodiment of the present invention, can be simultaneously Next layer of parameter of row scheduling current layer, shortens the whole acceleration time of neural network, improves the acceleration of neural network Efficiency.

Embodiment 2:

On the basis of the above embodiments, in the embodiment of the present invention, if the current layer to be accelerated is the last layer, Next layer of parameter of the scheduling current layer includes:

Dispatch the parameter of first layer.

Due to needing successively cyclically to carry out at acceleration every layer of neural network before neural network accelerates to complete Reason can carry out at acceleration so if current layer is the last layer using first layer as next layer of circulation of the last layer Reason.

Therefore next layer of parameter of scheduling the last layer is specially to dispatch the parameter of first layer.

Due in the embodiment of the present invention current layer be the last layer when, using the parameter of first layer as under the last layer One layer of parameter is scheduled, it is ensured that before neural network acceleration is completed, realizes layer-by-layer to circulation in neural network add Speed processing improves the acceleration efficiency of neural network to guarantee the whole acceleration time of shortening neural network.

Embodiment 3:

On the basis of the various embodiments described above, in the embodiment of the present invention, next layer of ginseng of the scheduling current layer Number includes:

In order to further increase the acceleration efficiency of neural network, every layer of parameter is pre-stored in neural network and accelerates chip Storage inside module, i.e., in on-chip memory, rather than in ppu, therefore next layer can be dispatched to more quickly Parameter.

Specifically, neural network acceleration chip can dispatch next layer of ginseng of current layer directly in on-chip memory Number is also possible to dispatch next layer of parameter of current layer, such as alternative document in on-chip memory indirectly by alternative document It can be REG file.

The on-chip memory includes read-only memory (ROM, Read-Only Memory), naturally it is also possible to for other tools There is the module etc. of store function.

When every layer of parameter is stored in on-chip memory, it can be stored in every layer of corresponding space, it preferably can be with It is that the corresponding space of adjacent layer is continuous, in order to lower layer of parameter of fast dispatch.

Accelerate in the on-chip memory of chip since every layer in the embodiment of the present invention of parameter is pre-stored in neural network, Therefore it can be dispatched to more quickly next layer of parameter, to further increase the acceleration efficiency of neural network.

Embodiment 4:

On the basis of the various embodiments described above, in the embodiment of the present invention, saved in the scheduling on-chip memory described Next layer of parameter of current layer includes:

In order to further increase the acceleration efficiency of neural network, every layer of parameter being saved in on-chip memory In, the parameter of layer to be processed is stored in REG file, computing unit can directly in REG file direct scheduling parameter, It saves and first determines next layer, then search the time of the parameter of next layer of scheduling.

Specifically, after the parameter of a certain layer is removed in REG file, next layer of the parameter of this layer is deposited from piece immediately It reads and is saved in REG file in reservoir.

Therefore, one layer of parameter is only saved in REG file every time.

Below with a specific embodiment to being illustrated in the embodiment of the present invention, some neural network has M convolution The parameter of M convolutional layer, is respectively stored in 1~M of space of ROM by layer first, first by the ginseng in space 1 after system starting Number, which is read out, to be stored in REG file, and the 1st layer parameter of computing unit is told to have been prepared for finishing, and waits computing unit will After first layer parameter is taken away, the parameter in ROM Space 2 is read out be stored in REG file immediately, and tell computing unit 2nd layer parameter has been prepared for finishing, and so on, when computing unit takes M layer parameter away, and since the 1st layer, until should Neural metwork training is completed.

Since in the embodiment of the present invention, neural network accelerates chip to pass through REG file, saved in scheduling on-chip memory Next layer of parameter of the current layer, further improves the acceleration efficiency of neural network.

Embodiment 5:

On the basis of the various embodiments described above, in the embodiment of the present invention, the parameter pair using current layer to be accelerated The current layer carries out acceleration processing, and before dispatching next layer of parameter of the current layer, the method also includes:

Neural network accelerates chip that can mention from neural network to be accelerated such as deep learning neural network model automatically Required parameter is taken, and is saved in on-chip memory, to improve the acceleration efficiency of neural network.

Neural network accelerates chip that could extract every layer of parameter for accelerating processing required from neural network to be accelerated, It can be and extracted in neural network according to the corresponding keyword of parameter, is also possible to pre-configured every in neural network The parameter of layer and preservation, neural network accelerate parameter save location of the chip directly in neural network directly to extract etc..

Neural network accelerates chip after extracting every layer of parameter needed for accelerating to handle in neural network, by the nerve The parameter of every layer of network is stored in internal on-chip memory.

By automatically from parameter needed for deep learning neural network model extraction accelerator, Parameter File being stored in and is added In the on-chip memory ROM of fast device, when neural network accelerator operation, successively reads configuration parameter, automatically configures mind to realize Network model parameter through network accelerator facilitates the reading of the parameter in neural network accelerator middle layer, to improve mind Through network acceleration efficiency.

Embodiment 6:

On the basis of the various embodiments described above, the embodiment of the invention also provides a kind of neural networks to accelerate chip, such as Fig. 2 It is shown, comprising: processor 201, communication interface 202, memory 203 and communication bus 204, wherein processor 201, communication connects Mouth 202, memory 203 complete mutual communication by communication bus 204；

It is stored with computer program in the memory 203, when described program is executed by the processor 201, so that The processor 201 executes following steps:

The communication bus that above-mentioned neural network accelerates chip to mention can be Peripheral Component Interconnect standard (Peripheral Component Interconnect, PCI) bus or expanding the industrial standard structure (Extended Industry Standard Architecture, EISA) bus etc..The communication bus can be divided into address bus, data/address bus, control bus etc..For just It is only indicated with a thick line in expression, figure, it is not intended that an only bus or a type of bus.

Communication interface 202 accelerates the communication between chip and other equipment for above-mentioned neural network.

Memory may include random access memory (Random Access Memory, RAM), also may include non-easy The property lost memory (Non-Volatile Memory, NVM), for example, at least a magnetic disk storage.Optionally, memory may be used also To be storage device that at least one is located remotely from aforementioned processor.

Above-mentioned processor can be general processor, including central processing unit, network processing unit (Network Processor, NP) etc.；It can also be digital command processor (Digital Signal Processing, DSP), dedicated collection At circuit, field programmable gate array or other programmable logic device, discrete gate or transistor logic, discrete hard Part component etc..

In embodiments of the present invention, when processor executes the program stored on memory, realization is worked as to neural network When front layer accelerate processing, it is capable of next layer of parameter of Parallel Scheduling current layer, shortens the whole of neural network and accelerate Time improves the acceleration efficiency of neural network.

Embodiment 7:

On the basis of the various embodiments described above, the embodiment of the invention also provides a kind of computers to store readable storage medium Matter is stored with the computer program that chip can be accelerated to execute by neural network in the computer readable storage medium, when described Program is when the neural network accelerates to run on chip, so that realizing following step when the neural network accelerates chip to execute It is rapid:

The processor that above-mentioned computer readable storage medium can be in neural network acceleration chip can access any Usable medium or data storage device, including but not limited to magnetic storage such as floppy disk, hard disk, tape, magneto-optic disk (MO) etc., light Learn memory such as CD, DVD, BD, HVD etc. and semiconductor memory such as ROM, EPROM, EEPROM, nonvolatile memory (NAND FLASH), solid state hard disk (SSD) etc..

Computer program, computer program are provided in the computer readable storage medium provided in embodiments of the present invention When being executed by processor, realizes when accelerate processing to neural network current layer, be capable of the next of Parallel Scheduling current layer The parameter of layer, shortens the whole acceleration time of neural network, improves the acceleration efficiency of neural network.

Fig. 3 is a kind of neural network accelerator schematic diagram provided in an embodiment of the present invention, is applied to neural network and accelerates Chip, the device include:

Accelerate scheduler module 301, for being directed to neural network to be accelerated, following step is carried out, until determining the mind It is completed through network acceleration: acceleration processing being carried out to the current layer using the parameter of current layer to be accelerated, and is worked as described in scheduling Next layer of parameter of front layer；

Determining module 302, for next layer being determined as to be accelerated when the current layer accelerates processing to complete Current layer carries out acceleration processing.

The acceleration scheduler module 301, if being the last layer, scheduling first specifically for the current layer to be accelerated The parameter of layer.

The acceleration scheduler module 301, specifically for next layer of the current layer saved in scheduling on-chip memory Parameter.

The acceleration scheduler module 301, is specifically used for through REG file, dispatches save in on-chip memory described and works as Next layer of parameter of front layer.

Described device further include:

Preserving module 303 is extracted, for extracting every layer of parameter for accelerating processing required from neural network to be accelerated, And it is saved in the on-chip memory.

The on-chip memory includes ROM.

For systems/devices embodiment, since it is substantially similar to the method embodiment, so the comparison of description is simple Single, the relevent part can refer to the partial explaination of embodiments of method.

It should be noted that, in this document, relational terms such as first and second and the like are used merely to a reality Body or an operation are distinguished with another entity or another operation, without necessarily requiring or implying these entities Or there are any actual relationship or orders between operation.

It should be understood by those skilled in the art that, embodiments herein can provide as method, system or computer program Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the application Apply the form of example.Moreover, it wherein includes the computer of computer usable program code that the application, which can be used in one or more, The computer program implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) produces The form of product.

The application is referring to method, the process of equipment (system) and computer program product according to the embodiment of the present application Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.

These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates, Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or The function of being specified in multiple boxes.

These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one The step of function of being specified in a box or multiple boxes.

Although the preferred embodiment of the application has been described, it is created once a person skilled in the art knows basic Property concept, then additional changes and modifications can be made to these embodiments.So it includes excellent that the following claims are intended to be interpreted as It selects embodiment and falls into all change and modification of the application range.

Obviously, various changes and modifications can be made to the invention without departing from essence of the invention by those skilled in the art Mind and range.In this way, if these modifications and changes of the present invention belongs to the range of the claims in the present invention and its equivalent technologies Within, then the present invention is also intended to include these modifications and variations.

Claims

1. a kind of neural network accelerated method, which is characterized in that it is applied to neural network and accelerates chip, this method comprises:

Acceleration processing is carried out to the current layer using the parameter of current layer to be accelerated, and dispatches next layer of the current layer Parameter；

When the current layer accelerates processing to complete, next layer is determined as current layer to be accelerated and carries out acceleration processing.

2. the method as described in claim 1, which is characterized in that described if the current layer to be accelerated is the last layer Next layer of parameter for dispatching the current layer includes:

Dispatch the parameter of first layer.

3. the method as described in claim 1, which is characterized in that next layer of parameter of the scheduling current layer includes:

4. method as claimed in claim 3, which is characterized in that the current layer saved in the scheduling on-chip memory Next layer of parameter includes:

5. the method as claimed in claim 3 or 4, which is characterized in that the parameter using current layer to be accelerated is to described Current layer carries out acceleration processing, and before dispatching next layer of parameter of the current layer, the method also includes:

Every layer of parameter needed for accelerating processing is extracted from neural network to be accelerated, and is saved in the on-chip memory In.

6. the method as claimed in claim 3 or 4, which is characterized in that the on-chip memory includes read only memory ROM.

7. a kind of neural network accelerator, which is characterized in that be applied to neural network and accelerate chip, which includes:

Accelerate scheduler module, for being directed to neural network to be accelerated, following step is carried out, until determining that the neural network adds Speed is completed: being carried out acceleration processing to the current layer using the parameter of current layer to be accelerated, and is dispatched under the current layer One layer of parameter；

Determining module, for when the current layer accelerates processing to complete, next layer to be determined as current layer to be accelerated Carry out acceleration processing.

8. device as claimed in claim 7, which is characterized in that the acceleration scheduler module, if be specifically used for described to be added The current layer of speed is the last layer, dispatches the parameter of first layer.

9. device as claimed in claim 7, which is characterized in that the acceleration scheduler module is specifically used for scheduling on piece storage Next layer of parameter of the current layer saved in device.

10. device as claimed in claim 9, which is characterized in that the acceleration scheduler module is specifically used for through REG file, Next layer of parameter of the current layer saved in scheduling on-chip memory.

11. the device as described in claim 9 or 10, which is characterized in that described device further include:

Preserving module is extracted, for extracting every layer of parameter for accelerating processing required from neural network to be accelerated, and is saved Into the on-chip memory.

12. the device as described in claim 9 or 10, which is characterized in that the on-chip memory includes read only memory ROM.

13. a kind of neural network accelerates chip characterized by comprising processor, communication interface, memory and communication bus, Wherein, processor, communication interface, memory complete mutual communication by communication bus；

It is stored with computer program in the memory, when described program is executed by the processor, so that the processor Perform claim requires the step of any one of 1~6 the method.

14. a kind of computer readable storage medium, which is characterized in that it is stored with the meter that chip can be accelerated to execute by neural network Calculation machine program, when described program is when the neural network accelerates to run on chip, so that the neural network accelerates chip to hold The step of any one of row claim 1~6 the method.