CN111027669A

CN111027669A - Method and device for realizing deep neural network on field programmable gate array

Info

Publication number: CN111027669A
Application number: CN201911002442.9A
Authority: CN
Inventors: 罗国杰; 戴拓; 张文泰; 章嘉玺
Original assignee: Advanced Institute of Information Technology AIIT of Peking University; Hangzhou Weiming Information Technology Co Ltd
Current assignee: Advanced Institute of Information Technology AIIT of Peking University; Hangzhou Weiming Information Technology Co Ltd
Priority date: 2019-10-21
Filing date: 2019-10-21
Publication date: 2020-04-17

Abstract

The application discloses a method and a device for realizing a deep neural network on a field programmable gate array, wherein the method comprises the following steps: analyzing resource demand and saturation throughput of each network layer of the deep neural network; enumerating all partitioning schemes for partitioning all the network layers into a plurality of programmable logic gate arrays according to the resource demand and the saturation throughput; calculating effect parameter data of all the division schemes, and selecting an optimal scheme from all the division schemes; and implementing the optimal scheme on a board. According to the method for realizing the deep neural network on the field programmable gate array, the optimal division scheme for dividing the network layer of the deep neural network into the plurality of FPGAs is planned, the resource utilization rate of the FPGAs and the computing capacity of the deep neural network are greatly improved, FPGA resources can be saved, the larger total throughput can be achieved, and the performance requirement for realizing the deep neural network can be well met.

Description

Method and device for realizing deep neural network on field programmable gate array

Technical Field

The application relates to the technical field of computers, in particular to a method and a device for realizing a deep neural network on a field programmable gate array.

Background

A Field Programmable Gate Array (FPGA), which is a product developed on the basis of Programmable devices such as PAL, GAL, CPLD, etc. The circuit is a semi-custom circuit in the field of application-specific integrated circuits, not only overcomes the defects of the custom circuit, but also overcomes the defect that the number of gate circuits of the original programmable device is limited. Deep Neural Networks (DNNs) are artificial Neural Networks with multiple hidden layers. DNN is widely applied to the fields of artificial intelligence such as image processing, mode recognition and the like. It is already common to implement deep neural networks on FPGAs, but the computation speed of the FPGA-based deep neural networks is slow. Moreover, the difficulty of accelerating the deep neural network on the FPGA is large due to the limitation of resources and the like.

Disclosure of Invention

The application aims to provide a method and a device for realizing a deep neural network on a field programmable gate array. The following presents a simplified summary in order to provide a basic understanding of some aspects of the disclosed embodiments. This summary is not an extensive overview and is intended to neither identify key/critical elements nor delineate the scope of such embodiments. Its sole purpose is to present some concepts in a simplified form as a prelude to the more detailed description that is presented later.

According to an aspect of an embodiment of the present application, there is provided a method for implementing a deep neural network on a field programmable gate array, including:

analyzing resource demand and saturation throughput of each network layer of the deep neural network;

enumerating all partitioning schemes for partitioning all the network layers into a plurality of programmable logic gate arrays according to the resource demand and the saturation throughput;

calculating the effect parameter data of all the division schemes, and selecting an optimal scheme from all the division schemes according to the effect parameter data of all the division schemes;

and implementing the optimal scheme on a board.

Further, the analyzing the resource demand and the saturation throughput of each network layer of the deep neural network comprises:

reading parameter data of the deep neural network, and calculating the resource demand of each network layer;

and acquiring the saturated throughput of each network layer according to the resource demand.

Further, the implementation of the optimal solution upper plate includes: and writing the parameter data of the network layer into the correspondingly divided field programmable gate arrays according to the optimal scheme.

Further, the parameter data of the network layer includes a layer type of the convolutional layer, the number of convolutional kernels, the size of the convolutional kernels, and a step size of the convolutional kernels.

Further, the network layer includes a convolutional layer.

Further, the optimal scheme comprises a partitioning scheme satisfying the following conditions: the total throughput of the deep neural network reaches a preset threshold value, and the number of field programmable gate arrays is minimum;

further, the optimal scheme comprises a partitioning scheme satisfying the following conditions: the number of the field programmable gate arrays reaches a preset threshold value, and the total throughput of the deep neural network is maximum.

According to another aspect of the embodiments of the present application, there is provided an apparatus for implementing a deep neural network on a field programmable gate array, including:

the analysis module is used for analyzing the resource demand and the saturation throughput of each network layer of the deep neural network;

an enumeration module, configured to enumerate all partitioning schemes for partitioning all the network layers into a plurality of programmable logic gate arrays according to the resource demand and the saturation throughput;

the selection module is used for calculating the effect parameter data of all the division schemes and selecting the optimal scheme from all the division schemes according to the effect parameter data of all the division schemes;

and the upper plate module is used for realizing the upper plate of the optimal scheme.

According to another aspect of embodiments of the present application, there is provided a non-transitory computer readable storage medium having stored thereon a computer program, which is executed by a processor, to implement the above-described method.

According to another aspect of the embodiments of the present application, there is provided a data processing method for processing data by using the deep neural network implemented by the method described above.

The technical scheme provided by the embodiment of the application can have the following beneficial effects:

according to the method for realizing the deep neural network on the field programmable gate array, the optimal division scheme for dividing the network layer of the deep neural network into the plurality of FPGAs is planned, the resource utilization rate of the FPGAs and the computing capacity of the deep neural network are greatly improved, FPGA resources can be saved, the larger total throughput can be achieved, and the performance requirement for realizing the deep neural network can be well met.

Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the embodiments of the application, or may be learned by the practice of the embodiments. The objectives and other advantages of the application may be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings can be obtained by those skilled in the art without creative efforts.

FIG. 1 is a flow chart of a method for implementing a deep neural network on a field programmable gate array according to an embodiment of the present application;

FIG. 2 is a block diagram of an apparatus for implementing a deep neural network on a field programmable gate array according to an embodiment of the present application;

FIG. 3 is a flow chart of a method for implementing a deep neural network on a field programmable gate array according to another embodiment of the present application;

FIG. 4 is a block diagram of an apparatus for implementing a deep neural network on a field programmable gate array according to another embodiment of the present application;

FIG. 5 is a flow chart of a method for implementing a deep neural network on a field programmable gate array according to another embodiment of the present application;

fig. 6 is a flowchart of a method for implementing a deep neural network on a field programmable gate array according to another embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is further described with reference to the accompanying drawings and specific embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

It will be understood by those within the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

As shown in fig. 1, one embodiment of the present application provides a method 001 of implementing a deep neural network on a field programmable gate array, comprising:

s10, analyzing the resource demand and the saturation throughput of each network layer of the deep neural network;

s20, enumerating all the division schemes for dividing all the network layers into a plurality of programmable logic gate arrays according to the resource demand and the saturation throughput;

s30, calculating the effect parameter data of all the division schemes, and selecting the optimal scheme from all the division schemes according to the effect parameter data of all the division schemes;

and S40, implementing the optimal scheme on a board.

In some embodiments, the analyzing the resource demand and saturation throughput of each network layer of the deep neural network comprises:

In some embodiments, said implementing said optimal solution upper plate comprises: and writing the parameter data of the network layer into the correspondingly divided field programmable gate arrays according to the optimal scheme.

In some embodiments, the parameter data of the network layer includes a layer type of the convolutional layer, the number of convolutional kernels, a size of the convolutional kernels, and a step size of the convolutional kernels. The network layer includes convolutional layers.

In some embodiments, the network layer includes a convolutional layer, and further includes one or more of an input layer, a pooling layer, an excitation layer, a fully-connected layer, and an output layer. The parameter data of the network layer comprises parameter data of the convolutional layer and also comprises one or more of the following data: the parameter data of the input layer, the parameter data of the pooling layer, the parameter data of the excitation layer, the parameter data of the full-connection layer and the parameter data of the output layer; the parameter data of the input layer comprises a layer type; the parameter data of the convolution layer comprises a layer type, the number of convolution kernels, the size of the convolution kernels and the step length of the convolution kernels; the parameter data of the pooling layer comprises a layer type, a pooling function, the size of the pooling layer and the step size of the pooling layer; the parameter data of the excitation layer comprises a layer type and an excitation function; the parameter data of the full connection layer comprises a layer type and an output number; the parameter data of the output layer comprises a layer type.

In some embodiments, the optimal solution comprises a partitioning solution that satisfies the following conditions: the total throughput of the deep neural network reaches a preset threshold value, and the number of field programmable gate arrays is minimum;

in some embodiments, the optimal solution comprises a partitioning solution that satisfies the following conditions: the number of the field programmable gate arrays reaches a preset threshold value, and the total throughput of the deep neural network is maximum.

As shown in fig. 2, the present embodiment further provides an apparatus for implementing a deep neural network on a field programmable gate array, including:

an analysis module 100, configured to analyze a resource demand and a saturation throughput of each network layer of the deep neural network;

an enumeration module 200, configured to enumerate all partitioning schemes for partitioning all network layers into a plurality of programmable logic gate arrays according to the resource demand and the saturation throughput;

a selecting module 300, configured to calculate effect parameter data of all the partition schemes, and select an optimal scheme from all the partition schemes according to the effect parameter data of all the partition schemes;

and the upper plate module 400 is used for implementing the optimal scheme on the upper plate.

The present embodiments also provide a non-transitory computer readable storage medium having stored thereon a computer program, which is executed by a processor, to implement the above-described method.

The embodiment also provides a data processing method, which utilizes the deep neural network realized by the method to process data.

As shown in fig. 3, one embodiment of the present application provides a method 002 of implementing a deep neural network on a field programmable gate array, comprising:

A1. a pretreatment step: analyzing resource demand and saturation throughput of each network layer of the deep neural network;

A2. dividing: planning an optimal scheme for dividing the deep neural network into a plurality of Field Programmable Gate Arrays (FPGAs) according to the result (resource demand and saturation throughput) obtained in the step A1;

A3. the implementation steps are as follows: and (4) realizing the optimal scheme obtained by A2 on a board.

In certain embodiments, step a1 comprises:

b1, reading parameter data of the deep neural network;

and B2, calculating the resource demand and the saturation throughput of each network layer according to the parameter data of the deep neural network.

In certain embodiments, step a1 comprises:

c1, reading the parameter data of the deep neural network, and calculating the resource demand of each network layer of the deep neural network;

c2, acquiring the saturation throughput of each network layer according to the resource demand; the saturated throughput is the throughput of the network layer when the resource demand is met.

In certain embodiments, a2 comprises:

d1, selecting a plurality of FPGAs according to the resource demand and the saturation throughput, and enumerating all division schemes for dividing all the network layers into the FPGAs.

D2, calculating the effect parameter data of all the division schemes, and selecting the optimal scheme from all the division schemes according to the effect parameter data of all the division schemes.

In certain embodiments, a2 comprises:

E1. and (3) state division of dynamic planning: the use condition of each FPGA is recorded by using state compression, and the numbers of the network layer and the FPGA number used at last are represented by using subscripts, so that the state transition is facilitated;

E2. and (3) dynamically planning state transition steps: enumerating all partitioning schemes for partitioning the network layer into the FPGA, and calculating feasible maximum throughput according to the saturated throughput obtained in the preprocessing step to complete state transition;

E3. backtracking an optimal solution step: and obtaining an optimal division scheme according to the final optimal result and the FPGA number used for recording each time.

In certain embodiments, step a3 comprises: and writing the parameter data of the network layer into the FPGA according to the optimal scheme, thereby realizing the deep neural network.

As shown in fig. 4, this embodiment further provides an apparatus for implementing a deep neural network on a field programmable gate array, including:

a preprocessing module for implementing step a 1;

a dividing module for implementing step a 2;

and the upper plate module is used for realizing the step A3.

As shown in fig. 5, one embodiment of the present application provides a method 003 of implementing a deep neural network on a field programmable gate array, comprising:

and S1, reading the parameter data of the deep neural network, and calculating the resource demand of each network layer of the deep neural network.

In some embodiments, each network layer of the deep neural network has an identifier for distinguishing between identifications.

In some embodiments, the network layer comprises an input layer, a convolutional layer, a pooling layer, an excitation layer, a fully-connected layer, and an output layer; the number of the convolution layers, the pooling layers, the excitation layers and the full-connection layers can be multiple, and the specific number is set according to the requirement of practical application when the deep neural network is designed.

In some embodiments, the computing units of each network layer are extracted, and the amount of resources, such as the amount of storage space required, etc., required to implement these computing units using the FPGA is calculated.

In some embodiments, each of the network layers is provided with an identifier (which may be a number) for distinguishing between identifications.

S2, acquiring the saturation throughput of each network layer according to the resource demand; the saturated throughput is the throughput of the network layer when the resource demand is met.

In some embodiments, the saturation throughput of the network layer may be obtained through a test, that is, the network layer is written into an FPGA capable of meeting the resource demand of the network layer, and the throughput is tested and recorded to obtain the saturation throughput.

In some embodiments, the saturation throughput of the network layer can be obtained through calculation, and the saturation throughput obtained through prediction in a calculation mode is time-saving, but has a certain error with the saturation throughput obtained through actual test.

S3, selecting a plurality of FPGAs according to the resource demand and the saturation throughput, and enumerating all division schemes for dividing all the network layers into the plurality of FPGAs.

When partitioning network layers into FPGAs, it is ensured that each of the network layers reaches a saturated throughput.

In some embodiments, each FPGA is provided with an identifier (which may be a number) for distinguishing between identifications, and the network layers and FPGAs that match with each other are marked together by the respective identifiers, so as to record matching modes. For example, assuming that there are a total of 50 network layers, the 50 network layers are sequentially numbered as N, respectively₁、N₂、N₃、……、N₅₀(ii) a Assuming that the finally selected FPGAs are 10 in total, the 10 FPGAs are respectively set to be numbered A₁、A₂、A₃、……、A₁₀(ii) a If the first to fourth layer network layers are divided into number A in the division scheme₂In FPGA of (1), then N is₁、N₂、N₃、N₄And A₂Are marked togetherTo facilitate recording of the matching pattern.

In some embodiments, E (i, j, x) represents the throughput of the FPGA when the network layers from the ith layer to the jth layer are all placed on the xth FPGA; where 1 ≦ i ≦ n, 2 ≦ j ≦ n, n being the total number of network layers, e.g., n ≦ 50. And judging whether the ith to j-th layer network layers can be placed on the x FPGA according to the known quantities of C, a (i), and the like:

when the resource quantity of the x FPGA is less than the sum of the resource demand quantities of the i-th to j-th layer network layers, the x FPGA cannot be set;

when the resource quantity of the x-th FPGA is larger than or equal to the sum of the resource demands of the i-th to j-th layer network layers, if E (i, j, x) is min { E (i, j-1, x), b (j) }, the x-th FPGA can be put down; otherwise, the file cannot be placed.

And S4, calculating the effect parameter data of all the division schemes, and selecting the optimal scheme from all the division schemes according to the effect parameter data of all the division schemes.

The effect parameter data of the partitioning scheme includes the total throughput of the deep neural network and the number of FPGAs used.

In some embodiments, the conditions satisfied by the optimal solution include: the total throughput of the deep neural network reaches a preset threshold value, and the number of the used FPGAs is the least; for example, if the preset threshold is T1, selecting the partition scheme with the total throughput of T1 and the least number of FPGAs from all the partition schemes; if two division schemes with the same number of used FPGAs exist, the scheme with larger total throughput is taken as the optimal scheme.

In some embodiments, the conditions satisfied by the optimal solution include: the number of the used FPGAs reaches a preset threshold value, and the total throughput of the deep neural network is maximum; for example, if the preset threshold is 10 (i.e. the number of used FPGAs should not exceed 10), then from all the partition schemes, the partition scheme with the number of used FPGAs not greater than 10 and the maximum total throughput is selected; and if two partition schemes with the same total throughput exist, the scheme with the smaller number of the used FPGAs is taken as the optimal scheme.

Each network layer reaches a saturation throughput under the condition that the resource demand is met, and if a plurality of network layers are arranged on the same FPGA, the throughput of the FPGA is the minimum value of the throughputs of the network layers. Only successive network layers can be arranged on the same FPGA.

In some embodiments, the total throughput of the deep neural network is represented by F (l, s, x); l represents that the placed network layers are the 1 st to the l-th network layers; s is a binary number used to mark the FPGA being used.

For example, assuming that the total number of the FPGAs is 12 and the numbers are 1-12 respectively, a 12-bit binary number s is used for representing the use condition of the FPGAs, and the numbers of the FPGAs and the digits of the s are in one-to-one correspondence from left to right; for example, s-001101011001 indicates that 3 rd, 4 th, 6 th, 8 th, 9 th, and 12 th FPGAs are used.

The FPGA placed on the l-th layer is the FPGA with the number of x; wherein the constraint relationship of x and s is (s &2^ x) > 0. F (l, s, x) ═ max { min (E (k +1, l, x), B (y, x), F (k, s-2^ x, y)) }, wherein k < l, ((s-2^ x) &2^ y) >0, and the calculation can be completed by using circulation. An array is defined as the auxiliary array, and during the calculation, k and y of the maximum value are obtained by using the auxiliary array record F (l, s, x).

And selecting the optimal scheme according to the obtained values of F (l, s, x) in different division schemes and the number of the used FPGAs.

And S5, writing the parameter data of the network layer into the FPGA according to the optimal scheme, thereby realizing the deep neural network.

The parameter data of the network layer comprises: the parameter data of the input layer, the parameter data of the convolutional layer, the parameter data of the pooling layer, the parameter data of the excitation layer, the parameter data of the full-connection layer and the parameter data of the output layer;

the parameter data of the input layer comprises a layer type;

the parameter data of the convolution layer comprises a layer type, the number of convolution kernels, the size of the convolution kernels and the step length of the convolution kernels;

the parameter data of the pooling layer comprises a layer type, a pooling function, the size of the pooling layer and the step size of the pooling layer;

the parameter data of the excitation layer comprises a layer type and an excitation function;

the parameter data of the full connection layer comprises a layer type and an output number;

the parameter data of the output layer comprises a layer type.

In some embodiments, a dynamic programming algorithm is used according to the performance parameters of each FPGA and the resource demand and the saturation throughput of each network layer, and the optimal scheme of the network layers and the FPGA is obtained under the condition of meeting the resource demand of each network layer, so that the requirement of the throughput of the resource demand of the network layers is met, and the waste of FPGA resources is avoided.

Fig. 6 is a flowchart illustrating a method for implementing a deep neural network on a field programmable gate array according to another embodiment of the present application.

The embodiment also provides an electronic device, which includes a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor executes the program to implement the method.

The present embodiment also provides a non-transitory computer readable storage medium having a computer program stored thereon, wherein the computer program is executed by a processor to implement the method described above.

The embodiment also provides a data processing method, which utilizes the deep neural network realized by the method for realizing the deep neural network on the field programmable gate array to process data, such as processing image data and the like; and enabling the deep neural network realized on the field programmable gate array to communicate with a computer (the FPGA communicates with the computer), and inputting data to be processed into the deep neural network realized on the field programmable gate array through the computer to process so as to obtain a processing result. In some embodiments, the communication between the FPGA and the host is accomplished using a first-in-first-out queue at the time of communication. The data processing method has the advantages of high data processing speed and good processing result.

It should be noted that:

the term "module" is not intended to be limited to a particular physical form. Depending on the particular application, a module may be implemented as hardware, firmware, software, and/or combinations thereof. Furthermore, different modules may share common components or even be implemented by the same component. There may or may not be clear boundaries between the various modules.

The algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose devices may be used with the teachings herein. The required structure for constructing such a device will be apparent from the description above. In addition, this application is not directed to any particular programming language. It will be appreciated that a variety of programming languages may be used to implement the teachings of the present application as described herein, and any descriptions of specific languages are provided above to disclose the best modes of the present application.

In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the application may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the application, various features of the application are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: this application is intended to cover such departures from the present disclosure as come within known or customary practice in the art to which this invention pertains. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this application.

Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the application and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.

The various component embodiments of the present application may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functions of some or all of the components in the creation apparatus of a virtual machine according to embodiments of the present application. The present application may also be embodied as apparatus or device programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present application may be stored on a computer readable medium or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.

It should be noted that the above-mentioned embodiments illustrate rather than limit the application, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The application may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.

The above-mentioned embodiments only express the embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present application shall be subject to the appended claims.

Claims

1. A method of implementing a deep neural network on a field programmable gate array, comprising:

and implementing the optimal scheme on a board.

2. The method of claim 1, wherein analyzing the resource demand and saturation throughput of each network layer of the deep neural network comprises:

3. The method of claim 2, wherein said implementing said optimal solution upper plate comprises: and writing the parameter data of the network layer into the correspondingly divided field programmable gate arrays according to the optimal scheme.

4. The method of claim 2, wherein the parameter data of the network layer comprises a layer type of the convolutional layer, a number of convolutional kernels, a size of the convolutional kernels, and a step size of the convolutional kernels.

5. The method of claim 1, wherein the network layer comprises a convolutional layer.

6. The method of claim 1, wherein the optimal solution comprises a partitioning solution satisfying the following condition: the total throughput of the deep neural network reaches a preset threshold value, and the number of the field programmable gate arrays is the least.

7. The method of claim 1, wherein the optimal solution comprises a partitioning solution satisfying the following condition: the number of the field programmable gate arrays reaches a preset threshold value, and the total throughput of the deep neural network is maximum.

8. An apparatus for implementing a deep neural network on a field programmable gate array, comprising:

9. A non-transitory computer readable storage medium having stored thereon a computer program, characterized in that the program is executed by a processor to implement the method according to any one of claims 1-7.

10. A data processing method, characterized in that data is processed using a deep neural network implemented by the method of any one of claims 1-7.