WO2020087254A1

WO2020087254A1 - Optimization method for convolutional neural network, and related product

Info

Publication number: WO2020087254A1
Application number: PCT/CN2018/112569
Authority: WO
Inventors: 赵睿哲
Original assignee: 深圳鲲云信息科技有限公司
Priority date: 2018-10-30
Filing date: 2018-10-30
Publication date: 2020-05-07
Also published as: CN111602145A

Abstract

An optimization method for a convolutional neural network, and a related product. The method comprises: obtaining a pre-trained model M; re-training the pre-trained model M according to a data set D in a specified field to obtain an initial model M0, and perform a layer replacement operation on the initial model M0, wherein the layer replacement operation comprises: determining on the basis of a bipartite graph maximum matching algorithm that a standard convolutional layer e in the initial model M0 is suitable to be replaced with a high-efficiency convolutional layer, and determining a first intermediate model M1 effect gain of the standard convolutional layer e being replaced with the high-efficiency convolutional layer; renormalizing parameters of the first intermediate model M1 to obtain a second intermediate model M2; initializing and retraining the second intermediate model M2 to obtain a third intermediate model M3; calculating a loss value of the third intermediate model M3; repeatedly performing the layer replacement operation to obtain multiple third intermediate models M3 and multiple loss values; and selecting the third intermediate model M3 having the smallest loss value as an output model. The method is low in costs.

Description

Optimization method of convolutional neural network and related products

Technical field

The invention relates to the technical field of communication and artificial intelligence, in particular to an optimization method of convolutional neural networks and related products.

Background technique

In recent years, as a machine learning model, deep convolutional neural networks have achieved excellent results in computer vision and other fields, and even exceeded the average level of humans in some tasks, such as image classification and recognition, Go games, and so on. Convolutional neural networks generally include multiple convolutional layers, interspersed with pooling layers, linear rectification layers, etc. The top of the network generally has one or more fully connected layers, and the top is the loss function layer for training.

Transfer learning is a method of developing and training machine learning models. The purpose is to transfer the model M trained in domain A to domain B at a lower cost through methods such as retraining. Transfer learning technology is widely used in deep convolutional neural networks, but the training time of such networks is very long and the cost is high.

Summary of the invention

The embodiments of the present invention provide an optimization method and related products of a convolutional neural network. The trained model can be simply retrained and can be applied to the target field, which has the advantage of reducing costs.

In a first aspect, an embodiment of the present invention provides an optimization method for a convolutional neural network. The method includes the following steps:

Obtain the pre-training model M;

Retrain the pre-trained model M on the data set D of the specified domain to obtain the initial model M ₀ , and perform a replacement layer operation on the initial model M ₀ ;

The operation of the replacement layer includes: determining that the standard convolution layer e in the initial model M ₀ is suitable to be replaced with an efficient convolution layer based on the bipartite graph maximum matching algorithm, and determining that the standard convolution layer e is replaced with the first middle of the efficient convolution layer Model M ₁ effect gain; renormalize the parameters of the first intermediate model M1 to obtain the second intermediate model M2; initialize and retrain the second intermediate model M2 to obtain the third intermediate model M3; calculate the third intermediate model M3 Loss value

Repeating the replacement layer operation results in multiple third intermediate models M3 and multiple loss values; the third intermediate model M3 with the smallest loss value is selected as the output model.

Optionally, the determination that the standard convolution layer e in the initial model M ₀ is suitable to be replaced with an efficient convolution layer based on the bipartite graph maximum matching algorithm specifically includes:

Finding a group convolutional layer containing Ng groups from the initial model M ₀ is the minimum change in the importance of intra-layer connections;

The importance is the L2 norm of all weights in each connection;

Optionally, the loss value includes:

Among them, Lw is the loss value.

In a second aspect, an optimization device for a convolutional neural network is provided. The device includes:

The obtaining unit is used to obtain the pre-training model M;

The training unit is used to retrain the pre-trained model M in the data set D of the specified domain to obtain the initial model M ₀ ;

A replacement unit for performing a replacement layer operation on the initial model M ₀ ; the replacement layer operation includes: determining that the standard convolution layer e in the initial model M ₀ is suitable to be replaced with an efficient convolution layer based on a bipartite graph maximum matching algorithm, and determining the standard e convolutional layer is replaced with a first intermediate layer is efficient convolution model gain effect M _1; a first intermediate parameter model M1 will be renormalized to obtain a second intermediate model M2; second intermediate model M2 is initialized and re Train to obtain the third intermediate model M3; calculate the loss value of the third intermediate model M3;

The selection unit is used to control the replacement unit to repeatedly perform the replacement layer operation to obtain multiple third intermediate models M3 and multiple loss values; and select the third intermediate model M3 with the smallest loss value as the output model.

Optionally, the replacement unit is specifically used to find a group convolutional layer containing Ng groups from the initial model M ₀ where the change in the importance of intra-layer connections is minimal;

The importance is the L2 norm of all weights in each connection;

Optionally, the loss value includes:

Among them, Lw is the loss value.

In a third aspect, a computer-readable storage medium is provided that stores a program for electronic data exchange, wherein the program causes the terminal to perform the method provided in the first aspect.

The implementation of the embodiments of the present invention has the following beneficial effects:

It can be seen that the technical solution of the present application proposes a brand-new solution to optimize the convolutional neural network by replacing the convolutional layer. In the prior art, it is difficult to select which convolutional layers need to be replaced, and it is difficult to train the replaced model. Optimization schemes that are not based on layer replacement often require the use of a large amount of GPU computing resources, and the training time is usually very long. By using this solution, an optimized convolutional neural network model can be obtained within a few hours on the premise that only one NVidia, Titan, and XP GPU is used. Therefore, it saves time, improves efficiency, and reduces costs.

BRIEF DESCRIPTION

In order to more clearly explain the technical solutions in the embodiments of the present invention, the drawings required in the description of the embodiments will be briefly introduced below. Obviously, the drawings in the following description are some embodiments of the present invention. Those of ordinary skill in the art can obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a schematic flowchart of an optimization method for a convolutional neural network provided by this application.

FIG. 2 is a schematic diagram of providing a parameter in an initialization replacement layer provided by this application.

FIG. 3 is a schematic structural diagram of an optimization device for a convolutional neural network provided by this application.

detailed description

The technical solutions in the embodiments of the present invention will be described clearly and completely in the following with reference to the drawings in the embodiments of the present invention. Obviously, the described embodiments are part of the embodiments of the present invention rather than all the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by a person of ordinary skill in the art without making creative efforts fall within the protection scope of the present invention.

The terms "first", "second", "third" and "fourth" in the description and claims of the present invention and the accompanying drawings are used to distinguish different objects, not to describe a specific order . In addition, the terms "including" and "having" and any variations thereof are intended to cover non-exclusive inclusions. For example, a process, method, system, product, or device that includes a series of steps or units is not limited to the listed steps or units, but optionally includes steps or units that are not listed, or optionally also includes Other steps or units inherent to these processes, methods, products or equipment.

Reference herein to "embodiments" means that specific features, results, or characteristics described in connection with the embodiments may be included in at least one embodiment of the present invention. The appearance of the phrase in various places in the specification does not necessarily refer to the same embodiment, nor is it an independent or alternative embodiment mutually exclusive with other embodiments. Those skilled in the art understand explicitly and implicitly that the embodiments described herein can be combined with other embodiments.

The present application proposes an optimization method for convolutional neural networks, which is based on convolutional neural networks for transfer learning and convolutional layer replacement. The goal of this optimization method is to reduce the resource occupation and calculation speed of the convolutional neural network model in a specific domain D (also called the target domain) without losing the performance of the task as much as possible. The input accepted by this method is the pre-trained deep convolutional neural network model and the data set of the target domain, and the output is the optimized convolutional neural network model that has been trained on the target domain data set and replaced by layers. The product neural network model can use the target domain D.

As shown in Figure 1, the input to this method is a pre-trained model

That is, a model that is pre-trained on a large data set and can solve general problems. As shown in Figure 1, the optimization method includes the following steps:

Step S101: Obtain a pre-training model M;

Step S102: Retrain the pre-trained model M on the data set D in the specified field to obtain the initial model M ₀ , and perform a replacement layer operation on the initial model M ₀ ; the replacement layer operation includes the following steps S103-S106;

Step S103, based on the bipartite graph maximum matching algorithm, it is determined that the standard convolutional layer e in the initial model M ₀ is suitable to be replaced with an efficient convolutional layer, and the standard intermediate convolutional layer e is replaced with the first intermediate model M ₁ Gain

There are two core problems to be solved in this application: how to select the standard convolutional layer and the replacement target to be replaced, and how to train the model with the replacement layer on the target data set.

Because the deep convolutional neural network model often contains dozens of convolutional layers, and there are many options for replacement, if you use the enumeration algorithm, there will be a serious "combinatorial explosion" problem (combinatorial explosion), the enumeration algorithm has greatly increased Overhead and therefore inefficient.

The technical solution of the present application uses a method based on the maximum bipartite matching algorithm to determine which standard convolutional layers are suitable to be replaced with efficient convolutional layers, and the effect gain that the replacement can bring. The problem to be solved by the bipartite graph maximum matching algorithm can be described formally as formula (1).

In the formula, N _g refers to the number of group convolutions in the replacement target, L refers to the number of layers in the entire network, and C _l refers to the number of channels that input data at layer 1,

Refers to the importance of the connection between the c-th input channel and the f-th output channel in the lth layer,

Then it refers to whether the connection between the c-th input channel and the f-th output channel of the first layer should be deleted.

The formula describes an optimization problem. The goal of optimization is to find a layer replacement target, that is, a group convolution layer containing N _g groups (when N _g is equal to the number of channels, the group convolution is depth separable) Convolutional layer), so that the change in the importance of the connections within the layer is minimal, and the importance measurement formula (2) is the L2 norm of all weights in each connection.

formula,

The meaning of is unchanged,

Refers to the connection weight between the c-th input channel and the f-th output channel of the lth layer,

Then it refers to the k-th element in the weight.

Step S104: Renormalize the parameters of the first intermediate model M1 to obtain the second intermediate model M2;

Step S105: Initialize and retrain the second intermediate model M2 to obtain the third intermediate model M3;

Step S106: Calculate the loss value of the third intermediate model M3;

The calculation method of the above loss value may include:

among them,

The loss value is obtained by summing the loss values of all L layers. The loss value of each layer is taken as the weighted average of the following two items: the L2 norm of the weight (first item) and the L2 norm of the remaining weight after layer replacement (second item).

To determine whether the k-th connection between the c-th input channel and the f-th output channel of the l-th layer should be deleted, the value is 0 or 1. λ and λ _g are the weights of two weighted averages.

Step S107: Repeat the replacement layer operation to obtain multiple third intermediate models M3 and multiple loss values; select the third intermediate model M3 with the smallest loss value as the output model.

The maximum matching connection of the bipartite graph is used to initialize the model after layer replacement, and the parameters used for initialization are the original parameters in the pre-trained model. Prior to initialization, this method will additionally perform regularization of parameters to ensure that the initialized model can be trained as soon as possible. The patent also rearranges the channel order of the output result by adding a pointwise convolution layer after the replaced convolutional layer. Refer to FIG. 2, which is a schematic diagram of initializing parameters in the replacement layer.

The technical solution of the present application proposes a brand-new solution to optimize the convolutional neural network by replacing the convolutional layer. In the prior art, it is difficult to select which convolutional layers need to be replaced, and it is difficult to train the replaced model. Optimization schemes that are not based on layer replacement often require the use of a large amount of GPU computing resources, and the training time is usually very long. By using this solution, an optimized convolutional neural network model can be obtained within a few hours under the premise of using only one NVidia Titan XP.

Referring to FIG. 3, FIG. 3 provides an optimization device for a convolutional neural network. The device includes:

The obtaining unit 301 is used to obtain the pre-training model M;

The training unit 302 is configured to retrain the pre-trained model M in the data set D of the specified domain to obtain the initial model M ₀ ;

The replacement unit 303 is configured to perform a replacement layer operation on the initial model M ₀ ; the replacement layer operation includes: determining that the standard convolution layer e in the initial model M ₀ is suitable to be replaced with an efficient convolution layer based on a bipartite graph maximum matching algorithm, and determining The standard convolution layer e is replaced with the first intermediate model M ₁ effect gain of the efficient convolution layer; the parameters of the first intermediate model M ₁ are renormalized to obtain the second intermediate model M 2; the second intermediate model M 2 is initialized and Retraining to obtain the third intermediate model M3; calculating the loss value of the third intermediate model M3;

The selection unit 304 is configured to control the replacement unit to repeatedly perform the replacement layer operation to obtain multiple third intermediate models M3 and multiple loss values; and select the third intermediate model M3 with the smallest loss value as the output model.

An embodiment of the present invention also provides a computer storage medium, wherein the computer storage medium stores a computer program for electronic data exchange, which causes the computer to execute any of the convolutional neural networks described in the above method embodiments Some or all steps of the optimization method.

An embodiment of the present invention also provides a computer program product, the computer program product includes a non-transitory computer-readable storage medium storing a computer program, the computer program is operable to cause the computer to execute as described in the above method embodiments Some or all steps of any convolutional neural network optimization method.

It should be noted that, for the sake of simple description, the foregoing method embodiments are all expressed as a series of action combinations, but those skilled in the art should know that the present invention is not limited by the sequence of actions described. Because according to the invention, certain steps can be performed in other orders or simultaneously. Secondly, those skilled in the art should also know that the embodiments described in the specification are all optional embodiments, and the actions and modules involved are not necessarily required by the present invention.

In the above embodiments, the description of each embodiment has its own emphasis. For a part that is not detailed in an embodiment, you can refer to related descriptions in other embodiments.

In the several embodiments provided in this application, it should be understood that the disclosed device may be implemented in other ways. For example, the device embodiments described above are only schematic. For example, the division of the unit is only a logical function division. In actual implementation, there may be another division manner, for example, multiple units or components may be combined or may Integration into another system, or some features can be ignored, or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical or other forms.

The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit. The above integrated unit can be implemented in the form of hardware or software program modules.

If the integrated unit is implemented in the form of a software program module and sold or used as an independent product, it may be stored in a computer-readable memory. Based on such an understanding, the technical solution of the present invention essentially or part of the contribution to the existing technology or all or part of the technical solution can be embodied in the form of a software product, the computer software product is stored in a memory, Several instructions are included to enable a computer device (which may be a personal computer, server, network device, etc.) to perform all or part of the steps of the methods described in the various embodiments of the present invention. The aforementioned memory includes: U disk, Read-Only Memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), mobile hard disk, magnetic disk or optical disk and other media that can store program codes.

A person of ordinary skill in the art may understand that all or part of the steps in the various methods of the above embodiments may be completed by a program instructing relevant hardware. The program may be stored in a computer-readable memory, and the memory may include: a flash disk , Read-Only Memory (English: Read-Only Memory, abbreviation: ROM), Random Access Device (English: Random Access Memory, abbreviation: RAM), magnetic disk or optical disk, etc.

The embodiments of the present invention have been described in detail above, and specific examples have been used in this article to explain the principles and implementations of the present invention. The descriptions of the above embodiments are only used to help understand the method and the core idea of the present invention; Those of ordinary skill in the art, according to the ideas of the present invention, may have changes in specific implementations and application scopes. In summary, the content of this specification should not be construed as limiting the present invention.

Claims

An optimization method of a convolutional neural network, characterized in that the method includes the following steps:

Obtain the pre-training model M;

Retrain the pre-trained model M on the data set D of the specified domain to obtain the initial model M 0 , and perform a replacement layer operation on the initial model M 0 ;

The operation of the replacement layer includes: determining that the standard convolution layer e in the initial model M 0 is suitable to be replaced with an efficient convolution layer based on the bipartite graph maximum matching algorithm, and determining that the standard convolution layer e is replaced with the first middle of the efficient convolution layer Model M 1 effect gain; renormalize the parameters of the first intermediate model M1 to obtain the second intermediate model M2; initialize and retrain the second intermediate model M2 to obtain the third intermediate model M3; calculate the third intermediate model M3 Loss value

Repeating the replacement layer operation results in multiple third intermediate models M3 and multiple loss values; the third intermediate model M3 with the smallest loss value is selected as the output model.
The method according to claim 1, wherein the determining that the standard convolution layer e in the initial model M 0 is suitable to be replaced with an efficient convolution layer based on the bipartite graph maximum matching algorithm specifically includes:

Finding a group convolutional layer containing Ng groups from the initial model M 0 is the minimum change in the importance of intra-layer connections;

The importance is the L2 norm of all weights in each connection;
The method according to claim 1 or 2, wherein the loss value comprises:

Among them, Lw is the loss value.
An optimization device for a convolutional neural network, characterized in that the device includes:

The obtaining unit is used to obtain the pre-training model M;

The training unit is used to retrain the pre-trained model M in the data set D of the specified domain to obtain the initial model M 0 ;

A replacement unit for performing a replacement layer operation on the initial model M 0 ; the replacement layer operation includes: determining that the standard convolution layer e in the initial model M 0 is suitable to be replaced with an efficient convolution layer based on a bipartite graph maximum matching algorithm, and determining the standard e convolutional layer is replaced with a first intermediate layer is efficient convolution model gain effect M 1; a first intermediate parameter model M1 will be renormalized to obtain a second intermediate model M2; second intermediate model M2 is initialized and re Train to obtain the third intermediate model M3; calculate the loss value of the third intermediate model M3;

The selection unit is used to control the replacement unit to repeatedly perform the replacement layer operation to obtain multiple third intermediate models M3 and multiple loss values; and select the third intermediate model M3 with the smallest loss value as the output model.
The device according to claim 4, characterized in that

The replacement unit is specifically used to find a group convolutional layer containing Ng groups from the initial model M 0 where the change in the importance of intra-layer connections is minimal;

The importance is the L2 norm of all weights in each connection;
The device according to claim 4 or 5, wherein the loss value comprises:

Among them, Lw is the loss value.
A computer-readable storage medium storing a program for electronic data exchange, wherein the program causes a terminal to perform the method provided in any one of claims 1-3.
A computer program product comprising a non-transitory computer-readable storage medium storing a computer program, the computer program being operable to cause a computer to perform the method provided in any one of claims 1-3.