CN109711411B

CN109711411B - Image segmentation and identification method based on capsule neurons

Info

Publication number: CN109711411B
Application number: CN201811505408.9A
Authority: CN
Inventors: 于慧敏; 黄伟
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2018-12-10
Filing date: 2018-12-10
Publication date: 2020-10-30
Anticipated expiration: 2038-12-10
Also published as: CN109711411A

Abstract

The invention discloses a capsule neuron-based collaborative segmentation identification method. The method uses a network built by capsule neurons to model and learn the shape knowledge of a target, and builds a model for collaborative segmentation and recognition based on the network. Compared with a classical scalar neuron, the capsule neuron can analyze and capture the geometric relation from a low-layer local instance to a high-layer local instance of the target layer by layer until the target is integrated. It can therefore unwrap the characteristics of target and background interference; and the features of the object may be further used for reconstruction and generation of the object. Based on the property of the capsule neurons in the capsule network, the invention builds a network topology structure of an encoder-decoder, can effectively learn and utilize the prior knowledge and information of the target, and applies the prior knowledge and information to the collaborative segmentation and recognition model. The method has strong expansibility, and the network of the encoder and the decoder can be replaced by other proper neural networks to meet different requirements.

Description

Image segmentation and identification method based on capsule neurons

Technology neighborhood

The invention belongs to image segmentation, automatic identification and target representation neighborhood, and particularly relates to an image segmentation identification method based on capsule neurons. The model makes efficient use of the properties specific to capsule neurons.

Background

In a model and a technical method of mutual cooperation of target segmentation and target identification, effective expression of a target is a key problem. The appropriate model and expression method, and how to generate the referenceable target object based on the prior knowledge, play an important role in establishing the cooperative process. In addition, the expansibility of the model is a point to be considered in the practical application process, and in some cases, the model needs to be expanded or reduced to different degrees for different applications to meet different requirements in terms of resource performance.

In recent years, deep learning and deep neural networks have played a great role in many computer vision and image processing tasks. The convolutional neural network is the most commonly used one in the deep neural networks at present, and is favored by the research community and the industrial community due to the strong expansibility and the excellent learning ability and expression ability. The capsule neuron is a nerve unit recently proposed by professor Hinton, and mainly aims to solve the problem that a convolutional neural network loses characteristic position information in an inference process. Capsule neurons focus on capturing the geometric relationships of target parts to target whole, trying to preserve such relationships and propagate their associated information. Therefore, the capsule neuron can resolve the target and the characteristics thereof from a plurality of interferences, and most of the interferences are filtered.

The characteristics of the capsule neurons are very helpful for the task of collaborative target segmentation and identification, on one hand, real targets can be analyzed from the segmentation results and the characteristics of the targets can be extracted, and most of interference in the segmentation process is filtered out, and the characteristics can be used for reconstructing or generating the real targets. On the other hand, the capsule-based deep neural network also has better expansibility.

According to the invention, a network of a coder-decoder framework is built based on the capsule neural unit and is introduced into a model for collaborative target segmentation and recognition, so that the learning, expression and generation of target shape knowledge are realized, and further the mutual collaboration of a segmentation task and a recognition task is realized.

Disclosure of Invention

The invention aims to provide an image segmentation and identification method based on capsule neurons. The method utilizes a capsule neuron-based deep neural network to learn, model and express the shape of the target. The deep neural network comprises two basic modules: an encoder and a decoder. The encoder uses a capsule neural unit, and the capsule neural unit is used for extracting and identifying target features in the current segmentation result; the decoder is used for generating an object shape for being referenced by the segmentation model based on the extracted object features and the recognition result. The two modules enable the two tasks to communicate with each other, communicate information with each other, work in concert to achieve better performance, and make the segmentation and recognition processes more explanatory.

The invention adopts the following technical scheme: a segmentation identification method based on capsule neuron images comprises the following steps:

step 1: based on two tuple data { target shape m containing L different classes_iObject class label y_iWhere i 1, N is the sample number, m_i∈{0，1}^H×WH and W are respectively an image m_iUsing capsule neurons to build and train an encoder network Enc for learning, extracting each target shape m_iIs characterized by

Wherein D is the dimension of the top capsule neuron of the encoder network; at the same time, based on the extracted features V_iTraining a decoder network Dec for generating the target shape;

step 2: for the image to be segmented and identified

Where there is one and only one object, C is the number of channels in the image I, E (q, t) using an energy function based on the image data_data(I, q) carrying out initial segmentation on I, and obtaining an initial result q belonging to [0, 1 ] by segmentation according to the principle of optimal energy]^H×WThe value q (x) of the pixel point position x characterizes the probability that the pixel belongs to the target;

and step 3: analyzing and identifying the initial result q by using an encoder network Enc to obtain the target shape feature V, wherein the label t of the identified target category is arg_lmax||v_lL, where v_lFor the l line, | V in the target feature V_lAnd | is its modulo length. The characteristics of the capsule neurons determine | | | v_l||∈[0，1]Thus | | | v_lI also represents the probability that the target belongs to class I;

and 4, step 4: generating a reference shape of the target using a decoder network Dec based on V and the recognition result t

The energy function in update step 2 is as follows:

E(q，t)＝α×E_data(I，q)+(1-α)×E_shape(q，t)

E_shape(q, t) is a reference shape

With the frontQ is a loss function, and alpha is weight; and obtaining an updated segmentation result q by using the updated energy function according to the principle of optimal energy.

And 5: and (5) repeating the steps 2, 3 and 4 until q converges or reaches the maximum iteration number, and outputting a segmentation result q and the identified target class label t.

The invention has the beneficial effects that:

(1) analyzing a target segmentation result by using a network established by a capsule neural unit, capturing a geometric relation from a target local part to a target whole, and filtering redundant interference information in the process of executing a cooperative task;

(2) the features extracted by the capsule network have stronger semantic information, and each dimensional feature can represent one attribute of a target, which brings interpretability to the identification process;

(3) the network of the structure of the encoder-decoder in the collaborative model has better expansibility, and can be replaced by other proper neural network modules, thereby expanding the application range of the collaborative model.

Drawings

FIG. 1 is a flow chart of the present invention;

FIG. 2 is an image to be segmented and identified;

fig. 3 to 7 show segmentation recognition results obtained in iterations 1, 20, 40, 60, and 80, where L is 30;

fig. 8-12 are reference shapes generated during iterations 1, 20, 40, 60, 80.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

On the contrary, the invention is intended to cover alternatives, modifications, equivalents and alternatives which may be included within the spirit and scope of the invention as defined by the appended claims. Furthermore, in the following detailed description of the present invention, certain specific details are set forth in order to provide a better understanding of the present invention. It will be apparent to one skilled in the art that the present invention may be practiced without these specific details.

Referring to fig. 1, a flowchart of steps of a capsule neuron-based collaborative segmentation and recognition model according to an embodiment of the present invention is shown.

Given a training dataset { target shape m_iObject class label y_iTesting a target image I_testThe method comprises the following steps:

1. training shape expression models and appearance expression models

(1.1) based on the data set D₀(target shape m)_iObject class label y_iAnd expanding the target shape appropriately (namely, expanding a data set), and performing displacement, deformation, rotation and perspective transformation of different degrees on part of the training shapes to generate more shapes for training. Which is defined with its tags as a data set

All target shapes are photographed

Normalized to 80 x 80 size.

(1.2) adding D₁Sample pair of

Inputting the shape into a coder-decoder network for shape learning, and establishing a shape recognition model Enc and a shape generation model Dec;

(1.3) the encoder-decoder network structure is:

based on losses

And training the network.

2. For test image I_testFor example, FIG. 2

(2.1) in this embodiment, the image data energy term is created by using the following method, where f (x) -logp (i (x)) iq (x)) is ≧ τ, g (x) -logp (i (x)) iq (x)) < τ, where T is a foreground probability confidence threshold, and i (x) is image data (e.g., a gray scale value) of a pixel point x. p (I (x) q (x) is equal to tau) represents the pixel color distribution of the foreground area, and p (I (x) q (x) is less than tau) represents the pixel color distribution of the background area. The data item is thus E_data(I；q)＝∑_xq (x) f (x) (1-q (x)) g (x); for energy function E (q, t) ═ E_data(I, q), dividing to obtain an initial result q according to the principle of energy optimization⁰；

(2.2) analyzing and identifying the target shape q by using the encoder network Enc to obtain the target shape feature V, and identifying the target class label t ═ arg_lmax||v_lL, where v_lFor the l line, | V in the target feature V_lAnd | is its modulo length. The characteristics of the capsule neurons determine | | | v_l||∈[0，1]Thus | | | v_lI also represents the probability that the target belongs to class I;

(2.3) generating a reference shape of the object using the decoder network Dec based on V and the recognition result t

The energy function in the update (2.1) is as follows:

E(q，t)＝α×E_data(I，q)+(1-α)×E_shape(q，t)

E_shape(q, t) is a reference shape

A loss function with q, α being the weight; and obtaining an updated segmentation result q by using the updated energy function according to the principle of optimal energy.

(2.4) repeating the step 2.1-2.3 until q converges or reaches the maximum iteration number, and outputting the segmented target q and the identified target class label t. The iterative process is as follows:

(a) in the k-th sub-optimization iteration process, the segmentation result q of the k-1-th time is divided by using Enc^k-1Extracting and identifying the shape to obtain the target characteristic V^kAnd the recognition result t^k；

(b) Based on target characteristics V^kAnd the recognition result t^kExcept for V^kSatisfies max | | v_lThe other is the feature of interference information, so all the other rows except the t-th row are set to be 0, and then V is set^kExpanded into a vector as input to the generative model Dec to generate a reference shape

(c) Based on reference shape

The loss function defined is:

(d) weighting the two energy terms to obtain the final energy

E(q，t)＝α×E_data(q)+(1-α)×E_shape(q，t)

Adding edge constraint terms

Based on the splitbegman method, the total energy can be converted to the following form:

wherein r is_data(x)＝f(x)-g(x)，

8-12 are reference shapes generated during iterations 1, 20, 40, 60, 80, and it can be seen that the reference shape of the target generated is slightly rough because the confidence of segmentation and recognition at the initial iteration is not very high, but the capsule network still analyzes a rough target from the reference shape, filters partial interference information, and retains a rough outline of the target; as the iteration progresses, the generated reference shape becomes finer and more specific, and conforms more and more to the target area in the actual test image.

Meanwhile, the features extracted in the identification process have strong interpretability, on one hand, the extracted features can be used for reconstructing a target shape; on the other hand, due to the nature of the capsule neurons themselves, each dimensional feature represents some deformation property of the target.

And because the capsule neuron is only a special neural unit in the deep neural network, the encoder network and the decoder network module also have expansibility naturally, and the scale and the number of layers of the network can be reduced or increased, so that the encoder-decoder module in the method can be replaced by other suitable network modules to meet different resource constraints and application requirements.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims

1. A segmentation identification method of an image based on capsule neurons is characterized by comprising the following steps:

step 1: based on two tuple data { target shape m containing L different classes_iObject class label y_iWhere i is 1, …, N is the sample number, m_i∈{0,1}^H×WH and W are respectively an image m_iUsing capsule neurons to build and train an encoder network Enc for learning, extracting each target shapem_iIs characterized by

step 2: for the image to be segmented and identified

Where there is one and only one object, C is the number of channels in the image I, respectively, using an energy function E based on the image data_data(I, q) carrying out preliminary segmentation on the I, and obtaining an initial segmentation result q by segmentation according to the principle of optimal energy⁰∈[0,1]^H×WThe value q (x) of the pixel point position x characterizes the probability that the pixel belongs to the target;

and step 3: splitting the result q using the encoder network Enc⁰Analyzing and identifying to obtain the target shape feature V, and identifying the target type label t ═ argmax | | | V_lL, where v_lFor the l line, | V in the target feature V_l| | is its modulo length; the characteristics of the capsule neurons determine | | | v_l||∈[0,1]Thus | | | v_lI also represents the probability that the target belongs to class I;

The update energy function is as follows:

E(q⁰,t)＝α×E_data(I,q⁰)+(1-α)×E_shape(q⁰,t)

E_shape(q⁰and t) is a reference shape

With the initial segmentation result q⁰α is the weight; using the updated energy function in terms of energyOptimal principle, obtaining updated segmentation result

And 5: according to step 3-4, the updated segmentation result

Performing iterative optimization until

And converging or reaching the maximum iteration times, and outputting a final segmentation result q' and the identified target class label t.