CN105243154A

CN105243154A - Remote sensing image retrieval method and system based on significant point characteristics and spare self-encodings

Info

Publication number: CN105243154A
Application number: CN201510708598.4A
Authority: CN
Inventors: 邵振峰; 周维勋; 李从敏
Original assignee: Wuhan University WHU
Current assignee: Wuhan University WHU
Priority date: 2015-10-27
Filing date: 2015-10-27
Publication date: 2016-01-13
Anticipated expiration: 2035-10-27
Also published as: CN105243154B

Abstract

A remote sensing image retrieval method and system based on significant point characteristics and spare self-encodings are disclosed. The method comprises the steps of extracting characteristic points of each image from an image library to obtain a characteristic point matrix, and calculating a salient map of each image based on a visual attention model; performing binaryzation on the salient maps by a self-adaption threshold value method, performing a mask calculation with the characteristic point matrix to obtain filtered significant characteristics points; separately choosing a plurality of significant characteristics points from each training image to configure training samples; training a spare auto-encoder network according to a whitened training sample set to obtain a characteristic extractor; extracting characteristics by the characteristic extractor, performing a sparsification treatment on the extracted image characteristics by a threshold function to obtain a final characteristic vector for retrieval; and performing image retrieval according to a preset similarity measurement criterion based on the extracted characteristic vector. The automatic extraction of the image characteristics is realized through the trained spare auto-encoder network; and in addition, the extracted characteristics are quite high in identification level, so that the retrieval precision ratio is ensured.

Description

Remote sensing image retrieval method and system based on salient point features and sparse self-coding

Technical Field

The invention belongs to the technical field of image processing, and relates to a remote sensing image retrieval method and system based on salient point features and sparse self-coding.

Background

Along with the improvement of the observation capability of remote sensing to the ground, the obtained remote sensing data has the characteristics of diversification and sea quantization. However, mass remote sensing data provides rich data sources for various important application requirements, and the problem of 'data mass and information inundation' of the remote sensing big data is increasingly prominent due to insufficient ground data processing and analyzing capability at present. How to utilize emerging scientific computing technology and means to quickly locate and intelligently retrieve an interested target or region in a remote sensing image is a challenge facing remote sensing big data processing and analysis and is also a scientific problem to be solved urgently in the field of remote sensing image processing. The remote sensing image retrieval technology is an effective method for solving the bottleneck problem, and has important significance in researching the efficient image retrieval technology.

The current remote sensing image retrieval technology mainly carries out similarity measurement on low-level features of an image so as to return a similar image. Compared with the traditional retrieval method based on keywords, the retrieval method based on the content has higher efficiency and accuracy, but the design of a feature description method capable of effectively describing various complex remote sensing image scenes is very difficult. In recent years, deep learning is becoming a research focus in the field of image recognition due to its good feature learning ability. Compared with the characteristics of artificial design, the method based on deep learning can obtain a characteristic extractor through sample training to realize automatic extraction of image characteristics, and is suitable for remote sensing image retrieval containing complex scenes. Due to the fact that network design and training are relatively simple, sparse self-coding has become a common deep learning method and is widely applied to image processing.

For sparse self-coding network training, in the aspect of constructing training samples, the existing method generally randomly selects a certain number and size of image blocks from a training image to construct the training samples, and the sample construction method has the following defects. First, from the perspective of human vision theory, people are interested in a specific target on a remote sensing image, and a randomly selected image block may not contain the specific target of interest. Second, since the size of the training image is fixed, the method of randomly selecting image blocks to construct training samples may result in insufficient training samples. Thirdly, since the training samples are image blocks, the features of the image blocks, not the entire image, are obtained when feature extraction is performed using the trained network, and therefore, the training samples cannot be directly used for image retrieval. In order to obtain the characteristics of the whole image, a convolution method is generally adopted, but the process is not only computationally inefficient, but also introduces other parameters. In the aspect of selecting an activation function, the existing method usually adopts a sigmoid function as the activation function of a network hidden layer neuron, and the sigmoid function has the problems of serious gradient disappearance and the like when the network propagates reversely, so that the network training is not facilitated. For sparse self-coding network feature extraction, the existing method generally directly takes the activation value of the hidden layer as the extracted feature without sparsification processing, and experiments show that the sparse feature performance is better.

Disclosure of Invention

Aiming at the defects in the prior art, the invention provides a remote sensing image retrieval technical scheme based on salient point features and sparse self-coding. The method takes the salient point features of the extracted remote sensing image as the input of the sparse self-coding network to train the sparse self-coding network, and finally extracts the image features by utilizing the trained feature extractor to realize remote sensing image retrieval.

The technical scheme adopted by the invention is a remote sensing image retrieval method based on salient point characteristics and sparse self-coding, which comprises the following steps:

step 1, extracting characteristic points of each image in an image library to obtain a characteristic point matrix, and calculating a saliency map of each image by using a visual attention model;

step 2, binarizing the saliency maps of each image in the image library by adopting a self-adaptive threshold method respectively, and performing mask operation on a feature point matrix corresponding to the image to obtain filtered saliency feature points; the implementation mode is as follows,

when the adaptive threshold method is adopted to binarize the saliency map, according to the saliency of the saliency map pixels, the binarization threshold value T of the saliency map is determined as follows,

T = \frac{2}{w \times h} Σ_{x = 1}^{w} Σ_{y = 1}^{h} I (x, y)

wherein w and h represent the width and height of the saliency map, respectively, and I (x, y) represents the saliency value of the saliency map pixel (x, y);

setting the significance map to be binarized according to a binarization threshold value T to obtain a binarized significance map, wherein a matrix I is arranged correspondingly_binaryLet P denote the feature point matrix of the image, P_IRepresenting the filtered salient feature point matrix, calculating the salient feature point matrix as follows,

P_{I} = P &CircleTimes; I_{b i n a r y}

step 3, taking a plurality of images from the image library as training images, respectively selecting a plurality of significant feature points from each training image to construct a training sample to obtain a training sample set X, and training a sparse self-coding network according to the whitened training sample set X' to obtain a feature extractor;

the sparse self-coding network comprises an input layer, an implicit layer and an output layer, wherein a neuron of the implicit layer adopts a ReLU function as an activation function, a neuron of the output layer adopts a softplus function as the activation function, a cost function of the sparse self-coding network is defined as follows,

J (W, b) = \frac{1}{2} | | X^{'} - H_{W, b} | |^{2} + \frac{λ}{2} | | W | |^{2}

wherein the first term is a mean square error term, the second term is a regularization term, H_W,bNetwork output value representing training sample set X', W ═ W₁,W₂]And b ═ b₁,b₂]Respectively representing weights W between the network input layer and the hidden layer₁And bias b₁And weight W between hidden layer and output layer₂And bias b₂A constructed weight matrix, wherein lambda represents a regular term coefficient;

step 4, extracting the features of all the images in the image library by using the feature extractor obtained by training in the step 3, and performing sparsification processing on the extracted image features by using a threshold function to obtain a final feature vector for retrieval; the implementation mode is as follows,

the extracted image feature Y is expressed as follows,

Y＝f₁(W₁P_I′+b₁)

wherein, the salient feature point matrix P_I' is the filtered salient feature point matrix P obtained from step 2_IThe whitened result;

for the extracted image feature Y, the following thinning processing is carried out to obtain a sparse feature matrix Z,

Z＝[Z₊,Z_-]＝[max(0,Y-α),max(0,α-Y)]

wherein α represents the threshold of the threshold function, matrix Z₊＝max(0,Y-α)，Z_-＝max(0,α-Y)；

The sparse feature matrix Z is further processed by setting the number of SIFT points detected from an image to be n to obtain a feature vector F as follows,

F = \frac{1}{n} Σ_{i = 1}^{n} [Z_{+}^{i}, Z_{-}^{i}]

wherein,andrespectively represent a matrix Z₊And Z_-The ith column vector of (2).

And 5, based on the feature vector extracted in the step 4, carrying out image retrieval according to a preset similarity measurement criterion.

And in step 1, extracting the feature points of each image in the image library to obtain a feature point matrix, and extracting by using an SIFT operator.

In step 5, the preset similarity measure criterion is urban distance.

The invention also correspondingly provides a remote sensing image retrieval system based on the salient point characteristics and the sparse self-coding, which comprises the following modules,

the characteristic point extraction module is used for extracting characteristic points of each image in the image library to obtain a characteristic point matrix and calculating a saliency map of each image by using a visual attention model;

the salient feature point extraction module is used for binarizing the salient images of all the images in the image library by adopting a self-adaptive threshold method respectively, and performing mask operation on a feature point matrix corresponding to the images to obtain filtered salient feature points; the implementation mode is as follows,

T = \frac{2}{w \times h} Σ_{x = 1}^{w} Σ_{y = 1}^{h} I (x, y)

P_{I} = P &CircleTimes; I_{b i n a r y}

the training module is used for taking a plurality of images from the image library as training images, respectively selecting a plurality of significant feature points from each training image to construct a training sample to obtain a training sample set X, and training a sparse self-coding network according to the whitened training sample set X' to obtain a feature extractor;

J (W, b) = \frac{1}{2} | | X^{'} - H_{W, b} | |^{2} + \frac{λ}{2} | | W | |^{2}

wherein, the first isThe term is the mean square error term, the second term is the regularization term, H_W,bNetwork output value representing training sample set X', W ═ W₁,W₂]And b ═ b₁,b₂]Respectively representing weights W between the network input layer and the hidden layer₁And bias b₁And weight W between hidden layer and output layer₂And bias b₂A constructed weight matrix, wherein lambda represents a regular term coefficient;

the feature extraction module is used for extracting features of all images in the image library by using the feature extractor obtained by training in the step 3, and performing sparsification processing on the extracted image features by using a threshold function to obtain a final feature vector for retrieval; the implementation mode is as follows,

the extracted image feature Y is expressed as follows,

Y＝f₁(W₁P_I′+b₁)

Z＝[Z₊,Z_-]＝[max(0,Y-α),max(0,α-Y)]

F = \frac{1}{n} Σ_{i = 1}^{n} [Z_{+}^{i}, Z_{-}^{i}]

And the retrieval module is used for retrieving the image according to a preset similarity measurement criterion on the basis of the feature vector extracted by the feature extraction module.

And in the characteristic point extraction module, extracting the characteristic points of each image in the image library to obtain a characteristic point matrix, and extracting by utilizing an SIFT operator.

In the retrieval module, the preset similarity measurement criterion adopts urban area distance.

Compared with the prior art, the invention has the following characteristics and beneficial effects,

1. the salient image of the image is calculated by adopting the visual attention model, and the characteristic points extracted by SIFT are filtered by binarization of the salient image to obtain the salient characteristic points of the image, so that the method not only accords with the visual attention characteristics of human eyes, but also can better reflect the retrieval requirements of people.

2. The significant feature points of the images are selected to construct training samples, so that the defect that the training samples are constructed by random sampling on the training images in the prior art is overcome.

3. The feature extractor obtained by sparse self-coding network training is used for realizing automatic extraction of image features, and the feature design process aiming at complex remote sensing images is omitted.

4. The expansibility is good, and the training samples include but are not limited to the salient feature points.

Drawings

FIG. 1 is a flow chart of an embodiment of the present invention.

Detailed Description

The remote sensing image retrieval method based on the salient point features and the sparse self-coding firstly extracts the feature points of an image to obtain a feature point matrix, calculates the salient map of the image, then performs 'mask' operation on the binarization of the salient map and the feature point matrix by adopting a self-adaptive threshold value to obtain the salient feature points, then selects a certain number of the salient feature points to construct a training sample to train a sparse self-coding network, automatically extracts the image features by utilizing a trained feature extractor to obtain feature vectors for retrieval, and finally performs image retrieval according to a preset similarity measurement method and returns a similar image.

To explain the technical solution of the present invention in detail, referring to fig. 1, the embodiment flow is specifically explained as follows:

step 1, extracting characteristic points of each image in an image library to obtain a characteristic point matrix, and calculating a saliency map of each image by using a visual attention model.

In particular, an existing image library or an image library constructed by a person skilled in the art may be used. For example, a high-resolution remote sensing image containing a plurality of ground feature categories is selected, and a Tiles blocking mode is adopted for segmentation to construct a retrieval image library containing a plurality of categories. For each image in the image library, in the embodiment, firstly, a SIFT (scaleinvarietfeaturetransform) operator is adopted to extract feature points (key points) of the image to obtain a feature point matrix, and then a GBVS (Graph-based visual salience) model is adopted to calculate a saliency map of the image.

And 2, for the saliency maps of the images in the image library, binarizing the saliency maps by adopting a self-adaptive threshold method respectively, and carrying out mask operation on the feature point matrixes corresponding to the images to obtain the filtered saliency feature points.

In the embodiment, a binarization threshold of the saliency map is determined according to the saliency of the pixel, and the binarization saliency map and the feature point matrix are subjected to 'mask' operation to obtain the salient feature points, which is realized as follows:

according to the significance size of the significant image pixel, a binarization threshold T of the significant image is determined by formula (1).

T = \frac{2}{w \times h} Σ_{x = 1}^{w} Σ_{y = 1}^{h} I (x, y) - - - (1)

Where w and h represent the width and height of the saliency map, respectively, and I (x, y) represents the saliency value of a pixel at the saliency map (x, y).

Binarizing the saliency map according to the binarization threshold value T to obtain a binarized saliency map, wherein a matrix I is arranged correspondingly_binary. Performing matrix search on feature points of image by using binary saliency mapLine filtering yields salient feature points. Let P denote the feature point matrix of the image, P_IRepresenting the filtered significant feature point matrix, the significant feature point matrix can be calculated by equation (2).

P_{I} = P &CircleTimes; I_{b i n a r y} - - - (2)

Wherein,

matrix array

Each element of the matrix P represents a feature vector corresponding to an SIFT key point, and the feature vector corresponding to the SIFT key point is generally 128-dimensional, and the embodiment of the invention correspondingly uses 128 dimensions;

matrix array

Wherein, P₁₂₈(x, y) represents a feature vector corresponding to the feature point, and if the pixel at (x, y) has no feature point, P is₁₂₈(x,y)＝0。I_binaryWherein each element is 0 or 1, I_binaryAnd (x, y) represents the value of the binary saliency map at (x, y). SymbolIs a number-times sign.

And 3, selecting a plurality of images from the image library as training images, respectively selecting a plurality of significant feature points from each training image to construct training samples, training a sparse self-coding network, and obtaining the feature extractor.

In the embodiment, in step 3, a certain number of salient feature points of the training image are selected instead of the traditional image block to construct the training sample, and a relu (rectifiedlinear units) function instead of the traditional sigmoid function is selected as the activation function of the sparse self-coding network hidden layer neuron during training. For example, each salient feature point in step 3 is a feature vector with dimensions of 4 × 4 × 8 ═ 128, and one feature point constitutes one training sample. In the implementation, the number of training images and the number of salient feature points in one training image can be automatically specified by those skilled in the art.

The concrete implementation is as follows:

firstly, selecting the salient feature points of the image, and constructing a training sample set.

The embodiment firstly randomly selects a certain number of images from an image library as training images, and then randomly selects salient feature points of the certain number of training images to construct a training sample set. The training sample set may be represented by equation (3):

where m represents the number of training samples, and each column of X represents a salient feature point, i.e., a training sample. For example, [ x ]_1,1,x_2,1,…,x_128,1]Is the 1 st training sample, [ x ]_1,2,x_2,2,…,x_128,2]Is the 2 nd training sample.

Then, training the sparse self-coding network to obtain a feature extractor.

Because the salient feature points extracted from the same training image have certain correlation, the training sample set X cannot be directly input into a sparse self-coding network for training. Before training, ZCA (zerocomponentaanalysis) whitening is adopted to process the training sample to obtain a whitened training sample set X', and relevant parameters during ZCA whitening are stored, and the ZCA whitening is realized in the prior art, which is not described in detail in the present invention.

The embodiment defines a sparse self-coding network comprising an input layer, a hidden layer and an output layer 3, wherein the hidden layer neurons use a ReLU function f₁Max (0, x) as the activation function, the output layer neurons use the softplus function f₂＝ln(1+e^x) As a function of activation. Compared with the traditional sigmoid function, the ReLU function can relieve the problem of gradient disappearance to a certain extent and is more beneficial to network training. Given a training sample set X', the cost function of the sparse self-encoding network can be defined as equation (4).

J (W, b) = \frac{1}{2} | | X^{'} - H_{W, b} | |^{2} + \frac{λ}{2} | | W | |^{2} - - - (4)

Where the first term is the mean square error term, the second term is the regularization term, H_W,bNetwork output value representing training sample set X', W ═ W₁,W₂]And b ═ b₁,b₂]Respectively representing weights W between the network input layer and the hidden layer₁And bias b₁And weight W between hidden layer and output layer₂And bias b₂And the constructed weight matrix is lambda represents a regular term coefficient. In specific implementation, the cost function in the formula (4) can be optimized by gradient descent and other methods during training to obtain the weight and the bias matrix parameters W and b.

And 4, extracting the features of all the images in the image library by using the feature extractor obtained by training in the step 3, and performing sparsification on the extracted features by using a threshold function to obtain a final feature vector for retrieval.

In step 4 of the embodiment, the salient feature points of the image are input into the feature extractor to be mapped to obtain corresponding image features, and then the extracted features are subjected to sparsification by using a threshold function to obtain a final feature vector for retrieval.

The extracted image feature Y can be represented by equation (5) as follows,

Y＝f₁(W₁P_I′+b₁)(5)

wherein W is₁P_I+b₁Substitution of the ReLU function f as variable x₁Max (0, x), salient feature point matrix P used here_I' is the result of preprocessing using the same ZCA whitening parameters as when whitening the training sample set X, based on the filtered significant feature point matrix obtained in step 2. And (4) performing sparsification processing on the extracted image features Y by using an equation (6) to obtain a sparse feature matrix Z.

Z＝[Z₊,Z_-]＝[max(0,Y-α),max(0,α-Y)](6)

Wherein α denotes the threshold values of the threshold functions f ═ max (0, x- α) and f ═ max (0, α -Y), the matrix Z₊＝max(0,Y-α)，Z_-＝max(0,α-Y)。

In order to obtain the final feature vector F for retrieval, let n be the number of SIFT points detected from one image, and further process the sparse feature matrix Z using equation (7).

F = \frac{1}{n} Σ_{i = 1}^{n} [Z_{+}^{i}, Z_{-}^{i}] - - - (7)

And 5, based on the feature vectors extracted in the step 4, carrying out image retrieval according to a preset similarity measurement criterion: in the implementation, one skilled in the art can preset the similarity measurement criteria. Embodiments use the urban distance (L1 norm) to calculate the similarity of the query image to other images and return related images by the magnitude of the similarity. In specific implementation, any image in the image library can be used as a query image to obtain a related image returned according to the similarity, and for other images except the image library, the feature vector can also be extracted in the same way and retrieved from the image library.

In specific implementation, the above processes can adopt a computer software mode to realize an automatic operation process, and can also adopt a modularized mode to provide a corresponding system. The invention also correspondingly provides a remote sensing image retrieval system based on the salient point characteristics and the sparse self-coding, which comprises the following modules,

T = \frac{2}{w \times h} Σ_{x = 1}^{w} Σ_{y = 1}^{h} I (x, y)

P_{I} = P &CircleTimes; I_{b i n a r y}

J (W, b) = \frac{1}{2} | | X^{'} - H_{W, b} | |^{2} + \frac{λ}{2} | | W | |^{2}

wherein the first itemIs a mean square error term, the second term is a regularizing term, H_W,bNetwork output value representing training sample set X', W ═ W₁,W₂]And b ═ b₁,b₂]Respectively representing weights W between the network input layer and the hidden layer₁And bias b₁And weight W between hidden layer and output layer₂And bias b₂A constructed weight matrix, wherein lambda represents a regular term coefficient;

the query feature extraction module is used for extracting features of the image to be queried by using the feature extractor obtained by training in the step 3, and performing sparsification processing on the extracted image features by using a threshold function to obtain a final feature vector for retrieval; the implementation mode is as follows,

the extracted image feature Y is expressed as follows,

Y＝f₁(W₁P_I′+b₁)

Z＝[Z₊,Z_-]＝[max(0,Y-α),max(0,α-Y)]

F = \frac{1}{n} Σ_{i = 1}^{n} [Z_{+}^{i}, Z_{-}^{i}]

And the retrieval module is used for retrieving the image according to a preset similarity measurement criterion based on the feature vector extracted by the query feature extraction module.

The specific embodiments described herein are merely illustrative of the invention. Various modifications or additions may be made to the described embodiments or alternatives may be employed by those skilled in the art without departing from the spirit or ambit of the invention as defined in the appended claims.

Claims

1. A remote sensing image retrieval method based on salient point features and sparse self-coding is characterized by comprising the following steps: comprises the following steps of (a) carrying out,

T = \frac{2}{w \times h} Σ_{x = 1}^{w} Σ_{y = 1}^{h} I (x, y)

P_{I} = P &CircleTimes; I_{b i n a r y}

J (W, b) = \frac{1}{2} | | X^{'} - H_{W, b} | |^{2} + \frac{λ}{2} | | W | |^{2}

the extracted image feature Y is expressed as follows,

Y＝f₁(W₁P_I′+b₁)

wherein, the salient feature point matrix P_I' is the filtered salient feature point matrix P obtained from step 2_IWhiteningThe latter result;

Z＝[Z₊,Z_-]＝[max(0,Y-α),max(0,α-Y)]

F = \frac{1}{n} Σ_{i = 1}^{n} [Z_{+}^{i}, Z_{-}^{i}]

2. The remote sensing image retrieval method based on the salient point features and the sparse self-coding as claimed in claim 1, wherein the method comprises the following steps: in the step 1, extracting the feature points of each image in the image library to obtain a feature point matrix, and extracting by utilizing an SIFT operator.

3. The remote sensing image retrieval method based on the salient point features and the sparse self-coding as claimed in claim 1 or 2, wherein the method comprises the following steps: in step 5, the urban area distance is adopted as the preset similarity measurement criterion.

4. A remote sensing image retrieval system based on salient point features and sparse self-coding is characterized in that: comprises the following modules which are used for realizing the functions of the system,

T = \frac{2}{w \times h} Σ_{x = 1}^{w} Σ_{y = 1}^{h} I (x, y)

setting the significance map to be binarized according to a binarization threshold value T to obtain a binarized significance map, wherein a matrix I is arranged correspondingly_binaryLet P denoteCharacteristic point matrix, P, of an image_IRepresenting the filtered salient feature point matrix, calculating the salient feature point matrix as follows,

P_{I} = P &CircleTimes; I_{b i n a r y}

J (W, b) = \frac{1}{2} | | X^{'} - H_{W, b} | |^{2} + \frac{λ}{2} | | W | |^{2}

wherein the first term is a mean square error term, the second term is a regularization term, H_W,bRepresenting a set of training samples XNet output value, W ═ W₁,W₂]And b ═ b₁,b₂]Respectively representing weights W between the network input layer and the hidden layer₁And bias b₁And weight W between hidden layer and output layer₂And bias b₂A constructed weight matrix, wherein lambda represents a regular term coefficient;

the extracted image feature Y is expressed as follows,

Y＝f₁(W₁P_I′+b₁)

Z＝[Z₊,Z_-]＝[max(0,Y-α),max(0,α-Y)]

F = \frac{1}{n} Σ_{i = 1}^{n} [Z_{+}^{i}, Z_{-}^{i}]

5. The remote sensing image retrieval system based on salient point features and sparse self-coding as claimed in claim 4, wherein: and in the characteristic point extraction module, extracting the characteristic points of each image in the image library to obtain a characteristic point matrix, and extracting by utilizing an SIFT operator.

6. The remote sensing image retrieval system based on the salient point features and the sparse self-coding as claimed in claim 4 or 5, wherein: in the retrieval module, the preset similarity measurement criterion adopts urban area distance.