CN111783789A

CN111783789A - Image sensitive information identification method

Info

Publication number: CN111783789A
Application number: CN202010607308.8A
Authority: CN
Inventors: 李凯勇
Original assignee: Qinghai Nationalities University
Current assignee: Qinghai Nationalities University
Priority date: 2020-06-30
Filing date: 2020-06-30
Publication date: 2020-10-16

Abstract

The invention discloses an image sensitive information identification method, which aims to improve the detection precision and reduce the noise influence by a filtering algorithm; aiming at weakening image details, particularly edge information, caused by a filtering algorithm, enhancing the image detail information by sharpening; in order to avoid redundant detection, the color feature, the shape feature and the texture feature of the image are extracted as detection objects to retrieve sensitive information, so that the retrieval process is simple and accurate; and the accurate classification capability of the support vector machine is fully utilized, and different types of sensitive information are accurately identified.

Description

Image sensitive information identification method

Technical Field

The invention relates to the field of image signal processing, in particular to an image sensitive information identification method.

Background

With the rapid development of the internet and mobile communication technologies, it becomes more convenient and diversified for people to acquire information. The information is mixed with a large amount of sensitive information, including sensitive content information such as pornography, violence, contraband, evil education, reaction and the like, and great challenges are brought to information security review of network information supervision departments. The vast amount of images containing sensitive information is widely spread over networks, negatively impacting social development. Therefore, research into image sensitive information recognition methods is increasingly important.

The traditional image sensitive information identification method mainly aims at identifying text information in an image, and does not consider non-text sensitive information factors, so that a large amount of sensitive information is missed to be identified; in order to improve the accuracy of sensitive information identification, various methods are proposed in the industry at present, but the effect is not good, wherein the method with better development prospect is a machine learning-based sensitive information detection and identification method, but as the research is shallow, the method still has defects in the accuracy method.

Disclosure of Invention

Aiming at the defects in the prior art, the image sensitive information identification method provided by the invention improves the detection precision and solves the problems of low precision and missing detection.

In order to achieve the purpose of the invention, the invention adopts the technical scheme that: an image sensitive information identification method, comprising the steps of:

s1, denoising the original image through a filtering algorithm to obtain a denoised image;

s2, carrying out sharpening processing on the denoised image to obtain an edge enhanced image;

s3, extracting color features, shape features and texture features of the edge enhancement image;

s4, according to the color feature, the shape feature and the texture feature of the edge enhancement image, carrying out sensitive information retrieval on the edge enhancement image to obtain sensitive information;

and S5, processing the sensitive information through a support vector machine, and realizing the identification and classification of the sensitive information.

The invention has the beneficial effects that: in order to improve the detection precision, the noise influence is reduced through a filtering algorithm; aiming at weakening image details, particularly edge information, caused by a filtering algorithm, enhancing the image detail information by sharpening; in order to avoid redundant detection, the color feature, the shape feature and the texture feature of the image are extracted as detection objects to retrieve sensitive information, so that the retrieval process is simple and accurate; and the accurate classification capability of the support vector machine is fully utilized, and different types of sensitive information are accurately identified.

Further, the step S1 includes the following sub-steps:

s11, denoising the original image through a mean value filtering algorithm to obtain a primary denoised image, wherein the mathematical expression of the mean value filtering algorithm is as follows:

wherein r (x, y) is the pixel value of the coordinate (x, y) pixel point of the original image, q₁(x₀,y₀) As the coordinates (x) of the originally denoised image₀,y₀) The pixel value of the pixel point, omega, is the coordinate (x)₀,y₀) K is the coordinate (x)₀,y₀) The sum of pixel values of the pixel itself and the neighborhood pixels;

s12, carrying out further denoising processing on the primary denoised image through a median filtering algorithm to obtain a denoised image, wherein the mathematical expression of the median filtering algorithm is as follows:

q(x₀,y₀)＝mid[q₁(x,y)|_(x,y)∈Ω](2)

wherein q is₁(x, y) is the pixel value of the coordinate (x, y) pixel point of the primary denoised image, q (x) is₀,y₀) For de-noised image coordinates (x)₀,y₀) The pixel value mid of the pixel point]The function indicates taking the middle bracket "[ alpha ], [ alpha ]]"median of individual data within.

The beneficial effects of the above further scheme are: gaussian noise is filtered through mean filtering, salt and pepper noise is filtered through median filtering, and two filtering means are combined, so that two types of noise mainly existing in an image are effectively filtered, and a foundation is laid for accurate identification.

Further, the step S2 includes the following sub-steps:

s21, calculating to obtain a gradient vector of the de-noised image by adopting a difference method, wherein a mathematical expression of the difference method comprises the following equation:

wherein,

the partial derivatives of coordinate (x, y) pixel points of the denoised image in the direction of an x axis;

the partial derivatives of coordinate (x, y) pixel points of the denoised image in the y-axis direction are shown, and G is a gradient vector of the denoised image;

s22, obtaining an edge enhancement image according to the gradient vector of the denoised image by the following formula:

wherein | G | purple₂Is the two-norm of the gradient vector G, T is the gradient threshold, and f (x, y) is the pixel value of the coordinate (x, y) pixel point of the edge-enhanced image.

The beneficial effects of the above further scheme are: although the filtering algorithm can filter noise, it can also destroy the detail information of the image, especially the edge information, and the gradient of the image shows the edge information of the image, so the gradient is modulo by the two norm operation, the edge information of the image is enhanced, and the detail information weakened by the filtering process is recovered.

Further, the step S3 includes the following sub-steps:

s31, extracting the color characteristics of the edge enhanced image through the following formula:

wherein, p (L) is the probability of the appearance of the first-level gray level, which is the color characteristic of the edge enhanced image, N (L) is the number of the first-level gray level pixel points, N is the total number of the pixel points contained in the edge enhanced image, the number of gray levels L is an integer, the value is in the left-closed right-open interval [0, L ], and L is the highest gray level value;

s32, extracting the shape feature of the edge enhanced image through the following equation:

where F is the two-dimensional image matrix of the edge-enhanced image, I_xDetecting the gray value, I, for the horizontal shape_yDetecting gray values for the shape in the vertical direction, wherein × is a matrix cross product operator, and I is a shape feature matrix of the edge enhanced image;

s33, calculating texture energy, moment of inertia, entropy and row-column similarity of the edge enhancement image through the following equations, and collecting texture characteristics by the texture energy, the moment of inertia, the entropy and the row-column similarity:

F₁＝∑_i,jp₁(i,j)²(11)

F₂＝∑_i,j(i-j)²p₁(i,j)²(12)

F₃＝∑_i,jp₁(i,j)·log₂p₁(i,j)(13)

wherein, F₁As texture energy, F₂Is moment of inertia, F₃Is entropy, F₄For line-row similarity, p₁(i, j) is the probability of the edge-enhanced image appearing in gray levels from level i to level j within a unit distance, μ_xIs the line mean, mu_yIs column mean, σ_xIs the line variance, σ_yIs the column variance.

The beneficial effects of the above further scheme are: the color feature and the shape feature are important feature information of an image known to the public, the texture feature explains an internal structure and pixel distribution space information of the image, texture energy represents the intensity of pixel distribution, rotational inertia represents the thickness of the texture, entropy represents the complexity of the image, and the internal information of the image is obtained through the extraction of the physical quantities without taking the whole image as a sample in a matrix form for operation, so that the operation amount is greatly reduced, redundancy is removed, and the accuracy is guaranteed.

Further, the step S4 includes the following sub-steps:

s41, carrying out primary screening and classification on the edge enhanced image according to the color feature, the shape feature and the texture feature of the edge enhanced image to obtain a text-free image and a text-containing image;

s42, carrying out sensitive information retrieval on the text-free image through an image color feature algorithm to obtain sensitive information of the text-free image;

s43, identifying a text area of the text-containing image according to the texture feature and the color feature of the text-containing image;

and S44, converting the text area containing the text image into a normalized feature vector, and detecting the normalized feature vector by adopting a convolutional neural network to obtain sensitive information containing the text image.

Further, the step S42 includes the following sub-steps:

a1, carrying out space transformation on the text-free image, and converting the text-free image into an HSV space;

a2, calculating a first moment, a second moment and a third moment of color features of the HSV space textless image to obtain a3 x 3 matrix;

a3, performing matrix transformation on the 3 x 3 matrix to obtain an unnormalized feature column vector with the dimension of 9;

a4, normalizing the unnormalized feature column vector to obtain a normalized feature vector;

and A5, detecting the normalized feature vectors by adopting a convolutional neural network to obtain sensitive information of the text-free image.

Further, the method of step S43 is: regions of the text-containing image having a stable texture width and uniform color characteristics are identified as text regions of the text-containing image.

Further, the method for retrieving the sensitive features by the convolutional neural network in the steps S44 and a5 comprises the following sub-steps:

b1, establishing a batch of normalized feature vectors of known sensitive information to form a training set;

b2, training the weight parameters and the bias parameters of the convolutional neural network according to the training set by the following equations to obtain the trained convolutional neural network as the sensitive information detector:

wherein, w is a weight parameter of the convolutional neural network, b is a bias parameter of the convolutional neural network, J (w, b) is a loss function of the convolutional neural network, and alpha is a learning rate;

b3, detecting the normalized feature vector through a sensitive information detector to obtain detection features;

b4, calculating the similarity between the detection characteristics and the sensitive information sample set;

b5, judging whether the similarity is more than or equal to 0.5, if so, jumping to the step B6, and if not, jumping to the step B7;

b6, taking the normalized feature vector as sensitive information;

and B7, the sensitive information is an empty set.

Further, the step S5 includes the following sub-steps:

s51, connecting M support vector machines in series to obtain a binary tree type multi-classifier;

s52, training a kth support vector machine through a kth class sensitive information sample set to obtain a trained binary tree type multi-classifier, wherein k is a positive integer in a closed interval [1, M ];

and S53, identifying and classifying the sensitive information through the trained binary tree type multi-classifier.

The beneficial effects of the above further scheme are: the support vector machine can accurately realize the classification processing of the NOT and the sensitive information identification in a dimensionality reduction manner, and the essence of the sensitive information identification is to accurately classify the sensitive information, so that the support vector machine and the identification method thereof are connected in series, each stage of support vector machine completes the function in the support vector machine, detects whether the support vector machine is the type which needs to be detected, and hands over the type of the NOT to the post-stage processing, completes the output of a binary tree type in a pipeline working mode, realizes the quick parallel processing, and has higher precision and higher speed compared with various identification methods based on a residual error neural network, a convolutional neural network and a clustering algorithm.

Drawings

FIG. 1 is a flow chart of a method for identifying image sensitive information;

fig. 2 is a graph comparing experimental effects.

Detailed Description

The following description of the embodiments of the present invention is provided to facilitate the understanding of the present invention by those skilled in the art, but it should be understood that the present invention is not limited to the scope of the embodiments, and it will be apparent to those skilled in the art that various changes may be made without departing from the spirit and scope of the invention as defined and defined in the appended claims, and all matters produced by the invention using the inventive concept are protected.

As shown in fig. 1, in one embodiment of the present invention, an image sensitive information recognition method includes the following steps:

s1, denoising the original image through a filtering algorithm to obtain a denoised image, and the method comprises the following steps:

q(x₀,y₀)＝mid[q₁(x,y)|_(x,y)∈Ω](2)

Gaussian noise is filtered through mean filtering, salt and pepper noise is filtered through median filtering, and two filtering means are combined, so that two types of noise mainly existing in an image are effectively filtered, and a foundation is laid for accurate identification.

S2, carrying out sharpening processing on the denoised image to obtain an edge enhanced image, and comprising the following steps:

wherein,

Although the filtering algorithm can filter noise, it can also destroy the detail information of the image, especially the edge information, and the gradient of the image shows the edge information of the image, so the gradient is modulo by the two norm operation, the edge information of the image is enhanced, and the detail information weakened by the filtering process is recovered.

S3, extracting color features, shape features and texture features of the edge enhanced image, comprising the following sub-steps:

F₁＝∑_i,jp₁(i,j)²(11)

F₂＝∑_i,j(i-j)²p₁(i,j)²(12)

F₃＝∑_i,jp₁(i,j)·log₂p₁(i,j)(13)

The color feature and the shape feature are important feature information of an image known to the public, the texture feature explains an internal structure and pixel distribution space information of the image, texture energy represents the intensity of pixel distribution, rotational inertia represents the thickness of the texture, entropy represents the complexity of the image, and the internal information of the image is obtained through the extraction of the physical quantities without taking the whole image as a sample in a matrix form for operation, so that the operation amount is greatly reduced, redundancy is removed, and the accuracy is guaranteed.

S4, according to the color feature, the shape feature and the texture feature of the edge enhanced image, sensitive information retrieval is carried out on the edge enhanced image to obtain sensitive information, and the method comprises the following steps:

Wherein, step S42 includes the following substeps:

And the method for searching the sensitive features by the convolutional neural network in the steps S44 and A5 comprises the following sub-steps:

b6, taking the normalized feature vector as sensitive information;

and B7, the sensitive information is an empty set.

S5, processing the sensitive information through a support vector machine, and realizing the identification and classification of the sensitive information, wherein the method comprises the following steps:

The support vector machine can accurately realize the classification processing of the NOT and the sensitive information identification in a dimensionality reduction manner, and the essence of the sensitive information identification is to accurately classify the sensitive information, so that the support vector machine and the identification method thereof are connected in series, each stage of support vector machine completes the function in the support vector machine, detects whether the support vector machine is the type which needs to be detected, and hands over the type of the NOT to the post-stage processing, completes the output of a binary tree type in a pipeline working mode, realizes the quick parallel processing, and has higher precision and higher speed compared with various identification methods based on a residual error neural network, a convolutional neural network and a clustering algorithm.

In order to verify the effect of the invention, a comparison test is designed, and the method provided by the invention is compared with two common prior arts, namely a method based on principal component analysis and a method based on template matching.

The information of the experimental data is shown in table 1 and the experimental results in table 2, from which it can be seen that the accuracy of the study method herein is higher when different experimental data are identified as sensitive information. In the calculation process, the average accuracy of the three identification methods is 61.5%, the accuracy of the identification method of the principal component analysis is 63.7%, and the accuracy of the identification method of the template matching is 85.83%.

Table 1 experimental data information

TABLE 2 accuracy of sensitive information under different identification methods (%)

On the basis of the above experiment, the identification time of different methods is compared, and the result is shown in fig. 2, and it can be seen that the identification time of the method of the invention is far shorter than that of the other two technologies.

In conclusion, in order to improve the detection precision, the noise influence is reduced through a filtering algorithm; aiming at weakening image details, particularly edge information, caused by a filtering algorithm, enhancing the image detail information by sharpening; in order to avoid redundant detection, the color feature, the shape feature and the texture feature of the image are extracted as detection objects to retrieve sensitive information, so that the retrieval process is simple and accurate; and the accurate classification capability of the support vector machine is fully utilized, and different types of sensitive information are accurately identified.

Claims

1. An image sensitive information recognition method, comprising the steps of:

2. The image sensitive information recognition method of claim 1, wherein the step S1 includes the following sub-steps:

wherein r (x, y) is the pixel value of the coordinate (x, y) pixel point of the original image, q₁(x₀，y₀) As the coordinates (x) of the originally denoised image₀，y₀) The pixel value of the pixel point, omega, is the coordinate (x)₀，y₀) K is the coordinate (x)₀，y₀) The sum of pixel values of the pixel itself and the neighborhood pixels;

q(x₀，y₀)＝mid[q₁(x，y)|_(x，y)∈Ω](2)

wherein q is₁(x, y) is the pixel value of the coordinate (x, y) pixel point of the primary denoised image, q (x) is₀，y₀) For de-noised image coordinates (x)₀，y₀) The pixel value mid of the pixel point]The function indicates taking the middle bracket "[ alpha ], [ alpha ]]"median of individual data within.

3. The image sensitive information recognition method of claim 2, wherein the step S2 includes the following sub-steps:

wherein,

4. The image sensitive information recognition method according to claim 3, wherein the step S3 includes the following sub-steps:

where F is the two-dimensional image matrix of the edge-enhanced image, I_xDetecting the gray value, I, for the horizontal shape_yDetect gray values for vertical shape, × matrix cross product operator, I shape for edge enhanced imageA feature matrix;

F₁＝∑_i，jp₁(i，j)²(11)

F₂＝∑_i，j(i-j)²p₁(i，j)²(12)

F₃＝∑_i，jp₁(i，j)·log₂p₁(i，j) (13)

5. The image sensitive information recognition method of claim 4, wherein the step S4 includes the following sub-steps:

6. The image sensitive information recognition method of claim 5, wherein the step S42 includes the following sub-steps:

7. The method for identifying image sensitive information according to claim 5, wherein the method of step S43 is: regions of the text-containing image having a stable texture width and uniform color characteristics are identified as text regions of the text-containing image.

8. The image sensitive information identification method according to claim 5, wherein the method of retrieving the sensitive feature by the convolutional neural network in steps S44 and A5 comprises the following sub-steps:

b6, taking the normalized feature vector as sensitive information;

and B7, the sensitive information is an empty set.

9. The image sensitive information recognition method of claim 5, wherein the step S5 includes the following sub-steps: