CN111783789A - Image sensitive information identification method - Google Patents
Image sensitive information identification method Download PDFInfo
- Publication number
- CN111783789A CN111783789A CN202010607308.8A CN202010607308A CN111783789A CN 111783789 A CN111783789 A CN 111783789A CN 202010607308 A CN202010607308 A CN 202010607308A CN 111783789 A CN111783789 A CN 111783789A
- Authority
- CN
- China
- Prior art keywords
- image
- sensitive information
- feature
- text
- edge
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 58
- 238000001914 filtration Methods 0.000 claims abstract description 31
- 238000012706 support-vector machine Methods 0.000 claims abstract description 22
- 238000001514 detection method Methods 0.000 claims abstract description 18
- 239000013598 vector Substances 0.000 claims description 39
- 238000013527 convolutional neural network Methods 0.000 claims description 26
- 239000011159 matrix material Substances 0.000 claims description 20
- 238000012545 processing Methods 0.000 claims description 16
- 238000012549 training Methods 0.000 claims description 12
- 230000006870 function Effects 0.000 claims description 8
- 230000009191 jumping Effects 0.000 claims description 6
- 230000009466 transformation Effects 0.000 claims description 6
- 238000012216 screening Methods 0.000 claims description 3
- 230000002708 enhancing effect Effects 0.000 abstract description 3
- 230000003313 weakening effect Effects 0.000 abstract description 3
- 230000009286 beneficial effect Effects 0.000 description 5
- 238000011161 development Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 235000002566 Capsicum Nutrition 0.000 description 2
- 239000006002 Pepper Substances 0.000 description 2
- 241000722363 Piper Species 0.000 description 2
- 235000016761 Piper aduncum Nutrition 0.000 description 2
- 235000017804 Piper guineense Nutrition 0.000 description 2
- 235000008184 Piper nigrum Nutrition 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 238000000513 principal component analysis Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 150000003839 salts Chemical class 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000003116 impacting effect Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 238000012552 review Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/30—Noise filtering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/46—Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
- G06V10/462—Salient features, e.g. scale invariant feature transforms [SIFT]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/56—Extraction of image or video features relating to colour
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses an image sensitive information identification method, which aims to improve the detection precision and reduce the noise influence by a filtering algorithm; aiming at weakening image details, particularly edge information, caused by a filtering algorithm, enhancing the image detail information by sharpening; in order to avoid redundant detection, the color feature, the shape feature and the texture feature of the image are extracted as detection objects to retrieve sensitive information, so that the retrieval process is simple and accurate; and the accurate classification capability of the support vector machine is fully utilized, and different types of sensitive information are accurately identified.
Description
Technical Field
The invention relates to the field of image signal processing, in particular to an image sensitive information identification method.
Background
With the rapid development of the internet and mobile communication technologies, it becomes more convenient and diversified for people to acquire information. The information is mixed with a large amount of sensitive information, including sensitive content information such as pornography, violence, contraband, evil education, reaction and the like, and great challenges are brought to information security review of network information supervision departments. The vast amount of images containing sensitive information is widely spread over networks, negatively impacting social development. Therefore, research into image sensitive information recognition methods is increasingly important.
The traditional image sensitive information identification method mainly aims at identifying text information in an image, and does not consider non-text sensitive information factors, so that a large amount of sensitive information is missed to be identified; in order to improve the accuracy of sensitive information identification, various methods are proposed in the industry at present, but the effect is not good, wherein the method with better development prospect is a machine learning-based sensitive information detection and identification method, but as the research is shallow, the method still has defects in the accuracy method.
Disclosure of Invention
Aiming at the defects in the prior art, the image sensitive information identification method provided by the invention improves the detection precision and solves the problems of low precision and missing detection.
In order to achieve the purpose of the invention, the invention adopts the technical scheme that: an image sensitive information identification method, comprising the steps of:
s1, denoising the original image through a filtering algorithm to obtain a denoised image;
s2, carrying out sharpening processing on the denoised image to obtain an edge enhanced image;
s3, extracting color features, shape features and texture features of the edge enhancement image;
s4, according to the color feature, the shape feature and the texture feature of the edge enhancement image, carrying out sensitive information retrieval on the edge enhancement image to obtain sensitive information;
and S5, processing the sensitive information through a support vector machine, and realizing the identification and classification of the sensitive information.
The invention has the beneficial effects that: in order to improve the detection precision, the noise influence is reduced through a filtering algorithm; aiming at weakening image details, particularly edge information, caused by a filtering algorithm, enhancing the image detail information by sharpening; in order to avoid redundant detection, the color feature, the shape feature and the texture feature of the image are extracted as detection objects to retrieve sensitive information, so that the retrieval process is simple and accurate; and the accurate classification capability of the support vector machine is fully utilized, and different types of sensitive information are accurately identified.
Further, the step S1 includes the following sub-steps:
s11, denoising the original image through a mean value filtering algorithm to obtain a primary denoised image, wherein the mathematical expression of the mean value filtering algorithm is as follows:
wherein r (x, y) is the pixel value of the coordinate (x, y) pixel point of the original image, q1(x0,y0) As the coordinates (x) of the originally denoised image0,y0) The pixel value of the pixel point, omega, is the coordinate (x)0,y0) K is the coordinate (x)0,y0) The sum of pixel values of the pixel itself and the neighborhood pixels;
s12, carrying out further denoising processing on the primary denoised image through a median filtering algorithm to obtain a denoised image, wherein the mathematical expression of the median filtering algorithm is as follows:
q(x0,y0)=mid[q1(x,y)|(x,y)∈Ω](2)
wherein q is1(x, y) is the pixel value of the coordinate (x, y) pixel point of the primary denoised image, q (x) is0,y0) For de-noised image coordinates (x)0,y0) The pixel value mid of the pixel point]The function indicates taking the middle bracket "[ alpha ], [ alpha ]]"median of individual data within.
The beneficial effects of the above further scheme are: gaussian noise is filtered through mean filtering, salt and pepper noise is filtered through median filtering, and two filtering means are combined, so that two types of noise mainly existing in an image are effectively filtered, and a foundation is laid for accurate identification.
Further, the step S2 includes the following sub-steps:
s21, calculating to obtain a gradient vector of the de-noised image by adopting a difference method, wherein a mathematical expression of the difference method comprises the following equation:
wherein,the partial derivatives of coordinate (x, y) pixel points of the denoised image in the direction of an x axis;the partial derivatives of coordinate (x, y) pixel points of the denoised image in the y-axis direction are shown, and G is a gradient vector of the denoised image;
s22, obtaining an edge enhancement image according to the gradient vector of the denoised image by the following formula:
wherein | G | purple2Is the two-norm of the gradient vector G, T is the gradient threshold, and f (x, y) is the pixel value of the coordinate (x, y) pixel point of the edge-enhanced image.
The beneficial effects of the above further scheme are: although the filtering algorithm can filter noise, it can also destroy the detail information of the image, especially the edge information, and the gradient of the image shows the edge information of the image, so the gradient is modulo by the two norm operation, the edge information of the image is enhanced, and the detail information weakened by the filtering process is recovered.
Further, the step S3 includes the following sub-steps:
s31, extracting the color characteristics of the edge enhanced image through the following formula:
wherein, p (L) is the probability of the appearance of the first-level gray level, which is the color characteristic of the edge enhanced image, N (L) is the number of the first-level gray level pixel points, N is the total number of the pixel points contained in the edge enhanced image, the number of gray levels L is an integer, the value is in the left-closed right-open interval [0, L ], and L is the highest gray level value;
s32, extracting the shape feature of the edge enhanced image through the following equation:
where F is the two-dimensional image matrix of the edge-enhanced image, IxDetecting the gray value, I, for the horizontal shapeyDetecting gray values for the shape in the vertical direction, wherein × is a matrix cross product operator, and I is a shape feature matrix of the edge enhanced image;
s33, calculating texture energy, moment of inertia, entropy and row-column similarity of the edge enhancement image through the following equations, and collecting texture characteristics by the texture energy, the moment of inertia, the entropy and the row-column similarity:
F1=∑i,jp1(i,j)2(11)
F2=∑i,j(i-j)2p1(i,j)2(12)
F3=∑i,jp1(i,j)·log2p1(i,j)(13)
wherein, F1As texture energy, F2Is moment of inertia, F3Is entropy, F4For line-row similarity, p1(i, j) is the probability of the edge-enhanced image appearing in gray levels from level i to level j within a unit distance, μxIs the line mean, muyIs column mean, σxIs the line variance, σyIs the column variance.
The beneficial effects of the above further scheme are: the color feature and the shape feature are important feature information of an image known to the public, the texture feature explains an internal structure and pixel distribution space information of the image, texture energy represents the intensity of pixel distribution, rotational inertia represents the thickness of the texture, entropy represents the complexity of the image, and the internal information of the image is obtained through the extraction of the physical quantities without taking the whole image as a sample in a matrix form for operation, so that the operation amount is greatly reduced, redundancy is removed, and the accuracy is guaranteed.
Further, the step S4 includes the following sub-steps:
s41, carrying out primary screening and classification on the edge enhanced image according to the color feature, the shape feature and the texture feature of the edge enhanced image to obtain a text-free image and a text-containing image;
s42, carrying out sensitive information retrieval on the text-free image through an image color feature algorithm to obtain sensitive information of the text-free image;
s43, identifying a text area of the text-containing image according to the texture feature and the color feature of the text-containing image;
and S44, converting the text area containing the text image into a normalized feature vector, and detecting the normalized feature vector by adopting a convolutional neural network to obtain sensitive information containing the text image.
Further, the step S42 includes the following sub-steps:
a1, carrying out space transformation on the text-free image, and converting the text-free image into an HSV space;
a2, calculating a first moment, a second moment and a third moment of color features of the HSV space textless image to obtain a3 x 3 matrix;
a3, performing matrix transformation on the 3 x 3 matrix to obtain an unnormalized feature column vector with the dimension of 9;
a4, normalizing the unnormalized feature column vector to obtain a normalized feature vector;
and A5, detecting the normalized feature vectors by adopting a convolutional neural network to obtain sensitive information of the text-free image.
Further, the method of step S43 is: regions of the text-containing image having a stable texture width and uniform color characteristics are identified as text regions of the text-containing image.
Further, the method for retrieving the sensitive features by the convolutional neural network in the steps S44 and a5 comprises the following sub-steps:
b1, establishing a batch of normalized feature vectors of known sensitive information to form a training set;
b2, training the weight parameters and the bias parameters of the convolutional neural network according to the training set by the following equations to obtain the trained convolutional neural network as the sensitive information detector:
wherein, w is a weight parameter of the convolutional neural network, b is a bias parameter of the convolutional neural network, J (w, b) is a loss function of the convolutional neural network, and alpha is a learning rate;
b3, detecting the normalized feature vector through a sensitive information detector to obtain detection features;
b4, calculating the similarity between the detection characteristics and the sensitive information sample set;
b5, judging whether the similarity is more than or equal to 0.5, if so, jumping to the step B6, and if not, jumping to the step B7;
b6, taking the normalized feature vector as sensitive information;
and B7, the sensitive information is an empty set.
Further, the step S5 includes the following sub-steps:
s51, connecting M support vector machines in series to obtain a binary tree type multi-classifier;
s52, training a kth support vector machine through a kth class sensitive information sample set to obtain a trained binary tree type multi-classifier, wherein k is a positive integer in a closed interval [1, M ];
and S53, identifying and classifying the sensitive information through the trained binary tree type multi-classifier.
The beneficial effects of the above further scheme are: the support vector machine can accurately realize the classification processing of the NOT and the sensitive information identification in a dimensionality reduction manner, and the essence of the sensitive information identification is to accurately classify the sensitive information, so that the support vector machine and the identification method thereof are connected in series, each stage of support vector machine completes the function in the support vector machine, detects whether the support vector machine is the type which needs to be detected, and hands over the type of the NOT to the post-stage processing, completes the output of a binary tree type in a pipeline working mode, realizes the quick parallel processing, and has higher precision and higher speed compared with various identification methods based on a residual error neural network, a convolutional neural network and a clustering algorithm.
Drawings
FIG. 1 is a flow chart of a method for identifying image sensitive information;
fig. 2 is a graph comparing experimental effects.
Detailed Description
The following description of the embodiments of the present invention is provided to facilitate the understanding of the present invention by those skilled in the art, but it should be understood that the present invention is not limited to the scope of the embodiments, and it will be apparent to those skilled in the art that various changes may be made without departing from the spirit and scope of the invention as defined and defined in the appended claims, and all matters produced by the invention using the inventive concept are protected.
As shown in fig. 1, in one embodiment of the present invention, an image sensitive information recognition method includes the following steps:
s1, denoising the original image through a filtering algorithm to obtain a denoised image, and the method comprises the following steps:
s11, denoising the original image through a mean value filtering algorithm to obtain a primary denoised image, wherein the mathematical expression of the mean value filtering algorithm is as follows:
wherein r (x, y) is the pixel value of the coordinate (x, y) pixel point of the original image, q1(x0,y0) As the coordinates (x) of the originally denoised image0,y0) The pixel value of the pixel point, omega, is the coordinate (x)0,y0) K is the coordinate (x)0,y0) The sum of pixel values of the pixel itself and the neighborhood pixels;
s12, carrying out further denoising processing on the primary denoised image through a median filtering algorithm to obtain a denoised image, wherein the mathematical expression of the median filtering algorithm is as follows:
q(x0,y0)=mid[q1(x,y)|(x,y)∈Ω](2)
wherein q is1(x, y) is the pixel value of the coordinate (x, y) pixel point of the primary denoised image, q (x) is0,y0) For de-noised image coordinates (x)0,y0) The pixel value mid of the pixel point]The function indicates taking the middle bracket "[ alpha ], [ alpha ]]"median of individual data within.
Gaussian noise is filtered through mean filtering, salt and pepper noise is filtered through median filtering, and two filtering means are combined, so that two types of noise mainly existing in an image are effectively filtered, and a foundation is laid for accurate identification.
S2, carrying out sharpening processing on the denoised image to obtain an edge enhanced image, and comprising the following steps:
s21, calculating to obtain a gradient vector of the de-noised image by adopting a difference method, wherein a mathematical expression of the difference method comprises the following equation:
wherein,the partial derivatives of coordinate (x, y) pixel points of the denoised image in the direction of an x axis;the partial derivatives of coordinate (x, y) pixel points of the denoised image in the y-axis direction are shown, and G is a gradient vector of the denoised image;
s22, obtaining an edge enhancement image according to the gradient vector of the denoised image by the following formula:
wherein | G | purple2Is the two-norm of the gradient vector G, T is the gradient threshold, and f (x, y) is the pixel value of the coordinate (x, y) pixel point of the edge-enhanced image.
Although the filtering algorithm can filter noise, it can also destroy the detail information of the image, especially the edge information, and the gradient of the image shows the edge information of the image, so the gradient is modulo by the two norm operation, the edge information of the image is enhanced, and the detail information weakened by the filtering process is recovered.
S3, extracting color features, shape features and texture features of the edge enhanced image, comprising the following sub-steps:
s31, extracting the color characteristics of the edge enhanced image through the following formula:
wherein, p (L) is the probability of the appearance of the first-level gray level, which is the color characteristic of the edge enhanced image, N (L) is the number of the first-level gray level pixel points, N is the total number of the pixel points contained in the edge enhanced image, the number of gray levels L is an integer, the value is in the left-closed right-open interval [0, L ], and L is the highest gray level value;
s32, extracting the shape feature of the edge enhanced image through the following equation:
where F is the two-dimensional image matrix of the edge-enhanced image, IxDetecting the gray value, I, for the horizontal shapeyDetecting gray values for the shape in the vertical direction, wherein × is a matrix cross product operator, and I is a shape feature matrix of the edge enhanced image;
s33, calculating texture energy, moment of inertia, entropy and row-column similarity of the edge enhancement image through the following equations, and collecting texture characteristics by the texture energy, the moment of inertia, the entropy and the row-column similarity:
F1=∑i,jp1(i,j)2(11)
F2=∑i,j(i-j)2p1(i,j)2(12)
F3=∑i,jp1(i,j)·log2p1(i,j)(13)
wherein, F1As texture energy, F2Is moment of inertia, F3Is entropy, F4For line-row similarity, p1(i, j) is the probability of the edge-enhanced image appearing in gray levels from level i to level j within a unit distance, μxIs the line mean, muyIs column mean, σxIs the line variance, σyIs the column variance.
The color feature and the shape feature are important feature information of an image known to the public, the texture feature explains an internal structure and pixel distribution space information of the image, texture energy represents the intensity of pixel distribution, rotational inertia represents the thickness of the texture, entropy represents the complexity of the image, and the internal information of the image is obtained through the extraction of the physical quantities without taking the whole image as a sample in a matrix form for operation, so that the operation amount is greatly reduced, redundancy is removed, and the accuracy is guaranteed.
S4, according to the color feature, the shape feature and the texture feature of the edge enhanced image, sensitive information retrieval is carried out on the edge enhanced image to obtain sensitive information, and the method comprises the following steps:
s41, carrying out primary screening and classification on the edge enhanced image according to the color feature, the shape feature and the texture feature of the edge enhanced image to obtain a text-free image and a text-containing image;
s42, carrying out sensitive information retrieval on the text-free image through an image color feature algorithm to obtain sensitive information of the text-free image;
s43, identifying a text area of the text-containing image according to the texture feature and the color feature of the text-containing image;
and S44, converting the text area containing the text image into a normalized feature vector, and detecting the normalized feature vector by adopting a convolutional neural network to obtain sensitive information containing the text image.
Wherein, step S42 includes the following substeps:
a1, carrying out space transformation on the text-free image, and converting the text-free image into an HSV space;
a2, calculating a first moment, a second moment and a third moment of color features of the HSV space textless image to obtain a3 x 3 matrix;
a3, performing matrix transformation on the 3 x 3 matrix to obtain an unnormalized feature column vector with the dimension of 9;
a4, normalizing the unnormalized feature column vector to obtain a normalized feature vector;
and A5, detecting the normalized feature vectors by adopting a convolutional neural network to obtain sensitive information of the text-free image.
And the method for searching the sensitive features by the convolutional neural network in the steps S44 and A5 comprises the following sub-steps:
b1, establishing a batch of normalized feature vectors of known sensitive information to form a training set;
b2, training the weight parameters and the bias parameters of the convolutional neural network according to the training set by the following equations to obtain the trained convolutional neural network as the sensitive information detector:
wherein, w is a weight parameter of the convolutional neural network, b is a bias parameter of the convolutional neural network, J (w, b) is a loss function of the convolutional neural network, and alpha is a learning rate;
b3, detecting the normalized feature vector through a sensitive information detector to obtain detection features;
b4, calculating the similarity between the detection characteristics and the sensitive information sample set;
b5, judging whether the similarity is more than or equal to 0.5, if so, jumping to the step B6, and if not, jumping to the step B7;
b6, taking the normalized feature vector as sensitive information;
and B7, the sensitive information is an empty set.
S5, processing the sensitive information through a support vector machine, and realizing the identification and classification of the sensitive information, wherein the method comprises the following steps:
s51, connecting M support vector machines in series to obtain a binary tree type multi-classifier;
s52, training a kth support vector machine through a kth class sensitive information sample set to obtain a trained binary tree type multi-classifier, wherein k is a positive integer in a closed interval [1, M ];
and S53, identifying and classifying the sensitive information through the trained binary tree type multi-classifier.
The support vector machine can accurately realize the classification processing of the NOT and the sensitive information identification in a dimensionality reduction manner, and the essence of the sensitive information identification is to accurately classify the sensitive information, so that the support vector machine and the identification method thereof are connected in series, each stage of support vector machine completes the function in the support vector machine, detects whether the support vector machine is the type which needs to be detected, and hands over the type of the NOT to the post-stage processing, completes the output of a binary tree type in a pipeline working mode, realizes the quick parallel processing, and has higher precision and higher speed compared with various identification methods based on a residual error neural network, a convolutional neural network and a clustering algorithm.
In order to verify the effect of the invention, a comparison test is designed, and the method provided by the invention is compared with two common prior arts, namely a method based on principal component analysis and a method based on template matching.
The information of the experimental data is shown in table 1 and the experimental results in table 2, from which it can be seen that the accuracy of the study method herein is higher when different experimental data are identified as sensitive information. In the calculation process, the average accuracy of the three identification methods is 61.5%, the accuracy of the identification method of the principal component analysis is 63.7%, and the accuracy of the identification method of the template matching is 85.83%.
Table 1 experimental data information
TABLE 2 accuracy of sensitive information under different identification methods (%)
On the basis of the above experiment, the identification time of different methods is compared, and the result is shown in fig. 2, and it can be seen that the identification time of the method of the invention is far shorter than that of the other two technologies.
In conclusion, in order to improve the detection precision, the noise influence is reduced through a filtering algorithm; aiming at weakening image details, particularly edge information, caused by a filtering algorithm, enhancing the image detail information by sharpening; in order to avoid redundant detection, the color feature, the shape feature and the texture feature of the image are extracted as detection objects to retrieve sensitive information, so that the retrieval process is simple and accurate; and the accurate classification capability of the support vector machine is fully utilized, and different types of sensitive information are accurately identified.
Claims (9)
1. An image sensitive information recognition method, comprising the steps of:
s1, denoising the original image through a filtering algorithm to obtain a denoised image;
s2, carrying out sharpening processing on the denoised image to obtain an edge enhanced image;
s3, extracting color features, shape features and texture features of the edge enhancement image;
s4, according to the color feature, the shape feature and the texture feature of the edge enhancement image, carrying out sensitive information retrieval on the edge enhancement image to obtain sensitive information;
and S5, processing the sensitive information through a support vector machine, and realizing the identification and classification of the sensitive information.
2. The image sensitive information recognition method of claim 1, wherein the step S1 includes the following sub-steps:
s11, denoising the original image through a mean value filtering algorithm to obtain a primary denoised image, wherein the mathematical expression of the mean value filtering algorithm is as follows:
wherein r (x, y) is the pixel value of the coordinate (x, y) pixel point of the original image, q1(x0,y0) As the coordinates (x) of the originally denoised image0,y0) The pixel value of the pixel point, omega, is the coordinate (x)0,y0) K is the coordinate (x)0,y0) The sum of pixel values of the pixel itself and the neighborhood pixels;
s12, carrying out further denoising processing on the primary denoised image through a median filtering algorithm to obtain a denoised image, wherein the mathematical expression of the median filtering algorithm is as follows:
q(x0,y0)=mid[q1(x,y)|(x,y)∈Ω](2)
wherein q is1(x, y) is the pixel value of the coordinate (x, y) pixel point of the primary denoised image, q (x) is0,y0) For de-noised image coordinates (x)0,y0) The pixel value mid of the pixel point]The function indicates taking the middle bracket "[ alpha ], [ alpha ]]"median of individual data within.
3. The image sensitive information recognition method of claim 2, wherein the step S2 includes the following sub-steps:
s21, calculating to obtain a gradient vector of the de-noised image by adopting a difference method, wherein a mathematical expression of the difference method comprises the following equation:
wherein,the partial derivatives of coordinate (x, y) pixel points of the denoised image in the direction of an x axis;the partial derivatives of coordinate (x, y) pixel points of the denoised image in the y-axis direction are shown, and G is a gradient vector of the denoised image;
s22, obtaining an edge enhancement image according to the gradient vector of the denoised image by the following formula:
wherein | G | purple2Is the two-norm of the gradient vector G, T is the gradient threshold, and f (x, y) is the pixel value of the coordinate (x, y) pixel point of the edge-enhanced image.
4. The image sensitive information recognition method according to claim 3, wherein the step S3 includes the following sub-steps:
s31, extracting the color characteristics of the edge enhanced image through the following formula:
wherein, p (L) is the probability of the appearance of the first-level gray level, which is the color characteristic of the edge enhanced image, N (L) is the number of the first-level gray level pixel points, N is the total number of the pixel points contained in the edge enhanced image, the number of gray levels L is an integer, the value is in the left-closed right-open interval [0, L ], and L is the highest gray level value;
s32, extracting the shape feature of the edge enhanced image through the following equation:
where F is the two-dimensional image matrix of the edge-enhanced image, IxDetecting the gray value, I, for the horizontal shapeyDetect gray values for vertical shape, × matrix cross product operator, I shape for edge enhanced imageA feature matrix;
s33, calculating texture energy, moment of inertia, entropy and row-column similarity of the edge enhancement image through the following equations, and collecting texture characteristics by the texture energy, the moment of inertia, the entropy and the row-column similarity:
F1=∑i,jp1(i,j)2(11)
F2=∑i,j(i-j)2p1(i,j)2(12)
F3=∑i,jp1(i,j)·log2p1(i,j) (13)
wherein, F1As texture energy, F2Is moment of inertia, F3Is entropy, F4For line-row similarity, p1(i, j) is the probability of the edge-enhanced image appearing in gray levels from level i to level j within a unit distance, μxIs the line mean, muyIs column mean, σxIs the line variance, σyIs the column variance.
5. The image sensitive information recognition method of claim 4, wherein the step S4 includes the following sub-steps:
s41, carrying out primary screening and classification on the edge enhanced image according to the color feature, the shape feature and the texture feature of the edge enhanced image to obtain a text-free image and a text-containing image;
s42, carrying out sensitive information retrieval on the text-free image through an image color feature algorithm to obtain sensitive information of the text-free image;
s43, identifying a text area of the text-containing image according to the texture feature and the color feature of the text-containing image;
and S44, converting the text area containing the text image into a normalized feature vector, and detecting the normalized feature vector by adopting a convolutional neural network to obtain sensitive information containing the text image.
6. The image sensitive information recognition method of claim 5, wherein the step S42 includes the following sub-steps:
a1, carrying out space transformation on the text-free image, and converting the text-free image into an HSV space;
a2, calculating a first moment, a second moment and a third moment of color features of the HSV space textless image to obtain a3 x 3 matrix;
a3, performing matrix transformation on the 3 x 3 matrix to obtain an unnormalized feature column vector with the dimension of 9;
a4, normalizing the unnormalized feature column vector to obtain a normalized feature vector;
and A5, detecting the normalized feature vectors by adopting a convolutional neural network to obtain sensitive information of the text-free image.
7. The method for identifying image sensitive information according to claim 5, wherein the method of step S43 is: regions of the text-containing image having a stable texture width and uniform color characteristics are identified as text regions of the text-containing image.
8. The image sensitive information identification method according to claim 5, wherein the method of retrieving the sensitive feature by the convolutional neural network in steps S44 and A5 comprises the following sub-steps:
b1, establishing a batch of normalized feature vectors of known sensitive information to form a training set;
b2, training the weight parameters and the bias parameters of the convolutional neural network according to the training set by the following equations to obtain the trained convolutional neural network as the sensitive information detector:
wherein, w is a weight parameter of the convolutional neural network, b is a bias parameter of the convolutional neural network, J (w, b) is a loss function of the convolutional neural network, and alpha is a learning rate;
b3, detecting the normalized feature vector through a sensitive information detector to obtain detection features;
b4, calculating the similarity between the detection characteristics and the sensitive information sample set;
b5, judging whether the similarity is more than or equal to 0.5, if so, jumping to the step B6, and if not, jumping to the step B7;
b6, taking the normalized feature vector as sensitive information;
and B7, the sensitive information is an empty set.
9. The image sensitive information recognition method of claim 5, wherein the step S5 includes the following sub-steps:
s51, connecting M support vector machines in series to obtain a binary tree type multi-classifier;
s52, training a kth support vector machine through a kth class sensitive information sample set to obtain a trained binary tree type multi-classifier, wherein k is a positive integer in a closed interval [1, M ];
and S53, identifying and classifying the sensitive information through the trained binary tree type multi-classifier.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010607308.8A CN111783789A (en) | 2020-06-30 | 2020-06-30 | Image sensitive information identification method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010607308.8A CN111783789A (en) | 2020-06-30 | 2020-06-30 | Image sensitive information identification method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111783789A true CN111783789A (en) | 2020-10-16 |
Family
ID=72761155
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010607308.8A Pending CN111783789A (en) | 2020-06-30 | 2020-06-30 | Image sensitive information identification method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111783789A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112991308A (en) * | 2021-03-25 | 2021-06-18 | 北京百度网讯科技有限公司 | Image quality determination method and device, electronic equipment and medium |
CN117237478A (en) * | 2023-11-09 | 2023-12-15 | 北京航空航天大学 | Sketch-to-color image generation method, sketch-to-color image generation system, storage medium and processing terminal |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101281521A (en) * | 2007-04-05 | 2008-10-08 | 中国科学院自动化研究所 | Method and system for filtering sensitive web page based on multiple classifier amalgamation |
CN101470897A (en) * | 2007-12-26 | 2009-07-01 | 中国科学院自动化研究所 | Sensitive film detection method based on audio/video amalgamation policy |
CN101996314A (en) * | 2009-08-26 | 2011-03-30 | 厦门市美亚柏科信息股份有限公司 | Content-based human body upper part sensitive image identification method and device |
CN103559511A (en) * | 2013-11-20 | 2014-02-05 | 天津农学院 | Automatic identification method of foliar disease image of greenhouse vegetable |
CN103996046A (en) * | 2014-06-11 | 2014-08-20 | 北京邮电大学 | Personnel recognition method based on multi-visual-feature fusion |
CN105509659A (en) * | 2015-11-25 | 2016-04-20 | 淮安市计量测试所 | Image-processing-based flatness detection system |
-
2020
- 2020-06-30 CN CN202010607308.8A patent/CN111783789A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101281521A (en) * | 2007-04-05 | 2008-10-08 | 中国科学院自动化研究所 | Method and system for filtering sensitive web page based on multiple classifier amalgamation |
CN101470897A (en) * | 2007-12-26 | 2009-07-01 | 中国科学院自动化研究所 | Sensitive film detection method based on audio/video amalgamation policy |
CN101996314A (en) * | 2009-08-26 | 2011-03-30 | 厦门市美亚柏科信息股份有限公司 | Content-based human body upper part sensitive image identification method and device |
CN103559511A (en) * | 2013-11-20 | 2014-02-05 | 天津农学院 | Automatic identification method of foliar disease image of greenhouse vegetable |
CN103996046A (en) * | 2014-06-11 | 2014-08-20 | 北京邮电大学 | Personnel recognition method based on multi-visual-feature fusion |
CN105509659A (en) * | 2015-11-25 | 2016-04-20 | 淮安市计量测试所 | Image-processing-based flatness detection system |
Non-Patent Citations (2)
Title |
---|
崔鹏飞等: "面向网络内容安全的图像识别技术研究", 《信息网络安全》 * |
袁杰等: "基于中值滤波和梯度锐化的边缘检测", 《计算机与现代化》 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112991308A (en) * | 2021-03-25 | 2021-06-18 | 北京百度网讯科技有限公司 | Image quality determination method and device, electronic equipment and medium |
CN112991308B (en) * | 2021-03-25 | 2023-11-24 | 北京百度网讯科技有限公司 | Image quality determining method and device, electronic equipment and medium |
CN117237478A (en) * | 2023-11-09 | 2023-12-15 | 北京航空航天大学 | Sketch-to-color image generation method, sketch-to-color image generation system, storage medium and processing terminal |
CN117237478B (en) * | 2023-11-09 | 2024-02-09 | 北京航空航天大学 | Sketch-to-color image generation method, sketch-to-color image generation system, storage medium and processing terminal |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111401372B (en) | Method for extracting and identifying image-text information of scanned document | |
Gao et al. | Automatic change detection in synthetic aperture radar images based on PCANet | |
CN102081731B (en) | Method and device for extracting text from image | |
Saba et al. | Annotated comparisons of proposed preprocessing techniques for script recognition | |
Ghadekar et al. | Handwritten digit and letter recognition using hybrid dwt-dct with knn and svm classifier | |
US9558403B2 (en) | Chemical structure recognition tool | |
Chu et al. | Strip steel surface defect recognition based on novel feature extraction and enhanced least squares twin support vector machine | |
Dhinesh et al. | Detection of leaf disease using principal component analysis and linear support vector machine | |
CN111783789A (en) | Image sensitive information identification method | |
CN110163182A (en) | A kind of hand back vein identification method based on KAZE feature | |
Jia et al. | Document image binarization using structural symmetry of strokes | |
Dhar et al. | Paper currency detection system based on combined SURF and LBP features | |
CN113989196B (en) | Visual-sense-based method for detecting appearance defects of earphone silica gel gasket | |
George et al. | Leaf identification using Harris corner detection, SURF feature and FLANN matcher | |
Gui et al. | A fast caption detection method for low quality video images | |
Dai et al. | Scene text detection based on enhanced multi-channels MSER and a fast text grouping process | |
Ali et al. | Different handwritten character recognition methods: a review | |
CN105512682B (en) | A kind of security level identification recognition methods based on Krawtchouk square and KNN-SMO classifier | |
Fernandez et al. | Classifying suspicious content in Tor Darknet | |
Xu et al. | Coin recognition method based on SIFT algorithm | |
Abdoli et al. | Offline signature verification using geodesic derivative pattern | |
Araújo et al. | Segmenting and recognizing license plate characters | |
Myint et al. | Handwritten signature verification system using Sobel operator and KNN classifier | |
Lin et al. | Coin recognition based on texture classification on ring and fan areas of the coin image | |
Romero et al. | Wavelet-based feature extraction for handwritten numerals |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |