[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN111783789A - Image sensitive information identification method - Google Patents

Image sensitive information identification method Download PDF

Info

Publication number
CN111783789A
CN111783789A CN202010607308.8A CN202010607308A CN111783789A CN 111783789 A CN111783789 A CN 111783789A CN 202010607308 A CN202010607308 A CN 202010607308A CN 111783789 A CN111783789 A CN 111783789A
Authority
CN
China
Prior art keywords
image
sensitive information
feature
text
edge
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010607308.8A
Other languages
Chinese (zh)
Inventor
李凯勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qinghai Nationalities University
Original Assignee
Qinghai Nationalities University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qinghai Nationalities University filed Critical Qinghai Nationalities University
Priority to CN202010607308.8A priority Critical patent/CN111783789A/en
Publication of CN111783789A publication Critical patent/CN111783789A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/30Noise filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/56Extraction of image or video features relating to colour

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an image sensitive information identification method, which aims to improve the detection precision and reduce the noise influence by a filtering algorithm; aiming at weakening image details, particularly edge information, caused by a filtering algorithm, enhancing the image detail information by sharpening; in order to avoid redundant detection, the color feature, the shape feature and the texture feature of the image are extracted as detection objects to retrieve sensitive information, so that the retrieval process is simple and accurate; and the accurate classification capability of the support vector machine is fully utilized, and different types of sensitive information are accurately identified.

Description

Image sensitive information identification method
Technical Field
The invention relates to the field of image signal processing, in particular to an image sensitive information identification method.
Background
With the rapid development of the internet and mobile communication technologies, it becomes more convenient and diversified for people to acquire information. The information is mixed with a large amount of sensitive information, including sensitive content information such as pornography, violence, contraband, evil education, reaction and the like, and great challenges are brought to information security review of network information supervision departments. The vast amount of images containing sensitive information is widely spread over networks, negatively impacting social development. Therefore, research into image sensitive information recognition methods is increasingly important.
The traditional image sensitive information identification method mainly aims at identifying text information in an image, and does not consider non-text sensitive information factors, so that a large amount of sensitive information is missed to be identified; in order to improve the accuracy of sensitive information identification, various methods are proposed in the industry at present, but the effect is not good, wherein the method with better development prospect is a machine learning-based sensitive information detection and identification method, but as the research is shallow, the method still has defects in the accuracy method.
Disclosure of Invention
Aiming at the defects in the prior art, the image sensitive information identification method provided by the invention improves the detection precision and solves the problems of low precision and missing detection.
In order to achieve the purpose of the invention, the invention adopts the technical scheme that: an image sensitive information identification method, comprising the steps of:
s1, denoising the original image through a filtering algorithm to obtain a denoised image;
s2, carrying out sharpening processing on the denoised image to obtain an edge enhanced image;
s3, extracting color features, shape features and texture features of the edge enhancement image;
s4, according to the color feature, the shape feature and the texture feature of the edge enhancement image, carrying out sensitive information retrieval on the edge enhancement image to obtain sensitive information;
and S5, processing the sensitive information through a support vector machine, and realizing the identification and classification of the sensitive information.
The invention has the beneficial effects that: in order to improve the detection precision, the noise influence is reduced through a filtering algorithm; aiming at weakening image details, particularly edge information, caused by a filtering algorithm, enhancing the image detail information by sharpening; in order to avoid redundant detection, the color feature, the shape feature and the texture feature of the image are extracted as detection objects to retrieve sensitive information, so that the retrieval process is simple and accurate; and the accurate classification capability of the support vector machine is fully utilized, and different types of sensitive information are accurately identified.
Further, the step S1 includes the following sub-steps:
s11, denoising the original image through a mean value filtering algorithm to obtain a primary denoised image, wherein the mathematical expression of the mean value filtering algorithm is as follows:
Figure BDA0002561323930000021
wherein r (x, y) is the pixel value of the coordinate (x, y) pixel point of the original image, q1(x0,y0) As the coordinates (x) of the originally denoised image0,y0) The pixel value of the pixel point, omega, is the coordinate (x)0,y0) K is the coordinate (x)0,y0) The sum of pixel values of the pixel itself and the neighborhood pixels;
s12, carrying out further denoising processing on the primary denoised image through a median filtering algorithm to obtain a denoised image, wherein the mathematical expression of the median filtering algorithm is as follows:
q(x0,y0)=mid[q1(x,y)|(x,y)∈Ω](2)
wherein q is1(x, y) is the pixel value of the coordinate (x, y) pixel point of the primary denoised image, q (x) is0,y0) For de-noised image coordinates (x)0,y0) The pixel value mid of the pixel point]The function indicates taking the middle bracket "[ alpha ], [ alpha ]]"median of individual data within.
The beneficial effects of the above further scheme are: gaussian noise is filtered through mean filtering, salt and pepper noise is filtered through median filtering, and two filtering means are combined, so that two types of noise mainly existing in an image are effectively filtered, and a foundation is laid for accurate identification.
Further, the step S2 includes the following sub-steps:
s21, calculating to obtain a gradient vector of the de-noised image by adopting a difference method, wherein a mathematical expression of the difference method comprises the following equation:
Figure BDA0002561323930000031
Figure BDA0002561323930000032
Figure BDA0002561323930000033
wherein,
Figure BDA0002561323930000034
the partial derivatives of coordinate (x, y) pixel points of the denoised image in the direction of an x axis;
Figure BDA0002561323930000035
the partial derivatives of coordinate (x, y) pixel points of the denoised image in the y-axis direction are shown, and G is a gradient vector of the denoised image;
s22, obtaining an edge enhancement image according to the gradient vector of the denoised image by the following formula:
Figure BDA0002561323930000036
wherein | G | purple2Is the two-norm of the gradient vector G, T is the gradient threshold, and f (x, y) is the pixel value of the coordinate (x, y) pixel point of the edge-enhanced image.
The beneficial effects of the above further scheme are: although the filtering algorithm can filter noise, it can also destroy the detail information of the image, especially the edge information, and the gradient of the image shows the edge information of the image, so the gradient is modulo by the two norm operation, the edge information of the image is enhanced, and the detail information weakened by the filtering process is recovered.
Further, the step S3 includes the following sub-steps:
s31, extracting the color characteristics of the edge enhanced image through the following formula:
Figure BDA0002561323930000037
wherein, p (L) is the probability of the appearance of the first-level gray level, which is the color characteristic of the edge enhanced image, N (L) is the number of the first-level gray level pixel points, N is the total number of the pixel points contained in the edge enhanced image, the number of gray levels L is an integer, the value is in the left-closed right-open interval [0, L ], and L is the highest gray level value;
s32, extracting the shape feature of the edge enhanced image through the following equation:
Figure BDA0002561323930000041
Figure BDA0002561323930000042
Figure BDA0002561323930000043
where F is the two-dimensional image matrix of the edge-enhanced image, IxDetecting the gray value, I, for the horizontal shapeyDetecting gray values for the shape in the vertical direction, wherein × is a matrix cross product operator, and I is a shape feature matrix of the edge enhanced image;
s33, calculating texture energy, moment of inertia, entropy and row-column similarity of the edge enhancement image through the following equations, and collecting texture characteristics by the texture energy, the moment of inertia, the entropy and the row-column similarity:
F1=∑i,jp1(i,j)2(11)
F2=∑i,j(i-j)2p1(i,j)2(12)
F3=∑i,jp1(i,j)·log2p1(i,j)(13)
Figure BDA0002561323930000044
wherein, F1As texture energy, F2Is moment of inertia, F3Is entropy, F4For line-row similarity, p1(i, j) is the probability of the edge-enhanced image appearing in gray levels from level i to level j within a unit distance, μxIs the line mean, muyIs column mean, σxIs the line variance, σyIs the column variance.
The beneficial effects of the above further scheme are: the color feature and the shape feature are important feature information of an image known to the public, the texture feature explains an internal structure and pixel distribution space information of the image, texture energy represents the intensity of pixel distribution, rotational inertia represents the thickness of the texture, entropy represents the complexity of the image, and the internal information of the image is obtained through the extraction of the physical quantities without taking the whole image as a sample in a matrix form for operation, so that the operation amount is greatly reduced, redundancy is removed, and the accuracy is guaranteed.
Further, the step S4 includes the following sub-steps:
s41, carrying out primary screening and classification on the edge enhanced image according to the color feature, the shape feature and the texture feature of the edge enhanced image to obtain a text-free image and a text-containing image;
s42, carrying out sensitive information retrieval on the text-free image through an image color feature algorithm to obtain sensitive information of the text-free image;
s43, identifying a text area of the text-containing image according to the texture feature and the color feature of the text-containing image;
and S44, converting the text area containing the text image into a normalized feature vector, and detecting the normalized feature vector by adopting a convolutional neural network to obtain sensitive information containing the text image.
Further, the step S42 includes the following sub-steps:
a1, carrying out space transformation on the text-free image, and converting the text-free image into an HSV space;
a2, calculating a first moment, a second moment and a third moment of color features of the HSV space textless image to obtain a3 x 3 matrix;
a3, performing matrix transformation on the 3 x 3 matrix to obtain an unnormalized feature column vector with the dimension of 9;
a4, normalizing the unnormalized feature column vector to obtain a normalized feature vector;
and A5, detecting the normalized feature vectors by adopting a convolutional neural network to obtain sensitive information of the text-free image.
Further, the method of step S43 is: regions of the text-containing image having a stable texture width and uniform color characteristics are identified as text regions of the text-containing image.
Further, the method for retrieving the sensitive features by the convolutional neural network in the steps S44 and a5 comprises the following sub-steps:
b1, establishing a batch of normalized feature vectors of known sensitive information to form a training set;
b2, training the weight parameters and the bias parameters of the convolutional neural network according to the training set by the following equations to obtain the trained convolutional neural network as the sensitive information detector:
Figure BDA0002561323930000051
Figure BDA0002561323930000052
wherein, w is a weight parameter of the convolutional neural network, b is a bias parameter of the convolutional neural network, J (w, b) is a loss function of the convolutional neural network, and alpha is a learning rate;
b3, detecting the normalized feature vector through a sensitive information detector to obtain detection features;
b4, calculating the similarity between the detection characteristics and the sensitive information sample set;
b5, judging whether the similarity is more than or equal to 0.5, if so, jumping to the step B6, and if not, jumping to the step B7;
b6, taking the normalized feature vector as sensitive information;
and B7, the sensitive information is an empty set.
Further, the step S5 includes the following sub-steps:
s51, connecting M support vector machines in series to obtain a binary tree type multi-classifier;
s52, training a kth support vector machine through a kth class sensitive information sample set to obtain a trained binary tree type multi-classifier, wherein k is a positive integer in a closed interval [1, M ];
and S53, identifying and classifying the sensitive information through the trained binary tree type multi-classifier.
The beneficial effects of the above further scheme are: the support vector machine can accurately realize the classification processing of the NOT and the sensitive information identification in a dimensionality reduction manner, and the essence of the sensitive information identification is to accurately classify the sensitive information, so that the support vector machine and the identification method thereof are connected in series, each stage of support vector machine completes the function in the support vector machine, detects whether the support vector machine is the type which needs to be detected, and hands over the type of the NOT to the post-stage processing, completes the output of a binary tree type in a pipeline working mode, realizes the quick parallel processing, and has higher precision and higher speed compared with various identification methods based on a residual error neural network, a convolutional neural network and a clustering algorithm.
Drawings
FIG. 1 is a flow chart of a method for identifying image sensitive information;
fig. 2 is a graph comparing experimental effects.
Detailed Description
The following description of the embodiments of the present invention is provided to facilitate the understanding of the present invention by those skilled in the art, but it should be understood that the present invention is not limited to the scope of the embodiments, and it will be apparent to those skilled in the art that various changes may be made without departing from the spirit and scope of the invention as defined and defined in the appended claims, and all matters produced by the invention using the inventive concept are protected.
As shown in fig. 1, in one embodiment of the present invention, an image sensitive information recognition method includes the following steps:
s1, denoising the original image through a filtering algorithm to obtain a denoised image, and the method comprises the following steps:
s11, denoising the original image through a mean value filtering algorithm to obtain a primary denoised image, wherein the mathematical expression of the mean value filtering algorithm is as follows:
Figure BDA0002561323930000071
wherein r (x, y) is the pixel value of the coordinate (x, y) pixel point of the original image, q1(x0,y0) As the coordinates (x) of the originally denoised image0,y0) The pixel value of the pixel point, omega, is the coordinate (x)0,y0) K is the coordinate (x)0,y0) The sum of pixel values of the pixel itself and the neighborhood pixels;
s12, carrying out further denoising processing on the primary denoised image through a median filtering algorithm to obtain a denoised image, wherein the mathematical expression of the median filtering algorithm is as follows:
q(x0,y0)=mid[q1(x,y)|(x,y)∈Ω](2)
wherein q is1(x, y) is the pixel value of the coordinate (x, y) pixel point of the primary denoised image, q (x) is0,y0) For de-noised image coordinates (x)0,y0) The pixel value mid of the pixel point]The function indicates taking the middle bracket "[ alpha ], [ alpha ]]"median of individual data within.
Gaussian noise is filtered through mean filtering, salt and pepper noise is filtered through median filtering, and two filtering means are combined, so that two types of noise mainly existing in an image are effectively filtered, and a foundation is laid for accurate identification.
S2, carrying out sharpening processing on the denoised image to obtain an edge enhanced image, and comprising the following steps:
s21, calculating to obtain a gradient vector of the de-noised image by adopting a difference method, wherein a mathematical expression of the difference method comprises the following equation:
Figure BDA0002561323930000081
Figure BDA0002561323930000082
Figure BDA0002561323930000083
wherein,
Figure BDA0002561323930000084
the partial derivatives of coordinate (x, y) pixel points of the denoised image in the direction of an x axis;
Figure BDA0002561323930000085
the partial derivatives of coordinate (x, y) pixel points of the denoised image in the y-axis direction are shown, and G is a gradient vector of the denoised image;
s22, obtaining an edge enhancement image according to the gradient vector of the denoised image by the following formula:
Figure BDA0002561323930000086
wherein | G | purple2Is the two-norm of the gradient vector G, T is the gradient threshold, and f (x, y) is the pixel value of the coordinate (x, y) pixel point of the edge-enhanced image.
Although the filtering algorithm can filter noise, it can also destroy the detail information of the image, especially the edge information, and the gradient of the image shows the edge information of the image, so the gradient is modulo by the two norm operation, the edge information of the image is enhanced, and the detail information weakened by the filtering process is recovered.
S3, extracting color features, shape features and texture features of the edge enhanced image, comprising the following sub-steps:
s31, extracting the color characteristics of the edge enhanced image through the following formula:
Figure BDA0002561323930000087
wherein, p (L) is the probability of the appearance of the first-level gray level, which is the color characteristic of the edge enhanced image, N (L) is the number of the first-level gray level pixel points, N is the total number of the pixel points contained in the edge enhanced image, the number of gray levels L is an integer, the value is in the left-closed right-open interval [0, L ], and L is the highest gray level value;
s32, extracting the shape feature of the edge enhanced image through the following equation:
Figure BDA0002561323930000091
Figure BDA0002561323930000092
Figure BDA0002561323930000093
where F is the two-dimensional image matrix of the edge-enhanced image, IxDetecting the gray value, I, for the horizontal shapeyDetecting gray values for the shape in the vertical direction, wherein × is a matrix cross product operator, and I is a shape feature matrix of the edge enhanced image;
s33, calculating texture energy, moment of inertia, entropy and row-column similarity of the edge enhancement image through the following equations, and collecting texture characteristics by the texture energy, the moment of inertia, the entropy and the row-column similarity:
F1=∑i,jp1(i,j)2(11)
F2=∑i,j(i-j)2p1(i,j)2(12)
F3=∑i,jp1(i,j)·log2p1(i,j)(13)
Figure BDA0002561323930000094
wherein, F1As texture energy, F2Is moment of inertia, F3Is entropy, F4For line-row similarity, p1(i, j) is the probability of the edge-enhanced image appearing in gray levels from level i to level j within a unit distance, μxIs the line mean, muyIs column mean, σxIs the line variance, σyIs the column variance.
The color feature and the shape feature are important feature information of an image known to the public, the texture feature explains an internal structure and pixel distribution space information of the image, texture energy represents the intensity of pixel distribution, rotational inertia represents the thickness of the texture, entropy represents the complexity of the image, and the internal information of the image is obtained through the extraction of the physical quantities without taking the whole image as a sample in a matrix form for operation, so that the operation amount is greatly reduced, redundancy is removed, and the accuracy is guaranteed.
S4, according to the color feature, the shape feature and the texture feature of the edge enhanced image, sensitive information retrieval is carried out on the edge enhanced image to obtain sensitive information, and the method comprises the following steps:
s41, carrying out primary screening and classification on the edge enhanced image according to the color feature, the shape feature and the texture feature of the edge enhanced image to obtain a text-free image and a text-containing image;
s42, carrying out sensitive information retrieval on the text-free image through an image color feature algorithm to obtain sensitive information of the text-free image;
s43, identifying a text area of the text-containing image according to the texture feature and the color feature of the text-containing image;
and S44, converting the text area containing the text image into a normalized feature vector, and detecting the normalized feature vector by adopting a convolutional neural network to obtain sensitive information containing the text image.
Wherein, step S42 includes the following substeps:
a1, carrying out space transformation on the text-free image, and converting the text-free image into an HSV space;
a2, calculating a first moment, a second moment and a third moment of color features of the HSV space textless image to obtain a3 x 3 matrix;
a3, performing matrix transformation on the 3 x 3 matrix to obtain an unnormalized feature column vector with the dimension of 9;
a4, normalizing the unnormalized feature column vector to obtain a normalized feature vector;
and A5, detecting the normalized feature vectors by adopting a convolutional neural network to obtain sensitive information of the text-free image.
And the method for searching the sensitive features by the convolutional neural network in the steps S44 and A5 comprises the following sub-steps:
b1, establishing a batch of normalized feature vectors of known sensitive information to form a training set;
b2, training the weight parameters and the bias parameters of the convolutional neural network according to the training set by the following equations to obtain the trained convolutional neural network as the sensitive information detector:
Figure BDA0002561323930000101
Figure BDA0002561323930000102
wherein, w is a weight parameter of the convolutional neural network, b is a bias parameter of the convolutional neural network, J (w, b) is a loss function of the convolutional neural network, and alpha is a learning rate;
b3, detecting the normalized feature vector through a sensitive information detector to obtain detection features;
b4, calculating the similarity between the detection characteristics and the sensitive information sample set;
b5, judging whether the similarity is more than or equal to 0.5, if so, jumping to the step B6, and if not, jumping to the step B7;
b6, taking the normalized feature vector as sensitive information;
and B7, the sensitive information is an empty set.
S5, processing the sensitive information through a support vector machine, and realizing the identification and classification of the sensitive information, wherein the method comprises the following steps:
s51, connecting M support vector machines in series to obtain a binary tree type multi-classifier;
s52, training a kth support vector machine through a kth class sensitive information sample set to obtain a trained binary tree type multi-classifier, wherein k is a positive integer in a closed interval [1, M ];
and S53, identifying and classifying the sensitive information through the trained binary tree type multi-classifier.
The support vector machine can accurately realize the classification processing of the NOT and the sensitive information identification in a dimensionality reduction manner, and the essence of the sensitive information identification is to accurately classify the sensitive information, so that the support vector machine and the identification method thereof are connected in series, each stage of support vector machine completes the function in the support vector machine, detects whether the support vector machine is the type which needs to be detected, and hands over the type of the NOT to the post-stage processing, completes the output of a binary tree type in a pipeline working mode, realizes the quick parallel processing, and has higher precision and higher speed compared with various identification methods based on a residual error neural network, a convolutional neural network and a clustering algorithm.
In order to verify the effect of the invention, a comparison test is designed, and the method provided by the invention is compared with two common prior arts, namely a method based on principal component analysis and a method based on template matching.
The information of the experimental data is shown in table 1 and the experimental results in table 2, from which it can be seen that the accuracy of the study method herein is higher when different experimental data are identified as sensitive information. In the calculation process, the average accuracy of the three identification methods is 61.5%, the accuracy of the identification method of the principal component analysis is 63.7%, and the accuracy of the identification method of the template matching is 85.83%.
Table 1 experimental data information
Figure BDA0002561323930000121
TABLE 2 accuracy of sensitive information under different identification methods (%)
Figure BDA0002561323930000122
On the basis of the above experiment, the identification time of different methods is compared, and the result is shown in fig. 2, and it can be seen that the identification time of the method of the invention is far shorter than that of the other two technologies.
In conclusion, in order to improve the detection precision, the noise influence is reduced through a filtering algorithm; aiming at weakening image details, particularly edge information, caused by a filtering algorithm, enhancing the image detail information by sharpening; in order to avoid redundant detection, the color feature, the shape feature and the texture feature of the image are extracted as detection objects to retrieve sensitive information, so that the retrieval process is simple and accurate; and the accurate classification capability of the support vector machine is fully utilized, and different types of sensitive information are accurately identified.

Claims (9)

1. An image sensitive information recognition method, comprising the steps of:
s1, denoising the original image through a filtering algorithm to obtain a denoised image;
s2, carrying out sharpening processing on the denoised image to obtain an edge enhanced image;
s3, extracting color features, shape features and texture features of the edge enhancement image;
s4, according to the color feature, the shape feature and the texture feature of the edge enhancement image, carrying out sensitive information retrieval on the edge enhancement image to obtain sensitive information;
and S5, processing the sensitive information through a support vector machine, and realizing the identification and classification of the sensitive information.
2. The image sensitive information recognition method of claim 1, wherein the step S1 includes the following sub-steps:
s11, denoising the original image through a mean value filtering algorithm to obtain a primary denoised image, wherein the mathematical expression of the mean value filtering algorithm is as follows:
Figure FDA0002561323920000011
wherein r (x, y) is the pixel value of the coordinate (x, y) pixel point of the original image, q1(x0,y0) As the coordinates (x) of the originally denoised image0,y0) The pixel value of the pixel point, omega, is the coordinate (x)0,y0) K is the coordinate (x)0,y0) The sum of pixel values of the pixel itself and the neighborhood pixels;
s12, carrying out further denoising processing on the primary denoised image through a median filtering algorithm to obtain a denoised image, wherein the mathematical expression of the median filtering algorithm is as follows:
q(x0,y0)=mid[q1(x,y)|(x,y)∈Ω](2)
wherein q is1(x, y) is the pixel value of the coordinate (x, y) pixel point of the primary denoised image, q (x) is0,y0) For de-noised image coordinates (x)0,y0) The pixel value mid of the pixel point]The function indicates taking the middle bracket "[ alpha ], [ alpha ]]"median of individual data within.
3. The image sensitive information recognition method of claim 2, wherein the step S2 includes the following sub-steps:
s21, calculating to obtain a gradient vector of the de-noised image by adopting a difference method, wherein a mathematical expression of the difference method comprises the following equation:
Figure FDA0002561323920000021
Figure FDA0002561323920000022
Figure FDA0002561323920000023
wherein,
Figure FDA0002561323920000024
the partial derivatives of coordinate (x, y) pixel points of the denoised image in the direction of an x axis;
Figure FDA0002561323920000025
the partial derivatives of coordinate (x, y) pixel points of the denoised image in the y-axis direction are shown, and G is a gradient vector of the denoised image;
s22, obtaining an edge enhancement image according to the gradient vector of the denoised image by the following formula:
Figure FDA0002561323920000026
wherein | G | purple2Is the two-norm of the gradient vector G, T is the gradient threshold, and f (x, y) is the pixel value of the coordinate (x, y) pixel point of the edge-enhanced image.
4. The image sensitive information recognition method according to claim 3, wherein the step S3 includes the following sub-steps:
s31, extracting the color characteristics of the edge enhanced image through the following formula:
Figure FDA0002561323920000027
wherein, p (L) is the probability of the appearance of the first-level gray level, which is the color characteristic of the edge enhanced image, N (L) is the number of the first-level gray level pixel points, N is the total number of the pixel points contained in the edge enhanced image, the number of gray levels L is an integer, the value is in the left-closed right-open interval [0, L ], and L is the highest gray level value;
s32, extracting the shape feature of the edge enhanced image through the following equation:
Figure FDA0002561323920000028
Figure FDA0002561323920000029
Figure FDA0002561323920000031
where F is the two-dimensional image matrix of the edge-enhanced image, IxDetecting the gray value, I, for the horizontal shapeyDetect gray values for vertical shape, × matrix cross product operator, I shape for edge enhanced imageA feature matrix;
s33, calculating texture energy, moment of inertia, entropy and row-column similarity of the edge enhancement image through the following equations, and collecting texture characteristics by the texture energy, the moment of inertia, the entropy and the row-column similarity:
F1=∑i,jp1(i,j)2(11)
F2=∑i,j(i-j)2p1(i,j)2(12)
F3=∑i,jp1(i,j)·log2p1(i,j) (13)
Figure FDA0002561323920000032
wherein, F1As texture energy, F2Is moment of inertia, F3Is entropy, F4For line-row similarity, p1(i, j) is the probability of the edge-enhanced image appearing in gray levels from level i to level j within a unit distance, μxIs the line mean, muyIs column mean, σxIs the line variance, σyIs the column variance.
5. The image sensitive information recognition method of claim 4, wherein the step S4 includes the following sub-steps:
s41, carrying out primary screening and classification on the edge enhanced image according to the color feature, the shape feature and the texture feature of the edge enhanced image to obtain a text-free image and a text-containing image;
s42, carrying out sensitive information retrieval on the text-free image through an image color feature algorithm to obtain sensitive information of the text-free image;
s43, identifying a text area of the text-containing image according to the texture feature and the color feature of the text-containing image;
and S44, converting the text area containing the text image into a normalized feature vector, and detecting the normalized feature vector by adopting a convolutional neural network to obtain sensitive information containing the text image.
6. The image sensitive information recognition method of claim 5, wherein the step S42 includes the following sub-steps:
a1, carrying out space transformation on the text-free image, and converting the text-free image into an HSV space;
a2, calculating a first moment, a second moment and a third moment of color features of the HSV space textless image to obtain a3 x 3 matrix;
a3, performing matrix transformation on the 3 x 3 matrix to obtain an unnormalized feature column vector with the dimension of 9;
a4, normalizing the unnormalized feature column vector to obtain a normalized feature vector;
and A5, detecting the normalized feature vectors by adopting a convolutional neural network to obtain sensitive information of the text-free image.
7. The method for identifying image sensitive information according to claim 5, wherein the method of step S43 is: regions of the text-containing image having a stable texture width and uniform color characteristics are identified as text regions of the text-containing image.
8. The image sensitive information identification method according to claim 5, wherein the method of retrieving the sensitive feature by the convolutional neural network in steps S44 and A5 comprises the following sub-steps:
b1, establishing a batch of normalized feature vectors of known sensitive information to form a training set;
b2, training the weight parameters and the bias parameters of the convolutional neural network according to the training set by the following equations to obtain the trained convolutional neural network as the sensitive information detector:
Figure FDA0002561323920000041
Figure FDA0002561323920000042
wherein, w is a weight parameter of the convolutional neural network, b is a bias parameter of the convolutional neural network, J (w, b) is a loss function of the convolutional neural network, and alpha is a learning rate;
b3, detecting the normalized feature vector through a sensitive information detector to obtain detection features;
b4, calculating the similarity between the detection characteristics and the sensitive information sample set;
b5, judging whether the similarity is more than or equal to 0.5, if so, jumping to the step B6, and if not, jumping to the step B7;
b6, taking the normalized feature vector as sensitive information;
and B7, the sensitive information is an empty set.
9. The image sensitive information recognition method of claim 5, wherein the step S5 includes the following sub-steps:
s51, connecting M support vector machines in series to obtain a binary tree type multi-classifier;
s52, training a kth support vector machine through a kth class sensitive information sample set to obtain a trained binary tree type multi-classifier, wherein k is a positive integer in a closed interval [1, M ];
and S53, identifying and classifying the sensitive information through the trained binary tree type multi-classifier.
CN202010607308.8A 2020-06-30 2020-06-30 Image sensitive information identification method Pending CN111783789A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010607308.8A CN111783789A (en) 2020-06-30 2020-06-30 Image sensitive information identification method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010607308.8A CN111783789A (en) 2020-06-30 2020-06-30 Image sensitive information identification method

Publications (1)

Publication Number Publication Date
CN111783789A true CN111783789A (en) 2020-10-16

Family

ID=72761155

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010607308.8A Pending CN111783789A (en) 2020-06-30 2020-06-30 Image sensitive information identification method

Country Status (1)

Country Link
CN (1) CN111783789A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112991308A (en) * 2021-03-25 2021-06-18 北京百度网讯科技有限公司 Image quality determination method and device, electronic equipment and medium
CN117237478A (en) * 2023-11-09 2023-12-15 北京航空航天大学 Sketch-to-color image generation method, sketch-to-color image generation system, storage medium and processing terminal

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101281521A (en) * 2007-04-05 2008-10-08 中国科学院自动化研究所 Method and system for filtering sensitive web page based on multiple classifier amalgamation
CN101470897A (en) * 2007-12-26 2009-07-01 中国科学院自动化研究所 Sensitive film detection method based on audio/video amalgamation policy
CN101996314A (en) * 2009-08-26 2011-03-30 厦门市美亚柏科信息股份有限公司 Content-based human body upper part sensitive image identification method and device
CN103559511A (en) * 2013-11-20 2014-02-05 天津农学院 Automatic identification method of foliar disease image of greenhouse vegetable
CN103996046A (en) * 2014-06-11 2014-08-20 北京邮电大学 Personnel recognition method based on multi-visual-feature fusion
CN105509659A (en) * 2015-11-25 2016-04-20 淮安市计量测试所 Image-processing-based flatness detection system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101281521A (en) * 2007-04-05 2008-10-08 中国科学院自动化研究所 Method and system for filtering sensitive web page based on multiple classifier amalgamation
CN101470897A (en) * 2007-12-26 2009-07-01 中国科学院自动化研究所 Sensitive film detection method based on audio/video amalgamation policy
CN101996314A (en) * 2009-08-26 2011-03-30 厦门市美亚柏科信息股份有限公司 Content-based human body upper part sensitive image identification method and device
CN103559511A (en) * 2013-11-20 2014-02-05 天津农学院 Automatic identification method of foliar disease image of greenhouse vegetable
CN103996046A (en) * 2014-06-11 2014-08-20 北京邮电大学 Personnel recognition method based on multi-visual-feature fusion
CN105509659A (en) * 2015-11-25 2016-04-20 淮安市计量测试所 Image-processing-based flatness detection system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
崔鹏飞等: "面向网络内容安全的图像识别技术研究", 《信息网络安全》 *
袁杰等: "基于中值滤波和梯度锐化的边缘检测", 《计算机与现代化》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112991308A (en) * 2021-03-25 2021-06-18 北京百度网讯科技有限公司 Image quality determination method and device, electronic equipment and medium
CN112991308B (en) * 2021-03-25 2023-11-24 北京百度网讯科技有限公司 Image quality determining method and device, electronic equipment and medium
CN117237478A (en) * 2023-11-09 2023-12-15 北京航空航天大学 Sketch-to-color image generation method, sketch-to-color image generation system, storage medium and processing terminal
CN117237478B (en) * 2023-11-09 2024-02-09 北京航空航天大学 Sketch-to-color image generation method, sketch-to-color image generation system, storage medium and processing terminal

Similar Documents

Publication Publication Date Title
CN111401372B (en) Method for extracting and identifying image-text information of scanned document
Gao et al. Automatic change detection in synthetic aperture radar images based on PCANet
CN102081731B (en) Method and device for extracting text from image
Saba et al. Annotated comparisons of proposed preprocessing techniques for script recognition
Ghadekar et al. Handwritten digit and letter recognition using hybrid dwt-dct with knn and svm classifier
US9558403B2 (en) Chemical structure recognition tool
Chu et al. Strip steel surface defect recognition based on novel feature extraction and enhanced least squares twin support vector machine
Dhinesh et al. Detection of leaf disease using principal component analysis and linear support vector machine
CN111783789A (en) Image sensitive information identification method
CN110163182A (en) A kind of hand back vein identification method based on KAZE feature
Jia et al. Document image binarization using structural symmetry of strokes
Dhar et al. Paper currency detection system based on combined SURF and LBP features
CN113989196B (en) Visual-sense-based method for detecting appearance defects of earphone silica gel gasket
George et al. Leaf identification using Harris corner detection, SURF feature and FLANN matcher
Gui et al. A fast caption detection method for low quality video images
Dai et al. Scene text detection based on enhanced multi-channels MSER and a fast text grouping process
Ali et al. Different handwritten character recognition methods: a review
CN105512682B (en) A kind of security level identification recognition methods based on Krawtchouk square and KNN-SMO classifier
Fernandez et al. Classifying suspicious content in Tor Darknet
Xu et al. Coin recognition method based on SIFT algorithm
Abdoli et al. Offline signature verification using geodesic derivative pattern
Araújo et al. Segmenting and recognizing license plate characters
Myint et al. Handwritten signature verification system using Sobel operator and KNN classifier
Lin et al. Coin recognition based on texture classification on ring and fan areas of the coin image
Romero et al. Wavelet-based feature extraction for handwritten numerals

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination