Disclosure of Invention
The invention provides a method and a device for detecting the defects of the denim by fusing multiple images, which aims to effectively solve the problems of low accuracy and low speed of the existing denim defect detection, fuses a template image, a front light source image and a back light source image, makes the characteristics of the defects richer and more prominent, and improves the recognition accuracy based on a deep learning network, thereby realizing the quick and accurate detection of the denim defects.
In order to achieve the above object, a first aspect of the present invention provides a method for detecting defects of multi-image fused denim, the method comprising the following steps:
step 1: collecting denim layout image data through a cloth inspecting mechanism, constructing a data set of the denim layout image data, and manually marking defective image data in the denim layout image data through a marking mechanism;
step 2: defining an image obtained by a shooting mechanism under the action of a light source mechanism on the front side of the machine body as a front light source image;
the image obtained by the shooting mechanism under the action of the light source mechanism on the back of the machine body is a back light source image;
the front light source image and the back light source image correspond to each other one by one to form a group of denim images;
and step 3: preprocessing a plurality of groups of the denim layout images to expand the data set to obtain a balanced data set;
and 4, step 4: carrying out three-channel processing of MinPooling strengthening and difference value combination on the balanced data set, and training a neural network by adopting the balanced data set subjected to the three-channel processing of MinPooling strengthening and difference value combination;
and 5: processing the ROI selected by the neural network training by using an OHEM algorithm;
step 6: and detecting the denim by using the trained neural network, and automatically marking the denim with the detected flaws.
Further, the pretreatment of step 3 comprises:
step 3.1: carrying out weighted average on the denim image through Gaussian filtering to obtain a data set for eliminating Gaussian noise;
step 3.2: equalizing the denim layout image to enhance the image contrast;
step 3.3: expanding the highlight area or the white part in the jean layout image through expansion;
step 3.4: carrying out edge padding processing on the denim picture, setting a pixel value to be 0 at the edge of the image, and filling the black background;
and randomly cutting the jean picture, and removing an image frame of the jean picture after random cutting.
Further, the MinPooling enhancement of step 4 comprises:
and setting the pixel value of the denim picture as a negative value, performing maximum pooling on the denim picture, and further taking the pixel value of the denim picture as a negative value again to strengthen the denim detail texture in the denim picture.
Further, the difference combining three-channel processing in step 4 includes:
step 4.2.1: setting a denim template image;
step 4.2.2: setting a fused image, wherein the fused image has three channels;
the first channel is a denim picture to be detected;
the second channel is a denim template image;
the third channel is a difference value image obtained by performing weighted difference value operation on the pixel matrixes of the first channel and the second channel;
step 4.2.3: and inputting the fused image into a neural network.
Further, the neural network in the step 4 comprises two parallel characteristic extraction network models based on fast-RCNN;
respectively taking the collected front light source image and back light source image as input, correspondingly respectively taking ResNeXt50 and ResNet50 as a backbone, adding deformable convolution, and then using Positionencoding to encode position information of the features;
and extracting two groups of features, and correspondingly connecting the two groups of features to form an output feature vector of the feature extraction network.
The invention provides a multi-image fused denim flaw detection device, which comprises a machine body, a supporting mechanism, a rotating mechanism, a motor and a transmission device, and is characterized in that the supporting mechanism is arranged on the machine body and comprises a supporting plate for placing denim;
the motor is arranged in the machine body, is connected with the rotating mechanism through a transmission device and drives the rotating mechanism to rotate, and the rotating mechanism is fixed at the upper part of the supporting mechanism;
the cloth inspecting mechanism is further arranged on the upper portion of the machine body and comprises a shooting mechanism, a light source mechanism and a marking mechanism, the shooting mechanism is used for establishing image data, the light source mechanism is respectively arranged on the front side and the back side of the machine body, the light source mechanism is used for providing sufficient illumination for the shooting mechanism, and the marking mechanism is used for marking the defective image data.
Through the technical scheme, the invention has the beneficial effects that:
the method comprises the steps of firstly, collecting denim layout image data through a cloth inspecting mechanism, constructing a data set of the denim layout image data, manually marking defective image data in the denim layout image data through a marking mechanism, enabling the data set of the denim layout image data to comprise a front light source image and a back light source image, enabling the front light source image and the back light source image to correspond to each other one by one to form a group of denim images, and then preprocessing a plurality of groups of denim images to expand the data set to obtain a balanced data set;
carrying out MinPooling strengthening and difference value combination three-channel processing on a balanced data set in order to strengthen the detail texture of the denim, training a neural network by adopting the balanced data set subjected to the MinPooling strengthening and difference value combination three-channel processing, processing a selected ROI (region of interest) trained by the neural network by using an OHEM (OHEM algorithm), finally detecting the denim by using the trained neural network, and automatically marking the denim with the detected flaw.
The method is based on a deep learning method, and combines a front light source image and a back light source image, so that a detection model obtains more multidimensional information input, the method has important significance for multi-dimensional identification of flaw characteristics, template information and flaw information are combined, the method has important significance for improving flaw detection precision, a Loss function is optimized, and identification of small flaws is greatly improved.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly described below with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.
Example 1
As shown in fig. 1, a method for detecting defects of multi-image fused denim, the method comprising:
step 1: collecting denim layout image data through a cloth inspecting mechanism, constructing a data set of the denim layout image data, and manually marking defective image data in the denim layout image data through a marking mechanism;
step 2: defining an image obtained by a shooting mechanism under the action of a light source mechanism on the front side of the machine body as a front light source image;
the image obtained by the shooting mechanism under the action of the light source mechanism on the back of the machine body is a back light source image;
the front light source image and the back light source image are in one-to-one correspondence to form a group of denim images;
and step 3: preprocessing a plurality of groups of the denim layout images to expand the data set to obtain a balanced data set;
and 4, step 4: carrying out MinPooling reinforcement and difference value combination three-channel processing on the balanced data set, and training a neural network by adopting the balanced data set subjected to the MinPooling reinforcement and difference value combination three-channel processing;
and 5: processing the ROI selected by the neural network training by using an OHEM algorithm;
step 6: and detecting the denim by using the trained neural network, and automatically marking the denim with the detected flaws.
In the invention, a front light source image and a back light source image are introduced to increase multi-dimensional information of image recognition, and meanwhile, a neural network is trained by adopting a balanced data set processed by MinPooling reinforcement and difference value combination three channels, so that the feature extraction capability of the model is improved, the denim flaw detection result is more accurate, and the performance is excellent particularly on the display of tiny flaw features.
Example 2
Since the image acquisition of denim defects is costly and difficult to obtain, the dataset is augmented in this embodiment by preprocessing based on embodiment 1 above, specifically:
step 3.1: carrying out weighted average on the denim image through Gaussian filtering to obtain a data set for eliminating Gaussian noise; the Gaussian filtering can be used for carrying out weighted average on the whole image, namely the value of each pixel point is obtained by carrying out weighted average on the value of each pixel point and other pixel values in a neighborhood;
step 3.2: the denim layout image is equalized to enhance the image contrast, so that the denim layout image is clearer relative to the original image, and the number of data sets is increased to train a program.
Step 3.3: expanding the highlight area or the white part in the jean layout image through expansion; the operation result graph is made larger than the highlight area of the original graph.
Step 3.4: carrying out edge padding processing on the denim picture, setting a pixel value to be 0 at the edge of the image, and filling the black background;
and randomly cutting the jean picture, and removing an image frame of the jean picture after random cutting.
Through the steps, under the condition that the characteristics of the image are not influenced, the accuracy of the model can be improved, the robustness of the model can be enhanced, the scale invariance and the direction invariance of the model can be enhanced by randomly turning the image at an angle, and the capability of the model for identifying the image is enhanced.
Example 3
Based on the above embodiment 1, in order to enhance the display capability of the small defect characteristics of the denim fabric to detect the small defect of the denim fabric, the step 4 is optimized in this embodiment, specifically:
step 4 the MinPooling enhancement comprises:
and setting the pixel value of the denim picture as a negative value, performing maximum pooling on the denim picture, and further taking the pixel value of the denim picture as a negative value again to strengthen the denim detail texture in the denim picture.
As an implementation manner, the differential combining three-channel processing includes:
step 4.2.1: setting a denim template image;
step 4.2.2: setting a fused image, wherein the fused image has three channels;
the first channel is a denim picture to be detected;
the second channel is a denim template image;
the third channel is a difference image obtained by performing weighted difference operation on the pixel matrix of the first channel and the pixel matrix of the second channel;
step 4.2.3: and inputting the fused image into a neural network.
Example 4
Based on the above embodiments, step 5 is optimized for constructing a neural network, specifically:
step 4, the neural network comprises two parallel characteristic extraction network models based on fast-RCNN;
respectively taking the collected front light source image and back light source image as input, correspondingly respectively taking ResNeXt50 and ResNet50 as a backbone, adding deformable convolution, and then using Positionencoding to encode position information of the features;
and extracting two groups of features, and correspondingly connecting the two groups of features to form an output feature vector of the feature extraction network.
In this embodiment, as shown in fig. 2, a candidate box and a picture in training are used as input, two identical ROI networks are established for the ROI network layer, one of the readable and writable networks performs forward calculation on all ROIs, and the readable and writable networks perform not only forward calculation but also backward propagation calculation on selected hardRoIs;
and then updating the network parameters using a random gradient descent.
The Faster R-CNN mainly comprises the following steps:
step S01: inputting the picture into a Convolutional Neural Network (CNN) to obtain a Feature map (Feature map);
step S02: inputting the extracted convolution characteristics into a region generation network (RPN) to obtain characteristic information of a candidate frame;
step S03: pooling (RoI pooling) is carried out on the extracted feature information in the candidate frame to obtain an image with a fixed size;
step S04: for the candidate frames of a specific category, judging whether the candidate frames belong to a certain category by using a classifier, and further adjusting the position information by using a regressor;
wherein the step S01 specifically includes:
step S01.1: scaling the picture to a fixed size of mxn, and then feeding the mxn image into convolutional Layers (Conv Layers);
step S01.2: details of obtaining the Feature map (Feature map) include:
features were extracted using ResNet50, resNeXt50, respectively.
The ResNet50 includes 5 levels corresponding to Conv1 to Conv5, where a Conv1 level (kernel _ size) =7, pad (fill pixel) =3, stride (step size) = 2) is passed first, then a posing level is passed, and finally a Bottleneck (Bottleneck level) output profile from Conv2 to Conv5 is passed.
The bottle neck layer introduced above is a network structure, and all 4 layers (Conv 2 to Conv 5) of ResNet50 contain the structure, each bottle neck is formed by 1 × 1, 3 × 3 and 1 × 1 sequential convolutions, the number of bottle neck convolutions in different layers is different, the total number of bottle neck convolutions is also different, and Conv2 to Conv5 respectively contain 3, 4, 6 and 3 bottle neck convolutions.
The ResNeXt50 network structure is similar to ResNet50, with the difference that the second convolution of each Bottleneck is replaced by a packet convolution;
the step S02 specifically includes:
step S02.1: firstly, the feature map is convoluted by (3 x 3) to obtain a (256 x M) feature vector
Step S02.2: performing convolution twice (1 × 1) on the obtained (256 × M) feature vectors to obtain a (18 × M) feature map and a (36 × M) feature map, respectively, corresponding to (M × 9) results, wherein each result contains 2 scores (scores of the foreground and the background) and 4 coordinates (offset amount to original coordinates), and corresponds to 9 anchor frames (anchors) of each point of the input feature map;
step S02.2 introduces an anchor box (anchors), which is a set of generated rectangular boxes, wherein each anchor box (anchors) has 4 values of (x) respectively1,y1,x2,y2) The coordinates of the upper left corner and the lower right corner of the rectangular frame are respectively represented, the rectangular frame can be divided into three types with the length-width ratio of 1: 1, 1: 2 and 2: 1, and the anchor frames (anchors) are arranged to judge which anchor frames (anchors) are positive anchor frames (positive anchors) and which anchor frames are negative anchor frames (negative anchors).
Step S02.3: inputting the obtained (18 XM × M) feature map into a readjustment matrix layer (reshape layer), then inputting the feature map into a classification layer (softmax layer) to obtain a classified image, and finally inputting the reshape layer (reshape layer) to obtain a candidate region (propofol)
Given anchor S = (S)x,Sy,Sw,Sh) By panning and zooming operations (d)x(S),dy(S),dw(S),dh(S)) to make the regression window (G') closer to the real window (G)
Translation:
G′X=SW·dx(S)+SX
G′y=Sh·dy(S)+Sy
zooming:
G′w=SW·exp(dw(S))
G′y=Sh·exp(dh(S))
obtaining d by linear regressionx(S),dy(S),dw(S),dh(S), the objective function is shown in formula (1):
wherein d is
*(S) is a predicted value, W
*Is a parameter of the learning that is,
is a Feature vector composed of Feature maps (Feature maps) of the antors;
the L1 loss function is shown in equation (2):
step S02.6: generating anchor frames (anchors), and performing border regression (bounding box regression) on all the anchor frames (anchors);
step S02.7: sorting anchor frames (anchors) from large to small according to input positive sample softmax regression scores (positive softmax scores), and extracting positive anchor frames (positive anchors) after correction positions;
step S02.8: taking positive anchor frames (positive anchors) beyond the boundary as the image boundary, and removing the positive anchor frames (positive anchors) with very small size;
step S02.9: performing Non-Maximum Suppression (Non-Maximum Suppression) on the remaining positive anchor frames (positive anchors);
step S02.10: outputting the result of the corresponding classifier as a candidate region (pro visual);
the step S03 specifically includes:
step S03.1: mapping the M N size candidate regions (nanoposals) back to the (M/16) × (N/16) size;
step S03.2: dividing a Feature map (Feature map) region corresponding to each candidate region (propofol) into a network of pooling width (pooled _ w) x pooling height (pooled _ h) horizontally;
step S03.3: performing maximum pooling (max pooling) on each part of the grid, and outputting a pooled _ w × pooled _ h size;
the step S04 is specifically embodied as:
step S04.1: inputting candidate region Feature maps (proxy features maps) into the network, and classifying the selected regions (proxies) through full connection and softmax;
step S04.2: performing border regression/boundary box regression (bounding box regression) on the candidate regions (nanopouls) to obtain a high-precision target detection box.
Example 5
The embodiment of the invention provides a multi-image fused denim flaw detection device which is shown in figure 3 and comprises a machine body, a supporting mechanism, a rotating mechanism, a motor and a transmission device, and is characterized in that the supporting mechanism is arranged on the machine body and comprises a supporting plate for placing denim;
the motor is arranged in the machine body, is connected with the rotating mechanism through a transmission device and drives the rotating mechanism to rotate, and the rotating mechanism is fixed at the upper part of the supporting mechanism;
the cloth inspecting mechanism is arranged on the upper portion of the machine body and comprises a shooting mechanism, a light source mechanism and a marking mechanism, the shooting mechanism is used for establishing image data, the light source mechanism is respectively arranged on the front side and the back side of the machine body, the light source mechanism is used for providing sufficient illumination for the shooting mechanism, and the marking mechanism is used for marking the image data with flaws.
The above-described embodiments are merely preferred embodiments of the present invention, and not intended to limit the scope of the invention, so that equivalent changes or modifications in the structure, features and principles described in the present invention should be included in the claims of the present invention.