CN113705359B - Multi-scale clothes detection system and method based on drum images of washing machine - Google Patents
Multi-scale clothes detection system and method based on drum images of washing machine Download PDFInfo
- Publication number
- CN113705359B CN113705359B CN202110883847.9A CN202110883847A CN113705359B CN 113705359 B CN113705359 B CN 113705359B CN 202110883847 A CN202110883847 A CN 202110883847A CN 113705359 B CN113705359 B CN 113705359B
- Authority
- CN
- China
- Prior art keywords
- module
- convolution
- layer
- feature
- clothes
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 49
- 238000005406 washing Methods 0.000 title claims abstract description 38
- 238000000034 method Methods 0.000 title claims abstract description 33
- 238000005070 sampling Methods 0.000 claims description 17
- 230000006870 function Effects 0.000 claims description 14
- 239000013598 vector Substances 0.000 claims description 11
- 238000011176 pooling Methods 0.000 claims description 9
- 238000004422 calculation algorithm Methods 0.000 claims description 8
- 238000012512 characterization method Methods 0.000 claims description 5
- 230000010354 integration Effects 0.000 claims description 5
- 230000008569 process Effects 0.000 claims description 4
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 claims description 3
- 238000009795 derivation Methods 0.000 claims description 3
- 238000006073 displacement reaction Methods 0.000 claims description 3
- 230000002708 enhancing effect Effects 0.000 claims description 3
- 238000012549 training Methods 0.000 claims description 3
- 230000004931 aggregating effect Effects 0.000 claims description 2
- 238000000605 extraction Methods 0.000 claims description 2
- 230000007246 mechanism Effects 0.000 claims description 2
- 230000008447 perception Effects 0.000 claims description 2
- 238000007781 pre-processing Methods 0.000 claims description 2
- 230000000638 stimulation Effects 0.000 claims description 2
- 230000009466 transformation Effects 0.000 claims description 2
- 230000000694 effects Effects 0.000 abstract description 6
- 238000010586 diagram Methods 0.000 description 7
- 238000005516 engineering process Methods 0.000 description 5
- 230000008859 change Effects 0.000 description 4
- 238000013527 convolutional neural network Methods 0.000 description 4
- 238000013461 design Methods 0.000 description 3
- 210000003128 head Anatomy 0.000 description 3
- 238000003709 image segmentation Methods 0.000 description 3
- 230000002776 aggregation Effects 0.000 description 2
- 238000004220 aggregation Methods 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 230000000295 complement effect Effects 0.000 description 2
- 230000006835 compression Effects 0.000 description 2
- 238000007906 compression Methods 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 238000009826 distribution Methods 0.000 description 2
- 230000004927 fusion Effects 0.000 description 2
- 238000005286 illumination Methods 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 238000004088 simulation Methods 0.000 description 2
- 239000000243 solution Substances 0.000 description 2
- 235000005807 Nelumbo Nutrition 0.000 description 1
- 240000002853 Nelumbo nucifera Species 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000007635 classification algorithm Methods 0.000 description 1
- 238000013329 compounding Methods 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 238000010790 dilution Methods 0.000 description 1
- 239000012895 dilution Substances 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 239000004744 fabric Substances 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000006116 polymerization reaction Methods 0.000 description 1
- 238000013138 pruning Methods 0.000 description 1
- 210000001525 retina Anatomy 0.000 description 1
- 230000002207 retinal effect Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000003860 storage Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biophysics (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a multi-scale clothes detection system and method based on a drum image of a washing machine, and belongs to the technical field of 2D image target detection. The system comprises: the improved ResNet network module, the feature enhancement module SRM, the dynamic receptive field DRF module and the dynamic deformable convolution DDH module; when clothes detection is realized, firstly, a high-quality shallow layer characteristic is obtained by utilizing an improved ResNet network module and an SRM module, and positioning information of a clothes target is reserved to the maximum extent by carrying out regression operation on the shallow layer characteristic; a pyramid structure with stronger semantic information is constructed through the DRF module, and clothes targets are classified and further positioned and calibrated while the characteristics of each size are comprehensively utilized; the offset effect of the DDH module on the detection frame enriches the diversity of prediction scales; the invention effectively improves the identification and classification capability of the clothes of the drum washing machine, improves the detection precision of the clothes, and can be applied to the detection scene of complex clothes in the washing machine.
Description
Technical Field
The invention relates to a multi-scale clothes detection system and method based on a drum image of a washing machine, and belongs to the technical field of target detection of 2D images.
Background
The traditional washing machine does not have a 'comet' function, and a washer needs to manually set a washing mode according to the known clothes type and through self experience values; the EnX-Pu semiconductor develops an intelligent washing machine demonstration model by adopting RFID and NFC technology on a global embedded system exhibition held by Nelumbo, germany, and the washing machine can read information about the type, color and the like of fabric fibers from buttons with built-in RFID tags, so that a washing program is optimized, but the technology needs to modify clothes; the method comprises the steps that a high-definition camera is placed in a washing machine, an image of clothes to be washed is acquired through the camera, the problem is converted into the problems in the fields of image segmentation and texture image classification, and the information of the clothes quantity and the clothes in the washing machine is obtained by designing an image segmentation algorithm and a texture image classification algorithm based on a convolutional neural network; however, the scheme needs to design two deep convolutional neural networks, namely an image segmentation network and an image classification network, and has high computational complexity; and the clothes are arranged in a manual and regular manner and are not in a natural state that various clothes are mutually shielded in the internal environment of the washing machine, so that the clothes are not suitable for an actual washing scene.
The advent of deep learning object detection technology makes it possible to directly learn the image characteristics of laundry through only one network, and find the laundry object in the image based on the characteristics. The technology is widely applied to common fields such as pedestrian detection, vehicle detection, face detection, retail commodity detection and the like, is also a prepositive technology for tracking and other high-level vision applications, and has huge market demand and application value.
Current target detection techniques fall into two main categories:
(1) Two-stage algorithm, which is mainly R-CNN and variants thereof, needs to rely on generated candidate region suggestions, generates a priori frame with possible targets through an RPN network, and then utilizes a subsequent detection network to predict the category and adjust the position coordinates of the candidate frame. The two-stage structure enables the generation of positive and negative samples to be more balanced, and has excellent detection precision in a secondary correction mode, but has the problem of low speed.
(2) The single-stage algorithm divides the picture into smaller squares, each square is provided with a fixed preset prior frame (anchor), and objects in the picture are distributed to different squares and then reclassified, so that the types and positions of different objects can be directly predicted by using only one CNN network, the execution speed is excellent, and the problem of low precision exists.
However, there is no detection network specifically adapted to the laundry image of the washing machine, so the most commonly used or universal target detection networks, such as two-stage network FASTER RCNN and single-stage network YOLO series (Chen Yaya, meng Chaohui. FashionAI garment attribute identification based on target detection algorithm [ J ]. Computer system application, 2019.); however, as the detail information of the clothing is rich, the similarity between the attributes is high, and the accuracy and the precision of identification and classification are seriously affected by external interference factors such as illumination, so that the effect of the identification and classification of the clothing attributes is directly affected by some detail designs of the general target detection framework; in addition, the scale variability caused by irregular placement of clothes in a roller environment cannot be well matched with a multi-scale target by a general target detection model, and positioning inaccuracy is easily caused.
Disclosure of Invention
In order to solve the problems of weak positioning and classifying capability and low recognition accuracy of the existing clothes detection method of the washing machine, the invention firstly provides a multi-scale clothes detection system based on a drum image of the washing machine, which comprises the following components:
the improved ResNet network module, the feature enhancement module SRM, the dynamic receptive field DRF module and the dynamic deformable convolution DDH module;
The improved ResNet network module is connected with the feature enhancement module SRM, a four-layer multi-scale pyramid structure is constructed on the basis of the output features of the feature enhancement module SRM, and the dynamic receptive field DRF module is used for connecting all feature layers of the four-layer multi-scale pyramid; the dynamic deformable convolution DDH module is connected with the dynamic receptive field DRF module;
the DRF module comprises multi-branch convolutions with different sizes
Optionally, the improved ResNet network module includes:
A 2D convolution layer with 7×7 convolution kernel and 1 step length, a maximum pooling layer with 3×3 convolution kernel and 2 step length, and 4 convolution layers connected in series; each of the 4 convolutional layers is formed by stacking residual blocks of different layers, the layers are 3,4, 23 and 3, and the output features are taken from the third layer and the fourth layer of the 4 convolutional layers.
Optionally, the method detects laundry in a washing machine by using the multi-scale laundry detection system based on drum images of the washing machine according to any one of claims 1-2, the method comprising:
step one: preprocessing an input washing machine drum image;
Step two: performing feature extraction on the drum image of the washing machine after the pretreatment in the step one by using an improved ResNet network module, and outputting feature layers with 8 times of downsampling rate and 16 times of downsampling rate;
Step three: sending the feature layer extracted in the second step into a feature enhancement module SRM to aggregate information so as to obtain shallow features with stronger characterization capability;
step four: inputting the shallow features obtained in the step three into a four-layer multi-scale pyramid structure, wherein the shallow features pass through a DRF module among layers of the four-layer multi-scale pyramid, and finally output features of feature layers of the pyramid are obtained;
Step five: carrying out multi-scale regression operation on the shallow features obtained in the step three, and carrying out coarse positioning on clothes by utilizing shallow feature information to obtain a prediction frame;
Step six: utilizing a dynamic deformable convolution DDH module to offset the output characteristics of each characteristic layer of the pyramid in the fourth step;
Step seven: taking the prediction frame obtained in the fifth step as a default frame of each feature layer of the four-layer multi-scale pyramid, and adjusting the default frame by using the offset generated by the DDH module in the sixth step;
step eight: performing secondary regression and classification by using the DDH module;
Step nine: and step five and step eight, the regression loss functions are synthesized and trained together, and finally, the classification and accurate positioning information of clothes are output.
Optionally, the step three of aggregating information includes:
Where S 3 is the output characteristic of the third layer of the modified ResNet101 network at 8 times the downsampling rate, S 4 is the output characteristic of the fourth layer of the modified ResNet101 network at 16 times the downsampling rate, f k×k () is a kxk convolution operation, For element-wise addition, C (-) is the channel stack, U (-) is the upsampling operation, y is the output feature of the aggregate two layer feature at 8 times the downsampling rate.
Optionally, the calculating of the DRF module in the fourth step includes:
where x is the upper layer output feature of each layer in the pyramid structure, For k x k convolution of expansion rate r, i represents the ith branch of the DRF module, W 1 [ i ] and W 2 [ i ] are weight parameters obtained by self-learning of the network on the ith branch,Representing a stack of n+1 feature maps, U is the output feature of the DRF module.
Optionally, the multi-scale regression algorithm of the fifth step includes:
S1: carrying out maximum pooling operation for 4 times on the output characteristic y of the third step to obtain four scales consistent with the four-layer pyramid characteristics in the fourth step;
Dk=f3×3(Mk(y)),k=0,1,2,3
Wherein M k (-) represents that k maximum pooling operations are performed, the downsampling rate of which is 2 3+k;Dk as an output feature; the number of channels is N box ×4, representing N box default frame centers and 4 offsets of width and height configured for each pixel point of the output feature D k;
S2: splicing the predicted results of each D k to obtain an integrated vector l of the predicted results;
s3: the smooth L1 function was used for l as regression loss:
Where cx, cy, w, h are the center and width-height coordinates of the default box, N is the total number of default boxes, l is the integrated vector of all D k predictors, representing 4 prediction offsets for all N default boxes, 4 Offsets for the corresponding known real box relative to the default box;
s4: the network performs reverse derivation according to the loss function of S3 in the training process, thereby reducing the l and the l The difference of the two is that the integration vector l of the more accurate prediction result is finally obtained.
Optionally, the calculating of the DDH module in the step six includes:
Wherein R defines the region and relative position of the receptive field, centered on the (0, 0) coordinates, r= { (-1, -1), (-1, 0),., (0, 1), (1, 1) }; p n is an enumeration of the positions listed in R, w ()'s are weight values of the corresponding positions in the convolution kernel, I ()'s are input characteristic values of the corresponding positions, and O ()'s are output characteristic values of the corresponding positions; the offset Δp n is obtained by performing 3×3 convolution on D k obtained in step five S1, and the number of output channels is kxkx2, which represents an offset parameter for each position in the convolution kernel of k.
Optionally, the formula for adjusting the default box in the step seven includes:
cx*=cx+Δp|x+lcx×w
cy*=cy+Δp|y+lcy×h
Wherein cx, cy, w, h are the center and width-height coordinates of the default frame after adjustment, Δp| x and Δp| y are the components of the DDH module offset Δp with respect to the x and y directions, and l cx、lcy、lw、lh is the prediction bias of the center and width-height coordinates of the default frame.
Optionally, the quadratic regression loss in the step eight is:
the classification loss is:
In the middle of Indicating whether the z-th prediction box and the j-th real box match with respect to the category t,/>For the softmax penalty of category confidence, N pos and N neg are the number of positive and negative samples, respectively, the positive sample being a prediction box containing laundry targets and the negative sample being a prediction box not containing laundry targets;
optionally, the integrated loss function in the step nine is:
optionally, the default frame sizes of the 4 feature layers in the fifth step are 32×32, 64×64, 128×128, 256×256, respectively.
Optionally, in the first step, the input size of the drum image of the washing machine is uniformly scaled to 512×512.
Optionally, the feature enhancement module in the third step is a multi-connection structure, and the shallow features with richer detail information are generated by obtaining multi-granularity information through stacking between adjacent layers.
Optionally, in the fourth step, the number of output channels of each layer of the multi-scale pyramid is 256.
Optionally, in step six, the DDH includes a short connection branch with spatial self-attention, and is implemented by a simple convolution of 3×3, so that the network can dynamically allocate weights to each scale object based on the distribution of the current features, and the final detection result can be more accurate.
The invention has the beneficial effects that:
Aiming at the problems of weak positioning and classifying capability and low recognition precision of the existing clothes detection method of the washing machine, the invention provides a multi-scale clothes detection system based on a drum image of the washing machine, and provides a multi-scale clothes detection method based on the drum image of the washing machine based on the system; the improved ResNet network changes the first layer convolution into a 7×7 large convolution with a step length of 1 to prevent excessive loss of clothing detail information, and extracts the third layer and the fourth layer to ensure sufficient clothing semantic information; the feature enhancement module obtains shallow layer features with stronger characterization capability in a feature aggregation mode so as to integrate the details and semantic information of the third layer and the fourth layer, so that the extracted clothing feature information is richer; the DRF module constructs a multi-scale pyramid structure with stronger semantic information, and the classification capability of the detection system on complex clothes is improved by deepening the network layer number and adaptively adjusting the receptive field; the DDH module has the offset effect on the positioning frame, so that the diversity of prediction scales is enriched, and the detection system has better adaptability to clothes with different scales. The multi-scale clothes detection system and method for the drum images of the washing machine effectively improve the identification and classification capacity of clothes of the washing machine and improve the clothes detection precision.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a diagram of a modified resnet network architecture.
FIG. 2 is a schematic diagram of a feature enhancement module according to the present invention.
Fig. 3 is a schematic diagram of a DRF module according to the present invention.
Fig. 4 is a diagram of a DDH module according to the present invention.
Fig. 5 is a schematic diagram of the default frame offset effect.
Fig. 6 is a diagram of an overall network framework provided by the present invention.
Fig. 7 is a diagram showing a detection effect of the network on complex clothes.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the embodiments of the present invention will be described in further detail with reference to the accompanying drawings.
Embodiment one:
The embodiment provides a multi-scale clothes detection system and method based on a drum image of a washing machine, which are used in parameter recommendation of an intelligent washing machine, the system and the method are based on a deep learning framework, start from a 2D RGB image, utilize an improved ResNet network to extract characteristics, construct multi-scale information by enhancing the extracted characteristics, carry out regression and classification operation in two stages, and enhance the discrimination capability of the network on complex clothes through a cascading transmission process, adapt to the change of clothes of each scale, and improve the detection performance.
The 2D RGB image is obtained by shooting with a high-definition camera, and the resolution is 1920 x 1080.
The following describes the system setup procedure in terms of the system's modules, architecture, and network loss function, respectively:
(1) Module of system
As shown in fig. 1, the improved ResNet network module includes: a convolution kernel is 7 multiplied by 7, the 2D convolution with the step length of 1 is followed by pooling layers, and then 4 convolution layers are connected in series; each convolution layer is formed by stacking residual blocks of different layers of ResNet network, the layers are 3,4, 23 and 3, and the output characteristics are the convolution blocks of the third layer and the fourth layer.
A specific structure of the feature enhancement module (SRM module) is shown in fig. 2. The module is a lightweight multi-connection module for enhancing shallow feature representations, including multiplexing of up-sampling connections, down-sampling connections, and constant resolution connections. The input layers selected are from shallow features extracted by ResNet network 101, a third layer with 8 times down sampling rate and a fourth layer with 16 times down sampling rate.
In order to alleviate information dilution caused by up-sampling operation, a cascade fusion mode is adopted to perform 1×1 convolution operation on a third layer to perform element-by-element addition fusion with the same size as the fourth layer to complement the information of the fourth layer. And the fourth layer of the features after the complement is stacked with the third layer of the features by adopting bilinear interpolation up-sampling to 8 times of the down-sampling rate. By this operation multi-granularity information from adjacent layers is integrated, resulting in high quality final features.
The specific structure of the dynamic receptive field module (DRF module) is shown in fig. 3. The modular design concept stems from a study of the human superficial retina, i.e., population receptive field size increases as retinal eccentricity increases. The main realization is that the eccentricity is simulated by multi-branch convolution of Inception structures, while the cavity convolution is used for simulating the relation between the perception scale and the eccentricity. Information of different scales is first captured by multi-branch convolutions of 1 x 1, 3 x 3,5 x 5 sizes, where to mitigate the parameter number, the 1 x 1 convolution is used for the dimension reduction of the channel and the 5 x 5 convolution is replaced by two 3 x 3 convolutions. Then, self-learning vectors are introduced, and the weight of each scale is distributed by using a soft attention mechanism so as to simulate local stimulation aiming at different scales. Similarly, the self-learning vector is weighted according to global stimulus to perform weight selection on the hole convolutions with different expansion rates so as to adaptively adjust the receptive field according to the stimulus. Therefore, a smaller convolution kernel is used for giving a larger weight to a weight closer to the convolution center, so that a larger receptive field is obtained, more context information is captured, and the generalization capability of the model to different scales is improved.
The specific structure of the dynamically deformable convolution detector head module (DDH module) is shown in fig. 4. The module uses deformable convolution to solve the problem of fixed geometry of the convolution network, which is limited to model geometry transformation. The position of each sampling point in the convolution kernel is added with an offset variable by further displacement adjustment of the spatial sampling position information in the module, so that the sampling area is freely adjusted, and the method is not limited to the previous regular lattice points. Furthermore, global-based spatial self-attention is then implemented through a simple 3×3 shortcut connection, enabling the network to dynamically assign weights to scale objects appropriately based on the distribution of current features. The operation enables the network to generate different offset values for shallow regression frames with different scales, and carries out corresponding displacement for the characteristic pixel points according to the different offset values, so that default frames arranged on the corresponding pixel points are correspondingly displaced, the network can generate different search ranges for the different default frames, and further fine tuning and matching with targets are carried out, so that the detection performance of the network on clothes examples with variable scales is improved.
(2) System architecture
The overall structure of the system is shown in fig. 6, and mainly comprises four parts:
The first part reserves a third layer which is 8 times downsampled and a fourth layer which is 16 times downsampled on the basis of ResNet networks, and shallow layer characteristics with stronger characterization capability are obtained by sending the third layer and the fourth layer into a designed characteristic enhancement module (SRM) for information aggregation.
The second part constructs the enhancement features through the designed dynamic receptive field module (DRF) into multi-scale features with 8 times, 16 times, 32 times and 64 times downsampling rates. By adaptively compounding information on different receptive fields, a dynamic multi-scale pyramid with rich semantic information is constructed.
And the third part carries out multi-scale regression operation based on the enhanced feature information, and takes the regression result as a candidate frame of the corresponding feature of the dynamic multi-scale pyramid. Default boxes derived from shallow regression results are classified and trimmed by multi-scale pyramid features.
The fourth section introduces a dynamically deformable convolution detector head module (DDH module) as the output layer of the pyramid feature.
(3) Network loss function
After the network model is established, the following steps are executed to complete the clothes detection process;
A high-definition camera is adopted to shoot and obtain a 2D RGB image of clothes in a drum of the washing machine, and the resolution is 1920 x 1080;
Step one: data enhancement, namely scaling an input picture to 512 x 512, and carrying out random up-down left-right overturn, brightness change, fuzzy treatment and illumination change;
Step two: extracting features of the 2D input image after the enhancement in the step one by using a ResNet network modified as shown in fig. 1, and outputting feature layers with 8 times of downsampling rate and 16 times of downsampling rate;
step three: the feature layer extracted in the second step is sent to a feature enhancement module shown in fig. 2 to aggregate information, so as to obtain shallow features with stronger characterization capability;
step four: inputting the shallow features obtained in the step three into a four-layer multi-scale pyramid structure, wherein the shallow features pass through a DRF module among layers of the four-layer multi-scale pyramid, and finally output features of feature layers of the pyramid are obtained;
Step five: carrying out multi-scale regression operation on the step polymerization characteristics, and carrying out coarse positioning on clothes by utilizing shallow characteristic information to obtain a prediction frame; the multiscale regression algorithm is as follows:
Input: outputting a characteristic y;
And (3) outputting: integration of 4 prediction biases for a multi-scale total of N default boxes;
s1: carrying out maximum pooling operation on y for 4 times to obtain four scales consistent with the four-layer pyramid features in the fourth step;
Dk=f3×3(Mk(y)),k=0,1,2,3
Wherein M k () represents that k maximum pooling operations are performed, the downsampling rate is 2 3+k.Dk, the channel number is N box ×4, and represents N box default frame centers and 4 offsets of width and height configured for each pixel point relative to the output feature D k;
S2: splicing the predicted results of each D k to obtain an integrated vector l of the predicted results;
S3: the smoothL1 function was used as regression loss for l:
Where cx, cy, w, h are the center and width-height coordinates of the default box, N is the total number of default boxes, l is the integration of all D k predictors, representing 4 prediction offsets for all N default boxes, 4 Offsets of the corresponding real frame relative to the default frame;
s4: the network performs reverse derivation according to the loss function of S3 in the training process, thereby reducing the l and the l The difference of the two is that the integration vector l of the more accurate prediction result is finally obtained;
step six: the output features of the feature layers of the pyramid of step four are shifted using the DDH module as in fig. 4.
Step seven: and taking the prediction frame obtained in the fifth step as a default frame of each feature layer of the fourth pyramid, and adjusting the default frame by using the offset generated by the sixth DDH module, wherein the effect of adjusting the default frame is shown in figure 5. The default frame center and width and height adjustment formula is as follows:
cx*=cx+Δp|x+lcx×w
cy*=cy+Δp|y+lcy×h
Wherein cx, cy, w, h are the center and width-height coordinates of the default frame after adjustment, Δp| x and Δp| y are the components of the DDH module offset Δp with respect to the x and y directions, and l cx、lcy、lw、lh is the prediction bias of the center and width-height coordinates of the default frame.
Step eight: and D, using the DDH module in the step six as a detection head at the same time, and carrying out secondary regression and classification. The secondary regression loss is as follows:
The classification loss is as follows:
In the middle of Indicating whether the z-th prediction box and the j-th real box match with respect to the category t,/>For the softmax penalty of class confidence, N pos and N neg are the number of positive and negative samples, respectively; the positive sample is a prediction frame containing a clothes target, and the negative sample is a prediction frame not containing the clothes target;
Step nine: and step five and step eight, the loss functions are integrated and trained together, and finally, the classification and accurate positioning information of clothes are output.
In order to highlight the advantages of the invention relative to other prior art, a series of simulation experiments are carried out, and the simulation results are as follows:
Table 1 shows the accuracy and model parameters of the method of the present application compared with FASTER RCNN, YOLOv networks in laundry detection, the detected pictures are 1000 barrels of samples provided by companies, 10 barrels of samples per barrel, and 10000 pictures total.
Table 1 comparison of accuracy and model parameters of the inventive network with other methods in laundry detection
Method of | Faster RCNN | YOLOv5m | The method | The method (after compression) |
Input size | 800×1000 | 640×640 | 512×512 | 512×512 |
Backbone network | ResNet101 | CSPDarknet | ResNet101 | ResNet101 |
Precision of | 85.2% | 50% | 89.7% | 86.7% |
Model parameter quantity | 137M | 21.4M | 48M | 26.4M |
As can be seen from the comparison in the table; compared with FASTER RCNN, the detection system and method of the invention reduce the model parameter on the premise of ensuring high precision; compared with YOLOv m network, the invention greatly improves the detection precision, and simultaneously, the invention also realizes low parameter quantity after pruning the invention by a compression algorithm.
In summary, compared with the existing clothes detection method of the washing machine, the method can realize the reduction of the system parameter on the premise of ensuring the detection precision, and can be well adapted to the scene of large size change of clothes detection and realize the identification and classification of the clothes of the drum of the washing machine as can be seen from fig. 7.
Some steps in the embodiments of the present invention may be implemented by using software, and the corresponding software program may be stored in a readable storage medium, such as an optical disc or a hard disk.
The foregoing description of the preferred embodiments of the invention is not intended to limit the invention to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and scope of the invention are intended to be included within the scope of the invention.
Claims (8)
1. A multi-scale laundry detection system based on a drum image of a washing machine, the system comprising:
the improved ResNet network module, the feature enhancement module SRM, the dynamic receptive field DRF module and the dynamic deformable convolution DDH module;
The improved ResNet network module is connected with the feature enhancement module SRM, a four-layer multi-scale pyramid structure is constructed on the basis of the output features of the feature enhancement module SRM, and the dynamic receptive field DRF module is used for connecting all feature layers of the four-layer multi-scale pyramid; the dynamic deformable convolution DDH module is connected with the dynamic receptive field DRF module;
the improved ResNet network module includes:
A 2D convolution layer with 7×7 convolution kernel and 1 step length, a maximum pooling layer with 3×3 convolution kernel and 2 step length, and 4 convolution layers connected in series; each of the 4 convolution layers is formed by stacking residual blocks of different layers, the number of layers is 3,4, 23 and 3, and output features are taken from a third layer convolution block and a fourth layer convolution block in the 4 convolution layers;
The feature enhancement module SRM is a lightweight multi-connection module and is used for enhancing shallow feature representation, and comprises multiplexing of up-sampling connection, down-sampling connection and constant-resolution connection, wherein the selected input layer is from shallow features extracted by ResNet network, namely a third layer with 8 times of down-sampling rate and a fourth layer with 16 times of down-sampling rate;
The dynamic receptive field DRF module simulates eccentricity through a multi-branch convolution of Inception structures, the cavity convolution is used for simulating the relation between a perception scale and the eccentricity, firstly, information of different scales is captured through multi-branch convolutions of 1×1,3×3 and 5×5, wherein the 1×1 convolution is used for reducing the dimension of a channel, the 5×5 convolution is replaced by two 3×3 convolutions, then, a self-learning vector is introduced, and the purpose of distributing weights of all scales by utilizing a soft attention mechanism is achieved so as to simulate local stimulation aiming at different scales;
The dynamic deformable convolution DDH module uses deformable convolution to solve the problem that the fixed geometric structure of a convolution network is limited to model geometric transformation, and the position of each sampling point in a convolution kernel is added with an offset variable by further displacement adjustment of the position information of spatial sampling in the module, so that the sampling area is freely adjusted and is not limited to the previous regular lattice points.
2. A method for detecting laundry in a washing machine based on a drum image of the washing machine, the method using the multi-scale laundry detection system based on a drum image of the washing machine of claim 1, the method comprising:
step one: preprocessing an input washing machine drum image;
Step two: performing feature extraction on the drum image of the washing machine after the pretreatment in the step one by using an improved ResNet network module, and outputting feature layers with 8 times of downsampling rate and 16 times of downsampling rate;
Step three: sending the feature layer extracted in the second step into a feature enhancement module SRM to aggregate information so as to obtain shallow features with stronger characterization capability;
step four: inputting the shallow features obtained in the step three into a four-layer multi-scale pyramid structure, wherein the shallow features pass through a DRF module among layers of the four-layer multi-scale pyramid, and finally output features of feature layers of the pyramid are obtained;
Step five: carrying out multi-scale regression operation on the shallow features obtained in the step three, and carrying out coarse positioning on clothes by utilizing shallow feature information to obtain a prediction frame;
Step six: utilizing a dynamic deformable convolution DDH module to offset the output characteristics of each characteristic layer of the pyramid in the fourth step;
Step seven: taking the prediction frame obtained in the fifth step as a default frame of each feature layer of the four-layer multi-scale pyramid, and adjusting the default frame by using the offset generated by the DDH module in the sixth step;
step eight: performing secondary regression and classification by using the DDH module;
Step nine: and step five and step eight, the regression loss functions are synthesized and trained together, and finally, the classification and accurate positioning information of clothes are output.
3. The method of claim 2, wherein the step three of aggregating information comprises:
Where S 3 is the output characteristic of the third layer of the modified ResNet101 network at 8 times the downsampling rate, S 4 is the output characteristic of the fourth layer of the modified ResNet101 network at 16 times the downsampling rate, f k×k () is a kxk convolution operation, For element-wise addition, C (-) is the channel stack, U (-) is the upsampling operation, y is the output feature of the aggregate two layer feature at 8 times the downsampling rate.
4. The method of claim 3, wherein the computing of the DRF module of step four comprises:
where x is the upper layer output feature of each layer in the pyramid structure, For k x k convolution of expansion rate r, i represents the ith branch of the DRF module, W 1 [ i ] and W 2 [ i ] are weight parameters obtained by self-learning of the network on the ith branch,Representing a stack of n+1 feature maps, U is the output feature of the DRF module.
5. The method according to claim 2, wherein the multi-scale regression algorithm of step five comprises:
S1: carrying out maximum pooling operation for 4 times on the output characteristic y of the third step to obtain four scales consistent with the four-layer pyramid characteristics in the fourth step;
Dk=f3×3(Mk(y)),k=0,1,2,3
Wherein M k (-) represents that k maximum pooling operations are performed, the downsampling rate of which is 2 3+k;Dk as an output feature; the number of channels is N box ×4, representing N box default frame centers and 4 offsets of width and height configured for each pixel point of the output feature D k;
S2: splicing the predicted results of each D k to obtain an integrated vector l of the predicted results;
s3: the smooth L1 function was used for l as regression loss:
Where cx, cy, w, h are the center and width-height coordinates of the default box, N is the total number of default boxes, l is the integrated vector of all D k predictors, representing 4 prediction offsets for all N default boxes, 4 Offsets for the corresponding known real box relative to the default box;
s4: the network performs reverse derivation according to the loss function of S3 in the training process, thereby reducing the l and the l The difference of the two is that the integration vector l of the more accurate prediction result is finally obtained.
6. The method of claim 5, wherein the calculating of the DDH module in step six comprises:
Wherein the method comprises the steps of The region representing the receptive field and the relative position, centered on the (0, 0) coordinates,P n is p/>Enumeration of the listed positions, w is a weight value of a corresponding position in the convolution kernel, I is an input characteristic value of the corresponding position, and O is an output characteristic value of the corresponding position; the offset Δp n is obtained by performing 3×3 convolution on D k obtained in step five S1, and the number of output channels is kxkx2, which represents an offset parameter for each position in the convolution kernel of k.
7. The method of claim 6, wherein the step seven formula for adjusting the default box comprises:
cx*=cx+Δp|x+lcx×w
cy*=cy+Δp|y+lcy×h
Wherein cx, cy, w, h are the center and width-height coordinates of the default frame after adjustment, Δp| x and Δp| y are the components of the DDH module offset Δp with respect to the x and y directions, and l cx、lcy、lw、lh is the prediction bias of the center and width-height coordinates of the default frame.
8. The method of claim 7, wherein the quadratic regression loss of step eight is:
the classification loss is:
In the middle of Indicating whether the z-th prediction box and the j-th real box match with respect to the category t,/>For the softmax penalty of category confidence, N pos and N neg are the number of positive and negative samples, respectively, the positive sample being a prediction box containing laundry targets and the negative sample being a prediction box not containing laundry targets;
the comprehensive loss function in the step nine is as follows:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110883847.9A CN113705359B (en) | 2021-08-03 | 2021-08-03 | Multi-scale clothes detection system and method based on drum images of washing machine |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110883847.9A CN113705359B (en) | 2021-08-03 | 2021-08-03 | Multi-scale clothes detection system and method based on drum images of washing machine |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113705359A CN113705359A (en) | 2021-11-26 |
CN113705359B true CN113705359B (en) | 2024-05-03 |
Family
ID=78651305
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110883847.9A Active CN113705359B (en) | 2021-08-03 | 2021-08-03 | Multi-scale clothes detection system and method based on drum images of washing machine |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113705359B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115100681B (en) * | 2022-06-24 | 2024-10-15 | 暨南大学 | Clothes identification method, system, medium and equipment |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109344821A (en) * | 2018-08-30 | 2019-02-15 | 西安电子科技大学 | Small target detecting method based on Fusion Features and deep learning |
CN110796037A (en) * | 2019-10-15 | 2020-02-14 | 武汉大学 | Satellite-borne optical remote sensing image ship target detection method based on lightweight receptive field pyramid |
CN110895707A (en) * | 2019-11-28 | 2020-03-20 | 江南大学 | Depth discrimination method for underwear types of washing machine under strong shielding condition |
CN110991311A (en) * | 2019-11-28 | 2020-04-10 | 江南大学 | Target detection method based on dense connection deep network |
CN112801183A (en) * | 2021-01-28 | 2021-05-14 | 哈尔滨理工大学 | Multi-scale target detection method based on YOLO v3 |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108830205B (en) * | 2018-06-04 | 2019-06-14 | 江南大学 | Based on the multiple dimensioned perception pedestrian detection method for improving full convolutional network |
-
2021
- 2021-08-03 CN CN202110883847.9A patent/CN113705359B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109344821A (en) * | 2018-08-30 | 2019-02-15 | 西安电子科技大学 | Small target detecting method based on Fusion Features and deep learning |
CN110796037A (en) * | 2019-10-15 | 2020-02-14 | 武汉大学 | Satellite-borne optical remote sensing image ship target detection method based on lightweight receptive field pyramid |
CN110895707A (en) * | 2019-11-28 | 2020-03-20 | 江南大学 | Depth discrimination method for underwear types of washing machine under strong shielding condition |
CN110991311A (en) * | 2019-11-28 | 2020-04-10 | 江南大学 | Target detection method based on dense connection deep network |
CN112801183A (en) * | 2021-01-28 | 2021-05-14 | 哈尔滨理工大学 | Multi-scale target detection method based on YOLO v3 |
Non-Patent Citations (2)
Title |
---|
基于密集连接的FPN多尺度目标检测算法;张宽;滕国伟;范涛;李聪;;计算机应用与软件;20200112(第01期);全文 * |
基于深度学习的MSSD目标检测方法;赵庆北;元昌安;;企业科技与发展;20180510(第05期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN113705359A (en) | 2021-11-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
TWI746674B (en) | Type prediction method, device and electronic equipment for identifying objects in images | |
CN109359538B (en) | Training method of convolutional neural network, gesture recognition method, device and equipment | |
CN109902548B (en) | Object attribute identification method and device, computing equipment and system | |
CN111696137B (en) | Target tracking method based on multilayer feature mixing and attention mechanism | |
CN111368769B (en) | Ship multi-target detection method based on improved anchor point frame generation model | |
CN110097609B (en) | Sample domain-based refined embroidery texture migration method | |
CN106570480B (en) | A kind of human action classification method based on gesture recognition | |
CN112348036A (en) | Self-adaptive target detection method based on lightweight residual learning and deconvolution cascade | |
CN108961675A (en) | Fall detection method based on convolutional neural networks | |
CN109446922B (en) | Real-time robust face detection method | |
CN111738344A (en) | Rapid target detection method based on multi-scale fusion | |
CN112784782B (en) | Three-dimensional object identification method based on multi-view double-attention network | |
CN109993103A (en) | A kind of Human bodys' response method based on point cloud data | |
CN110211127B (en) | Image partition method based on bicoherence network | |
CN107341440A (en) | Indoor RGB D scene image recognition methods based on multitask measurement Multiple Kernel Learning | |
CN111209873A (en) | High-precision face key point positioning method and system based on deep learning | |
CN109272577A (en) | A kind of vision SLAM method based on Kinect | |
CN113763417B (en) | Target tracking method based on twin network and residual error structure | |
Xiao et al. | Pedestrian object detection with fusion of visual attention mechanism and semantic computation | |
CN113705359B (en) | Multi-scale clothes detection system and method based on drum images of washing machine | |
CN112800882A (en) | Mask face posture classification method based on weighted double-flow residual error network | |
CN113011253A (en) | Face expression recognition method, device, equipment and storage medium based on ResNeXt network | |
Xia et al. | Hybrid regression and isophote curvature for accurate eye center localization | |
CN113011506A (en) | Texture image classification method based on depth re-fractal spectrum network | |
CN113191352A (en) | Water meter pointer reading identification method based on target detection and binary image detection |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |