[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN113705359B - Multi-scale clothes detection system and method based on drum images of washing machine - Google Patents

Multi-scale clothes detection system and method based on drum images of washing machine Download PDF

Info

Publication number
CN113705359B
CN113705359B CN202110883847.9A CN202110883847A CN113705359B CN 113705359 B CN113705359 B CN 113705359B CN 202110883847 A CN202110883847 A CN 202110883847A CN 113705359 B CN113705359 B CN 113705359B
Authority
CN
China
Prior art keywords
module
convolution
layer
feature
clothes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110883847.9A
Other languages
Chinese (zh)
Other versions
CN113705359A (en
Inventor
陈莹
郑棨元
化春键
胡蒙
裴佩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangnan University
Original Assignee
Jiangnan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangnan University filed Critical Jiangnan University
Priority to CN202110883847.9A priority Critical patent/CN113705359B/en
Publication of CN113705359A publication Critical patent/CN113705359A/en
Application granted granted Critical
Publication of CN113705359B publication Critical patent/CN113705359B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a multi-scale clothes detection system and method based on a drum image of a washing machine, and belongs to the technical field of 2D image target detection. The system comprises: the improved ResNet network module, the feature enhancement module SRM, the dynamic receptive field DRF module and the dynamic deformable convolution DDH module; when clothes detection is realized, firstly, a high-quality shallow layer characteristic is obtained by utilizing an improved ResNet network module and an SRM module, and positioning information of a clothes target is reserved to the maximum extent by carrying out regression operation on the shallow layer characteristic; a pyramid structure with stronger semantic information is constructed through the DRF module, and clothes targets are classified and further positioned and calibrated while the characteristics of each size are comprehensively utilized; the offset effect of the DDH module on the detection frame enriches the diversity of prediction scales; the invention effectively improves the identification and classification capability of the clothes of the drum washing machine, improves the detection precision of the clothes, and can be applied to the detection scene of complex clothes in the washing machine.

Description

Multi-scale clothes detection system and method based on drum images of washing machine
Technical Field
The invention relates to a multi-scale clothes detection system and method based on a drum image of a washing machine, and belongs to the technical field of target detection of 2D images.
Background
The traditional washing machine does not have a 'comet' function, and a washer needs to manually set a washing mode according to the known clothes type and through self experience values; the EnX-Pu semiconductor develops an intelligent washing machine demonstration model by adopting RFID and NFC technology on a global embedded system exhibition held by Nelumbo, germany, and the washing machine can read information about the type, color and the like of fabric fibers from buttons with built-in RFID tags, so that a washing program is optimized, but the technology needs to modify clothes; the method comprises the steps that a high-definition camera is placed in a washing machine, an image of clothes to be washed is acquired through the camera, the problem is converted into the problems in the fields of image segmentation and texture image classification, and the information of the clothes quantity and the clothes in the washing machine is obtained by designing an image segmentation algorithm and a texture image classification algorithm based on a convolutional neural network; however, the scheme needs to design two deep convolutional neural networks, namely an image segmentation network and an image classification network, and has high computational complexity; and the clothes are arranged in a manual and regular manner and are not in a natural state that various clothes are mutually shielded in the internal environment of the washing machine, so that the clothes are not suitable for an actual washing scene.
The advent of deep learning object detection technology makes it possible to directly learn the image characteristics of laundry through only one network, and find the laundry object in the image based on the characteristics. The technology is widely applied to common fields such as pedestrian detection, vehicle detection, face detection, retail commodity detection and the like, is also a prepositive technology for tracking and other high-level vision applications, and has huge market demand and application value.
Current target detection techniques fall into two main categories:
(1) Two-stage algorithm, which is mainly R-CNN and variants thereof, needs to rely on generated candidate region suggestions, generates a priori frame with possible targets through an RPN network, and then utilizes a subsequent detection network to predict the category and adjust the position coordinates of the candidate frame. The two-stage structure enables the generation of positive and negative samples to be more balanced, and has excellent detection precision in a secondary correction mode, but has the problem of low speed.
(2) The single-stage algorithm divides the picture into smaller squares, each square is provided with a fixed preset prior frame (anchor), and objects in the picture are distributed to different squares and then reclassified, so that the types and positions of different objects can be directly predicted by using only one CNN network, the execution speed is excellent, and the problem of low precision exists.
However, there is no detection network specifically adapted to the laundry image of the washing machine, so the most commonly used or universal target detection networks, such as two-stage network FASTER RCNN and single-stage network YOLO series (Chen Yaya, meng Chaohui. FashionAI garment attribute identification based on target detection algorithm [ J ]. Computer system application, 2019.); however, as the detail information of the clothing is rich, the similarity between the attributes is high, and the accuracy and the precision of identification and classification are seriously affected by external interference factors such as illumination, so that the effect of the identification and classification of the clothing attributes is directly affected by some detail designs of the general target detection framework; in addition, the scale variability caused by irregular placement of clothes in a roller environment cannot be well matched with a multi-scale target by a general target detection model, and positioning inaccuracy is easily caused.
Disclosure of Invention
In order to solve the problems of weak positioning and classifying capability and low recognition accuracy of the existing clothes detection method of the washing machine, the invention firstly provides a multi-scale clothes detection system based on a drum image of the washing machine, which comprises the following components:
the improved ResNet network module, the feature enhancement module SRM, the dynamic receptive field DRF module and the dynamic deformable convolution DDH module;
The improved ResNet network module is connected with the feature enhancement module SRM, a four-layer multi-scale pyramid structure is constructed on the basis of the output features of the feature enhancement module SRM, and the dynamic receptive field DRF module is used for connecting all feature layers of the four-layer multi-scale pyramid; the dynamic deformable convolution DDH module is connected with the dynamic receptive field DRF module;
the DRF module comprises multi-branch convolutions with different sizes
Optionally, the improved ResNet network module includes:
A 2D convolution layer with 7×7 convolution kernel and 1 step length, a maximum pooling layer with 3×3 convolution kernel and 2 step length, and 4 convolution layers connected in series; each of the 4 convolutional layers is formed by stacking residual blocks of different layers, the layers are 3,4, 23 and 3, and the output features are taken from the third layer and the fourth layer of the 4 convolutional layers.
Optionally, the method detects laundry in a washing machine by using the multi-scale laundry detection system based on drum images of the washing machine according to any one of claims 1-2, the method comprising:
step one: preprocessing an input washing machine drum image;
Step two: performing feature extraction on the drum image of the washing machine after the pretreatment in the step one by using an improved ResNet network module, and outputting feature layers with 8 times of downsampling rate and 16 times of downsampling rate;
Step three: sending the feature layer extracted in the second step into a feature enhancement module SRM to aggregate information so as to obtain shallow features with stronger characterization capability;
step four: inputting the shallow features obtained in the step three into a four-layer multi-scale pyramid structure, wherein the shallow features pass through a DRF module among layers of the four-layer multi-scale pyramid, and finally output features of feature layers of the pyramid are obtained;
Step five: carrying out multi-scale regression operation on the shallow features obtained in the step three, and carrying out coarse positioning on clothes by utilizing shallow feature information to obtain a prediction frame;
Step six: utilizing a dynamic deformable convolution DDH module to offset the output characteristics of each characteristic layer of the pyramid in the fourth step;
Step seven: taking the prediction frame obtained in the fifth step as a default frame of each feature layer of the four-layer multi-scale pyramid, and adjusting the default frame by using the offset generated by the DDH module in the sixth step;
step eight: performing secondary regression and classification by using the DDH module;
Step nine: and step five and step eight, the regression loss functions are synthesized and trained together, and finally, the classification and accurate positioning information of clothes are output.
Optionally, the step three of aggregating information includes:
Where S 3 is the output characteristic of the third layer of the modified ResNet101 network at 8 times the downsampling rate, S 4 is the output characteristic of the fourth layer of the modified ResNet101 network at 16 times the downsampling rate, f k×k () is a kxk convolution operation, For element-wise addition, C (-) is the channel stack, U (-) is the upsampling operation, y is the output feature of the aggregate two layer feature at 8 times the downsampling rate.
Optionally, the calculating of the DRF module in the fourth step includes:
where x is the upper layer output feature of each layer in the pyramid structure, For k x k convolution of expansion rate r, i represents the ith branch of the DRF module, W 1 [ i ] and W 2 [ i ] are weight parameters obtained by self-learning of the network on the ith branch,Representing a stack of n+1 feature maps, U is the output feature of the DRF module.
Optionally, the multi-scale regression algorithm of the fifth step includes:
S1: carrying out maximum pooling operation for 4 times on the output characteristic y of the third step to obtain four scales consistent with the four-layer pyramid characteristics in the fourth step;
Dk=f3×3(Mk(y)),k=0,1,2,3
Wherein M k (-) represents that k maximum pooling operations are performed, the downsampling rate of which is 2 3+k;Dk as an output feature; the number of channels is N box ×4, representing N box default frame centers and 4 offsets of width and height configured for each pixel point of the output feature D k;
S2: splicing the predicted results of each D k to obtain an integrated vector l of the predicted results;
s3: the smooth L1 function was used for l as regression loss:
Where cx, cy, w, h are the center and width-height coordinates of the default box, N is the total number of default boxes, l is the integrated vector of all D k predictors, representing 4 prediction offsets for all N default boxes, 4 Offsets for the corresponding known real box relative to the default box;
s4: the network performs reverse derivation according to the loss function of S3 in the training process, thereby reducing the l and the l The difference of the two is that the integration vector l of the more accurate prediction result is finally obtained.
Optionally, the calculating of the DDH module in the step six includes:
Wherein R defines the region and relative position of the receptive field, centered on the (0, 0) coordinates, r= { (-1, -1), (-1, 0),., (0, 1), (1, 1) }; p n is an enumeration of the positions listed in R, w ()'s are weight values of the corresponding positions in the convolution kernel, I ()'s are input characteristic values of the corresponding positions, and O ()'s are output characteristic values of the corresponding positions; the offset Δp n is obtained by performing 3×3 convolution on D k obtained in step five S1, and the number of output channels is kxkx2, which represents an offset parameter for each position in the convolution kernel of k.
Optionally, the formula for adjusting the default box in the step seven includes:
cx*=cx+Δp|x+lcx×w
cy*=cy+Δp|y+lcy×h
Wherein cx, cy, w, h are the center and width-height coordinates of the default frame after adjustment, Δp| x and Δp| y are the components of the DDH module offset Δp with respect to the x and y directions, and l cx、lcy、lw、lh is the prediction bias of the center and width-height coordinates of the default frame.
Optionally, the quadratic regression loss in the step eight is:
the classification loss is:
In the middle of Indicating whether the z-th prediction box and the j-th real box match with respect to the category t,/>For the softmax penalty of category confidence, N pos and N neg are the number of positive and negative samples, respectively, the positive sample being a prediction box containing laundry targets and the negative sample being a prediction box not containing laundry targets;
optionally, the integrated loss function in the step nine is:
optionally, the default frame sizes of the 4 feature layers in the fifth step are 32×32, 64×64, 128×128, 256×256, respectively.
Optionally, in the first step, the input size of the drum image of the washing machine is uniformly scaled to 512×512.
Optionally, the feature enhancement module in the third step is a multi-connection structure, and the shallow features with richer detail information are generated by obtaining multi-granularity information through stacking between adjacent layers.
Optionally, in the fourth step, the number of output channels of each layer of the multi-scale pyramid is 256.
Optionally, in step six, the DDH includes a short connection branch with spatial self-attention, and is implemented by a simple convolution of 3×3, so that the network can dynamically allocate weights to each scale object based on the distribution of the current features, and the final detection result can be more accurate.
The invention has the beneficial effects that:
Aiming at the problems of weak positioning and classifying capability and low recognition precision of the existing clothes detection method of the washing machine, the invention provides a multi-scale clothes detection system based on a drum image of the washing machine, and provides a multi-scale clothes detection method based on the drum image of the washing machine based on the system; the improved ResNet network changes the first layer convolution into a 7×7 large convolution with a step length of 1 to prevent excessive loss of clothing detail information, and extracts the third layer and the fourth layer to ensure sufficient clothing semantic information; the feature enhancement module obtains shallow layer features with stronger characterization capability in a feature aggregation mode so as to integrate the details and semantic information of the third layer and the fourth layer, so that the extracted clothing feature information is richer; the DRF module constructs a multi-scale pyramid structure with stronger semantic information, and the classification capability of the detection system on complex clothes is improved by deepening the network layer number and adaptively adjusting the receptive field; the DDH module has the offset effect on the positioning frame, so that the diversity of prediction scales is enriched, and the detection system has better adaptability to clothes with different scales. The multi-scale clothes detection system and method for the drum images of the washing machine effectively improve the identification and classification capacity of clothes of the washing machine and improve the clothes detection precision.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a diagram of a modified resnet network architecture.
FIG. 2 is a schematic diagram of a feature enhancement module according to the present invention.
Fig. 3 is a schematic diagram of a DRF module according to the present invention.
Fig. 4 is a diagram of a DDH module according to the present invention.
Fig. 5 is a schematic diagram of the default frame offset effect.
Fig. 6 is a diagram of an overall network framework provided by the present invention.
Fig. 7 is a diagram showing a detection effect of the network on complex clothes.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the embodiments of the present invention will be described in further detail with reference to the accompanying drawings.
Embodiment one:
The embodiment provides a multi-scale clothes detection system and method based on a drum image of a washing machine, which are used in parameter recommendation of an intelligent washing machine, the system and the method are based on a deep learning framework, start from a 2D RGB image, utilize an improved ResNet network to extract characteristics, construct multi-scale information by enhancing the extracted characteristics, carry out regression and classification operation in two stages, and enhance the discrimination capability of the network on complex clothes through a cascading transmission process, adapt to the change of clothes of each scale, and improve the detection performance.
The 2D RGB image is obtained by shooting with a high-definition camera, and the resolution is 1920 x 1080.
The following describes the system setup procedure in terms of the system's modules, architecture, and network loss function, respectively:
(1) Module of system
As shown in fig. 1, the improved ResNet network module includes: a convolution kernel is 7 multiplied by 7, the 2D convolution with the step length of 1 is followed by pooling layers, and then 4 convolution layers are connected in series; each convolution layer is formed by stacking residual blocks of different layers of ResNet network, the layers are 3,4, 23 and 3, and the output characteristics are the convolution blocks of the third layer and the fourth layer.
A specific structure of the feature enhancement module (SRM module) is shown in fig. 2. The module is a lightweight multi-connection module for enhancing shallow feature representations, including multiplexing of up-sampling connections, down-sampling connections, and constant resolution connections. The input layers selected are from shallow features extracted by ResNet network 101, a third layer with 8 times down sampling rate and a fourth layer with 16 times down sampling rate.
In order to alleviate information dilution caused by up-sampling operation, a cascade fusion mode is adopted to perform 1×1 convolution operation on a third layer to perform element-by-element addition fusion with the same size as the fourth layer to complement the information of the fourth layer. And the fourth layer of the features after the complement is stacked with the third layer of the features by adopting bilinear interpolation up-sampling to 8 times of the down-sampling rate. By this operation multi-granularity information from adjacent layers is integrated, resulting in high quality final features.
The specific structure of the dynamic receptive field module (DRF module) is shown in fig. 3. The modular design concept stems from a study of the human superficial retina, i.e., population receptive field size increases as retinal eccentricity increases. The main realization is that the eccentricity is simulated by multi-branch convolution of Inception structures, while the cavity convolution is used for simulating the relation between the perception scale and the eccentricity. Information of different scales is first captured by multi-branch convolutions of 1 x 1, 3 x 3,5 x 5 sizes, where to mitigate the parameter number, the 1 x 1 convolution is used for the dimension reduction of the channel and the 5 x 5 convolution is replaced by two 3 x 3 convolutions. Then, self-learning vectors are introduced, and the weight of each scale is distributed by using a soft attention mechanism so as to simulate local stimulation aiming at different scales. Similarly, the self-learning vector is weighted according to global stimulus to perform weight selection on the hole convolutions with different expansion rates so as to adaptively adjust the receptive field according to the stimulus. Therefore, a smaller convolution kernel is used for giving a larger weight to a weight closer to the convolution center, so that a larger receptive field is obtained, more context information is captured, and the generalization capability of the model to different scales is improved.
The specific structure of the dynamically deformable convolution detector head module (DDH module) is shown in fig. 4. The module uses deformable convolution to solve the problem of fixed geometry of the convolution network, which is limited to model geometry transformation. The position of each sampling point in the convolution kernel is added with an offset variable by further displacement adjustment of the spatial sampling position information in the module, so that the sampling area is freely adjusted, and the method is not limited to the previous regular lattice points. Furthermore, global-based spatial self-attention is then implemented through a simple 3×3 shortcut connection, enabling the network to dynamically assign weights to scale objects appropriately based on the distribution of current features. The operation enables the network to generate different offset values for shallow regression frames with different scales, and carries out corresponding displacement for the characteristic pixel points according to the different offset values, so that default frames arranged on the corresponding pixel points are correspondingly displaced, the network can generate different search ranges for the different default frames, and further fine tuning and matching with targets are carried out, so that the detection performance of the network on clothes examples with variable scales is improved.
(2) System architecture
The overall structure of the system is shown in fig. 6, and mainly comprises four parts:
The first part reserves a third layer which is 8 times downsampled and a fourth layer which is 16 times downsampled on the basis of ResNet networks, and shallow layer characteristics with stronger characterization capability are obtained by sending the third layer and the fourth layer into a designed characteristic enhancement module (SRM) for information aggregation.
The second part constructs the enhancement features through the designed dynamic receptive field module (DRF) into multi-scale features with 8 times, 16 times, 32 times and 64 times downsampling rates. By adaptively compounding information on different receptive fields, a dynamic multi-scale pyramid with rich semantic information is constructed.
And the third part carries out multi-scale regression operation based on the enhanced feature information, and takes the regression result as a candidate frame of the corresponding feature of the dynamic multi-scale pyramid. Default boxes derived from shallow regression results are classified and trimmed by multi-scale pyramid features.
The fourth section introduces a dynamically deformable convolution detector head module (DDH module) as the output layer of the pyramid feature.
(3) Network loss function
After the network model is established, the following steps are executed to complete the clothes detection process;
A high-definition camera is adopted to shoot and obtain a 2D RGB image of clothes in a drum of the washing machine, and the resolution is 1920 x 1080;
Step one: data enhancement, namely scaling an input picture to 512 x 512, and carrying out random up-down left-right overturn, brightness change, fuzzy treatment and illumination change;
Step two: extracting features of the 2D input image after the enhancement in the step one by using a ResNet network modified as shown in fig. 1, and outputting feature layers with 8 times of downsampling rate and 16 times of downsampling rate;
step three: the feature layer extracted in the second step is sent to a feature enhancement module shown in fig. 2 to aggregate information, so as to obtain shallow features with stronger characterization capability;
step four: inputting the shallow features obtained in the step three into a four-layer multi-scale pyramid structure, wherein the shallow features pass through a DRF module among layers of the four-layer multi-scale pyramid, and finally output features of feature layers of the pyramid are obtained;
Step five: carrying out multi-scale regression operation on the step polymerization characteristics, and carrying out coarse positioning on clothes by utilizing shallow characteristic information to obtain a prediction frame; the multiscale regression algorithm is as follows:
Input: outputting a characteristic y;
And (3) outputting: integration of 4 prediction biases for a multi-scale total of N default boxes;
s1: carrying out maximum pooling operation on y for 4 times to obtain four scales consistent with the four-layer pyramid features in the fourth step;
Dk=f3×3(Mk(y)),k=0,1,2,3
Wherein M k () represents that k maximum pooling operations are performed, the downsampling rate is 2 3+k.Dk, the channel number is N box ×4, and represents N box default frame centers and 4 offsets of width and height configured for each pixel point relative to the output feature D k;
S2: splicing the predicted results of each D k to obtain an integrated vector l of the predicted results;
S3: the smoothL1 function was used as regression loss for l:
Where cx, cy, w, h are the center and width-height coordinates of the default box, N is the total number of default boxes, l is the integration of all D k predictors, representing 4 prediction offsets for all N default boxes, 4 Offsets of the corresponding real frame relative to the default frame;
s4: the network performs reverse derivation according to the loss function of S3 in the training process, thereby reducing the l and the l The difference of the two is that the integration vector l of the more accurate prediction result is finally obtained;
step six: the output features of the feature layers of the pyramid of step four are shifted using the DDH module as in fig. 4.
Step seven: and taking the prediction frame obtained in the fifth step as a default frame of each feature layer of the fourth pyramid, and adjusting the default frame by using the offset generated by the sixth DDH module, wherein the effect of adjusting the default frame is shown in figure 5. The default frame center and width and height adjustment formula is as follows:
cx*=cx+Δp|x+lcx×w
cy*=cy+Δp|y+lcy×h
Wherein cx, cy, w, h are the center and width-height coordinates of the default frame after adjustment, Δp| x and Δp| y are the components of the DDH module offset Δp with respect to the x and y directions, and l cx、lcy、lw、lh is the prediction bias of the center and width-height coordinates of the default frame.
Step eight: and D, using the DDH module in the step six as a detection head at the same time, and carrying out secondary regression and classification. The secondary regression loss is as follows:
The classification loss is as follows:
In the middle of Indicating whether the z-th prediction box and the j-th real box match with respect to the category t,/>For the softmax penalty of class confidence, N pos and N neg are the number of positive and negative samples, respectively; the positive sample is a prediction frame containing a clothes target, and the negative sample is a prediction frame not containing the clothes target;
Step nine: and step five and step eight, the loss functions are integrated and trained together, and finally, the classification and accurate positioning information of clothes are output.
In order to highlight the advantages of the invention relative to other prior art, a series of simulation experiments are carried out, and the simulation results are as follows:
Table 1 shows the accuracy and model parameters of the method of the present application compared with FASTER RCNN, YOLOv networks in laundry detection, the detected pictures are 1000 barrels of samples provided by companies, 10 barrels of samples per barrel, and 10000 pictures total.
Table 1 comparison of accuracy and model parameters of the inventive network with other methods in laundry detection
Method of Faster RCNN YOLOv5m The method The method (after compression)
Input size 800×1000 640×640 512×512 512×512
Backbone network ResNet101 CSPDarknet ResNet101 ResNet101
Precision of 85.2% 50% 89.7% 86.7%
Model parameter quantity 137M 21.4M 48M 26.4M
As can be seen from the comparison in the table; compared with FASTER RCNN, the detection system and method of the invention reduce the model parameter on the premise of ensuring high precision; compared with YOLOv m network, the invention greatly improves the detection precision, and simultaneously, the invention also realizes low parameter quantity after pruning the invention by a compression algorithm.
In summary, compared with the existing clothes detection method of the washing machine, the method can realize the reduction of the system parameter on the premise of ensuring the detection precision, and can be well adapted to the scene of large size change of clothes detection and realize the identification and classification of the clothes of the drum of the washing machine as can be seen from fig. 7.
Some steps in the embodiments of the present invention may be implemented by using software, and the corresponding software program may be stored in a readable storage medium, such as an optical disc or a hard disk.
The foregoing description of the preferred embodiments of the invention is not intended to limit the invention to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and scope of the invention are intended to be included within the scope of the invention.

Claims (8)

1. A multi-scale laundry detection system based on a drum image of a washing machine, the system comprising:
the improved ResNet network module, the feature enhancement module SRM, the dynamic receptive field DRF module and the dynamic deformable convolution DDH module;
The improved ResNet network module is connected with the feature enhancement module SRM, a four-layer multi-scale pyramid structure is constructed on the basis of the output features of the feature enhancement module SRM, and the dynamic receptive field DRF module is used for connecting all feature layers of the four-layer multi-scale pyramid; the dynamic deformable convolution DDH module is connected with the dynamic receptive field DRF module;
the improved ResNet network module includes:
A 2D convolution layer with 7×7 convolution kernel and 1 step length, a maximum pooling layer with 3×3 convolution kernel and 2 step length, and 4 convolution layers connected in series; each of the 4 convolution layers is formed by stacking residual blocks of different layers, the number of layers is 3,4, 23 and 3, and output features are taken from a third layer convolution block and a fourth layer convolution block in the 4 convolution layers;
The feature enhancement module SRM is a lightweight multi-connection module and is used for enhancing shallow feature representation, and comprises multiplexing of up-sampling connection, down-sampling connection and constant-resolution connection, wherein the selected input layer is from shallow features extracted by ResNet network, namely a third layer with 8 times of down-sampling rate and a fourth layer with 16 times of down-sampling rate;
The dynamic receptive field DRF module simulates eccentricity through a multi-branch convolution of Inception structures, the cavity convolution is used for simulating the relation between a perception scale and the eccentricity, firstly, information of different scales is captured through multi-branch convolutions of 1×1,3×3 and 5×5, wherein the 1×1 convolution is used for reducing the dimension of a channel, the 5×5 convolution is replaced by two 3×3 convolutions, then, a self-learning vector is introduced, and the purpose of distributing weights of all scales by utilizing a soft attention mechanism is achieved so as to simulate local stimulation aiming at different scales;
The dynamic deformable convolution DDH module uses deformable convolution to solve the problem that the fixed geometric structure of a convolution network is limited to model geometric transformation, and the position of each sampling point in a convolution kernel is added with an offset variable by further displacement adjustment of the position information of spatial sampling in the module, so that the sampling area is freely adjusted and is not limited to the previous regular lattice points.
2. A method for detecting laundry in a washing machine based on a drum image of the washing machine, the method using the multi-scale laundry detection system based on a drum image of the washing machine of claim 1, the method comprising:
step one: preprocessing an input washing machine drum image;
Step two: performing feature extraction on the drum image of the washing machine after the pretreatment in the step one by using an improved ResNet network module, and outputting feature layers with 8 times of downsampling rate and 16 times of downsampling rate;
Step three: sending the feature layer extracted in the second step into a feature enhancement module SRM to aggregate information so as to obtain shallow features with stronger characterization capability;
step four: inputting the shallow features obtained in the step three into a four-layer multi-scale pyramid structure, wherein the shallow features pass through a DRF module among layers of the four-layer multi-scale pyramid, and finally output features of feature layers of the pyramid are obtained;
Step five: carrying out multi-scale regression operation on the shallow features obtained in the step three, and carrying out coarse positioning on clothes by utilizing shallow feature information to obtain a prediction frame;
Step six: utilizing a dynamic deformable convolution DDH module to offset the output characteristics of each characteristic layer of the pyramid in the fourth step;
Step seven: taking the prediction frame obtained in the fifth step as a default frame of each feature layer of the four-layer multi-scale pyramid, and adjusting the default frame by using the offset generated by the DDH module in the sixth step;
step eight: performing secondary regression and classification by using the DDH module;
Step nine: and step five and step eight, the regression loss functions are synthesized and trained together, and finally, the classification and accurate positioning information of clothes are output.
3. The method of claim 2, wherein the step three of aggregating information comprises:
Where S 3 is the output characteristic of the third layer of the modified ResNet101 network at 8 times the downsampling rate, S 4 is the output characteristic of the fourth layer of the modified ResNet101 network at 16 times the downsampling rate, f k×k () is a kxk convolution operation, For element-wise addition, C (-) is the channel stack, U (-) is the upsampling operation, y is the output feature of the aggregate two layer feature at 8 times the downsampling rate.
4. The method of claim 3, wherein the computing of the DRF module of step four comprises:
where x is the upper layer output feature of each layer in the pyramid structure, For k x k convolution of expansion rate r, i represents the ith branch of the DRF module, W 1 [ i ] and W 2 [ i ] are weight parameters obtained by self-learning of the network on the ith branch,Representing a stack of n+1 feature maps, U is the output feature of the DRF module.
5. The method according to claim 2, wherein the multi-scale regression algorithm of step five comprises:
S1: carrying out maximum pooling operation for 4 times on the output characteristic y of the third step to obtain four scales consistent with the four-layer pyramid characteristics in the fourth step;
Dk=f3×3(Mk(y)),k=0,1,2,3
Wherein M k (-) represents that k maximum pooling operations are performed, the downsampling rate of which is 2 3+k;Dk as an output feature; the number of channels is N box ×4, representing N box default frame centers and 4 offsets of width and height configured for each pixel point of the output feature D k;
S2: splicing the predicted results of each D k to obtain an integrated vector l of the predicted results;
s3: the smooth L1 function was used for l as regression loss:
Where cx, cy, w, h are the center and width-height coordinates of the default box, N is the total number of default boxes, l is the integrated vector of all D k predictors, representing 4 prediction offsets for all N default boxes, 4 Offsets for the corresponding known real box relative to the default box;
s4: the network performs reverse derivation according to the loss function of S3 in the training process, thereby reducing the l and the l The difference of the two is that the integration vector l of the more accurate prediction result is finally obtained.
6. The method of claim 5, wherein the calculating of the DDH module in step six comprises:
Wherein the method comprises the steps of The region representing the receptive field and the relative position, centered on the (0, 0) coordinates,P n is p/>Enumeration of the listed positions, w is a weight value of a corresponding position in the convolution kernel, I is an input characteristic value of the corresponding position, and O is an output characteristic value of the corresponding position; the offset Δp n is obtained by performing 3×3 convolution on D k obtained in step five S1, and the number of output channels is kxkx2, which represents an offset parameter for each position in the convolution kernel of k.
7. The method of claim 6, wherein the step seven formula for adjusting the default box comprises:
cx*=cx+Δp|x+lcx×w
cy*=cy+Δp|y+lcy×h
Wherein cx, cy, w, h are the center and width-height coordinates of the default frame after adjustment, Δp| x and Δp| y are the components of the DDH module offset Δp with respect to the x and y directions, and l cx、lcy、lw、lh is the prediction bias of the center and width-height coordinates of the default frame.
8. The method of claim 7, wherein the quadratic regression loss of step eight is:
the classification loss is:
In the middle of Indicating whether the z-th prediction box and the j-th real box match with respect to the category t,/>For the softmax penalty of category confidence, N pos and N neg are the number of positive and negative samples, respectively, the positive sample being a prediction box containing laundry targets and the negative sample being a prediction box not containing laundry targets;
the comprehensive loss function in the step nine is as follows:
CN202110883847.9A 2021-08-03 2021-08-03 Multi-scale clothes detection system and method based on drum images of washing machine Active CN113705359B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110883847.9A CN113705359B (en) 2021-08-03 2021-08-03 Multi-scale clothes detection system and method based on drum images of washing machine

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110883847.9A CN113705359B (en) 2021-08-03 2021-08-03 Multi-scale clothes detection system and method based on drum images of washing machine

Publications (2)

Publication Number Publication Date
CN113705359A CN113705359A (en) 2021-11-26
CN113705359B true CN113705359B (en) 2024-05-03

Family

ID=78651305

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110883847.9A Active CN113705359B (en) 2021-08-03 2021-08-03 Multi-scale clothes detection system and method based on drum images of washing machine

Country Status (1)

Country Link
CN (1) CN113705359B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115100681B (en) * 2022-06-24 2024-10-15 暨南大学 Clothes identification method, system, medium and equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109344821A (en) * 2018-08-30 2019-02-15 西安电子科技大学 Small target detecting method based on Fusion Features and deep learning
CN110796037A (en) * 2019-10-15 2020-02-14 武汉大学 Satellite-borne optical remote sensing image ship target detection method based on lightweight receptive field pyramid
CN110895707A (en) * 2019-11-28 2020-03-20 江南大学 Depth discrimination method for underwear types of washing machine under strong shielding condition
CN110991311A (en) * 2019-11-28 2020-04-10 江南大学 Target detection method based on dense connection deep network
CN112801183A (en) * 2021-01-28 2021-05-14 哈尔滨理工大学 Multi-scale target detection method based on YOLO v3

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108830205B (en) * 2018-06-04 2019-06-14 江南大学 Based on the multiple dimensioned perception pedestrian detection method for improving full convolutional network

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109344821A (en) * 2018-08-30 2019-02-15 西安电子科技大学 Small target detecting method based on Fusion Features and deep learning
CN110796037A (en) * 2019-10-15 2020-02-14 武汉大学 Satellite-borne optical remote sensing image ship target detection method based on lightweight receptive field pyramid
CN110895707A (en) * 2019-11-28 2020-03-20 江南大学 Depth discrimination method for underwear types of washing machine under strong shielding condition
CN110991311A (en) * 2019-11-28 2020-04-10 江南大学 Target detection method based on dense connection deep network
CN112801183A (en) * 2021-01-28 2021-05-14 哈尔滨理工大学 Multi-scale target detection method based on YOLO v3

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于密集连接的FPN多尺度目标检测算法;张宽;滕国伟;范涛;李聪;;计算机应用与软件;20200112(第01期);全文 *
基于深度学习的MSSD目标检测方法;赵庆北;元昌安;;企业科技与发展;20180510(第05期);全文 *

Also Published As

Publication number Publication date
CN113705359A (en) 2021-11-26

Similar Documents

Publication Publication Date Title
TWI746674B (en) Type prediction method, device and electronic equipment for identifying objects in images
CN109359538B (en) Training method of convolutional neural network, gesture recognition method, device and equipment
CN109902548B (en) Object attribute identification method and device, computing equipment and system
CN111696137B (en) Target tracking method based on multilayer feature mixing and attention mechanism
CN111368769B (en) Ship multi-target detection method based on improved anchor point frame generation model
CN110097609B (en) Sample domain-based refined embroidery texture migration method
CN106570480B (en) A kind of human action classification method based on gesture recognition
CN112348036A (en) Self-adaptive target detection method based on lightweight residual learning and deconvolution cascade
CN108961675A (en) Fall detection method based on convolutional neural networks
CN109446922B (en) Real-time robust face detection method
CN111738344A (en) Rapid target detection method based on multi-scale fusion
CN112784782B (en) Three-dimensional object identification method based on multi-view double-attention network
CN109993103A (en) A kind of Human bodys' response method based on point cloud data
CN110211127B (en) Image partition method based on bicoherence network
CN107341440A (en) Indoor RGB D scene image recognition methods based on multitask measurement Multiple Kernel Learning
CN111209873A (en) High-precision face key point positioning method and system based on deep learning
CN109272577A (en) A kind of vision SLAM method based on Kinect
CN113763417B (en) Target tracking method based on twin network and residual error structure
Xiao et al. Pedestrian object detection with fusion of visual attention mechanism and semantic computation
CN113705359B (en) Multi-scale clothes detection system and method based on drum images of washing machine
CN112800882A (en) Mask face posture classification method based on weighted double-flow residual error network
CN113011253A (en) Face expression recognition method, device, equipment and storage medium based on ResNeXt network
Xia et al. Hybrid regression and isophote curvature for accurate eye center localization
CN113011506A (en) Texture image classification method based on depth re-fractal spectrum network
CN113191352A (en) Water meter pointer reading identification method based on target detection and binary image detection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant