WO2023273515A1 - Target detection method, apparatus, electronic device and storage medium - Google Patents
Target detection method, apparatus, electronic device and storage medium Download PDFInfo
- Publication number
- WO2023273515A1 WO2023273515A1 PCT/CN2022/086919 CN2022086919W WO2023273515A1 WO 2023273515 A1 WO2023273515 A1 WO 2023273515A1 CN 2022086919 W CN2022086919 W CN 2022086919W WO 2023273515 A1 WO2023273515 A1 WO 2023273515A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- target
- image
- channel
- detection
- frequency
- Prior art date
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 115
- 238000003860 storage Methods 0.000 title claims abstract description 17
- 238000012545 processing Methods 0.000 claims abstract description 39
- 230000009466 transformation Effects 0.000 claims description 75
- 238000000034 method Methods 0.000 claims description 24
- 238000011176 pooling Methods 0.000 claims description 20
- 238000005070 sampling Methods 0.000 claims description 19
- 230000006870 function Effects 0.000 claims description 16
- 230000004913 activation Effects 0.000 claims description 12
- 238000004590 computer program Methods 0.000 claims description 11
- 238000012549 training Methods 0.000 claims description 7
- 230000008569 process Effects 0.000 claims description 3
- 238000005516 engineering process Methods 0.000 description 8
- 238000004364 calculation method Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 6
- 238000013473 artificial intelligence Methods 0.000 description 5
- 230000000717 retained effect Effects 0.000 description 5
- 230000000007 visual effect Effects 0.000 description 5
- 230000010365 information processing Effects 0.000 description 4
- 230000007246 mechanism Effects 0.000 description 4
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 239000011159 matrix material Substances 0.000 description 3
- 241000282412 Homo Species 0.000 description 2
- 230000003796 beauty Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 210000001525 retina Anatomy 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- 241000023320 Luma <angiosperm> Species 0.000 description 1
- 206010028980 Neoplasm Diseases 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000003190 augmentative effect Effects 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000009529 body temperature measurement Methods 0.000 description 1
- 230000001149 cognitive effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 239000002537 cosmetic Substances 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 230000004807 localization Effects 0.000 description 1
- 230000014759 maintenance of location Effects 0.000 description 1
- OSWPMRLSEDHDFF-UHFFFAOYSA-N methyl salicylate Chemical compound COC(=O)C1=CC=CC=C1O OSWPMRLSEDHDFF-UHFFFAOYSA-N 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Definitions
- the present disclosure relates to the technical field of image processing, and in particular to a target detection method, device, electronic equipment and storage medium.
- the target detection task is an important task in the field of computer vision, and its working purpose is to accurately locate a specific target object from the image through computer image processing. To achieve this purpose by computer, on the one hand, it is necessary to be able to determine the target, for example, to obtain the contour curve of the target, or to obtain parameters such as the shape and size of the target; on the other hand, it is also necessary to locate the target in the image specific location in . As the basic tasks of subsequent segmentation, tracking, and recognition tasks, the target detection task is also an important part of image processing in the field of computer vision. and other links to provide a better data basis.
- the video stream is usually input through the camera. Since the picture changes presented by the video stream are realized by the sequential transformation of multiple frames of digital images, and each frame of digital images contains multiple pixels arranged in a matrix, The resolution of the image is reflected by the number of pixels set in two directions perpendicular to each other. The higher the resolution of the image, the larger the information data carried by the image. Usually, the original video stream has a large amount of data. It is beneficial to processing, transmission and storage, and the encoding and decoding of video streams greatly limits the performance of algorithms.
- Embodiments of the present disclosure provide a target detection method, device, electronic device and storage medium, which can reduce an image without encoding and decoding without affecting the detection performance of the target in the reduced image.
- An aspect of the embodiments of the present disclosure provides a target detection method, which may include: performing color coding on the input original color image to obtain multiple images in the YUV color space; performing pixel area detection on the target images in the multiple images Divide to obtain multiple pixel areas corresponding to the target image; perform discrete cosine transform on each pixel area to obtain the transformation features of the target image; select the target channel of the target area from the transformation features; according to the frequency domain characteristics of the target channel in the target image information for object detection.
- selecting the target channel of the target region from the transformation feature may include: according to the transformation feature, using a channel selection network for detection to obtain the target channel, the channel selection network is a network model obtained by training in advance according to the transformation feature of the sample image, The sample image and the target image are images of the same encoding format.
- the channel selection network may include: a pooling layer, a convolution processing layer, an activation function layer, and a sampling layer; according to the transformation characteristics, the channel selection network corresponding to the pre-trained target image is used for detection to obtain the target channel, which may include : Use the pooling layer to perform global average pooling on the feature values of each channel in the transformation feature to obtain the pooling feature; use the convolution processing layer to perform convolution processing on the pooling feature to obtain the convolution feature; use the activation function
- the layer processes the convolution feature to obtain the probability feature; the sampling layer uses the sampling layer to sample the channel corresponding to the target image according to the probability feature to obtain the target channel.
- the activation function layer may be a sigmoid function layer.
- the sampling layer can be a gumbelsoftmax sampling layer.
- dividing the pixel area of the target image in the multiple images may include: each divided pixel area includes N*N pixel units, where N is a positive integer greater than 0.
- each pixel area may include 8*8 pixel units.
- performing color coding on the input original color image to obtain the multiple images in the YUV color space may further include: subtracting predetermined values from the pixel values of the multiple image pixels in the YUV color space.
- performing object detection according to the frequency-domain feature information of the target channel in the target image may include: inputting the frequency-domain feature information of the target channel into a preset downsampling layer in a pre-trained frequency-domain detection network for processing , to get information about the target object.
- performing object detection according to the frequency-domain feature information of the target channel in the target image may include: splicing the frequency-domain feature information of the target channel into the original frequency-domain detection network at four times downsampling as input, The feature information detection in the frequency domain uses the original image which is enlarged by four times as input.
- the multiple images may include: a Y component image, a U component image, and a V component image; the target image may include: a Y component image.
- an object detection device which may include: an encoding module configured to color-encode an input original color image to obtain multiple images in a YUV color space; an area division module configured to It is configured to divide the pixel regions of the target image in the multiple images to obtain multiple pixel regions corresponding to the target image; the transformation module is configured to perform discrete cosine transform on each pixel region to obtain the transformation characteristics of the target image; The selection module is configured to select the target channel of the target area from the transformation features; the detection module is configured to perform object detection according to the frequency domain feature information of the target channel in the target image.
- the feature selection module is configured to: use a channel selection network for detection according to the transformation characteristics to obtain the target channel, the channel selection network is a network model obtained by training in advance according to the transformation characteristics of the sample image, the sample image and the target image images in the same encoding format.
- an electronic device which may include: a memory and a processor, the memory stores a computer program executable by the processor, and when the processor executes the computer program, any one of the target detection methods described above is implemented .
- Another aspect of the embodiments of the present disclosure provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is read and executed, any one of the object detection methods described above is implemented.
- An embodiment of the present disclosure provides a target detection method, device, electronic device, and storage medium.
- the target detection method may include color coding the input original color image to obtain multiple images in the YUV color space;
- the target image is divided into pixel areas to obtain multiple pixel areas corresponding to the target image; discrete cosine transform is performed on each pixel area to obtain the transformation features of the target image;
- the target channel of the target area is selected from the transformation features; according to the target image
- the frequency-domain feature information of the target channel is used for object detection.
- FIG. 1 is a flowchart of a target detection method provided by some embodiments of the present disclosure
- Fig. 2 is a schematic diagram of a target channel selection path for a target image within a pixel region in a target detection method according to some embodiments of the present disclosure
- FIG. 3 is a flowchart of an implementation of step S104 in a target detection method provided by some embodiments of the present disclosure
- Fig. 4 is a flow chart of an implementation of step S1041 in a target detection method provided by some embodiments of the present disclosure
- Fig. 5 is a flowchart of a target detection method provided by other embodiments of the present disclosure.
- FIG. 6 is a flow chart of another implementation of step S104 in a target detection method provided by some embodiments of the present disclosure.
- FIG. 7 is a flowchart of an implementation of step S105 in a target detection method provided by some embodiments of the present disclosure.
- FIG. 8 is a schematic diagram of an object detection device 100 provided by some embodiments of the present disclosure.
- Fig. 9 is a schematic diagram of an electronic device 200 provided by some embodiments of the present disclosure.
- Icons 100-target detection device; 110-encoding module; 120-area division module; 130-transformation module; 140-feature selection module; 150-detection module; 200-electronic equipment; 201-memory; 202-processor.
- orientation or positional relationship indicated by the terms “upper”, “lower”, “inner”, “outer” etc. is based on the orientation or positional relationship shown in the drawings, or the The orientation or positional relationship that is customarily placed when the application product is used is only for the convenience of describing the present disclosure and simplifying the description, and does not indicate or imply that the referred device or element must have a specific orientation, be constructed and operated in a specific orientation, therefore It should not be construed as a limitation of the present disclosure.
- the terms “first”, “second”, etc. are only used for distinguishing descriptions, and should not be construed as indicating or implying relative importance.
- Artificial Intelligence is an emerging science and technology that studies and develops theories, methods, technologies and application systems for simulating and extending human intelligence.
- the subject of artificial intelligence is a comprehensive subject that involves many technologies such as chips, big data, cloud computing, Internet of Things, distributed storage, deep learning, machine learning, and neural networks.
- computer vision is specifically to allow machines to recognize the world.
- Computer vision technology usually includes face recognition, liveness detection, fingerprint recognition and anti-counterfeiting verification, biometric recognition, face detection, pedestrian detection, target detection, pedestrian detection, etc.
- the target detection task is an important task in the field of computer vision.
- the goal is to locate the image of the object from an image.
- two aspects of work are required. On the one hand, it is necessary to confirm the object to be located. On the other hand, Also locate its exact position in the image.
- target detection task has become a basic task and challenge in the field of computer vision.
- the image input is usually from the camera, so the input is usually a video stream, the video stream has a large amount of image data, and the decoding of the video stream will consume a lot of time and system operations capacity, considering the limitation of computing power, it is often necessary to reduce the image with large resolution, but directly reducing the image, on the one hand, also wastes processing time, on the other hand, directly reducing the image will also cause image data loss. loss, which leads to the loss of information of small objects in the image and makes it difficult to identify, which in turn leads to poor detection and recognition performance for small objects in the image.
- FIG. 1 is a flow chart of a target detection method provided in an embodiment of the present disclosure. As shown in FIG. 1 , it may include:
- the input original color image can be color-coded, the input RGB original image (for example, a 1080*1920*3 RGB original image) is converted into domain space, and the RGB image is converted into an image of multiple components of the color space , for example, decomposed into three (Y, Cr, Cb) images of YUV color space.
- Y represents the brightness (Luminance or Luma), that is, the grayscale value
- U and "V” represent the chroma (Chrominance or Chroma), which is used to describe the color and saturation of the image , which specifies the color of the pixel.
- the target image in the multiple images may be divided into pixel areas to obtain multiple pixel areas corresponding to the target image.
- the target image in the multiple images may be divided into pixel areas to obtain multiple pixel areas corresponding to the target image.
- it can be the target image with preset selection in the multiple images, and directly divide the target image in the image into multiple pixel areas, or it can be multi-pixel
- Each image is divided into multiple pixel areas as a whole, that is to say, it can be understood that the target image is the entire image.
- the size of multiple pixel regions corresponding to the target image should be the same, that is, if the pixel region is set to be 8*8 blocks, then each pixel region is 8*8 blocks.
- Discrete cosine transform can be performed on each pixel area to obtain the transformation feature of the target image, and the transform feature corresponding to each pixel area of the target image can be obtained after the discrete cosine transform is performed on the pixel area.
- the coefficients closer to the upper left corner have larger amplitudes and lower frequencies, and the coefficients closer to the lower right corner have smaller amplitudes and higher frequencies. Therefore, in this frequency coefficient matrix, the upper left side is the low frequency area.
- the lower right side is the high-frequency region, and a large amount of feature information in the image is concentrated in the low-frequency region, which is the upper left side of the frequency coefficient matrix.
- FIG. 2 is a target channel selection path in a pixel area of the target image in the target detection method of some embodiments of the present disclosure.
- the target area can be, for example, a low-frequency area, and the target channel selection in the low-frequency area can be a predetermined number of target channel selections in the order shown by the arrow in Figure 2, or can be selected based on artificial prior information.
- the target channel allocation in the three images also needs to be selected, for example, in the three (Y, Cr, Cb) images , because the human eye is much more sensitive to the recognition of brightness (Y) than to the recognition of chroma (Cr, Cb), therefore, based on the contribution of the three components to the image feature information, the importance of Y is higher than that of Cr and Cb.
- target channel selection in the pixel areas of the three images select a larger number of target channels for Y, and select a smaller number of target channels for Cr and Cb.
- the selected target channel corresponds to the retention of frequency-domain feature information in the target channel.
- S105 Perform object detection according to the frequency-domain feature information of the target channel in the target image.
- object detection can be performed according to the frequency-domain feature information of the target channel in the selected target image. Since the target channel in the target image is optimized and selected, the information that can reflect the target feature in the target image is selectively retained. Therefore, the When the target image is used for object detection, the detection performance of small-sized objects in the image can be improved. Moreover, since the target image is effectively reduced, the calculation amount of image processing is reduced, and the time consumed for detecting the target object in the image can be shortened. and reduce computing resource usage.
- a target detection method provided by an embodiment of the present disclosure may include color coding the input original color image to obtain multiple images in the YUV color space; dividing the target image in the multiple images into pixel regions to obtain the target image Corresponding multiple pixel areas; Discrete cosine transform is performed on each pixel area to obtain the transformation feature of the target image; Select the target area such as the target channel of the low frequency area from the transformation feature; According to the frequency domain feature information of the target channel in the target image, Perform object detection.
- By obtaining the transformation features of the target image and selecting and retaining the transformation features it is possible to retain more valuable and informative features in the target image, thereby effectively improving the performance without increasing computing time and program occupation. For the accuracy of object detection in the image, there is no need to reduce the size of the image to lose the information of small objects in the image, and there is no need to perform encoding and decoding, resulting in a huge amount of calculation.
- each divided pixel area may include N*N pixel units, where N is greater than 0 positive integer of .
- each pixel area may include 8*8 pixel units.
- each pixel area includes 8*8 pixel units as the basic unit unit, which is more conducive to reducing the complexity of calculation.
- each pixel area is divided into 8*8 pixel units for example and description.
- FIG. 3 is a flow chart of an implementation of step S104 in a target detection method provided by an embodiment of the present disclosure.
- S104 selecting a target channel of a target region from transformation features may include:
- the channel selection network is a network model trained in advance according to the transformation feature of the sample image, and the sample image and the target image are images of the same encoding format.
- step S104 when step S104 is performed, the target channel of the target region is selected from the transformation features, and adaptive training can be performed in advance for real-time correction.
- the pre-selected target image can be used
- the preset image of the same format is used as a sample image, and the network model is obtained by training according to the transformation characteristics of the sample image.
- the pre-trained channel can be used according to the transformation characteristics The network is selected to correspond to the target image for detection, so that the obtained target channel can retain more valuable and informative feature information.
- the channel selection network may include: a pooling layer, a convolution processing layer, an activation function layer, and a sampling layer.
- Fig. 4 is a flow chart of an implementation of step S1041 in a target detection method provided by an embodiment of the present disclosure.
- S1041 according to the transformation characteristics, use a channel selection network for detection to obtain the target channel,
- the channel selection network is a network model trained in advance according to the transformation characteristics of the sample image.
- the sample image and the target image are images of the same encoding format, which can include:
- the pooling layer can be used to perform global average pooling processing on the transformed features, including performing global average pooling on the feature values of each channel in the transformed features to obtain pooled features, such as 1*1*64 features.
- the convolution processing layer can be used to perform convolution processing on the pooled features to obtain convolution features.
- the convolution feature can be processed by the activation function layer, and the probability vector of the pooled feature can be obtained as the probability feature.
- the activation function layer may be a sigmoid function layer.
- sampling layer sampling the channel corresponding to the target image according to the probability feature to obtain the target channel.
- the sampling layer can be used to set the probability value of some channels in the probability feature to 1 to indicate that the channel is retained, and set the probability value of another part of the channel to 0 to indicate that the channel is discarded, and determine the channel set to 1 in the probability feature as the target aisle.
- Each number in the probability feature is a probability value from 0 to 1, and the probability value is used to indicate the probability that the channel where the feature is located is retained.
- the sampling layer can be a gumbelsoftmax sampling layer.
- the frequency domain feature information of the target channel can be obtained by multiplying the sampled probability feature and the transformation feature.
- Attention Mechanism originated from the study of human vision.
- cognitive science due to the bottleneck of information processing, humans will selectively focus on a part of all information while ignoring other visible information. This mechanism is often called the attention mechanism.
- Different parts of the human retina have different degrees of information processing ability, that is, acuity, and only the fovea of the retina has the strongest acuity.
- the attention mechanism has two main aspects: deciding which part of the input needs to be paid attention to; and allocating limited information processing resources to important parts.
- the target channel can be reserved selectively, and the accuracy of object detection in the target image can be improved after the target image is reduced. .
- Fig. 5 is a flowchart of a target detection method provided by other embodiments of the present disclosure. As shown in Fig. 5, step S101, color coding the input original color image, and obtaining multiple images in the YUV color space may also include :
- step S101, performing color coding on the input original color image to obtain multiple images in YUV color space may also include:
- the pixel values of multiple image pixels of the YUV color space can also be included Subtract 127 respectively.
- the pixel value of each image pixel is subtracted by 127 (in this example, 127 is the default value) left shift operation to ensure the symmetry of each 8*8 block.
- the multiple images may include: a Y component image, a U component image, and a V component image; the target image may include: a Y component image.
- the Y component image is used to represent the gray scale of the color space image
- the U component The image and the V component image express the color and saturation of the color space image, because in each component image, the Y component image used to represent the grayscale has a greater impact on the visual quality of the color space image than the U component image and the V component image.
- the target image may include the Y component image, which can meet the accuracy required for object detection in the target image after the reduction processing of the target image.
- the target image may further include: a U component image and a V component image. That is, the target image includes a Y component image, a U component image, and a V component image, and the target image includes the images of the three components in the entire color space image. Therefore, after the target image is reduced, even if the visual quality of the image is The U component image and the V component image with relatively low contribution can also be processed and selectively retained, thereby improving the accuracy of object detection in the target image after the reduction processing of the target image.
- FIG. 6 is a target detection method provided by an embodiment of the present disclosure.
- the flow chart of another embodiment of step S104, as shown in FIG. 6, S104, selecting the target channel of the target area from the transformation feature may include:
- the target image when performing step S104 and selecting the target channel of the target area from the transformation features, it may include transforming the features from the Y component image and the U component image respectively. transform features and select low frequency channels from the transform features of the V component image as the Y component low frequency channel, the U component low frequency channel and the V component low frequency channel respectively.
- step S1042 selecting a first preset number of low-frequency channels from the transformation characteristics of the Y component image as the Y component low-frequency channel, the selected low-frequency channel quantity is the first preset number, and performing step S1043, from U Select the second preset number of low-frequency channels in the transformation feature of the component image as the U component low-frequency channel, the number of selected low-frequency channels is the second preset number, and in step S1044, select the third from the transformation feature of the V component image.
- the preset number of low-frequency channels is used as the V component low-frequency channel, and the selected number of low-frequency channels is the third preset number, and the first preset number is greater than the second preset number and greater than the third preset number, that is, when executing When selecting the low-frequency channel in the transformation characteristics of the Y component image, U component image and V component image, still follow the principle that the aforementioned Y component image contributes more to the visual quality of the color space image.
- the first preset number of low-frequency channels selected as the low-frequency channels of the Y component in the transformation characteristics of the component images is greater than the selected number of low-frequency channels for the transformation characteristics of the U component images and the V component images.
- Fig. 7 is a flow chart of an implementation of step S105 in a target detection method provided by some embodiments of the present disclosure. As shown in Fig. 7, S105, perform object detection according to the frequency-domain feature information of the target channel in the target image , which can include:
- an optional way to perform object detection according to the frequency-domain feature information of the target channel in the target image may be to input the preset down-sampling layer in the pre-trained frequency-domain detection network
- the frequency-domain feature information of the target channel is processed to obtain the information of the target object.
- related mainstream detection networks such as Faster RCNN, Retinanet, etc.
- the optional method is to splice the obtained frequency-domain feature information
- the quadruple downsampling in the original frequency domain detection network is used as input.
- Frequency domain feature information detection can directly use four times the size of the original image as input, so that more small object information in the image can be retained, so the detection performance for small objects is better, and high data such as 4K cameras can be directly used
- the image with the amount of information is used as input, without having to shrink the image in advance.
- FIG. 8 is a schematic diagram of a target detection device provided by an embodiment of the present disclosure. As shown in FIG. 8 , in another aspect of the embodiments of the present disclosure, a target detection device 100 is provided.
- the target detection device 100 may include:
- the encoding module 110 is configured to perform color encoding on the input original color image to obtain multiple images in the YUV color space.
- the region division module 120 is configured to divide the pixel region of the target image in the multiple images to obtain multiple pixel regions corresponding to the target image.
- the transformation module 130 is configured to perform discrete cosine transformation on each pixel region to obtain transformation features of the target image.
- the feature selection module 140 is configured to select a target channel of the target region from the transformed features.
- the detection module 150 is configured to perform object detection according to the frequency-domain feature information of the target channel in the target image.
- the target detection device can retain the more valuable and informative features in the target image by acquiring the transformation features of the target image and selecting and retaining the transformation features, so that On the basis of not increasing the computing time and program occupation, it can effectively improve the accuracy of object detection in the image, without reducing the size of the image to lose the information of small objects in the image, and without encoding and decoding, resulting in a huge amount of computation.
- the above modules may be one or more integrated circuits configured to implement the above method, for example: one or more specific integrated circuits (Application Specific Integrated Circuit, referred to as ASIC), or, one or more microprocessors ( digital signal processor (DSP for short), or, one or more Field Programmable Gate Arrays (Field Programmable Gate Array, FPGA for short), etc.
- ASIC Application Specific Integrated Circuit
- DSP digital signal processor
- FPGA Field Programmable Gate Array
- the processing element may be a general-purpose processor, such as a central processing unit (Central Processing Unit, referred to as CPU) or other processors that can call program codes.
- CPU central processing unit
- these modules can be integrated together and implemented in the form of a system-on-a-chip (SOC for short).
- the feature selection module 140 is optionally configured to use the channel selection network corresponding to the pre-trained target image for detection according to the transformation feature, and obtain the target channel.
- the channel selection network can be The network model obtained by pre-training according to the transformation characteristics of the sample image, the sample image and the target image can be images of the same encoding format.
- the channel selection network may include: a pooling layer, a convolution processing layer, an activation function layer, and a sampling layer.
- the feature selection module 140 is optionally configured to use a pooling layer, which can perform global average pooling on the feature values of each channel in the transformed feature to obtain a pooling feature; using a convolution processing layer, can perform convolution on the pooling feature.
- Product processing to obtain convolutional features; using activation function layer, convolutional features can be processed to obtain probability features; sampling layer can be used to sample the channel corresponding to the target image according to the probability feature to obtain the target channel.
- the region division module 120 is optionally configured to perform pixel region division on the target image in the plurality of images, and each pixel region obtained by division may include N*N pixel units , where N is a positive integer greater than 0.
- each divided pixel area may include 8*8 pixel units.
- the encoding module 110 is optionally configured to perform color encoding on the input original color image to obtain multiple images in the YUV color space, and is also configured to perform color encoding on the YUV color space Subtract 127 from the pixel values of multiple image pixels, respectively.
- the detection module 150 is optionally configured to input the frequency-domain feature information of the target channel into a preset downsampling layer in a pre-trained frequency-domain detection network for processing, Get information about the target object.
- the plurality of images may include a Y component image, a U component image, and a V component image.
- the target image includes a Y component image, and in some optional implementation manners, the target image may also include a U component image and a V component image.
- the feature selection module 140 is optionally configured to select a first preset number of low-frequency channels as the Y component from the transformation features of the Y component image Low-frequency channel; select the low-frequency channel of the second preset number as the U component low-frequency channel from the transformation feature of the U component image; select the low-frequency channel of the third preset number as the V component low-frequency channel from the transformation feature of the V component image; wherein , the first preset number is larger than the second preset number and larger than the third preset number.
- FIG. 9 is a schematic diagram of an electronic device 200 provided by an embodiment of the present disclosure.
- an electronic device 200 which may include: a memory 201 and a processor 202,
- the memory 201 stores a computer program executable by the processor 202, and the processor 202 invokes the program stored in the memory 201 to execute any one of the embodiments of the object detection method described above.
- the specific implementation manner and technical effect are similar, and will not be repeated here.
- Another aspect of the embodiments of the present disclosure provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is read and executed, any one of the object detection methods described above is implemented.
- the disclosed devices and methods may be implemented in other ways.
- the device embodiments described above are only illustrative.
- the division of units is only a logical function division. In actual implementation, there may be other division methods.
- multiple units or components can be combined or integrated. to another system, or some features may be ignored, or not implemented.
- the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
- a unit described as a separate component may or may not be physically separated, and a component displayed as a unit may or may not be a physical unit, that is, it may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
- each functional unit in each embodiment of the present disclosure may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
- the integrated unit can be implemented in the form of hardware, or in the form of hardware plus software functional units.
- Integrated units implemented in the form of software functional units may be stored in a computer-readable storage medium.
- the software functional unit is stored in a storage medium, including several instructions to enable a computer device (which may be a personal computer, server, or network device, etc.) or processor (English: processor) to execute the methods described in various embodiments of the present disclosure part of the steps.
- the aforementioned storage medium can include: U disk, mobile hard disk, read-only memory (English: Read-Only Memory, abbreviated: ROM), random access memory (English: Random Access Memory, abbreviated: RAM), magnetic disk or optical disc Various media that can store program codes.
- the present disclosure provides a target detection method, device, electronic equipment, and storage medium.
- the target detection method may include color coding an input original color image to obtain multiple images in a YUV color space;
- the target image is divided into pixel areas to obtain multiple pixel areas corresponding to the target image; discrete cosine transform is performed on each pixel area to obtain the transformation features of the target image; the target channel of the target area is selected from the transformation features; according to the target image
- the frequency-domain feature information of the target channel is used for object detection.
- the image can be reduced without encoding and decoding without affecting the detection performance of the target object in the reduced image.
- the object detection method, device, electronic device and storage medium of the present disclosure are reproducible and can be used in various industrial applications, for example, tumor detection in medical images.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Image Analysis (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
The present disclosure relates to the technical field of image processing, and provided in embodiments thereof are a target detection method, an apparatus, an electronic device, and a storage medium, which can reduce the size of an image without needing to use a means of coding and decoding and without affecting the detection performance on a target object in a size-reduced image. The target detection method may comprise: performing color coding on an input original color image, and obtaining a plurality of YUV color space images; performing pixel region division on a target image among the plurality of images, and obtaining a plurality of pixel regions corresponding to the target image; performing discrete cosine transform on each pixel region, and obtaining transform features of the target image; selecting a target channel of a low frequency region among the transform features; and performing object detection according to frequency domain feature information of the target channel in the target image.
Description
相关申请的交叉引用Cross References to Related Applications
本公开要求于2021年06月28日提交中国国家知识产权局的申请号为202110721797.4、名称为“目标检测方法、装置、电子设备和存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本公开中。This disclosure claims the priority of the Chinese patent application with application number 202110721797.4 and titled "target detection method, device, electronic device and storage medium" filed with the State Intellectual Property Office of China on June 28, 2021, the entire contents of which are incorporated by reference incorporated in this disclosure.
本公开涉及图像处理技术领域,具体涉及一种目标检测方法、装置、电子设备和存储介质。The present disclosure relates to the technical field of image processing, and in particular to a target detection method, device, electronic equipment and storage medium.
目标检测任务作为计算机视觉领域的一个重要的任务,其工作目的是通过计算机的图像处理,由图像中准确定位到特定的目标物。要通过计算机实现这一工作目的,一方面,需要能够确定目标物,例如,需要获得目标物的轮廓曲线,或者获得目标物的形状尺寸等参数;另一方面,还需要定位到目标物在图像中的具体位置。目标检测任务作为后续分割、跟踪、识别等任务的基础任务,也是反映计算机视觉领域内图像处理的重要环节,只有做到准确高速的目标检测,才能够为图像处理中后续的分割、跟踪、识别等环节提供较佳的数据基础。The target detection task is an important task in the field of computer vision, and its working purpose is to accurately locate a specific target object from the image through computer image processing. To achieve this purpose by computer, on the one hand, it is necessary to be able to determine the target, for example, to obtain the contour curve of the target, or to obtain parameters such as the shape and size of the target; on the other hand, it is also necessary to locate the target in the image specific location in . As the basic tasks of subsequent segmentation, tracking, and recognition tasks, the target detection task is also an important part of image processing in the field of computer vision. and other links to provide a better data basis.
在实际应用中,通常是通过摄像机输入视频流,由于视频流呈现出的画面变化是由多帧数字图像的依次变换实现的,而每一帧的数字图像都包含成矩阵排列的多个像素,由多个像素在相互垂直的两个方向上的设置数量来体现图像的分辨率,图像分辨率越高,图像携带的信息数据就越大,通常原始的视频流的数据量都较大,不利于处理、传输和存储,对视频流的编码解码很大程度地限制了算法的发挥。为了平衡算力有限的问题,一般需要对大分辨率图像缩小处理,但是直接缩小图像的分辨率,一方面会消耗数据计算处理的时间,另一方面,直接对图像进行缩小处理,也很容易会损失掉图像中的小物体信息,从而导致在缩小后的图像中对于尺寸较小的目标物的检测性能较差。In practical applications, the video stream is usually input through the camera. Since the picture changes presented by the video stream are realized by the sequential transformation of multiple frames of digital images, and each frame of digital images contains multiple pixels arranged in a matrix, The resolution of the image is reflected by the number of pixels set in two directions perpendicular to each other. The higher the resolution of the image, the larger the information data carried by the image. Usually, the original video stream has a large amount of data. It is beneficial to processing, transmission and storage, and the encoding and decoding of video streams greatly limits the performance of algorithms. In order to balance the problem of limited computing power, it is generally necessary to shrink large-resolution images, but directly reducing the resolution of images will consume data calculation and processing time on the one hand, and on the other hand, directly shrinking images is also very easy Small object information in the image will be lost, resulting in poor detection performance for smaller objects in the reduced image.
发明内容Contents of the invention
本公开实施例提供了一种目标检测方法、装置、电子设备和存储介质,能够无需通过编码解码的方式即可对图像进行缩小,且不影响缩小后的图像中对于目标物的检测性能。Embodiments of the present disclosure provide a target detection method, device, electronic device and storage medium, which can reduce an image without encoding and decoding without affecting the detection performance of the target in the reduced image.
本公开实施例的一方面,提供了一种目标检测方法,可以包括:对输入的原始彩色图像进行颜色编码,得到YUV色彩空间的多个图像;对多个图像中的目标图像进行像素区域的划分,得到目标图像对应的多个像素区域;对每个像素区域进行离散余弦变换,得到目标图像的变换特征;从变换特征中选择目标区域的目标通道;根据目标图像中目标通道的 频域特征信息,进行物体检测。An aspect of the embodiments of the present disclosure provides a target detection method, which may include: performing color coding on the input original color image to obtain multiple images in the YUV color space; performing pixel area detection on the target images in the multiple images Divide to obtain multiple pixel areas corresponding to the target image; perform discrete cosine transform on each pixel area to obtain the transformation features of the target image; select the target channel of the target area from the transformation features; according to the frequency domain characteristics of the target channel in the target image information for object detection.
可选地,从变换特征中选择目标区域的目标通道可以包括:根据变换特征,采用通道选择网络进行检测,得到目标通道,通道选择网络为预先根据样本图像的变换特征进行训练得到的网络模型,样本图像和目标图像为相同编码格式的图像。Optionally, selecting the target channel of the target region from the transformation feature may include: according to the transformation feature, using a channel selection network for detection to obtain the target channel, the channel selection network is a network model obtained by training in advance according to the transformation feature of the sample image, The sample image and the target image are images of the same encoding format.
可选地,通道选择网络可以包括:池化层、卷积处理层、激活函数层及采样层;根据变换特征,采用预先训练的目标图像对应的通道选择网络进行检测,得到目标通道,可以包括:采用池化层,对变换特征中的各通道的特征值进行全局平均池化,得到池化特征;采用卷积处理层,对池化特征进行卷积处理,得到卷积特征;采用激活函数层,对卷积特征进行处理,得到概率特征;采用采样层,根据概率特征对与目标图像对应的通道进行采样处理,得到目标通道。Optionally, the channel selection network may include: a pooling layer, a convolution processing layer, an activation function layer, and a sampling layer; according to the transformation characteristics, the channel selection network corresponding to the pre-trained target image is used for detection to obtain the target channel, which may include : Use the pooling layer to perform global average pooling on the feature values of each channel in the transformation feature to obtain the pooling feature; use the convolution processing layer to perform convolution processing on the pooling feature to obtain the convolution feature; use the activation function The layer processes the convolution feature to obtain the probability feature; the sampling layer uses the sampling layer to sample the channel corresponding to the target image according to the probability feature to obtain the target channel.
可选地,激活函数层可以为sigmoid函数层。Optionally, the activation function layer may be a sigmoid function layer.
可选地,采样层可以为gumbelsoftmax采样层。Optionally, the sampling layer can be a gumbelsoftmax sampling layer.
可选地,对多个图像中的目标图像进行像素区域的划分可以包括:划分得到的每个像素区域包括N*N个像素单元,其中,N为大于0的正整数。Optionally, dividing the pixel area of the target image in the multiple images may include: each divided pixel area includes N*N pixel units, where N is a positive integer greater than 0.
可选地,每个像素区域可以包括8*8个像素单元。Optionally, each pixel area may include 8*8 pixel units.
可选地,对输入的原始彩色图像进行颜色编码,得到YUV色彩空间的多个图像还可以包括:对YUV色彩空间的多个图像像素的像素值分别减去预定值。Optionally, performing color coding on the input original color image to obtain the multiple images in the YUV color space may further include: subtracting predetermined values from the pixel values of the multiple image pixels in the YUV color space.
可选地,根据目标图像中目标通道的频域特征信息,进行物体检测,可以包括:将目标通道的频域特征信息,输入至预先训练的频域检测网络中的预设下采样层进行处理,得到目标物体的信息。Optionally, performing object detection according to the frequency-domain feature information of the target channel in the target image may include: inputting the frequency-domain feature information of the target channel into a preset downsampling layer in a pre-trained frequency-domain detection network for processing , to get information about the target object.
可选地,根据目标图像中目标通道的频域特征信息,进行物体检测,可以包括:将所述目标通道的频域特征信息,拼接到原频域检测网络中四倍下采样处作为输入,频域特征信息检测使用被放大四倍的原始图像作为输入。Optionally, performing object detection according to the frequency-domain feature information of the target channel in the target image may include: splicing the frequency-domain feature information of the target channel into the original frequency-domain detection network at four times downsampling as input, The feature information detection in the frequency domain uses the original image which is enlarged by four times as input.
可选地,多个图像可以包括:Y分量图像、U分量图像、V分量图像;目标图像可以包括:Y分量图像。Optionally, the multiple images may include: a Y component image, a U component image, and a V component image; the target image may include: a Y component image.
可选地,目标图像还可以包括:U分量图像、V分量图像;从变换特征中选择目标区域的目标通道可以包括:从Y分量图像的变换特征中选择第一预设数量的低频通道作为Y分量低频通道;从U分量图像的变换特征中选择第二预设数量的低频通道作为U分量低频通道;从V分量图像的变换特征中选择第三预设数量的低频通道作为V分量低频通道;其中,第一预设数量大于第二预设数量,且大于第三预设数量。Optionally, the target image may also include: a U component image, a V component image; selecting the target channel of the target area from the transformation feature may include: selecting a first preset number of low-frequency channels from the transformation feature of the Y component image as Y Component low-frequency channel; select the second preset number of low-frequency channels as the U component low-frequency channel from the transformation feature of the U component image; select the third preset number of low-frequency channels as the V component low-frequency channel from the transformation feature of the V component image; Wherein, the first preset number is larger than the second preset number and larger than the third preset number.
本公开实施例的又一方面,提供了一种目标检测装置,可以包括:编码模块,被配置成对输入的原始彩色图像进行颜色编码,得到YUV色彩空间的多个图像;区域划分模块, 被配置成对多个图像中的目标图像进行像素区域的划分,得到目标图像对应的多个像素区域;变换模块,被配置成对每个像素区域进行离散余弦变换,得到目标图像的变换特征;特征选择模块,被配置成从变换特征中选择目标区域的目标通道;检测模块,被配置成根据目标图像中目标通道的频域特征信息,进行物体检测。Yet another aspect of the embodiments of the present disclosure provides an object detection device, which may include: an encoding module configured to color-encode an input original color image to obtain multiple images in a YUV color space; an area division module configured to It is configured to divide the pixel regions of the target image in the multiple images to obtain multiple pixel regions corresponding to the target image; the transformation module is configured to perform discrete cosine transform on each pixel region to obtain the transformation characteristics of the target image; The selection module is configured to select the target channel of the target area from the transformation features; the detection module is configured to perform object detection according to the frequency domain feature information of the target channel in the target image.
可选地,特征选择模块配置成:根据所述变换特征,采用通道选择网络进行检测,得到目标通道,通道选择网络为预先根据样本图像的变换特征进行训练得到的网络模型,样本图像和目标图像为相同编码格式的图像。Optionally, the feature selection module is configured to: use a channel selection network for detection according to the transformation characteristics to obtain the target channel, the channel selection network is a network model obtained by training in advance according to the transformation characteristics of the sample image, the sample image and the target image images in the same encoding format.
本公开实施例的另一方面,提供了一种电子设备,可以包括:存储器和处理器,存储器存储有处理器可执行的计算机程序,处理器执行计算机程序时实现上述任一项的目标检测方法。Another aspect of the embodiments of the present disclosure provides an electronic device, which may include: a memory and a processor, the memory stores a computer program executable by the processor, and when the processor executes the computer program, any one of the target detection methods described above is implemented .
本公开实施例的再一方面,提供了一种计算机可读存储介质,存储介质上存储有计算机程序,计算机程序被读取并执行时,实现上述任一项的目标检测方法。Another aspect of the embodiments of the present disclosure provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is read and executed, any one of the object detection methods described above is implemented.
本公开实施例提供的一种目标检测方法、装置、电子设备和存储介质,目标检测方法可以包括对输入的原始彩色图像进行颜色编码,得到YUV色彩空间的多个图像;对多个图像中的目标图像进行像素区域的划分,得到目标图像对应的多个像素区域;对每个像素区域进行离散余弦变换,得到目标图像的变换特征;从变换特征中选择目标区域的目标通道;根据目标图像中目标通道的频域特征信息,进行物体检测。通过获取目标图像的变换特征,并对变换特征的选择保留,能够将目标图像中更有价值和信息量更丰富的特征进行保留,从而在不增加运算时间和程序占用的基础上,有效地提高对于图像中物体检测的准确性,无需对图像进行缩小处理而损失图像中的小物体信息,也无需进行编码解码导致运算量巨大。An embodiment of the present disclosure provides a target detection method, device, electronic device, and storage medium. The target detection method may include color coding the input original color image to obtain multiple images in the YUV color space; The target image is divided into pixel areas to obtain multiple pixel areas corresponding to the target image; discrete cosine transform is performed on each pixel area to obtain the transformation features of the target image; the target channel of the target area is selected from the transformation features; according to the target image The frequency-domain feature information of the target channel is used for object detection. By obtaining the transformation features of the target image and selecting and retaining the transformation features, it is possible to retain more valuable and informative features in the target image, thereby effectively improving the performance without increasing computing time and program occupation. For the accuracy of object detection in the image, there is no need to reduce the size of the image to lose the information of small objects in the image, and there is no need to perform encoding and decoding, resulting in a huge amount of calculation.
为了更清楚地说明本公开实施例的技术方案,下面将对本公开实施例中所需要使用的附图作简单地介绍,应当理解,以下附图仅示出了本公开的某些实施例,因此不应被看作是对范围的限定,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他相关的附图。In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the following will briefly introduce the accompanying drawings that need to be used in the embodiments of the present disclosure. It should be understood that the following drawings only show some embodiments of the present disclosure, so It should not be regarded as a limitation on the scope, and those skilled in the art can also obtain other related drawings according to these drawings without creative work.
图1是本公开一些实施例所提供的一种目标检测方法的流程图;FIG. 1 is a flowchart of a target detection method provided by some embodiments of the present disclosure;
图2是本公开一些实施例的目标检测方法中对于目标图像的一个像素区域内目标通道选择路径的示意图;Fig. 2 is a schematic diagram of a target channel selection path for a target image within a pixel region in a target detection method according to some embodiments of the present disclosure;
图3是本公开一些实施例所提供的一种目标检测方法中步骤S104的一种实施方式的流程图;FIG. 3 is a flowchart of an implementation of step S104 in a target detection method provided by some embodiments of the present disclosure;
图4是本公开一些实施例所提供的一种目标检测方法中步骤S1041的一种实施方式的 流程图;Fig. 4 is a flow chart of an implementation of step S1041 in a target detection method provided by some embodiments of the present disclosure;
图5是本公开另一些实施例所提供的一种目标检测方法的流程图;Fig. 5 is a flowchart of a target detection method provided by other embodiments of the present disclosure;
图6是本公开一些实施例所提供的一种目标检测方法中,步骤S104的另一种实施方式的流程图;FIG. 6 is a flow chart of another implementation of step S104 in a target detection method provided by some embodiments of the present disclosure;
图7是本公开一些实施例所提供的一种目标检测方法中步骤S105的一种实施方式的流程图;FIG. 7 is a flowchart of an implementation of step S105 in a target detection method provided by some embodiments of the present disclosure;
图8是本公开一些实施例所提供的一种目标检测装置100的示意图;FIG. 8 is a schematic diagram of an object detection device 100 provided by some embodiments of the present disclosure;
图9是本公开一些实施例所提供的一种电子设备200的示意图。Fig. 9 is a schematic diagram of an electronic device 200 provided by some embodiments of the present disclosure.
图标:100-目标检测装置;110-编码模块;120-区域划分模块;130-变换模块;140-特征选择模块;150-检测模块;200-电子设备;201-存储器;202-处理器。Icons: 100-target detection device; 110-encoding module; 120-area division module; 130-transformation module; 140-feature selection module; 150-detection module; 200-electronic equipment; 201-memory; 202-processor.
下面将结合本公开实施例中的附图,对本公开实施例中的技术方案进行清楚、完整地描述。The following will clearly and completely describe the technical solutions in the embodiments of the present disclosure with reference to the drawings in the embodiments of the present disclosure.
在本公开的描述中,需要说明的是,术语“上”、“下”、“内”、“外”等指示的方位或位置关系为基于附图所示的方位或位置关系,或者是该申请产品使用时惯常摆放的方位或位置关系,仅是为了便于描述本公开和简化描述,而不是指示或暗示所指的装置或元件必须具有特定的方位、以特定的方位构造和操作,因此不能理解为对本公开的限制。此外,术语“第一”、“第二”等仅用于区分描述,而不能理解为指示或暗示相对重要性。In the description of the present disclosure, it should be noted that the orientation or positional relationship indicated by the terms "upper", "lower", "inner", "outer" etc. is based on the orientation or positional relationship shown in the drawings, or the The orientation or positional relationship that is customarily placed when the application product is used is only for the convenience of describing the present disclosure and simplifying the description, and does not indicate or imply that the referred device or element must have a specific orientation, be constructed and operated in a specific orientation, therefore It should not be construed as a limitation of the present disclosure. In addition, the terms "first", "second", etc. are only used for distinguishing descriptions, and should not be construed as indicating or implying relative importance.
近年来,基于人工智能的计算机视觉、深度学习、机器学习、图像处理、图像识别等技术研究取得了重要进展。人工智能(Artificial Intelligence,AI)是研究、开发用于模拟、延伸人的智能的理论、方法、技术及应用系统的新兴科学技术。人工智能学科是一门综合性学科,涉及芯片、大数据、云计算、物联网、分布式存储、深度学习、机器学习、神经网络等诸多技术种类。计算机视觉作为人工智能的一个重要分支,具体是让机器识别世界,计算机视觉技术通常包括人脸识别、活体检测、指纹识别与防伪验证、生物特征识别、人脸检测、行人检测、目标检测、行人识别、图像处理、图像识别、图像语义理解、图像检索、文字识别、视频处理、视频内容识别、行为识别、三维重建、虚拟现实、增强现实、同步定位与地图构建(SLAM)、计算摄影、机器人导航与定位等技术。随着人工智能技术的研究和进步,该项技术在众多领域展开了应用,例如安防、城市管理、交通管理、楼宇管理、园区管理、人脸通行、人脸考勤、物流管理、仓储管理、机器人、智能营销、计算摄影、手机影像、云服务、智能家居、穿戴设备、无人驾驶、自动驾驶、智能医疗、人脸支付、人脸解锁、指纹解锁、人证核验、智慧屏、智能电视、摄像机、移动互联网、网络直播、美颜、美妆、医疗美容、智能测温等领域。In recent years, artificial intelligence-based computer vision, deep learning, machine learning, image processing, image recognition and other technologies have made important progress. Artificial Intelligence (AI) is an emerging science and technology that studies and develops theories, methods, technologies and application systems for simulating and extending human intelligence. The subject of artificial intelligence is a comprehensive subject that involves many technologies such as chips, big data, cloud computing, Internet of Things, distributed storage, deep learning, machine learning, and neural networks. As an important branch of artificial intelligence, computer vision is specifically to allow machines to recognize the world. Computer vision technology usually includes face recognition, liveness detection, fingerprint recognition and anti-counterfeiting verification, biometric recognition, face detection, pedestrian detection, target detection, pedestrian detection, etc. Recognition, image processing, image recognition, image semantic understanding, image retrieval, text recognition, video processing, video content recognition, behavior recognition, 3D reconstruction, virtual reality, augmented reality, simultaneous localization and map construction (SLAM), computational photography, robotics Navigation and positioning technologies. With the research and progress of artificial intelligence technology, this technology has been applied in many fields, such as security, urban management, traffic management, building management, park management, face access, face attendance, logistics management, warehouse management, robots , smart marketing, computational photography, mobile imaging, cloud services, smart home, wearable devices, unmanned driving, automatic driving, smart medical care, face payment, face unlock, fingerprint unlock, witness verification, smart screen, smart TV, Cameras, mobile Internet, webcasting, beauty, cosmetics, medical beauty, intelligent temperature measurement and other fields.
目标检测任务作为计算机视觉领域的一个重要任务,目标是从一张图像中定位到物体的图像,要实现这一任务,需要两方面的工作,一方面需要确认待定位的物体,另一方面,还要定位到其在图像中的准确位置。目标检测任务作为下游分割、跟踪、识别等任务的基础部分,已经成为计算机视觉领域内的一项基本任务和挑战。The target detection task is an important task in the field of computer vision. The goal is to locate the image of the object from an image. To achieve this task, two aspects of work are required. On the one hand, it is necessary to confirm the object to be located. On the other hand, Also locate its exact position in the image. As a basic part of downstream segmentation, tracking, recognition and other tasks, target detection task has become a basic task and challenge in the field of computer vision.
在目标检测的实际应用中,通常图像的输入是来自于摄像机,因此,输入的通常是一视频流,视频流的图像数据量较大,对于视频流的解码会消耗大量的时间和系统的运算能力,考虑到运算能力的限制,往往需要对大分辨率的图像进行缩小处理,但是直接对图像进行缩小处理,一方面同样浪费处理时间,另一方面,直接缩小的图像也会造成图像数据的损失,从而导致图像中小物体信息损失难以识别,进而导致对于图像中小物体的检测识别性能变差。In the actual application of target detection, the image input is usually from the camera, so the input is usually a video stream, the video stream has a large amount of image data, and the decoding of the video stream will consume a lot of time and system operations capacity, considering the limitation of computing power, it is often necessary to reduce the image with large resolution, but directly reducing the image, on the one hand, also wastes processing time, on the other hand, directly reducing the image will also cause image data loss. loss, which leads to the loss of information of small objects in the image and makes it difficult to identify, which in turn leads to poor detection and recognition performance for small objects in the image.
基于此,本公开实施例提供了一种目标检测方法,图1是本公开实施例提供的一种目标检测方法的流程图,如图1所示,可以包括:Based on this, an embodiment of the present disclosure provides a target detection method. FIG. 1 is a flow chart of a target detection method provided in an embodiment of the present disclosure. As shown in FIG. 1 , it may include:
S101、对输入的原始彩色图像进行颜色编码,得到YUV色彩空间的多个图像。S101. Perform color coding on the input original color image to obtain multiple images in YUV color space.
首先,可以对输入的原始彩色图像进行颜色编码,输入RGB原始图像(例如,一幅1080*1920*3的RGB原始图像)进行域空间转换,将RGB图像转化为色彩空间的多个分量的图像,示例地,分解为YUV色彩空间的三幅(Y,Cr,Cb)图像。YUV色彩空间中,“Y”表示明亮度(Luminance或Luma),也就是灰阶值,“U”和“V”表示的则是色度(Chrominance或Chroma),作用是描述图像色彩及饱和度,用于指定像素的颜色。First, the input original color image can be color-coded, the input RGB original image (for example, a 1080*1920*3 RGB original image) is converted into domain space, and the RGB image is converted into an image of multiple components of the color space , for example, decomposed into three (Y, Cr, Cb) images of YUV color space. In the YUV color space, "Y" represents the brightness (Luminance or Luma), that is, the grayscale value, and "U" and "V" represent the chroma (Chrominance or Chroma), which is used to describe the color and saturation of the image , which specifies the color of the pixel.
S102、对多个图像中的目标图像进行像素区域的划分,得到目标图像对应的多个像素区域。S102. Divide the target image in the plurality of images into pixel regions to obtain a plurality of pixel regions corresponding to the target image.
可以对多个图像中的目标图像进行像素区域的划分,以得到目标图像对应的多个像素区域。其中,需要说明的是,对于多个图像的像素区域划分,可以为对多个图像中有预设选择的目标图像,直接对图像中的目标图像划分为多个像素区域,也可以为对多个图像整体都划分为多个像素区域,也就是说,可以理解为,目标图像即为整个图像。The target image in the multiple images may be divided into pixel areas to obtain multiple pixel areas corresponding to the target image. Wherein, it should be noted that, for the pixel area division of multiple images, it can be the target image with preset selection in the multiple images, and directly divide the target image in the image into multiple pixel areas, or it can be multi-pixel Each image is divided into multiple pixel areas as a whole, that is to say, it can be understood that the target image is the entire image.
通常情况下,目标图像对应的多个像素区域的大小应当是相同的,即,若像素区域设置为8*8的分块,则每一个像素区域均为8*8分块。Usually, the size of multiple pixel regions corresponding to the target image should be the same, that is, if the pixel region is set to be 8*8 blocks, then each pixel region is 8*8 blocks.
S103、对每个像素区域进行离散余弦变换,得到目标图像的变换特征。S103. Perform discrete cosine transformation on each pixel region to obtain transformation features of the target image.
可以对每个像素区域进行离散余弦变换,以得到目标图像的变换特征,像素区域进行离散余弦变换后得到目标图像对应于每个像素区域的变换特征。Discrete cosine transform can be performed on each pixel area to obtain the transformation feature of the target image, and the transform feature corresponding to each pixel area of the target image can be obtained after the discrete cosine transform is performed on the pixel area.
在像素区域的变换特征中,通常越靠近左上角位置的系数幅度大、频率低,越靠近右下角位置的系数幅度小、频率高,所以,在这个频率系数矩阵中,左上侧为低频区域,右下侧为高频区域,而图像中的特征信息大量集中在低频区域中,也就是频率系数矩阵的左 上侧。In the transformation characteristics of the pixel area, usually the coefficients closer to the upper left corner have larger amplitudes and lower frequencies, and the coefficients closer to the lower right corner have smaller amplitudes and higher frequencies. Therefore, in this frequency coefficient matrix, the upper left side is the low frequency area. The lower right side is the high-frequency region, and a large amount of feature information in the image is concentrated in the low-frequency region, which is the upper left side of the frequency coefficient matrix.
S104、从变换特征中选择目标区域的目标通道。S104. Select the target channel of the target area from the transformation features.
因此,可以对变换特征做重构操作,从变换特征中选择目标区域的目标通道;其中,图2是本公开一些实施例的目标检测方法中对于目标图像的一个像素区域内目标通道选择路径的示意图,目标区域例如可以为低频区域,低频区域的目标通道选择,可以为如图2中箭头所示的顺序进行预定数量的目标通道选择,也可以是根据人为的先验信息对于通道进行挑选,而且,仍旧以分解为YUV色彩空间的三幅(Y,Cr,Cb)图像为例,对于三幅图像中的目标通道分配也需要进行选择,例如,三幅(Y,Cr,Cb)图像中,由于人眼对于亮度(Y)的识别要远比对于色度(Cr,Cb)的识别敏感,因此,基于三个分量对图像特征信息的贡献,Y的重要程度高于Cr和Cb,在三幅图像像素区域中进行目标通道选择时,对于Y中选择较多数量的目标通道,对于Cr和Cb选择较少数量的目标通道。挑选的目标通道则对应目标通道中的频域特征信息保留。Therefore, the reconstruction operation can be performed on the transformation features, and the target channel of the target area can be selected from the transformation features; wherein, FIG. 2 is a target channel selection path in a pixel area of the target image in the target detection method of some embodiments of the present disclosure. Schematic diagram, the target area can be, for example, a low-frequency area, and the target channel selection in the low-frequency area can be a predetermined number of target channel selections in the order shown by the arrow in Figure 2, or can be selected based on artificial prior information. Moreover, still taking the three (Y, Cr, Cb) images decomposed into the YUV color space as an example, the target channel allocation in the three images also needs to be selected, for example, in the three (Y, Cr, Cb) images , because the human eye is much more sensitive to the recognition of brightness (Y) than to the recognition of chroma (Cr, Cb), therefore, based on the contribution of the three components to the image feature information, the importance of Y is higher than that of Cr and Cb. When performing target channel selection in the pixel areas of the three images, select a larger number of target channels for Y, and select a smaller number of target channels for Cr and Cb. The selected target channel corresponds to the retention of frequency-domain feature information in the target channel.
S105、根据目标图像中目标通道的频域特征信息,进行物体检测。S105. Perform object detection according to the frequency-domain feature information of the target channel in the target image.
然后,可以根据挑选的目标图像中目标通道的频域特征信息,进行物体检测,由于目标图像中的目标通道经过优化选择,目标图像中能够反映目标特征的信息得到了有选择的保留,因此对目标图像进行物体检测时,能够提升图像中小尺寸物体的检测性能,而且,由于目标图像得到了有效的缩小,降低了图像处理的计算量,也就能够缩短对于图像中目标物检测时消耗的时间和降低计算资源的占用。Then, object detection can be performed according to the frequency-domain feature information of the target channel in the selected target image. Since the target channel in the target image is optimized and selected, the information that can reflect the target feature in the target image is selectively retained. Therefore, the When the target image is used for object detection, the detection performance of small-sized objects in the image can be improved. Moreover, since the target image is effectively reduced, the calculation amount of image processing is reduced, and the time consumed for detecting the target object in the image can be shortened. and reduce computing resource usage.
本公开实施例提供的一种目标检测方法,可以包括对输入的原始彩色图像进行颜色编码,得到YUV色彩空间的多个图像;对多个图像中的目标图像进行像素区域的划分,得到目标图像对应的多个像素区域;对每个像素区域进行离散余弦变换,得到目标图像的变换特征;从变换特征中选择目标区域例如低频区域的目标通道;根据目标图像中目标通道的频域特征信息,进行物体检测。通过获取目标图像的变换特征,并对变换特征的选择保留,能够将目标图像中更有价值和信息量更丰富的特征进行保留,从而在不增加运算时间和程序占用的基础上,有效地提高对于图像中物体检测的准确性,无需对图像进行缩小处理而损失图像中的小物体信息,也无需进行编码解码而导致运算量巨大。A target detection method provided by an embodiment of the present disclosure may include color coding the input original color image to obtain multiple images in the YUV color space; dividing the target image in the multiple images into pixel regions to obtain the target image Corresponding multiple pixel areas; Discrete cosine transform is performed on each pixel area to obtain the transformation feature of the target image; Select the target area such as the target channel of the low frequency area from the transformation feature; According to the frequency domain feature information of the target channel in the target image, Perform object detection. By obtaining the transformation features of the target image and selecting and retaining the transformation features, it is possible to retain more valuable and informative features in the target image, thereby effectively improving the performance without increasing computing time and program occupation. For the accuracy of object detection in the image, there is no need to reduce the size of the image to lose the information of small objects in the image, and there is no need to perform encoding and decoding, resulting in a huge amount of calculation.
在本公开的一些可选的实施例中,可以对多个图像中的目标图像进行像素区域的划分包括:划分得到的每个像素区域可以包括N*N个像素单元,其中,N为大于0的正整数。In some optional embodiments of the present disclosure, dividing the pixel area of the target image in the plurality of images includes: each divided pixel area may include N*N pixel units, where N is greater than 0 positive integer of .
在本公开的一些可选的实施例中,每个像素区域可以包括8*8个像素单元。In some optional embodiments of the present disclosure, each pixel area may include 8*8 pixel units.
采用这种区域划分方式,划分得到的像素区域的横向和纵向的像素单元数量相同,以此为基本单位,便于后续步骤中的计算处理,采用每个像素区域包括8*8个像素单元作为基本单元,更有利于降低计算的复杂性。Using this method of area division, the number of horizontal and vertical pixel units in the divided pixel area is the same, which is used as the basic unit to facilitate the calculation and processing in subsequent steps. Each pixel area includes 8*8 pixel units as the basic unit unit, which is more conducive to reducing the complexity of calculation.
需要说明的是,以下的说明中,均以每个像素区域为8*8个像素单元的划分方式进行举例和说明。It should be noted that, in the following description, each pixel area is divided into 8*8 pixel units for example and description.
图3是本公开实施例提供的一种目标检测方法中,步骤S104的一种实施方式的流程图,如图3所示,S104、从变换特征中选择目标区域的目标通道可以包括:FIG. 3 is a flow chart of an implementation of step S104 in a target detection method provided by an embodiment of the present disclosure. As shown in FIG. 3 , S104, selecting a target channel of a target region from transformation features may include:
S1041、根据变换特征,采用通道选择网络进行检测,得到目标通道,通道选择网络为预先根据样本图像的变换特征进行训练得到的网络模型,样本图像和目标图像为相同编码格式的图像。S1041. According to the transformation feature, use a channel selection network for detection to obtain the target channel. The channel selection network is a network model trained in advance according to the transformation feature of the sample image, and the sample image and the target image are images of the same encoding format.
在本公开的一些可选的实施例中,在执行步骤S104时,从变换特征中选择目标区域的目标通道,可以预先进行自适应训练以便实时修正,可选地,预先选取与目标图像可以采用相同格式的预设图像作为样本图像,根据该样本图像的变换特征进行训练以得到网络模型,在对目标图像的变换特征中选择目标区域的目标通道时,可以根据变换特征,采用预先训练的通道选择网络与目标图像对应进行检测,这样得到的目标通道,能够保留更有价值、有信息量的特征信息。In some optional embodiments of the present disclosure, when step S104 is performed, the target channel of the target region is selected from the transformation features, and adaptive training can be performed in advance for real-time correction. Optionally, the pre-selected target image can be used The preset image of the same format is used as a sample image, and the network model is obtained by training according to the transformation characteristics of the sample image. When selecting the target channel of the target area in the transformation characteristics of the target image, the pre-trained channel can be used according to the transformation characteristics The network is selected to correspond to the target image for detection, so that the obtained target channel can retain more valuable and informative feature information.
在本公开的一些可选的实施例中,通道选择网络可以包括:池化层、卷积处理层、激活函数层及采样层。In some optional embodiments of the present disclosure, the channel selection network may include: a pooling layer, a convolution processing layer, an activation function layer, and a sampling layer.
图4是本公开实施例提供的一种目标检测方法中,步骤S1041的一种实施方式的流程图,如图4所示,S1041、根据变换特征,采用通道选择网络进行检测,得到目标通道,通道选择网络为预先根据样本图像的变换特征进行训练得到的网络模型,样本图像和目标图像为相同编码格式的图像,可以包括:Fig. 4 is a flow chart of an implementation of step S1041 in a target detection method provided by an embodiment of the present disclosure. As shown in Fig. 4, S1041, according to the transformation characteristics, use a channel selection network for detection to obtain the target channel, The channel selection network is a network model trained in advance according to the transformation characteristics of the sample image. The sample image and the target image are images of the same encoding format, which can include:
S10411、采用池化层,对变换特征中的各通道的特征值进行全局平均池化,得到池化特征。S10411. Using a pooling layer, perform global average pooling on the feature values of each channel in the transformed feature to obtain a pooled feature.
可以采用池化层对变换特征进行全局平均池化处理,包括对变换特征中的各个通道的特征值进行全局平均池化,以得到池化后的特征,如1*1*64的特征。The pooling layer can be used to perform global average pooling processing on the transformed features, including performing global average pooling on the feature values of each channel in the transformed features to obtain pooled features, such as 1*1*64 features.
S10412、采用卷积处理层,对池化特征进行卷积处理,得到卷积特征。S10412. Use a convolution processing layer to perform convolution processing on the pooled features to obtain convolution features.
可以采用卷积处理层,对池化后的特征再进行卷积处理,从而得到卷积特征。The convolution processing layer can be used to perform convolution processing on the pooled features to obtain convolution features.
S10413、采用激活函数层,对卷积特征进行处理,得到概率特征。S10413. Using an activation function layer to process the convolution features to obtain probability features.
可以采用激活函数层对卷积特征进行处理,得到该池化后的特征的概率向量,作为概率特征。该激活函数层可以为sigmoid函数层。The convolution feature can be processed by the activation function layer, and the probability vector of the pooled feature can be obtained as the probability feature. The activation function layer may be a sigmoid function layer.
S10414、采用采样层,根据概率特征对与目标图像对应的通道进行采样处理,得到目标通道。S10414. Using the sampling layer, sampling the channel corresponding to the target image according to the probability feature to obtain the target channel.
继而可以采用采样层将概率特征中的部分通道的概率值置1,以表示通道被保留,将另一部分通道的概率值置0,以表示通道被丢弃,确定概率特征中置1的通道为目标通道。 概率特征中的每个数为0~1中的概率值,概率值用于表示特征所在的通道被保留的概率。采样层可以为gumbelsoftmax采样层。Then the sampling layer can be used to set the probability value of some channels in the probability feature to 1 to indicate that the channel is retained, and set the probability value of another part of the channel to 0 to indicate that the channel is discarded, and determine the channel set to 1 in the probability feature as the target aisle. Each number in the probability feature is a probability value from 0 to 1, and the probability value is used to indicate the probability that the channel where the feature is located is retained. The sampling layer can be a gumbelsoftmax sampling layer.
由于该采样后的概率特征中目标通道的概率值为1,而其它通道的概率值为0,可以通过对采样后的概率特征和变换特征进行相乘,得到该目标通道的频域特征信息。Since the probability value of the target channel in the sampled probability feature is 1, and the probability value of other channels is 0, the frequency domain feature information of the target channel can be obtained by multiplying the sampled probability feature and the transformation feature.
注意力机制(Attention Mechanism)源于对人类视觉的研究。在认知科学中,由于信息处理的瓶颈,人类会选择性地关注所有信息的一部分,同时忽略其他可见的信息,这一机制通常被称为注意力机制。人类视网膜不同的部位具有不同程度的信息处理能力,即敏锐度(Acuity),只有视网膜中央凹部位具有最强的敏锐度。为了合理利用有限的视觉信息处理资源,人类需要选择视觉区域中的特定部分,然后集中关注它。注意力机制主要有两个方面:决定需要关注输入的哪部分;分配有限的信息处理资源给重要的部分。Attention Mechanism originated from the study of human vision. In cognitive science, due to the bottleneck of information processing, humans will selectively focus on a part of all information while ignoring other visible information. This mechanism is often called the attention mechanism. Different parts of the human retina have different degrees of information processing ability, that is, acuity, and only the fovea of the retina has the strongest acuity. In order to rationally utilize limited visual information processing resources, humans need to select a specific part in the visual area and then focus on it. The attention mechanism has two main aspects: deciding which part of the input needs to be paid attention to; and allocating limited information processing resources to important parts.
基于此,资源集中于关注点处,能够更有效率的集中利用资源,因此,能够对目标通道进行有目标有选择的保留,进而提高目标图像的缩小处理后进行目标图像中物体检测的准确性。Based on this, resources are concentrated at the point of interest, and resources can be used more efficiently. Therefore, the target channel can be reserved selectively, and the accuracy of object detection in the target image can be improved after the target image is reduced. .
图5是本公开另一些实施例提供的一种目标检测方法的流程图,如图5所示,步骤S101、对输入的原始彩色图像进行颜色编码,得到YUV色彩空间的多个图像还可以包括:Fig. 5 is a flowchart of a target detection method provided by other embodiments of the present disclosure. As shown in Fig. 5, step S101, color coding the input original color image, and obtaining multiple images in the YUV color space may also include :
S1011、对YUV色彩空间的多个图像像素的像素值分别减去预设值。S1011. Subtract preset values from the pixel values of the multiple image pixels in the YUV color space.
在本公开的一些可选的实施例中,步骤S101、对输入的原始彩色图像进行颜色编码,得到YUV色彩空间的多个图像还可以包括:In some optional embodiments of the present disclosure, step S101, performing color coding on the input original color image to obtain multiple images in YUV color space may also include:
S1011、对YUV色彩空间的多个图像像素的像素值分别减去预设值。S1011. Subtract preset values from the pixel values of the multiple image pixels in the YUV color space.
在对输入的原始彩色图像进行颜色编码,将输入的RGB原始图像分解为YUV色彩空间的三幅(Y,Cr,Cb)图像之后,还可以包括对YUV色彩空间的多个图像像素的像素值分别减去127。对每一个图像像素的像素值均进行减127(在此示例中,127即为预设值)的左移操作,保证每个8*8分块的对称。After color-coding the input original color image and decomposing the input RGB original image into three (Y, Cr, Cb) images of the YUV color space, the pixel values of multiple image pixels of the YUV color space can also be included Subtract 127 respectively. The pixel value of each image pixel is subtracted by 127 (in this example, 127 is the default value) left shift operation to ensure the symmetry of each 8*8 block.
在本公开的一些可选的实施例中,多个图像可以包括:Y分量图像、U分量图像、V分量图像;目标图像可以包括:Y分量图像。In some optional embodiments of the present disclosure, the multiple images may include: a Y component image, a U component image, and a V component image; the target image may include: a Y component image.
在对输入的原始彩色图像进行颜色编码,将输入的RGB原始图像分解为YUV色彩空间的三幅(Y,Cr,Cb)图像之后,Y分量图像用于表示色彩空间图像的灰阶,U分量图像和V分量图像表达的是色彩空间图像的色彩和饱和度,由于各分量图像中,用于表示灰阶的Y分量图像相对于U分量图像和V分量图像来说,对于色彩空间图像视觉质量的贡献更高,因此,目标图像可以包括Y分量图像,即可满足对于目标图像的缩小处理后进行目标图像中物体检测所需的准确性。After color-coding the input original color image and decomposing the input RGB original image into three (Y, Cr, Cb) images of the YUV color space, the Y component image is used to represent the gray scale of the color space image, and the U component The image and the V component image express the color and saturation of the color space image, because in each component image, the Y component image used to represent the grayscale has a greater impact on the visual quality of the color space image than the U component image and the V component image. The contribution of is higher, therefore, the target image may include the Y component image, which can meet the accuracy required for object detection in the target image after the reduction processing of the target image.
在本公开的一些可选的实施例中,目标图像还可以包括:U分量图像、V分量图像。 即,目标图像包括Y分量图像、U分量图像和V分量图像,目标图像包含了整个色彩空间图像中三个分量的图像,从而,在对目标图像进行缩小处理后,即使对于在图像视觉质量上贡献相对不高的U分量图像和V分量图像也能够进行处理和有选择的保留,进而提高目标图像的缩小处理后进行目标图像中物体检测的准确性。In some optional embodiments of the present disclosure, the target image may further include: a U component image and a V component image. That is, the target image includes a Y component image, a U component image, and a V component image, and the target image includes the images of the three components in the entire color space image. Therefore, after the target image is reduced, even if the visual quality of the image is The U component image and the V component image with relatively low contribution can also be processed and selectively retained, thereby improving the accuracy of object detection in the target image after the reduction processing of the target image.
在目标图像包括Y分量图像、U分量图像和V分量图像的前提下,与之相配合的,在本公开的一些可选的实施例中,图6是本公开实施例提供的一种目标检测方法中,步骤S104的另一种实施方式的流程图,如图6所示,S104、从变换特征中选择目标区域的目标通道可以包括:On the premise that the target image includes a Y component image, a U component image, and a V component image, in some optional embodiments of the present disclosure, FIG. 6 is a target detection method provided by an embodiment of the present disclosure. In the method, the flow chart of another embodiment of step S104, as shown in FIG. 6, S104, selecting the target channel of the target area from the transformation feature may include:
S1042、从Y分量图像的变换特征中选择第一预设数量的低频通道作为Y分量低频通道。S1042. Select a first preset number of low-frequency channels from the transformation characteristics of the Y-component image as the Y-component low-frequency channels.
S1043、从U分量图像的变换特征中选择第二预设数量的低频通道作为U分量低频通道。S1043. Select a second preset number of low-frequency channels from the transformation features of the U-component image as U-component low-frequency channels.
S1044、从V分量图像的变换特征中选择第三预设数量的低频通道作为V分量低频通道;其中,第一预设数量大于第二预设数量,且大于第三预设数量。S1044. Select a third preset number of low-frequency channels from the transformation characteristics of the V-component image as V-component low-frequency channels; wherein, the first preset number is greater than the second preset number and greater than the third preset number.
在目标图像包括Y分量图像、U分量图像和V分量图像时,执行步骤S104、从变换特征中选择目标区域的目标通道时,可以包括分别对从Y分量图像的变换特征、从U分量图像的变换特征以及从V分量图像的变换特征中选择低频通道,以分别作为Y分量低频通道、U分量低频通道和V分量低频通道。而且,在执行步骤S1042、从Y分量图像的变换特征中选择第一预设数量的低频通道作为Y分量低频通道中,选择的低频通道数量为第一预设数量,在执行步骤S1043、从U分量图像的变换特征中选择第二预设数量的低频通道作为U分量低频通道中,选择的低频通道数量为第二预设数量,在执行步骤S1044、从V分量图像的变换特征中选择第三预设数量的低频通道作为V分量低频通道中,选择的低频通道数量为第三预设数量,而且,第一预设数量大于第二预设数量且大于第三预设数量,即,在执行对于Y分量图像、U分量图像和V分量图像的变换特征中选择低频通道时,仍然遵循前述的Y分量图像对于色彩空间图像视觉质量的贡献更高的原则,在执行步骤S1042时,对于从Y分量图像的变换特征中选择低频通道作为Y分量低频通道的第一预设数量大于对于U分量图像和V分量图像的变换特征的低频通道的选择数量。When the target image includes a Y component image, a U component image, and a V component image, when performing step S104 and selecting the target channel of the target area from the transformation features, it may include transforming the features from the Y component image and the U component image respectively. transform features and select low frequency channels from the transform features of the V component image as the Y component low frequency channel, the U component low frequency channel and the V component low frequency channel respectively. Moreover, in performing step S1042, selecting a first preset number of low-frequency channels from the transformation characteristics of the Y component image as the Y component low-frequency channel, the selected low-frequency channel quantity is the first preset number, and performing step S1043, from U Select the second preset number of low-frequency channels in the transformation feature of the component image as the U component low-frequency channel, the number of selected low-frequency channels is the second preset number, and in step S1044, select the third from the transformation feature of the V component image. The preset number of low-frequency channels is used as the V component low-frequency channel, and the selected number of low-frequency channels is the third preset number, and the first preset number is greater than the second preset number and greater than the third preset number, that is, when executing When selecting the low-frequency channel in the transformation characteristics of the Y component image, U component image and V component image, still follow the principle that the aforementioned Y component image contributes more to the visual quality of the color space image. When performing step S1042, for the The first preset number of low-frequency channels selected as the low-frequency channels of the Y component in the transformation characteristics of the component images is greater than the selected number of low-frequency channels for the transformation characteristics of the U component images and the V component images.
图7是本公开一些实施例提供的一种目标检测方法中步骤S105的一种实施方式的流程图,如图7所示,S105、根据目标图像中目标通道的频域特征信息,进行物体检测,可以包括:Fig. 7 is a flow chart of an implementation of step S105 in a target detection method provided by some embodiments of the present disclosure. As shown in Fig. 7, S105, perform object detection according to the frequency-domain feature information of the target channel in the target image , which can include:
S1051、将目标通道的频域特征信息,输入至预先训练的频域检测网络中的预设下采样层进行处理,得到目标物体的信息。S1051. Input the frequency-domain characteristic information of the target channel into a preset downsampling layer in the pre-trained frequency-domain detection network for processing, and obtain information of the target object.
在本公开的一些可选的实施例中,在根据目标图像中目标通道的频域特征信息进行物体检测的可选方式可以为,在预先训练的频域检测网络中的预设下采样层输入目标通道的频域特征信息并进行处理,以得到目标物体的信息。In some optional embodiments of the present disclosure, an optional way to perform object detection according to the frequency-domain feature information of the target channel in the target image may be to input the preset down-sampling layer in the pre-trained frequency-domain detection network The frequency-domain feature information of the target channel is processed to obtain the information of the target object.
在本公开的一些可选的实施例中,可以使用相关的主流检测网络,如Faster RCNN、Retinanet等来实现在频域特征上进行检测,可选的做法即是根据得到的频域特征信息拼接到原频域检测网络中四倍下采样处作为输入。频域特征信息检测可以直接使用四倍大小的原始图像作为输入,从而可以保留图像中更多的小物体信息,所以对于小物体的检测性能更好,同时可以直接使用如4K摄像机这种高数据信息量的图像作为输入,而不必预先对图像进行缩小处理。In some optional embodiments of the present disclosure, related mainstream detection networks, such as Faster RCNN, Retinanet, etc., can be used to implement detection on frequency-domain features. The optional method is to splice the obtained frequency-domain feature information The quadruple downsampling in the original frequency domain detection network is used as input. Frequency domain feature information detection can directly use four times the size of the original image as input, so that more small object information in the image can be retained, so the detection performance for small objects is better, and high data such as 4K cameras can be directly used The image with the amount of information is used as input, without having to shrink the image in advance.
图8是本公开实施例提供的一种目标检测装置的示意图,如图8所示,本公开实施例的又一方面,提供了一种目标检测装置100,目标检测装置100可以包括:FIG. 8 is a schematic diagram of a target detection device provided by an embodiment of the present disclosure. As shown in FIG. 8 , in another aspect of the embodiments of the present disclosure, a target detection device 100 is provided. The target detection device 100 may include:
编码模块110,被配置成对输入的原始彩色图像进行颜色编码,得到YUV色彩空间的多个图像。The encoding module 110 is configured to perform color encoding on the input original color image to obtain multiple images in the YUV color space.
区域划分模块120,被配置成对多个图像中的目标图像进行像素区域的划分,得到目标图像对应的多个像素区域。The region division module 120 is configured to divide the pixel region of the target image in the multiple images to obtain multiple pixel regions corresponding to the target image.
变换模块130,被配置成对每个像素区域进行离散余弦变换,得到目标图像的变换特征。The transformation module 130 is configured to perform discrete cosine transformation on each pixel region to obtain transformation features of the target image.
特征选择模块140,被配置成从变换特征中选择目标区域的目标通道。The feature selection module 140 is configured to select a target channel of the target region from the transformed features.
检测模块150,被配置成根据目标图像中目标通道的频域特征信息,进行物体检测。The detection module 150 is configured to perform object detection according to the frequency-domain feature information of the target channel in the target image.
在本公开的一些可选的实施方式中,目标检测装置通过获取目标图像的变换特征,并对变换特征的选择保留,能够将目标图像中更有价值和信息量更丰富的特征进行保留,从而在不增加运算时间和程序占用的基础上,有效地提高对于图像中物体检测的准确性,无需对图像进行缩小处理而损失图像中的小物体信息,也无需进行编码解码导致运算量巨大。In some optional implementations of the present disclosure, the target detection device can retain the more valuable and informative features in the target image by acquiring the transformation features of the target image and selecting and retaining the transformation features, so that On the basis of not increasing the computing time and program occupation, it can effectively improve the accuracy of object detection in the image, without reducing the size of the image to lose the information of small objects in the image, and without encoding and decoding, resulting in a huge amount of computation.
以上这些模块可以是被配置成实施以上方法的一个或多个集成电路,例如:一个或更多个特定集成电路(Application Specific Integrated Circuit,简称ASIC),或,一个或更多个微处理器(digital singnal processor,简称DSP),或,一个或者更多个现场可编程门阵列(Field Programmable Gate Array,简称FPGA)等。再如,当以上某个模块通过处理元件调度程序代码的形式实现时,该处理元件可以是通用处理器,例如中央处理器(Central Processing Unit,简称CPU)或其它可以调用程序代码的处理器。再如,这些模块可以集成在一起,以片上系统(system-on-a-chip,简称SOC)的形式实现。The above modules may be one or more integrated circuits configured to implement the above method, for example: one or more specific integrated circuits (Application Specific Integrated Circuit, referred to as ASIC), or, one or more microprocessors ( digital signal processor (DSP for short), or, one or more Field Programmable Gate Arrays (Field Programmable Gate Array, FPGA for short), etc. For another example, when one of the above modules is implemented in the form of a processing element scheduler code, the processing element may be a general-purpose processor, such as a central processing unit (Central Processing Unit, referred to as CPU) or other processors that can call program codes. For another example, these modules can be integrated together and implemented in the form of a system-on-a-chip (SOC for short).
在本公开的一些可选的实施方式中,特征选择模块140,可选地被配置成根据变换特征,采用预先训练的目标图像对应的通道选择网络进行检测,得到目标通道,通道选择网络可以为预先根据样本图像的变换特征进行训练得到的网络模型,样本图像和目标图像可以为 相同编码格式的图像。In some optional implementations of the present disclosure, the feature selection module 140 is optionally configured to use the channel selection network corresponding to the pre-trained target image for detection according to the transformation feature, and obtain the target channel. The channel selection network can be The network model obtained by pre-training according to the transformation characteristics of the sample image, the sample image and the target image can be images of the same encoding format.
在本公开的一些可选的实施方式中,通道选择网络可以包括:池化层、卷积处理层、激活函数层及采样层。特征选择模块140可选地被配置成采用池化层,可以对变换特征中的各通道的特征值进行全局平均池化,得到池化特征;采用卷积处理层,可以对池化特征进行卷积处理,得到卷积特征;采用激活函数层,可以对卷积特征进行处理,得到概率特征;采用采样层,可以根据概率特征对与目标图像对应的通道进行采样处理,得到目标通道。In some optional implementation manners of the present disclosure, the channel selection network may include: a pooling layer, a convolution processing layer, an activation function layer, and a sampling layer. The feature selection module 140 is optionally configured to use a pooling layer, which can perform global average pooling on the feature values of each channel in the transformed feature to obtain a pooling feature; using a convolution processing layer, can perform convolution on the pooling feature. Product processing to obtain convolutional features; using activation function layer, convolutional features can be processed to obtain probability features; sampling layer can be used to sample the channel corresponding to the target image according to the probability feature to obtain the target channel.
在本公开的一些可选的实施方式中,区域划分模块120可选地被配置成对多个图像中的目标图像进行像素区域划分,划分得到的每个像素区域可以包括N*N个像素单元,其中,N为大于0的正整数。In some optional implementations of the present disclosure, the region division module 120 is optionally configured to perform pixel region division on the target image in the plurality of images, and each pixel region obtained by division may include N*N pixel units , where N is a positive integer greater than 0.
在本公开的一些可选的实施方式中,划分得到的每个像素区域可以包括8*8个像素单元。In some optional implementation manners of the present disclosure, each divided pixel area may include 8*8 pixel units.
在本公开的一些可选的实施方式中,编码模块110,可选地被配置成对输入的原始彩色图像进行颜色编码,得到YUV色彩空间的多个图像,还被配置成对YUV色彩空间的多个图像像素的像素值分别减去127。In some optional implementations of the present disclosure, the encoding module 110 is optionally configured to perform color encoding on the input original color image to obtain multiple images in the YUV color space, and is also configured to perform color encoding on the YUV color space Subtract 127 from the pixel values of multiple image pixels, respectively.
在本公开的一些可选的实施方式中,检测模块150,可选地被配置成将目标通道的频域特征信息,输入至预先训练的频域检测网络中的预设下采样层进行处理,得到目标物体的信息。In some optional implementations of the present disclosure, the detection module 150 is optionally configured to input the frequency-domain feature information of the target channel into a preset downsampling layer in a pre-trained frequency-domain detection network for processing, Get information about the target object.
在本公开的一些可选的实施方式中,多个图像可以包括Y分量图像、U分量图像和V分量图像。目标图像包括Y分量图像,在一些可选地实施方式中,目标图像还可以包括U分量图像、V分量图像。In some optional implementations of the present disclosure, the plurality of images may include a Y component image, a U component image, and a V component image. The target image includes a Y component image, and in some optional implementation manners, the target image may also include a U component image and a V component image.
在目标图像包括Y分量图像、U分量图像和V分量图像的情况下,特征选择模块140,可选地被配置成从Y分量图像的变换特征中选择第一预设数量的低频通道作为Y分量低频通道;从U分量图像的变换特征中选择第二预设数量的低频通道作为U分量低频通道;从V分量图像的变换特征中选择第三预设数量的低频通道作为V分量低频通道;其中,第一预设数量大于第二预设数量,且大于第三预设数量。In the case where the target image includes a Y component image, a U component image, and a V component image, the feature selection module 140 is optionally configured to select a first preset number of low-frequency channels as the Y component from the transformation features of the Y component image Low-frequency channel; select the low-frequency channel of the second preset number as the U component low-frequency channel from the transformation feature of the U component image; select the low-frequency channel of the third preset number as the V component low-frequency channel from the transformation feature of the V component image; wherein , the first preset number is larger than the second preset number and larger than the third preset number.
上述装置被配置成执行前述实施例提供的方法,其实现原理和技术效果类似,在此不再赘述。The above-mentioned apparatus is configured to execute the methods provided in the foregoing embodiments, and the implementation principles and technical effects thereof are similar, and details are not repeated here.
图9是本公开实施例提供的一种电子设备200的示意图,如图9所示,本公开实施例的另一方面,提供了一种电子设备200,可以包括:存储器201和处理器202,存储器201存储有处理器202可执行的计算机程序,处理器202调用存储器201存储的程序,执行上述任一项的目标检测方法的实施例。具体实现方式和技术效果类似,这里不再赘述。FIG. 9 is a schematic diagram of an electronic device 200 provided by an embodiment of the present disclosure. As shown in FIG. 9, another aspect of the embodiment of the present disclosure provides an electronic device 200, which may include: a memory 201 and a processor 202, The memory 201 stores a computer program executable by the processor 202, and the processor 202 invokes the program stored in the memory 201 to execute any one of the embodiments of the object detection method described above. The specific implementation manner and technical effect are similar, and will not be repeated here.
本公开实施例的再一方面,提供了一种计算机可读存储介质,存储介质上存储有计算机程序,计算机程序被读取并执行时,实现上述任一项的目标检测方法。Another aspect of the embodiments of the present disclosure provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is read and executed, any one of the object detection methods described above is implemented.
在本公开所提供的实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the embodiments provided in the present disclosure, it should be understood that the disclosed devices and methods may be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components can be combined or integrated. to another system, or some features may be ignored, or not implemented. In another point, the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。A unit described as a separate component may or may not be physically separated, and a component displayed as a unit may or may not be a physical unit, that is, it may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
另外,在本公开各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present disclosure may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit. The integrated unit can be implemented in the form of hardware, or in the form of hardware plus software functional units.
以软件功能单元的形式实现的集成的单元,可以存储在一个计算机可读取存储介质中。软件功能单元存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或处理器(英文:processor)执行本公开各个实施例所述方法的部分步骤。而前述的存储介质可以包括:U盘、移动硬盘、只读存储器(英文:Read-Only Memory,简称:ROM)、随机存取存储器(英文:Random Access Memory,简称:RAM)、磁碟或者光盘等各种可以存储程序代码的介质。Integrated units implemented in the form of software functional units may be stored in a computer-readable storage medium. The software functional unit is stored in a storage medium, including several instructions to enable a computer device (which may be a personal computer, server, or network device, etc.) or processor (English: processor) to execute the methods described in various embodiments of the present disclosure part of the steps. The aforementioned storage medium can include: U disk, mobile hard disk, read-only memory (English: Read-Only Memory, abbreviated: ROM), random access memory (English: Random Access Memory, abbreviated: RAM), magnetic disk or optical disc Various media that can store program codes.
以上所述仅为本公开的实施例而已,并不用于限制本公开的保护范围,对于本领域的技术人员来说,本公开可以有各种更改和变化。凡在本公开的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本公开的保护范围之内。The above descriptions are only examples of the present disclosure, and are not intended to limit the protection scope of the present disclosure. For those skilled in the art, the present disclosure may have various modifications and changes. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of the present disclosure shall be included within the protection scope of the present disclosure.
本公开提供了一种目标检测方法、装置、电子设备和存储介质,所述目标检测方法可以包括对输入的原始彩色图像进行颜色编码,得到YUV色彩空间的多个图像;对多个图像中的目标图像进行像素区域的划分,得到目标图像对应的多个像素区域;对每个像素区域进行离散余弦变换,得到目标图像的变换特征;从变换特征中选择目标区域的目标通道;根据目标图像中目标通道的频域特征信息,进行物体检测。能够无需通过编码解码的方式即可对图像进行缩小,且不影响缩小后的图像中对于目标物的检测性能。The present disclosure provides a target detection method, device, electronic equipment, and storage medium. The target detection method may include color coding an input original color image to obtain multiple images in a YUV color space; The target image is divided into pixel areas to obtain multiple pixel areas corresponding to the target image; discrete cosine transform is performed on each pixel area to obtain the transformation features of the target image; the target channel of the target area is selected from the transformation features; according to the target image The frequency-domain feature information of the target channel is used for object detection. The image can be reduced without encoding and decoding without affecting the detection performance of the target object in the reduced image.
此外,可以理解的是,本公开的目标检测方法、装置、电子设备和存储介质是可以重现的,并且可以用在多种工业应用中,例如,医学图像中的肿瘤检测。In addition, it can be understood that the object detection method, device, electronic device and storage medium of the present disclosure are reproducible and can be used in various industrial applications, for example, tumor detection in medical images.
Claims (16)
- 一种目标检测方法,其特征在于,包括:A target detection method, characterized in that, comprising:对输入的原始彩色图像进行颜色编码,得到YUV色彩空间的多个图像;Color-encode the input original color image to get multiple images in YUV color space;对所述多个图像中的目标图像进行像素区域的划分,得到所述目标图像对应的多个像素区域;dividing the target image in the plurality of images into pixel regions to obtain a plurality of pixel regions corresponding to the target image;对每个所述像素区域进行离散余弦变换,得到所述目标图像的变换特征;performing a discrete cosine transform on each of the pixel regions to obtain the transform features of the target image;从所述变换特征中选择目标区域的目标通道;selecting a target channel of a target region from said transformed features;根据所述目标图像中所述目标通道的频域特征信息,进行物体检测。Object detection is performed according to the frequency-domain feature information of the target channel in the target image.
- 根据权利要求1所述的方法,其特征在于,所述从所述变换特征中选择目标区域的目标通道包括:The method according to claim 1, wherein the selecting the target channel of the target region from the transformation features comprises:根据所述变换特征,采用通道选择网络进行检测,得到所述目标通道,所述通道选择网络为预先根据样本图像的变换特征进行训练得到的网络模型,所述样本图像和所述目标图像为相同编码格式的图像。According to the transformation characteristics, a channel selection network is used for detection to obtain the target channel, and the channel selection network is a network model obtained by training in advance according to the transformation characteristics of the sample image, and the sample image and the target image are the same An image in encoded format.
- 根据权利要求2所述的方法,其特征在于,所述通道选择网络包括:池化层、卷积处理层、激活函数层及采样层;The method according to claim 2, wherein the channel selection network comprises: a pooling layer, a convolution processing layer, an activation function layer, and a sampling layer;所述根据所述变换特征,采用预先训练的所述目标图像对应的通道选择网络进行检测,得到所述目标通道,包括:According to the transformation feature, the channel selection network corresponding to the pre-trained target image is used for detection, and the target channel is obtained, including:采用所述池化层,对所述变换特征中的各通道的特征值进行全局平均池化,得到池化特征;Using the pooling layer to perform global average pooling on the feature values of each channel in the transformation feature to obtain a pooling feature;采用所述卷积处理层,对所述池化特征进行卷积处理,得到卷积特征;Using the convolution processing layer to perform convolution processing on the pooled features to obtain convolution features;采用所述激活函数层,对所述卷积特征进行处理,得到概率特征;Using the activation function layer to process the convolution feature to obtain a probability feature;采用所述采样层,根据所述概率特征对与所述目标图像对应的通道进行采样处理,得到所述目标通道。The sampling layer is used to perform sampling processing on the channel corresponding to the target image according to the probability feature to obtain the target channel.
- 根据权利要求3所述的方法,其特征在于,所述激活函数层为sigmoid函数层。The method according to claim 3, wherein the activation function layer is a sigmoid function layer.
- 根据权利要求3所述的方法,其特征在于,所述采样层为gumbelsoftmax采样层。The method according to claim 3, wherein the sampling layer is a gumbelsoftmax sampling layer.
- 根据权利要求1至5中任一项所述的方法,其特征在于,所述对所述多个图像中的目标图像进行像素区域的划分包括:The method according to any one of claims 1 to 5, wherein said dividing the pixel area of the target image in the plurality of images comprises:划分得到的每个所述像素区域包括N*N个像素单元,其中,N为大于0的正整数。Each of the divided pixel regions includes N*N pixel units, where N is a positive integer greater than 0.
- 根据权利要求6所述的方法,其特征在于,每个所述像素区域包括8*8个像素单元。The method according to claim 6, wherein each of the pixel regions includes 8*8 pixel units.
- 根据权利要求1至7中任一项所述的方法,其特征在于,所述对输入的原始彩色图像进行颜色编码,得到YUV色彩空间的多个图像还包括:The method according to any one of claims 1 to 7, wherein said color coding the input original color image to obtain a plurality of images in YUV color space also includes:对所述YUV色彩空间的多个图像像素的像素值分别减去预定值。A predetermined value is respectively subtracted from the pixel values of the plurality of image pixels in the YUV color space.
- 根据权利要求1至8中任一项所述的方法,其特征在于,所述根据所述目标图像中所述目标通道的频域特征信息,进行物体检测,包括:The method according to any one of claims 1 to 8, wherein the object detection according to the frequency domain feature information of the target channel in the target image comprises:将所述目标通道的频域特征信息,输入至预先训练的频域检测网络中的预设下采样层进行处理,得到目标物体的信息。The frequency-domain feature information of the target channel is input to the preset downsampling layer in the pre-trained frequency-domain detection network for processing to obtain the information of the target object.
- 根据权利要求1至8中任一项所述的方法,其特征在于,所述根据所述目标图像中所述目标通道的频域特征信息,进行物体检测,包括:The method according to any one of claims 1 to 8, wherein the object detection according to the frequency domain feature information of the target channel in the target image comprises:将所述目标通道的频域特征信息,拼接到原频域检测网络中四倍下采样处作为输入,频域特征信息检测使用被放大四倍的原始图像作为输入。The frequency-domain feature information of the target channel is spliced to the quadruple downsampled part of the original frequency-domain detection network as input, and the frequency-domain feature information detection uses the original image enlarged by four times as input.
- 根据权利要求1至10中任一项所述的方法,其特征在于,所述多个图像包括:Y分量图像、U分量图像、V分量图像;The method according to any one of claims 1 to 10, wherein the plurality of images comprises: a Y component image, a U component image, and a V component image;所述目标图像包括:Y分量图像。The target image includes: a Y component image.
- 根据权利要求11所述的方法,其特征在于,所述目标图像还包括:U分量图像、V分量图像;The method according to claim 11, wherein the target image further comprises: a U component image, a V component image;所述从所述变换特征中选择目标区域的目标通道包括:The target channel for selecting the target region from the transformation features includes:从所述Y分量图像的变换特征中选择第一预设数量的低频通道作为Y分量低频通道;selecting a first preset number of low-frequency channels as Y-component low-frequency channels from the transformation characteristics of the Y-component image;从所述U分量图像的变换特征中选择第二预设数量的低频通道作为U分量低频通道;Selecting a second preset number of low-frequency channels as U-component low-frequency channels from the transformation characteristics of the U component image;从所述V分量图像的变换特征中选择第三预设数量的低频通道作为V分量低频通道;其中,所述第一预设数量大于所述第二预设数量,且大于所述第三预设数量。Select a third preset number of low-frequency channels from the transformation characteristics of the V component image as V component low-frequency channels; wherein, the first preset number is greater than the second preset number, and is greater than the third preset number Set the quantity.
- 一种目标检测装置,其特征在于,包括:A target detection device, characterized in that it comprises:编码模块,被配置成对输入的原始彩色图像进行颜色编码,得到YUV色彩空间的多个图像;An encoding module configured to color-encode the input original color image to obtain a plurality of images in the YUV color space;区域划分模块,被配置成对所述多个图像中的目标图像进行像素区域的划分,得到所述目标图像对应的多个像素区域;A region division module configured to divide the pixel regions of the target image in the plurality of images to obtain a plurality of pixel regions corresponding to the target image;变换模块,被配置成对每个所述像素区域进行离散余弦变换,得到所述目标图像的变换特征;A transform module configured to perform discrete cosine transform on each of the pixel regions to obtain transform features of the target image;特征选择模块,被配置成从所述变换特征中选择目标区域的目标通道;a feature selection module configured to select a target channel of a target region from the transformed features;检测模块,被配置成根据所述目标图像中所述目标通道的频域特征信息,进行物体 检测。The detection module is configured to perform object detection according to the frequency-domain feature information of the target channel in the target image.
- 根据权利要求13所述的目标检测装置,其特征在于,所述特征选择模块配置成:根据所述变换特征,采用通道选择网络进行检测,得到所述目标通道,所述通道选择网络为预先根据样本图像的变换特征进行训练得到的网络模型,所述样本图像和所述目标图像为相同编码格式的图像。The target detection device according to claim 13, wherein the feature selection module is configured to: use a channel selection network for detection according to the transformation feature to obtain the target channel, and the channel selection network is based on The network model obtained by training the transformed features of the sample image, the sample image and the target image are images in the same encoding format.
- 一种电子设备,其特征在于,包括:存储器和处理器,所述存储器存储有所述处理器可执行的计算机程序,所述处理器执行所述计算机程序时实现上述权利要求1至12任一项所述的目标检测方法。An electronic device, characterized by comprising: a memory and a processor, the memory stores a computer program executable by the processor, and when the processor executes the computer program, any one of claims 1 to 12 above is realized The target detection method described in the item.
- 一种计算机可读存储介质,其特征在于,所述存储介质上存储有计算机程序,所述计算机程序被读取并执行时,实现上述权利要求1至12中任一项所述的目标检测方法。A computer-readable storage medium, characterized in that a computer program is stored on the storage medium, and when the computer program is read and executed, the target detection method described in any one of claims 1 to 12 is realized .
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110721797.4 | 2021-06-28 | ||
CN202110721797.4A CN113591838B (en) | 2021-06-28 | 2021-06-28 | Target detection method, device, electronic equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023273515A1 true WO2023273515A1 (en) | 2023-01-05 |
Family
ID=78245021
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2022/086919 WO2023273515A1 (en) | 2021-06-28 | 2022-04-14 | Target detection method, apparatus, electronic device and storage medium |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN113591838B (en) |
WO (1) | WO2023273515A1 (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113591838B (en) * | 2021-06-28 | 2023-08-29 | 北京旷视科技有限公司 | Target detection method, device, electronic equipment and storage medium |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102710938A (en) * | 2012-05-08 | 2012-10-03 | 东莞中山大学研究院 | Method and device for video processing based on nonuniform DCT (discrete cosine transform) |
CN113591838A (en) * | 2021-06-28 | 2021-11-02 | 北京旷视科技有限公司 | Target detection method, target detection device, electronic equipment and storage medium |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109961009B (en) * | 2019-02-15 | 2023-10-31 | 平安科技(深圳)有限公司 | Pedestrian detection method, system, device and storage medium based on deep learning |
CN112347887B (en) * | 2020-10-28 | 2023-11-24 | 深圳市优必选科技股份有限公司 | Object detection method, object detection device and electronic equipment |
-
2021
- 2021-06-28 CN CN202110721797.4A patent/CN113591838B/en active Active
-
2022
- 2022-04-14 WO PCT/CN2022/086919 patent/WO2023273515A1/en active Application Filing
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102710938A (en) * | 2012-05-08 | 2012-10-03 | 东莞中山大学研究院 | Method and device for video processing based on nonuniform DCT (discrete cosine transform) |
CN113591838A (en) * | 2021-06-28 | 2021-11-02 | 北京旷视科技有限公司 | Target detection method, target detection device, electronic equipment and storage medium |
Non-Patent Citations (2)
Title |
---|
LIU CHUNYANG, WU ZE-MIN; HU LEI; LIU XI: "Pedestrian Detection Based on DCT of Multi-channel Feature", COMPUTER SCIENCE, KEXUE JISHU WENXIAN CHUBANSHE CHONGQING FENSHE, CN, vol. 44, no. 11A, 30 November 2017 (2017-11-30), CN , XP093018488, ISSN: 1002-137X * |
NEWCHENXF: "JPEG Compression Principle and DCT Discrete Cosine Transform", CSDN BLOG, pages 1 - 7, XP009542276, Retrieved from the Internet <URL:https://blog.csdn.net/newchenxf/article/details/51719597> * |
Also Published As
Publication number | Publication date |
---|---|
CN113591838A (en) | 2021-11-02 |
CN113591838B (en) | 2023-08-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110163080B (en) | Face key point detection method and device, storage medium and electronic equipment | |
CN111402130B (en) | Data processing method and data processing device | |
CN111369427A (en) | Image processing method, image processing device, readable medium and electronic equipment | |
CN110428382B (en) | Efficient video enhancement method and device for mobile terminal and storage medium | |
CN110084154B (en) | Method and device for rendering image, electronic equipment and computer readable storage medium | |
CN113344794B (en) | Image processing method and device, computer equipment and storage medium | |
CN111681177A (en) | Video processing method and device, computer readable storage medium and electronic equipment | |
CN114627034A (en) | Image enhancement method, training method of image enhancement model and related equipment | |
WO2024041235A1 (en) | Image processing method and apparatus, device, storage medium and program product | |
CN114519667A (en) | Image super-resolution reconstruction method and system | |
WO2023273515A1 (en) | Target detection method, apparatus, electronic device and storage medium | |
CN113284055A (en) | Image processing method and device | |
CN116630514A (en) | Image processing method, device, computer readable storage medium and electronic equipment | |
CN113066018A (en) | Image enhancement method and related device | |
JP2023131117A (en) | Joint perception model training, joint perception method, device, and medium | |
CN110473176B (en) | Image processing method and device, fundus image processing method and electronic equipment | |
CN113920023B (en) | Image processing method and device, computer readable medium and electronic equipment | |
AU2012268887A1 (en) | Saliency prediction method | |
CN114049491A (en) | Fingerprint segmentation model training method, fingerprint segmentation device, fingerprint segmentation equipment and fingerprint segmentation medium | |
CN110047126B (en) | Method, apparatus, electronic device, and computer-readable storage medium for rendering image | |
Wang et al. | Deep intensity guidance based compression artifacts reduction for depth map | |
US20240177409A1 (en) | Image processing method and apparatus, electronic device, and readable storage medium | |
CN115830362A (en) | Image processing method, apparatus, device, medium, and product | |
CN114399696A (en) | Target detection method and device, storage medium and electronic equipment | |
CN114299105A (en) | Image processing method, image processing device, computer equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 22831346 Country of ref document: EP Kind code of ref document: A1 |