CN114332456A - Target detection and identification method and device for large-resolution image - Google Patents
Target detection and identification method and device for large-resolution image Download PDFInfo
- Publication number
- CN114332456A CN114332456A CN202210255384.6A CN202210255384A CN114332456A CN 114332456 A CN114332456 A CN 114332456A CN 202210255384 A CN202210255384 A CN 202210255384A CN 114332456 A CN114332456 A CN 114332456A
- Authority
- CN
- China
- Prior art keywords
- image
- sub
- category
- information
- original image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 34
- 238000001514 detection method Methods 0.000 title claims description 18
- 238000004364 calculation method Methods 0.000 claims description 12
- 230000004927 fusion Effects 0.000 claims description 12
- 230000011218 segmentation Effects 0.000 claims description 11
- 238000007781 pre-processing Methods 0.000 claims description 9
- 238000012545 processing Methods 0.000 claims description 4
- 230000002708 enhancing effect Effects 0.000 claims description 3
- 238000010586 diagram Methods 0.000 description 7
- 238000011160 research Methods 0.000 description 3
- CURLTUGMZLYLDI-UHFFFAOYSA-N Carbon dioxide Chemical compound O=C=O CURLTUGMZLYLDI-UHFFFAOYSA-N 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 238000007499 fusion processing Methods 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 2
- 230000006978 adaptation Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 229910002092 carbon dioxide Inorganic materials 0.000 description 1
- 239000001569 carbon dioxide Substances 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000003709 image segmentation Methods 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Landscapes
- Image Analysis (AREA)
Abstract
The invention relates to the technical field of image recognition, in particular to a method and a device for detecting and recognizing a target of a high-resolution image, wherein the method comprises the following steps: acquiring a large-resolution image set, and performing data enhancement to obtain an enhanced image set; dividing each original image in the enhanced image set to obtain corresponding sub-images and position information thereof; coding and fusing the subimages and the position information thereof to obtain corresponding data tensors; performing feature representation learning on the data tensor layer by layer based on a Faster R-CNN model, fusing low-layer information, middle-layer information and high-layer information of the Faster R-CNN model by adopting an attention mechanism, determining feature representations corresponding to subimages, further determining candidate target positions, and performing regression and classification to determine a final target position and a category to which the final target position belongs; and determining the final target position and the category of the original image according to the final target position and the category of the original image. Through the scheme, the final model performance is improved.
Description
Technical Field
The invention relates to the technical field of image recognition, in particular to a method and a device for detecting and recognizing a target of a high-resolution image.
Background
With the rapid development of information technology, the convenience, high efficiency, safety and reliability brought by information processing make industrial informatization become the development trend of various industries. The image is a medium which is most ubiquitous in daily life and plays a key role in the information transfer process. Therefore, it is an important research content in the computer vision direction to efficiently and reliably utilize image information, and the image information attracts a great number of researchers.
In the early stage, because the semantic information of the image is complex, the traditional machine learning algorithm cannot fully understand the image information, and therefore the research is relatively simple. In recent years, the advent of deep learning, the improvement of computing performance, and the arrival of the era of big data brought about the fact that it is necessary to fully understand the use of image information, and many research subjects of fire and heat, such as image classification, image segmentation, object detection, face recognition, and re-recognition, have been generated around this, and have been largely successful in these directions.
It is noted, however, that although image studies have been successful in all directions with the continuous updating of learning algorithms, many study directions still face significant challenges in some special contexts. Including the problem of large resolution image object detection. The image is generally taken by professional equipment and has its fixed purpose, as distinguished from ordinary life pictures, such as satellite images or other aerial images. Such images often have their fixed role, e.g. for observing terrain, vegetation, water conservancy, or for military reconnaissance, re-orThe device is used for meteorological monitoring and the like. Taking a terrestrial satellite image as an example, the image needs to be detected when being used for observing the terrain, vegetation or water conservancy, and if the target is small, the common image target detection method does not work because the resolution of the image is too large, and firstly, the input of the common method is generally positioned at 102*102 - 103*103The size of the large resolution image is usually much larger than the size of the large resolution image, and if the size of the original data is simply scaled, a large amount of information is lost, and especially when the detected target is small, the target may even be lost; secondly, the size of the image with large resolution is larger, and the amount of rich information is larger, so that the proportion of a target area to a background area is smaller; thirdly, due to the special application background, the quantity of the images is small, a large amount of experimental data cannot be obtained, and the training of the model is not facilitated. Due to the factors, the original method cannot obtain good precision in the aspect of large-resolution image target detection, so that the normal performance requirement cannot be met.
Disclosure of Invention
In order to overcome the problems in the related art, the invention provides a method and a device for detecting and identifying a target of a high-resolution image.
According to a first aspect of the embodiments of the present invention, there is provided a method for detecting and identifying a target in a high-resolution image, the method including:
acquiring a large-resolution image set, and performing data enhancement on the large-resolution image set to obtain an enhanced image set;
dividing each original image in the enhanced image set to obtain corresponding sub-images and position information thereof;
coding and fusing the subimages and the position information thereof to obtain corresponding data tensors;
performing feature representation learning layer by layer on the data tensor based on a Faster R-CNN model, and fusing low-layer information, middle-layer information and high-layer information of the Faster R-CNN model by adopting an attention mechanism to determine feature representation corresponding to the subimages;
determining candidate target positions according to the feature representations corresponding to the sub-images, and performing regression and classification to determine the final target position of each sub-image and the category of the sub-image;
and determining the final target position and the category of the original image according to the final target position and the category of each sub-image.
In one embodiment, preferably, segmenting each image in the enhanced image set to obtain a corresponding sub-image and position information thereof includes:
dividing each original image by adopting a fixed window overlapping type dividing mode to obtain corresponding sub-images, and arranging the sub-images according to the sequence;
and performing data preprocessing on each sub-image, and determining the position information of the sub-image in the original image, wherein the position information comprises the coordinates of the center point of the sub-image in the original image and the width and height of the sub-image.
In one embodiment, preferably, the fusing the low-level information, the middle-level information and the high-level information of the Faster R-CNN model by using an attention mechanism to determine the corresponding feature representation of the sub-image includes:
respectively taking the low-layer information, the middle-layer information and the high-layer information of the Faster R-CNN model as Q, K and V in an attention mechanism, and calculating by adopting the following formula to determine the characteristic representation corresponding to the sub-image;
wherein Z represents the feature representation corresponding to the sub-image, Q represents a query item, K represents a key value, V represents a parameter value, and d represents a super parameter.
In one embodiment, preferably, determining the final target position of the original image and the category to which the final target position belongs according to the final target position of each sub-image and the category to which the final target position belongs includes:
and merging the final target positions of which the distances between the sub-images are smaller than a preset threshold value according to the position information of the sub-images in the original image to determine the final target position of the original image, and determining the category with the maximum category probability value as the category to which the original image belongs.
In one embodiment, the candidate target locations are preferably determined using the following first calculation:
wherein,indicating a loss of the candidate target location,which represents the parameters to be learned and,is shown asiA vector representation of the position of the individual candidate objects,is shown asiThe offset at which the candidate target location changes to the true target location,which represents the amount of the offset of the prediction,representing the regularization term, γ is a hyper-parameter;
regression and classification are performed using the following second calculation formula:
l 2 = L cls + λL loc (2)
wherein,l 2the sum of the losses of the regression and classification is represented,L cls representing classificationsThe loss of the carbon dioxide gas is reduced,L loc the loss of position is indicated and,λis a super parameter, and balances the two loss parts.
According to a second aspect of embodiments of the present invention, there is provided an object detection and recognition apparatus for a large-resolution image, the apparatus including:
the enhancement module is used for acquiring a large-resolution image set and enhancing data of the large-resolution image set to obtain an enhanced image set;
the segmentation module is used for segmenting each original image in the enhanced image set to obtain corresponding sub-images and position information thereof;
the processing module is used for coding and fusing the sub-images and the position information thereof to obtain corresponding data tensors;
the fusion module is used for performing feature representation learning layer by layer on the data tensor based on a Faster R-CNN model, and fusing low-layer information, middle-layer information and high-layer information of the Faster R-CNN model by adopting an attention mechanism so as to determine feature representation corresponding to the subimages;
the first determining module is used for determining candidate target positions according to the feature representations corresponding to the sub-images, and performing regression and classification to determine the final target position of each sub-image and the category of the sub-image;
and the second determining module is used for determining the final target position and the category of the original image according to the final target position and the category of each sub-image.
In one embodiment, preferably, the segmentation module includes:
the segmentation unit is used for segmenting each original image by adopting a fixed window overlapping segmentation mode to obtain corresponding sub-images and arranging the sub-images in sequence;
and the preprocessing unit is used for preprocessing data of each sub-image and determining the position information of the sub-image in the original image, wherein the position information comprises the coordinates of the center point of the sub-image in the original image and the width and height of the sub-image.
In one embodiment, preferably, the fusion module is configured to:
respectively taking the low-layer information, the middle-layer information and the high-layer information of the Faster R-CNN model as Q, K and V in an attention mechanism, and calculating by adopting the following formula to determine the characteristic representation corresponding to the sub-image;
wherein Z represents the feature representation corresponding to the sub-image, Q represents a query item, K represents a key value, V represents a parameter value, and d represents a super parameter.
In one embodiment, preferably, the second determining module is configured to:
and merging the final target positions of which the distances between the sub-images are smaller than a preset threshold value according to the position information of the sub-images in the original image to determine the final target position of the original image, and determining the category with the maximum category probability value as the category to which the original image belongs.
In one embodiment, the candidate target locations are preferably determined using the following first calculation:
wherein,indicating a loss of the candidate target location,which represents the parameters to be learned and,is shown asiA vector representation of the position of the individual candidate objects,is shown asiThe offset at which the candidate target location changes to the true target location,which represents the amount of the offset of the prediction,representing the regularization term, γ is a hyper-parameter;
regression and classification are performed using the following second calculation formula:
l 2 = L cls + λL loc (2)
wherein,l 2the sum of the losses of the regression and classification is represented,L cls a loss of classification is indicated and,L loc the loss of position is indicated and,λis a super parameter, and balances the two loss parts.
According to a third aspect of embodiments of the present invention, there is provided a computer readable storage medium having stored thereon computer instructions which, when executed by a processor, implement the steps of the method of any one of the first aspect.
The technical scheme provided by the embodiment of the invention can have the following beneficial effects:
the invention realizes target detection on a high-resolution image based on high-low layer information fusion, and compared with the previous method. Since the large-resolution image size is much larger than the normal image, the large-resolution image needs to be preprocessed in order to make the model feasible. If the mode of equal scaling is adopted, target information is likely to be lost, and in order to solve the problem, the invention adopts the window overlapping type to segment the image to obtain the sub-images of the original image, and the sub-images are sequentially used as the input of the model, so that the integrity of the information is effectively ensured; meanwhile, in order to avoid losing the position information of the sub-image in the original image, the invention additionally adds the position information on the characteristic representation to enhance the integrity of the spatial information of the sub-image; in addition, during feature learning, the method utilizes an attention mechanism to perform weighted fusion on high-layer information and low-layer information to obtain corresponding convolutional layer output for downstream tasks, and feature information is greatly enriched through the method, so that the final model performance is improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.
FIG. 1 is a flow diagram illustrating a method for object detection and recognition of a high resolution image according to an exemplary embodiment.
Fig. 2 is a flowchart illustrating a step S102 in a target detecting and recognizing method of a large-resolution image according to an exemplary embodiment.
FIG. 3 is a schematic diagram illustrating a method of object detection and recognition for a high resolution image according to an exemplary embodiment.
FIG. 4 is a block diagram illustrating an apparatus for object detection and recognition of a large resolution image according to an exemplary embodiment.
FIG. 5 is a block diagram illustrating an apparatus for object detection and recognition of a large resolution image according to an exemplary embodiment.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention, as detailed in the appended claims.
FIG. 1 is a flow diagram illustrating a method for object detection and recognition of a high resolution image, as shown in FIG. 1, according to an exemplary embodiment, the method comprising:
step S101, acquiring a large-resolution image set, and performing data enhancement on the large-resolution image set to obtain an enhanced image set;
due to the special background of the large-resolution picture, the data volume is far smaller than that of a common picture task, so that the problem of model training insufficiency caused by the insufficiency of the data volume is easily caused.
Step S102, segmenting each original image in the enhanced image set to obtain corresponding sub-images and position information thereof;
because the size of the original image is too large, the whole image cannot be input into the model at one time, and the information is easily lost by the traditional downsampling method.
Step S103, encoding and fusing the sub-images and the position information thereof to obtain corresponding data tensors;
step S104, based on a Faster R-CNN model, performing feature representation learning layer by layer on the data tensor, and fusing low-layer information, middle-layer information and high-layer information of the Faster R-CNN model by adopting an attention mechanism to determine feature representation corresponding to the subimages;
step S105, determining candidate target positions according to the feature representations corresponding to the sub-images, and performing regression and classification to determine the final target position of each sub-image and the category of the sub-image;
and step S106, determining the final target position and the category of the original image according to the final target position and the category of each sub-image.
The method is based on high-low layer information fusion, utilizes an attention mechanism and multi-layer weighted fusion as final feature representation. Since the large-resolution image size is much larger than the normal image, the large-resolution image needs to be preprocessed in order to make the model feasible. If the mode of equal scaling is adopted, target information is likely to be lost, and in order to solve the problem, the invention adopts the window overlapping type to segment the image to obtain the sub-images of the original image, and the sub-images are sequentially used as the input of the model, so that the integrity of the information is effectively ensured; meanwhile, in order to avoid losing the position information of the sub-image in the original image, the invention additionally adds the position information on the characteristic representation to enhance the integrity of the spatial information of the sub-image; in addition, during feature learning, the method utilizes an attention mechanism to perform weighted fusion on high-layer information and low-layer information to obtain corresponding convolutional layer output for downstream tasks, and feature information is greatly enriched through the method, so that the final model performance is improved.
Fig. 2 is a flowchart illustrating a step S102 in a target detecting and recognizing method of a large-resolution image according to an exemplary embodiment.
As shown in fig. 2, in one embodiment, preferably, the step S102 includes:
step S201, segmenting each original image by adopting a fixed window overlapping segmentation mode to obtain corresponding sub-images, and arranging the sub-images in sequence;
step S202, performing data preprocessing on each sub-image, and determining the position information of the sub-image in the original image, wherein the position information comprises the coordinates of the center point of the sub-image in the original image and the width and height of the sub-image.
In order to make the data information more accurate, the position information of the obtained sub-image in the original image is added, the position information comprises four coordinates of upper left, lower left, upper right and lower right, a four-dimensional vector is used for representing (x, y, w, h), wherein x and y are the coordinates of the center point of the sub-image in the original image, and w and h are the width and the height of the sub-image respectively.
In one embodiment, preferably, the fusing the low-level information, the middle-level information and the high-level information of the Faster R-CNN model by using an attention mechanism to determine the corresponding feature representation of the sub-image includes:
respectively taking the low-layer information, the middle-layer information and the high-layer information of the Faster R-CNN model as Q, K and V in an attention mechanism, and calculating by adopting the following formula to determine the characteristic representation corresponding to the sub-image;
wherein Z represents the feature representation corresponding to the sub-image, Q represents a query item, K represents a key value, V represents a parameter value, and d represents a super parameter.
In one embodiment, preferably, determining the final target position of the original image and the category to which the final target position belongs according to the final target position of each sub-image and the category to which the final target position belongs includes:
and merging the final target positions of which the distances between the sub-images are smaller than a preset threshold value according to the position information of the sub-images in the original image to determine the final target position of the original image, and determining the category with the maximum category probability value as the category to which the original image belongs.
In one embodiment, the candidate target locations are preferably determined using the following first calculation:
wherein,indicating a loss of the candidate target location,which represents the parameters to be learned and,is shown asiA vector representation of the position of the individual candidate objects,is shown asiThe offset at which the candidate target location changes to the true target location,which represents the amount of the offset of the prediction,representing the regularization term, γ is a hyper-parameter;
regression and classification are performed using the following second calculation formula:
l 2 = L cls + λL loc (2)
wherein,l 2the sum of the losses of the regression and classification is represented,L cls a loss of classification is indicated and,L loc the loss of position is indicated and,λis a super parameter, and balances the two loss parts.
The above technical solution of the present invention is described in detail with a specific embodiment, as shown in fig. 3, the Attention mechanism module effectively performs weighted fusion on the low-level and high-level information, mainly extracts three layers of representations, namely, low, medium and high-level representations, and uses the low, medium and high-level representations as q, k and v in the Attention mechanism, in this way, the general weighted fusion process is simplified, that is, it is not necessary to separately calculate corresponding weights for the low, medium and high-level representations, and then fuse the weighted information into one representation. By adopting the mode, the method greatly simplifies the original operation, fully utilizes high-low layer information and enriches characteristic representation, thereby providing a better representation for downstream tasks.
FIG. 4 is a block diagram illustrating an apparatus for object detection and recognition of a large resolution image according to an exemplary embodiment.
As shown in fig. 4, an object detecting and recognizing apparatus for a large resolution image, the apparatus comprising:
the enhancing module 41 is configured to obtain a large-resolution image set, and perform data enhancement on the large-resolution image set to obtain an enhanced image set;
a segmentation module 42, configured to segment each original image in the enhanced image set to obtain a corresponding sub-image and position information thereof;
a processing module 43, configured to perform encoding and fusion processing on the sub-image and the position information thereof to obtain a corresponding data tensor;
a fusion module 44, configured to perform feature representation learning layer by layer on the data tensor based on a Faster R-CNN model, and fuse low-layer information, middle-layer information, and high-layer information of the Faster R-CNN model by using an attention mechanism to determine feature representations corresponding to the subimages;
a first determining module 45, configured to determine a candidate target position according to the feature representation corresponding to each sub-image, and perform regression and classification to determine a final target position of each sub-image and a category to which the final target position belongs;
and a second determining module 46, configured to determine the final target position of the original image and the category to which the final target position belongs according to the final target position of each sub-image and the category to which the final target position belongs.
FIG. 5 is a block diagram illustrating an apparatus for object detection and recognition of a large resolution image according to an exemplary embodiment.
As shown in fig. 5, in one embodiment, the segmentation module 42 preferably includes:
a dividing unit 51, configured to divide each original image by using a fixed-window overlapping type dividing manner to obtain corresponding sub-images, and arrange the sub-images in sequence;
the preprocessing unit 52 is configured to perform data preprocessing on each sub-image, and determine position information of the sub-image in the original image, where the position information includes coordinates of a center point of the sub-image in the original image and a width and a height of the sub-image.
In one embodiment, preferably, the fusion module 44 is configured to:
respectively taking the low-layer information, the middle-layer information and the high-layer information of the Faster R-CNN model as Q, K and V in an attention mechanism, and calculating by adopting the following formula to determine the characteristic representation corresponding to the sub-image;
wherein Z represents the feature representation corresponding to the sub-image, Q represents a query item, K represents a key value, V represents a parameter value, and d represents a super parameter.
In one embodiment, preferably, the second determining module 46 is configured to:
and merging the final target positions of which the distances between the sub-images are smaller than a preset threshold value according to the position information of the sub-images in the original image to determine the final target position of the original image, and determining the category with the maximum category probability value as the category to which the original image belongs.
In one embodiment, the candidate target locations are preferably determined using the following first calculation:
wherein,indicating a loss of the candidate target location,which represents the parameters to be learned and,is shown asiA vector representation of the position of the individual candidate objects,is shown asiThe offset at which the candidate target location changes to the true target location,which represents the amount of the offset of the prediction,representing the regularization term, γ is a hyper-parameter;
regression and classification are performed using the following second calculation formula:
l 2 = L cls + λL loc (2)
wherein,l 2the sum of the losses of the regression and classification is represented,L cls a loss of classification is indicated and,L loc the loss of position is indicated and,λis a super parameter, and balances the two loss parts.
According to a third aspect of embodiments of the present invention, there is provided a computer readable storage medium having stored thereon computer instructions which, when executed by a processor, implement the steps of the method of any one of the first aspect.
According to a fourth aspect of the embodiments of the present invention, there is provided a target detection and recognition system based on a large-resolution image, the system including:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to:
acquiring a large-resolution image set, and performing data enhancement on the large-resolution image set to obtain an enhanced image set;
dividing each original image in the enhanced image set to obtain corresponding sub-images and position information thereof;
coding and fusing the subimages and the position information thereof to obtain corresponding data tensors;
performing feature representation learning layer by layer on the data tensor based on a Faster R-CNN model, and fusing low-layer information, middle-layer information and high-layer information of the Faster R-CNN model by adopting an attention mechanism to determine feature representation corresponding to the subimages;
determining candidate target positions according to the feature representations corresponding to the sub-images, and performing regression and classification to determine the final target position of each sub-image and the category of the sub-image;
and determining the final target position and the category of the original image according to the final target position and the category of each sub-image.
It is further understood that the term "plurality" means two or more, and other terms are analogous. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. The singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It will be further understood that the terms "first," "second," and the like are used to describe various information and that such information should not be limited by these terms. These terms are only used to distinguish one type of information from another and do not denote a particular order or importance. Indeed, the terms "first," "second," and the like are fully interchangeable. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present invention.
It is further to be understood that while operations are depicted in the drawings in a particular order, this is not to be understood as requiring that such operations be performed in the particular order shown or in serial order, or that all illustrated operations be performed, to achieve desirable results. In certain environments, multitasking and parallel processing may be advantageous.
Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.
It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.
Claims (10)
1. A method for object detection and recognition of a high resolution image, the method comprising:
acquiring a large-resolution image set, and performing data enhancement on the large-resolution image set to obtain an enhanced image set;
dividing each original image in the enhanced image set to obtain corresponding sub-images and position information thereof;
coding and fusing the subimages and the position information thereof to obtain corresponding data tensors;
performing feature representation learning layer by layer on the data tensor based on a Faster R-CNN model, and fusing low-layer information, middle-layer information and high-layer information of the Faster R-CNN model by adopting an attention mechanism to determine feature representation corresponding to the subimages;
determining candidate target positions according to the feature representations corresponding to the sub-images, and performing regression and classification to determine the final target position of each sub-image and the category of the sub-image;
and determining the final target position and the category of the original image according to the final target position and the category of each sub-image.
2. The method of claim 1, wherein segmenting each image in the enhanced image set to obtain a corresponding sub-image and its position information comprises:
dividing each original image by adopting a fixed window overlapping type dividing mode to obtain corresponding sub-images, and arranging the sub-images according to the sequence;
and performing data preprocessing on each sub-image, and determining the position information of the sub-image in the original image, wherein the position information comprises the coordinates of the center point of the sub-image in the original image and the width and height of the sub-image.
3. The method as claimed in claim 1, wherein fusing the low-level information, the middle-level information and the high-level information of the Faster R-CNN model using an attention mechanism to determine the corresponding feature representation of the sub-image comprises:
respectively taking the low-layer information, the middle-layer information and the high-layer information of the Faster R-CNN model as Q, K and V in an attention mechanism, and calculating by adopting the following formula to determine the characteristic representation corresponding to the sub-image;
wherein Z represents the feature representation corresponding to the sub-image, Q represents a query item, K represents a key value, V represents a parameter value, and d represents a super parameter.
4. The method of claim 1, wherein determining the final target position of the original image and the category to which the final target position belongs according to the final target position of each sub-image and the category to which the final target position belongs comprises:
and merging the final target positions of which the distances between the sub-images are smaller than a preset threshold value according to the position information of the sub-images in the original image to determine the final target position of the original image, and determining the category with the maximum category probability value as the category to which the original image belongs.
5. The method of claim 1, wherein the candidate target locations are determined using a first calculation:
wherein,indicating a loss of the candidate target location,which represents the parameters to be learned and,is shown asiA vector representation of the position of the individual candidate objects,is shown asiThe offset at which the candidate target location changes to the true target location,which represents the amount of the offset of the prediction,representing the regularization term, γ is a hyper-parameter;
regression and classification are performed using the following second calculation formula:
l 2 = L cls + λL loc (2)
wherein,l 2the sum of the losses of the regression and classification is represented,L cls a loss of classification is indicated and,L loc the loss of position is indicated and,λis a super parameter, and balances the two loss parts.
6. An apparatus for object detection and recognition of a high resolution image, the apparatus comprising:
the enhancement module is used for acquiring a large-resolution image set and enhancing data of the large-resolution image set to obtain an enhanced image set;
the segmentation module is used for segmenting each original image in the enhanced image set to obtain corresponding sub-images and position information thereof;
the processing module is used for coding and fusing the sub-images and the position information thereof to obtain corresponding data tensors;
the fusion module is used for performing feature representation learning layer by layer on the data tensor based on a Faster R-CNN model, and fusing low-layer information, middle-layer information and high-layer information of the Faster R-CNN model by adopting an attention mechanism so as to determine feature representation corresponding to the subimages;
the first determining module is used for determining candidate target positions according to the feature representations corresponding to the sub-images, and performing regression and classification to determine the final target position of each sub-image and the category of the sub-image;
and the second determining module is used for determining the final target position and the category of the original image according to the final target position and the category of each sub-image.
7. The apparatus of claim 6, wherein the segmentation module comprises:
the segmentation unit is used for segmenting each original image by adopting a fixed window overlapping segmentation mode to obtain corresponding sub-images and arranging the sub-images in sequence;
and the preprocessing unit is used for preprocessing data of each sub-image and determining the position information of the sub-image in the original image, wherein the position information comprises the coordinates of the center point of the sub-image in the original image and the width and height of the sub-image.
8. The apparatus of claim 6, wherein the fusion module is configured to:
respectively taking the low-layer information, the middle-layer information and the high-layer information of the Faster R-CNN model as Q, K and V in an attention mechanism, and calculating by adopting the following formula to determine the characteristic representation corresponding to the sub-image;
wherein Z represents the feature representation corresponding to the sub-image, Q represents a query item, K represents a key value, V represents a parameter value, and d represents a super parameter.
9. The apparatus of claim 6, wherein the second determining module is configured to:
and merging the final target positions of which the distances between the sub-images are smaller than a preset threshold value according to the position information of the sub-images in the original image to determine the final target position of the original image, and determining the category with the maximum category probability value as the category to which the original image belongs.
10. The apparatus of claim 6, wherein the candidate target locations are determined using a first calculation formula:
wherein,indicating a loss of the candidate target location,which represents the parameters to be learned and,is shown asiA vector representation of the position of the individual candidate objects,is shown asiThe offset at which the candidate target location changes to the true target location,which represents the amount of the offset of the prediction,representing the regularization term, γ is a hyper-parameter;
regression and classification are performed using the following second calculation formula:
l 2 = L cls + λL loc (2)
wherein,l 2the sum of the losses of the regression and classification is represented,L cls a loss of classification is indicated and,L loc the loss of position is indicated and,λis a super parameter, and balances the two loss parts.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210255384.6A CN114332456A (en) | 2022-03-16 | 2022-03-16 | Target detection and identification method and device for large-resolution image |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210255384.6A CN114332456A (en) | 2022-03-16 | 2022-03-16 | Target detection and identification method and device for large-resolution image |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114332456A true CN114332456A (en) | 2022-04-12 |
Family
ID=81033942
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210255384.6A Pending CN114332456A (en) | 2022-03-16 | 2022-03-16 | Target detection and identification method and device for large-resolution image |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114332456A (en) |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180158189A1 (en) * | 2016-12-07 | 2018-06-07 | Samsung Electronics Co., Ltd. | System and method for a deep learning machine for object detection |
CN108805064A (en) * | 2018-05-31 | 2018-11-13 | 中国农业大学 | A kind of fish detection and localization and recognition methods and system based on deep learning |
CN109886269A (en) * | 2019-02-27 | 2019-06-14 | 南京中设航空科技发展有限公司 | A kind of transit advertising board recognition methods based on attention mechanism |
CN111191730A (en) * | 2020-01-02 | 2020-05-22 | 中国航空工业集团公司西安航空计算技术研究所 | Method and system for detecting oversized image target facing embedded deep learning |
CN111507958A (en) * | 2020-04-15 | 2020-08-07 | 全球能源互联网研究院有限公司 | Target detection method, training method of detection model and electronic equipment |
CN112861982A (en) * | 2021-02-24 | 2021-05-28 | 佛山市南海区广工大数控装备协同创新研究院 | Long-tail target detection method based on gradient average |
CN113538331A (en) * | 2021-05-13 | 2021-10-22 | 中国地质大学(武汉) | Metal surface damage target detection and identification method, device, equipment and storage medium |
CN113989744A (en) * | 2021-10-29 | 2022-01-28 | 西安电子科技大学 | Pedestrian target detection method and system based on oversized high-resolution image |
-
2022
- 2022-03-16 CN CN202210255384.6A patent/CN114332456A/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180158189A1 (en) * | 2016-12-07 | 2018-06-07 | Samsung Electronics Co., Ltd. | System and method for a deep learning machine for object detection |
CN108805064A (en) * | 2018-05-31 | 2018-11-13 | 中国农业大学 | A kind of fish detection and localization and recognition methods and system based on deep learning |
CN109886269A (en) * | 2019-02-27 | 2019-06-14 | 南京中设航空科技发展有限公司 | A kind of transit advertising board recognition methods based on attention mechanism |
CN111191730A (en) * | 2020-01-02 | 2020-05-22 | 中国航空工业集团公司西安航空计算技术研究所 | Method and system for detecting oversized image target facing embedded deep learning |
CN111507958A (en) * | 2020-04-15 | 2020-08-07 | 全球能源互联网研究院有限公司 | Target detection method, training method of detection model and electronic equipment |
CN112861982A (en) * | 2021-02-24 | 2021-05-28 | 佛山市南海区广工大数控装备协同创新研究院 | Long-tail target detection method based on gradient average |
CN113538331A (en) * | 2021-05-13 | 2021-10-22 | 中国地质大学(武汉) | Metal surface damage target detection and identification method, device, equipment and storage medium |
CN113989744A (en) * | 2021-10-29 | 2022-01-28 | 西安电子科技大学 | Pedestrian target detection method and system based on oversized high-resolution image |
Non-Patent Citations (5)
Title |
---|
YUHUA C.等: "Domain Adaptive Faster R-CNN for Object Detection in the Wild", 《2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION》 * |
吴建鑫: "《模式识别》", 31 March 2020, 北京:机械工业出版社 * |
唐子惠 编著: "《医学人工智能导论》", 30 April 2020, 上海:上海科学技术出版社 * |
林刚 等: "基于改进Faster-RCNN的输电线巡检图像多目标检测及定位", 《电力自动化设备》 * |
赵杰 等: "《智能机器人技术:安保、巡逻、处置类警用机器人研究实践》", 31 January 2021, 北京:机械工业出版社 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN114119638B (en) | Medical image segmentation method integrating multi-scale features and attention mechanisms | |
CN113158862B (en) | Multitasking-based lightweight real-time face detection method | |
CN109859190B (en) | Target area detection method based on deep learning | |
CN110555481B (en) | Portrait style recognition method, device and computer readable storage medium | |
CN110414344B (en) | Character classification method based on video, intelligent terminal and storage medium | |
CN112150493A (en) | Semantic guidance-based screen area detection method in natural scene | |
CN111274994B (en) | Cartoon face detection method and device, electronic equipment and computer readable medium | |
CN113240716B (en) | Twin network target tracking method and system with multi-feature fusion | |
CN114724155A (en) | Scene text detection method, system and equipment based on deep convolutional neural network | |
CN116229192B (en) | ODConvBS-YOLOv s-based flame smoke detection method | |
CN113379771A (en) | Hierarchical human body analytic semantic segmentation method with edge constraint | |
CN113221731B (en) | Multi-scale remote sensing image target detection method and system | |
CN113743521B (en) | Target detection method based on multi-scale context awareness | |
Ouyang et al. | An anchor-free detector with channel-based prior and bottom-enhancement for underwater object detection | |
CN114332456A (en) | Target detection and identification method and device for large-resolution image | |
CN117975090A (en) | Character interaction detection method based on intelligent perception | |
Zhang et al. | HiT: Building Mapping with Hierarchical Transformers | |
CN113824989B (en) | Video processing method, device and computer readable storage medium | |
Sugang et al. | Object detection algorithm based on cosine similarity IoU | |
CN115240163A (en) | Traffic sign detection method and system based on one-stage detection network | |
CN115410089A (en) | Self-adaptive local context embedded optical remote sensing small-scale target detection method | |
CN116958615A (en) | Picture identification method, device, equipment and medium | |
CN114463628A (en) | Deep learning remote sensing image ship target identification method based on threshold value constraint | |
CN115424027B (en) | Image similarity comparison method, device and equipment for image foreground person | |
CN118053150B (en) | Supervision method based on text detail graph as end-to-end text detection and recognition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20220412 |