[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN109165654B - Training method of target positioning model and target positioning method and device - Google Patents

Training method of target positioning model and target positioning method and device Download PDF

Info

Publication number
CN109165654B
CN109165654B CN201810992851.7A CN201810992851A CN109165654B CN 109165654 B CN109165654 B CN 109165654B CN 201810992851 A CN201810992851 A CN 201810992851A CN 109165654 B CN109165654 B CN 109165654B
Authority
CN
China
Prior art keywords
model
loss function
image
convolution
foreground
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810992851.7A
Other languages
Chinese (zh)
Other versions
CN109165654A (en
Inventor
叶锦宇
刘玉明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jiuhu Times Intelligent Technology Co ltd
Original Assignee
Beijing Jiuhu Times Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jiuhu Times Intelligent Technology Co ltd filed Critical Beijing Jiuhu Times Intelligent Technology Co ltd
Priority to CN201810992851.7A priority Critical patent/CN109165654B/en
Publication of CN109165654A publication Critical patent/CN109165654A/en
Application granted granted Critical
Publication of CN109165654B publication Critical patent/CN109165654B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The application provides a training method of a target positioning model, which comprises the following steps: inputting the image sample to a convolution model to extract a first image feature of the image sample; inputting the first image feature to a segmentation model to generate first foreground coordinates of an image sample; inputting the first image feature to a regression model to generate second foreground coordinates of the image sample; calculating a model loss function according to the first foreground coordinate and the second foreground coordinate; and simultaneously training the convolution model, the segmentation model and the regression model according to the model loss function to generate a target positioning model consisting of the convolution model and the regression model. By adopting the scheme, the convolution model in the actual prediction stage obtains better image characteristics by training the convolution model, the segmentation model and the regression model, and meanwhile, the regression model is utilized to improve the image positioning speed.

Description

Training method of target positioning model and target positioning method and device
Technical Field
The present application relates to the field of image recognition technologies, and in particular, to a training method for a target location model, and a target location method and apparatus.
Background
In order to deal with a large amount of images to be recognized in financial credit and audit businesses, credit and audit personnel usually complete intelligent credit and audit (generally, auditing data such as an identity card, a bank card, a business license and the like of a user) by means of the image positioning and recognition technology, so that labor cost is saved, and production efficiency is improved.
The existing image positioning recognition technology is developed based on the OCR recognition technology. But current OCR recognition technology is not yet complete.
Disclosure of Invention
In view of the above, an object of the present application is to provide a method for training a target location model, and a method and an apparatus for target location.
In a first aspect, an embodiment of the present application provides a method for training an object location model, where the method includes:
inputting the image sample to a convolution model to extract a first image feature of the image sample;
inputting the first image feature to a segmentation model to generate first foreground coordinates of an image sample;
inputting the first image feature to a regression model to generate second foreground coordinates of the image sample;
calculating a model loss function according to the first foreground coordinate and the second foreground coordinate;
and simultaneously training the convolution model, the segmentation model and the regression model according to the model loss function to generate a target positioning model consisting of the convolution model and the regression model.
With reference to the first aspect, an embodiment of the present application provides a first possible implementation manner of the first aspect, where the calculating a model loss function according to the first foreground coordinate and the second foreground coordinate includes:
determining a first loss function according to the difference between the first foreground coordinate and the actual coordinate of the target in the image sample;
determining a second loss function according to the difference between the second foreground coordinate and the actual coordinate of the target in the image sample;
and determining a model loss function according to the first loss function and the second loss function.
With reference to the first possible implementation manner of the first aspect, an embodiment of the present application provides a second possible implementation manner of the first aspect, where the step of simultaneously training the convolution model, the segmentation model, and the regression model according to a model loss function includes:
judging whether the model loss function meets the preset output requirement or not;
and if the model loss function does not meet the preset output requirement, simultaneously training the convolution model, the segmentation model and the regression model according to the model loss function, and re-executing the steps to input the image sample into the convolution model so as to extract the first image characteristic of the image sample.
With reference to the first possible implementation manner or the second possible implementation manner of the first aspect, an embodiment of the present application provides a third possible implementation manner of the first aspect, where the step of simultaneously training the convolution model, the segmentation model, and the regression model according to a model loss function further includes:
judging whether the model loss function meets the preset output requirement or not;
and if the model loss function meets the preset output requirement, generating a target positioning model consisting of a convolution model and a regression model.
In a second aspect, an embodiment of the present application further provides a target positioning method, where a target image is input into a convolution model in a target positioning model to extract a second image feature of the target image;
inputting the second image features into a regression model in the target positioning model to generate foreground coordinates of the target image.
In a third aspect, an embodiment of the present application further provides a training apparatus for a target location model, where the training apparatus includes a first extraction module, configured to input an image sample to a convolution model to extract a first image feature of the image sample;
a first processing module for inputting the first image feature to a segmentation model to generate first foreground coordinates of an image sample;
a second image processing module for inputting the first image features to a regression model to generate second foreground coordinates of the image sample;
the first analysis module is used for calculating a model loss function according to the first foreground coordinate and the second foreground coordinate;
and the first generation module is used for simultaneously training the convolution model, the segmentation model and the regression model according to the model loss function so as to generate a target positioning model consisting of the convolution model and the regression model.
With reference to the third aspect, embodiments of the present application provide a first possible implementation manner of the second aspect, where the first analysis module includes: the device comprises a first analysis unit, a second analysis unit and a first determination unit;
the first analysis unit is used for determining a first loss function according to the difference between the first foreground coordinate and the actual coordinate of the target in the image sample;
the second analysis unit is used for determining a second loss function according to the difference between the second foreground coordinate and the actual coordinate of the target in the image sample;
the first determining unit is configured to determine a model loss function according to the first loss function and the second loss function.
With reference to the first possible implementation manner of the third aspect, this application provides a second possible implementation manner of the third aspect, where the first generating module includes: the device comprises a first judging unit, a first generating unit and a first processing unit;
the first judging unit is used for judging whether the model loss function meets the preset requirement or not;
and the first processing unit is used for training the convolution model, the segmentation model and the regression model simultaneously according to the model loss function and driving the first extraction module to work again when the model loss function does not meet the preset output requirement.
With reference to the first possible implementation manner of the third aspect, an embodiment of the present application provides a third possible implementation manner of the third aspect, where the first generating module further includes: a second judging unit and a second generating unit;
the second judgment unit is used for judging whether the model loss function meets the preset output requirement or not;
and the second generation unit is used for generating a target positioning model consisting of a convolution model and a regression model if the model loss function meets the preset output requirement.
In a fourth aspect, an apparatus for locating a target is further provided in an embodiment of the present application, where the apparatus includes a second extraction module and a second analysis module;
the second extraction module is used for inputting the target image into the convolution model in the target positioning model so as to extract a second image characteristic of the target image;
and the second analysis module is used for inputting the second image characteristics into a regression model in the target positioning model so as to generate foreground coordinates of the target image.
The training method for the target positioning model provided by the embodiment of the application comprises the following steps: inputting the image sample to a convolution model to extract a first image feature of the image sample; inputting the first image feature to a segmentation model to generate first foreground coordinates of an image sample; inputting the first image feature to a regression model to generate second foreground coordinates of the image sample; calculating a model loss function according to the first foreground coordinate and the second foreground coordinate; and simultaneously training the convolution model, the segmentation model and the regression model according to the model loss function to generate a target positioning model consisting of the convolution model and the regression model. That is, in the training phase, the segmentation model is used to help the convolution model and/or the regression model to train, and finally after the training is completed, the convolution model and the regression model are combined into the target positioning model to improve the situation that the positioning speed is slow in the case of using the segmentation model.
In order to make the aforementioned objects, features and advantages of the present application more comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.
FIG. 1 is a basic flowchart of a training method of an object location model according to an embodiment of the present disclosure;
FIG. 2 is a schematic diagram of a model used in a training process in a training method for an object location model provided in an embodiment of the present application;
FIG. 3 is a flow chart illustrating an optimization of a training method for an object location model according to an embodiment of the present disclosure;
FIG. 4 is a flow chart illustrating an optimization of another method for training an object location model provided by an embodiment of the present application;
FIG. 5 is a flow chart illustrating a method for locating a target according to an embodiment of the present application;
FIG. 6 is a schematic diagram illustrating a trained model in an object localization method according to an embodiment of the present disclosure;
FIG. 7 is a schematic structural diagram of a training apparatus for an object location model according to an embodiment of the present disclosure;
fig. 8 is a schematic structural diagram of a computing device for performing a training method of an object location model and an object location method according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.
The image positioning identification technology is widely applied, especially in financial auditing business. The letter examiner needs to examine and verify the massive identity cards, bank cards, business licenses and other data uploaded to the page or the APP terminal by the user every day. In recent years, with the development of hardware acceleration devices such as a GPU (graphics processing unit), a TPU (thermoplastic polyurethane) and the like and the improvement of the accuracy of various image recognition algorithms, the positioning recognition of images is started by utilizing an artificial intelligence technology so as to save labor cost and improve production efficiency.
At present, the OCR recognition technology is mainly used in the financial credit and audit business. The character information in the certificate picture uploaded by the user can be automatically recognized through an OCR recognition technology. The auditing objects in the financial credit and audit business, whether the objects are identity cards or bank cards, are basically rectangular outer frames, and the text areas are relatively fixed and the sizes of characters are uniform. The positioning technology in the whole OCR recognition technology is used for positioning images by respectively positioning image foreground and character areas. In order to realize the positioning of pictures from the webpage end and the APP end, a user is generally required to upload the pictures and then perform OCR recognition on the pictures in a front box which defines that the size and the aspect ratio of the pictures are fixed. The limitation of the front frame is equivalent to filtering out a background picture, and the difficulty of foreground positioning and character positioning is greatly reduced. In addition, on the premise of not adding a front frame, the traditional boundary detection and a recently emerging neural network are utilized to carry out region positioning, after the coordinates of a foreground region are positioned, the foreground picture is scratched, and then the subsequent character region positioning is carried out.
The processing method for adding the front frame is generally used for a mobile phone APP end, and a camera of a mobile phone of a user is called to carry out on-site shooting. The scheme is limited in that pictures uploaded by the user are shot on site, historical pictures stored in the photo album cannot be used, in addition, the difficulty of shooting is improved for the user due to the arrangement of the front frame, and the user experience is reduced. In addition, when the current frame is not a strong rule, the occupation ratio of the background area in the whole picture can be limited to a certain extent, and the subsequent steps still lack foreground positioning. When the front frame is not limited, the picture may be taken from the scene or may be a historical picture in an album. When the traditional boundary detection is used for foreground positioning, firstly, the boundary detection is greatly influenced by the quality of the picture, the robustness is not strong, and when the picture is unclear, the boundary characteristics are not obvious or the background is too complex, the positioning result cannot be obtained, or the positioning error is very large.
In view of the foregoing problems, embodiments of the present application provide a method for training a target location model, and a method and an apparatus for target location, which are described below by way of embodiments.
To facilitate understanding of the embodiment, a method for training an object location model disclosed in the embodiment of the present application is first described, and as shown in fig. 1, the method includes the following steps:
s101, inputting an image sample into a convolution model to extract a first image feature of the image sample;
s102, inputting the first image characteristics into a segmentation model to generate first foreground coordinates of an image sample;
s103, inputting the first image characteristics into a regression model to generate second foreground coordinates of the image sample;
s104, calculating a model loss function according to the first foreground coordinate and the second foreground coordinate;
and S105, simultaneously training the convolution model, the segmentation model and the regression model according to the model loss function to generate a target positioning model consisting of the convolution model and the regression model.
As shown in fig. 2, the model in training used in steps S101-S105 described above is shown, and consists of a convolution model 201, a segmentation model 202, and a regression model 203. Wherein the segmentation model 202 and the regression model 203 respectively receive the first image features of the image samples output by the convolution models as two models receiving the output results of the convolution models.
In step S101, in training, image samples (e.g., pixel matrices) as a training set are scaled to the input size of the convolution model, and then input into the convolution model. This is to take into account that the fully connected layers in the convolution model are required for the input image size. Specifically, although the convolutional layer in the convolutional model has no size restriction requirement for an image, the full link layer needs to input an image of a fixed size. Therefore, the size of the image sample needs to be adjusted uniformly, that is, to a fixed size. More specifically, the dimension of the input vector of the full link layer (reflecting the size of the input image) corresponds to the number of weight parameters of the full link layer, and if the dimension of the input vector is not fixed, the number of weight parameters of the full link layer is also not fixed. Therefore, a pixel matrix of an image of a fixed size should be input as an image sample into the convolution model here to enable the convolution model to extract the first image feature of the image sample.
S102 and S103 may be two steps performed simultaneously, and when the step S102 is executed specifically, the first image feature of the image sample is subjected to segmentation recognition by using a segmentation model to determine the first foreground coordinate of the image sample. Here, the segmentation model is to accurately locate the foreground coordinates of the image by using various image segmentation algorithms. The image segmentation algorithm used here mainly includes a threshold-based segmentation method, an edge-based segmentation method, a region-based segmentation method, an image segmentation method based on cluster analysis, a wavelet transform-based segmentation method, a mathematical morphology-based segmentation method, and an artificial neural network-based segmentation method. The segmentation algorithm using the artificial neural network obtains a linear decision function by training a multilayer perceptron, and then classifies pixels by using the decision function to achieve the purpose of segmentation. The segmentation model needs a large amount of training data, and the neural network has huge connection, so that spatial information is easily introduced, and the problems of noise and non-uniformity in the image can be well solved. Therefore, the segmentation model is preferably a model of a segmentation algorithm using an artificial network.
In addition, in step S103, mainly by using a regression model, linear regression positioning is performed on the first image feature to obtain a second foreground coordinate of the image sample. The regression model is mainly used for searching a mapping corresponding to a target vector to be positioned so as to minimize the error between the target vector and the real position vector of the target. That is, given an input feature vector of a first image feature, then by learning a set of parameters, a second foreground coordinate value on an image sample input to a regression model and subjected to linear regression is made to be very close to an actual value. The feature vector of the first image feature of the image sample is used as input, the feature vector of the first image feature to be positioned is translated, and then scaling is performed to obtain a predicted value of a second foreground coordinate of the image sample. And calculating a functional relation between the predicted value and the actual value to obtain an optimized parameter. And finally, the predicted value is close to the true value through learning of the optimization parameters so as to obtain a second foreground coordinate of the image sample.
Step S104 is to calculate a first loss function (the first loss function is determined according to the first foreground coordinate) and a second loss function (the second loss function is determined according to the second foreground coordinate) respectively obtained by the segmentation model and the regression model during the training process, so as to obtain a model loss function.
In step S105, the convolution function, the segmentation function, and the regression model are trained simultaneously using the model loss function. Each portion of the training model may be optimized using a model loss function to generate a final object localization model consisting of a convolution model and a regression model. In specific implementation, the segmentation model and the regression model may be trained by using a model loss function, or the convolution model and the regression model may be trained, or the convolution model, the segmentation model and the loss model may be trained by using a model loss function.
The method for training the target positioning model in the embodiment of the application comprises the steps of firstly extracting first image characteristics of an image sample by using a convolution model, then respectively inputting results output by the convolution model, namely the first image characteristics, into a segmentation model and a regression model, and simultaneously training the convolution model, the segmentation model and the regression model according to output results of the segmentation model and the regression model, so that the target positioning model consisting of the convolution model and the regression model is obtained, and the situation that the positioning speed is slow under the condition that the segmentation model is used is improved.
Further, step S104 can be implemented as the following steps, as shown in fig. 3:
s301, determining a first loss function according to the difference between the first foreground coordinate and the actual coordinate of the target in the image sample;
s302, determining a second loss function according to the difference between the second foreground coordinate and the actual coordinate of the target in the image sample;
s303, determining a model loss function according to the first loss function and the second loss function.
Step S301, comparing the first foreground coordinate obtained by segmenting the model with the actual coordinate of the target in the image sample to determine a first loss function. During the model training process, the actual coordinates of the target in the image sample are determined in advance and labeled to determine the first loss function.
In step S302, a second loss function is determined by calculating a difference between a second foreground coordinate, i.e., a predicted value, and an actual coordinate, i.e., a true value, of the target in the image sample. Specifically, step S301 and step S302 may be executed simultaneously or separately.
In step S303, a final model loss function is calculated according to the first loss function and the second loss function obtained in the previous steps. The first loss function and the second loss function are continuously generated along with the model loss function respectively optimizing the convolution model, the segmentation model and the regression model, namely different first foreground coordinates and second foreground coordinates are generated, and the first foreground coordinates and the second foreground coordinates are respectively compared with the actual coordinates to respectively generate the corresponding first loss function and the second loss function. And determining a final model loss function by continuously generating a first loss function and a second loss function and calculating the first loss function and the second loss function.
Further, as shown in fig. 4, step S105 may be implemented as the following steps, and step S105 includes two cases, where the first case is specifically as follows:
s401, judging whether the model loss function meets a preset output requirement or not;
s402, if the model loss function does not meet the preset requirement, simultaneously training the convolution model, the segmentation model and the regression model according to the model loss function, and re-executing the steps to input the image sample into the convolution model so as to extract the first image feature of the image sample.
After determining a model loss function according to the first loss function and the second loss function, judging the generated model loss function, and when the final model loss function does not meet the preset output requirement, re-executing the step to input the image sample into the convolution model so as to extract the first image feature of the image sample; inputting the first image feature to a segmentation model to generate first foreground coordinates of an image sample; inputting the first image feature to a regression model to generate second foreground coordinates of the image sample; calculating a model loss function according to the first foreground coordinate and the second foreground coordinate; and training the convolution model, the segmentation model and the regression model simultaneously according to the generated model loss function. The preset output requirement means that when the error between the result respectively output by the convolution model, the segmentation model and the regression model and the real result is minimum, the corresponding model loss function determined according to the first loss function and the second loss function is determined as the final model loss function.
The second case is as follows:
s403, judging whether the model loss function meets the preset output requirement;
s404, if the model loss function meets the preset output requirement, a target positioning model composed of a convolution model and a regression model is generated.
The convolution model and regression model trained by the model loss function meeting the preset output requirement should be the finally determined target positioning model. When the model loss function is judged to meet the preset output requirement, namely the model loss function is determined to be the optimal model loss function, the gradient value of the model loss function is calculated by using a gradient descent algorithm, the corresponding optimization parameter is calculated, the convolution model, the segmentation model and the regression model are trained by using the model parameter, and the target positioning model finally composed of the convolution model and the regression model is generated. The segmentation model is used for training the convolution model so that the convolution model can obtain image features with higher precision in practical application, and meanwhile, the trained regression model has better positioning speed.
In summary, according to the training method for the target positioning model, the convolution model, the segmentation model and the regression model are respectively trained, so that the convolution model in the actual prediction stage obtains better image characteristics, and meanwhile, the regression model is trained to improve the image positioning speed.
Corresponding to the above training method of the target location model, the present application also provides a target location method, as shown in fig. 5:
s501, inputting the target image into a convolution model in the target positioning model to extract a second image feature of the target image;
s502, inputting the second image characteristics into a regression model in the target positioning model to generate foreground coordinates of the target image.
Steps S501 and S502 are a process of performing actual prediction using the target localization pattern determined by the above-described step training. And inputting the target image into a convolution model in the target positioning model to extract a second image characteristic of the target image. The target localization model here is the target localization model determined by training according to the above steps S101 to S105. The convolution model in the target positioning model is optimized according to the parameters of the segmentation model, so that the second image characteristics of the target image can be more accurately extracted. The second image features are input to a regression model in the target location model to generate the final desired foreground coordinates of the target image. The target positioning model used for actual prediction not only can have higher positioning precision for the target image, but also can have faster positioning speed.
In the practical application of image positioning, a segmentation model is abandoned, and a target positioning model composed of a convolution model and a regression model is used, so that the problem of low positioning speed of the segmentation model can be solved, and an image positioning result can be obtained better and faster.
The method described above with reference to fig. 5 is performed using an object localization model, i.e. as shown in fig. 6, the final generated actual object localization model is composed of a convolution model 601 and a regression model 602. The convolution model and the regression model in the target positioning model are utilized to improve the target positioning precision and improve the positioning speed.
In summary, the specific steps of the embodiments of the method described above are as follows:
step 1, inputting an image sample into a convolution model to extract a first image characteristic of the image sample;
step 2, inputting the first image characteristic into a segmentation model to generate a first foreground coordinate of an image sample;
step 3, determining a first loss function according to the difference between the first foreground coordinate and the actual coordinate of the target in the image sample;
step 4, inputting the first image characteristics into a regression model to generate second foreground coordinates of the image sample;
step 5, determining a second loss function according to the difference between the second foreground coordinate and the actual coordinate of the target in the image sample;
step 6, determining a model loss function according to the first loss function and the second loss function;
step 7, judging whether the model loss function meets the preset output requirement or not;
step 8, if the model loss function does not meet the preset output requirement, simultaneously training the convolution model, the segmentation model and the regression model according to the model loss function, and re-executing the step to input the image sample into the convolution model so as to extract the first image characteristic of the image sample;
step 9, if the model loss function meets the preset output requirement, generating a target positioning model consisting of a convolution model and a regression model;
step 10, inputting the target image into a convolution model in the target positioning model to extract a second image characteristic of the target image;
and 11, inputting the second image characteristics into a regression model in the target positioning model to generate foreground coordinates of the target image.
The training method of the target positioning model is realized through the steps 1-9, and the target positioning method is realized through the steps 10-11.
In the embodiment of the application, parameters of the shared convolution layer, the regression layer and the segmentation layer are updated by training a model structure composed of a convolution model, a regression simulation and a segmentation model and by using a model loss function constructed by the regression model and the segmentation model in the training process. In the actual application stage, the segmentation model is abandoned, and the output of the regression model is taken as the final result. In summary, because the convolution model performs feature abstraction and then performs regression model positioning, the training speed of the model is high, but the positioning accuracy is not high, and if segmentation positioning is used, high positioning accuracy can be obtained, but the model is complex and the training time is long. The convolution layer is shared by the regression model and the segmentation model, the segmentation model is used for obtaining better model characteristics in the training stage, the regression model is used in the prediction stage for accelerating the prediction speed, and meanwhile, the prediction precision is improved.
In addition, an embodiment of the present application further provides a training apparatus for a target location model, as shown in fig. 7: a first extraction module 701, configured to input the image sample to a convolution model to extract a first image feature of the image sample;
a first processing module 702, configured to input the first image feature to a segmentation model to generate a first foreground coordinate of an image sample;
a second processing module 703, configured to input the first image feature to a regression model to generate a second foreground coordinate of the image sample;
a first analysis module 704 for calculating a model loss function from the first foreground coordinates and the second foreground coordinates;
a first generating module 705 for training the convolution model, the segmentation model and the regression model simultaneously according to the model loss function to generate a target location model composed of the convolution model and the regression model.
Wherein the first analysis module 704 comprises: the device comprises a first analysis unit, a second analysis unit and a first determination unit;
the first analysis unit is used for determining a first loss function according to the difference between the first foreground coordinate and the actual coordinate of the target in the image sample;
the second analysis unit is used for determining a second loss function according to the difference between the second foreground coordinate and the actual coordinate of the target in the image sample;
and the first determining unit calculates according to the first loss function and the second loss function so as to determine a model loss function.
Wherein, the first generating module 705 includes: a first judging unit and a first processing unit;
the first judgment unit is used for judging whether the model loss function meets the preset output requirement or not;
and the first processing unit is used for training the convolution model, the segmentation model and the regression model simultaneously according to the model loss function when the model loss function does not meet the preset output requirement, and driving the first extraction module to work again.
Wherein, the first generating module 705 further includes: a second judging unit and a second generating unit;
the second judgment unit is used for judging whether the model loss function meets the preset output requirement or not;
and the second generation unit is used for generating a target positioning model consisting of a convolution model and a regression model if the model loss function meets the preset output requirement.
The embodiment of the application also comprises a target positioning device, which comprises a second extraction module and a second analysis module;
the second extraction module is used for inputting the target image into the convolution model in the target positioning model so as to extract a second image characteristic of the target image;
and the second analysis module is used for inputting the second image characteristics into a regression model in the target positioning model so as to generate foreground coordinates of the target image.
As shown in fig. 8, a schematic diagram of a computing device provided in the embodiment of the present application, where the computing device 80 includes: a processor 81, a memory 82 and a bus 83, the memory 82 storing instructions for execution, the processor 81 and the memory 82 communicating via the bus 83 when the computing device is operating, the processor 81 executing steps stored in the memory 82 such as performing a training method of an object localization model and an object localization method.
The embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program performs the steps of the method for training the object location model and the method for locating the object in any of the above embodiments.
Specifically, the storage medium can be a general storage medium, such as a mobile disk, a hard disk, or the like, and when a computer program on the storage medium is executed, the above training method and the target positioning method for performing the target positioning model can be executed, so that the convolution model in the actual prediction stage obtains better image features by respectively training the convolution model, the segmentation model, and the regression model, and at the same time, the image positioning speed is improved by using the regression model.
The computer program product for performing the target location model training method and the target location method provided in the embodiments of the present application includes a computer-readable storage medium storing a program code, where instructions included in the program code may be used to execute the method in the foregoing method embodiments, and specific implementations may refer to the method embodiments and are not described herein again.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present application, and are used for illustrating the technical solutions of the present application, but not limiting the same, and the scope of the present application is not limited thereto, and although the present application is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope disclosed in the present application; such modifications, changes or substitutions do not depart from the spirit and scope of the exemplary embodiments of the present application, and are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (10)

1. A method for training an object localization model, comprising:
inputting the image sample to a convolution model to extract a first image feature of the image sample;
inputting the first image feature to a segmentation model to generate first foreground coordinates of an image sample;
inputting the first image feature to a regression model to generate second foreground coordinates of the image sample;
calculating a model loss function according to the first foreground coordinate and the second foreground coordinate;
and simultaneously training the convolution model, the segmentation model and the regression model according to the model loss function to generate a target positioning model consisting of the convolution model and the regression model.
2. The method of claim 1, wherein computing a model penalty function from the first foreground coordinate and the second foreground coordinate comprises:
determining a first loss function according to the difference between the first foreground coordinate and the actual coordinate of the target in the image sample;
determining a second loss function according to the difference between the second foreground coordinate and the actual coordinate of the target in the image sample;
and determining a model loss function according to the first loss function and the second loss function.
3. The method of claim 1, wherein simultaneously training the convolution model, the segmentation model, and the regression model according to a model loss function comprises:
judging whether the model loss function meets the preset output requirement or not;
if the model loss function does not meet the preset output requirement, the convolution model, the segmentation model and the regression model are trained simultaneously according to the model loss function, and the step is executed again to input the image sample into the convolution model so as to extract the first image feature of the image sample.
4. The method of claim 3, wherein the convolution model, the segmentation model, and the regression model are trained simultaneously according to a model loss function, further comprising:
judging whether the model loss function meets the preset output requirement or not;
and if the model loss function meets the preset output requirement, generating a target positioning model consisting of a convolution model and a regression model.
5. An object localization method, based on the method according to any of claims 1-4, comprising:
inputting the target image into a convolution model in the target positioning model to extract a second image feature of the target image;
inputting the second image features into a regression model in the target positioning model to generate foreground coordinates of the target image.
6. An apparatus for training an object localization model, comprising:
the first extraction module is used for inputting the image sample into the convolution model so as to extract a first image characteristic of the image sample;
a first processing module for inputting the first image feature to a segmentation model to generate first foreground coordinates of an image sample;
a second processing module for inputting the first image features to a regression model to generate second foreground coordinates of the image sample;
the first analysis module is used for calculating a model loss function according to the first foreground coordinate and the second foreground coordinate;
and the first generation module is used for simultaneously training the convolution model, the segmentation model and the regression model according to the model loss function so as to generate a target positioning model consisting of the convolution model and the regression model.
7. An apparatus as claimed in claim 6, wherein the first analysis module comprises: the device comprises a first analysis unit, a second analysis unit and a first determination unit;
the first analysis unit is used for determining a first loss function according to the difference between the first foreground coordinate and the actual coordinate of the target in the image sample;
the second analysis unit is used for determining a second loss function according to the difference between the second foreground coordinate and the actual coordinate of the target in the image sample;
the first determining unit is configured to determine a model loss function according to the first loss function and the second loss function.
8. An apparatus according to claim 6, wherein the first generating module comprises: the device comprises a first judging unit, a first generating unit and a first processing unit;
the first judging unit is used for judging whether the model loss function meets the preset output requirement or not;
and the first processing unit is used for training the convolution model, the segmentation model and the regression model simultaneously according to the model loss function and driving the first extraction module to work again when the model loss function does not meet the preset output requirement.
9. An apparatus as claimed in claim 8, wherein the first generating module further comprises: a second judging unit and a second generating unit;
the second judgment unit is used for judging whether the model loss function meets the preset output requirement or not;
and the second generation unit is used for generating a target positioning model consisting of a convolution model and a regression model if the model loss function meets the preset output requirement.
10. An apparatus for object localization according to any of claims 6-9, comprising: a second extraction module and a second analysis module;
the second extraction module is used for inputting the target image into the convolution model in the target positioning model so as to extract a second image characteristic of the target image;
and the second analysis module is used for inputting the second image characteristics into a regression model in the target positioning model so as to generate foreground coordinates of the target image.
CN201810992851.7A 2018-08-23 2018-08-23 Training method of target positioning model and target positioning method and device Active CN109165654B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810992851.7A CN109165654B (en) 2018-08-23 2018-08-23 Training method of target positioning model and target positioning method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810992851.7A CN109165654B (en) 2018-08-23 2018-08-23 Training method of target positioning model and target positioning method and device

Publications (2)

Publication Number Publication Date
CN109165654A CN109165654A (en) 2019-01-08
CN109165654B true CN109165654B (en) 2021-03-30

Family

ID=64893338

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810992851.7A Active CN109165654B (en) 2018-08-23 2018-08-23 Training method of target positioning model and target positioning method and device

Country Status (1)

Country Link
CN (1) CN109165654B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110675453B (en) * 2019-10-16 2021-04-13 北京天睿空间科技股份有限公司 Self-positioning method for moving target in known scene
CN111080694A (en) * 2019-12-20 2020-04-28 上海眼控科技股份有限公司 Training and positioning method, device, equipment and storage medium of positioning model
CN111179628B (en) * 2020-01-09 2021-09-28 北京三快在线科技有限公司 Positioning method and device for automatic driving vehicle, electronic equipment and storage medium
CN113469172B (en) * 2020-03-30 2022-07-01 阿里巴巴集团控股有限公司 Target positioning method, model training method, interface interaction method and equipment

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105550746B (en) * 2015-12-08 2018-02-02 北京旷视科技有限公司 The training method and trainer of machine learning model
CN107730514B (en) * 2017-09-29 2021-02-12 北京奇宝科技有限公司 Scene segmentation network training method and device, computing equipment and storage medium
CN108133186A (en) * 2017-12-21 2018-06-08 东北林业大学 A kind of plant leaf identification method based on deep learning
CN108416412B (en) * 2018-01-23 2021-04-06 浙江瀚镪自动化设备股份有限公司 Logistics composite code identification method based on multitask deep learning
CN108416378B (en) * 2018-02-28 2020-04-14 电子科技大学 Large-scene SAR target recognition method based on deep neural network

Also Published As

Publication number Publication date
CN109165654A (en) 2019-01-08

Similar Documents

Publication Publication Date Title
CN112052787B (en) Target detection method and device based on artificial intelligence and electronic equipment
CN109165654B (en) Training method of target positioning model and target positioning method and device
CN111241989A (en) Image recognition method and device and electronic equipment
CN112464912B (en) Robot end face detection method based on YOLO-RGGNet
CN110852257B (en) Method and device for detecting key points of human face and storage medium
CN111311485B (en) Image processing method and related device
CN107784288A (en) A kind of iteration positioning formula method for detecting human face based on deep neural network
CN111222548A (en) Similar image detection method, device, equipment and storage medium
CN112464798A (en) Text recognition method and device, electronic equipment and storage medium
CN112101359B (en) Text formula positioning method, model training method and related device
CN115239644B (en) Concrete defect identification method, device, computer equipment and storage medium
CN111192312B (en) Depth image acquisition method, device, equipment and medium based on deep learning
CN111626295A (en) Training method and device for license plate detection model
CN112287947A (en) Region suggestion box detection method, terminal and storage medium
CN114266894A (en) Image segmentation method and device, electronic equipment and storage medium
CN109685805B (en) Image segmentation method and device
CN113034514A (en) Sky region segmentation method and device, computer equipment and storage medium
CN112749576B (en) Image recognition method and device, computing equipment and computer storage medium
CN109977937B (en) Image processing method, device and equipment
CN115630660B (en) Barcode positioning method and device based on convolutional neural network
CN117389664A (en) Unique control region division method and device, electronic equipment and storage medium
CN113468906B (en) Graphic code extraction model construction method, identification device, equipment and medium
CN118115932A (en) Image regressor training method, related method, device, equipment and medium
WO2015114021A1 (en) Image capture using client device
CN112288748B (en) Semantic segmentation network training and image semantic segmentation method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant