CN109165654B

CN109165654B - Training method of target positioning model and target positioning method and device

Info

Publication number: CN109165654B
Application number: CN201810992851.7A
Authority: CN
Inventors: 叶锦宇; 刘玉明
Original assignee: Beijing Jiuhu Times Intelligent Technology Co ltd
Current assignee: Beijing Jiuhu Times Intelligent Technology Co ltd
Priority date: 2018-08-23
Filing date: 2018-08-23
Publication date: 2021-03-30
Anticipated expiration: 2038-08-23
Also published as: CN109165654A

Abstract

The application provides a training method of a target positioning model, which comprises the following steps: inputting the image sample to a convolution model to extract a first image feature of the image sample; inputting the first image feature to a segmentation model to generate first foreground coordinates of an image sample; inputting the first image feature to a regression model to generate second foreground coordinates of the image sample; calculating a model loss function according to the first foreground coordinate and the second foreground coordinate; and simultaneously training the convolution model, the segmentation model and the regression model according to the model loss function to generate a target positioning model consisting of the convolution model and the regression model. By adopting the scheme, the convolution model in the actual prediction stage obtains better image characteristics by training the convolution model, the segmentation model and the regression model, and meanwhile, the regression model is utilized to improve the image positioning speed.

Description

Training method of target positioning model and target positioning method and device

Technical Field

The present application relates to the field of image recognition technologies, and in particular, to a training method for a target location model, and a target location method and apparatus.

Background

In order to deal with a large amount of images to be recognized in financial credit and audit businesses, credit and audit personnel usually complete intelligent credit and audit (generally, auditing data such as an identity card, a bank card, a business license and the like of a user) by means of the image positioning and recognition technology, so that labor cost is saved, and production efficiency is improved.

The existing image positioning recognition technology is developed based on the OCR recognition technology. But current OCR recognition technology is not yet complete.

Disclosure of Invention

In view of the above, an object of the present application is to provide a method for training a target location model, and a method and an apparatus for target location.

In a first aspect, an embodiment of the present application provides a method for training an object location model, where the method includes:

inputting the image sample to a convolution model to extract a first image feature of the image sample;

inputting the first image feature to a segmentation model to generate first foreground coordinates of an image sample;

inputting the first image feature to a regression model to generate second foreground coordinates of the image sample;

calculating a model loss function according to the first foreground coordinate and the second foreground coordinate;

and simultaneously training the convolution model, the segmentation model and the regression model according to the model loss function to generate a target positioning model consisting of the convolution model and the regression model.

With reference to the first aspect, an embodiment of the present application provides a first possible implementation manner of the first aspect, where the calculating a model loss function according to the first foreground coordinate and the second foreground coordinate includes:

determining a first loss function according to the difference between the first foreground coordinate and the actual coordinate of the target in the image sample;

determining a second loss function according to the difference between the second foreground coordinate and the actual coordinate of the target in the image sample;

and determining a model loss function according to the first loss function and the second loss function.

With reference to the first possible implementation manner of the first aspect, an embodiment of the present application provides a second possible implementation manner of the first aspect, where the step of simultaneously training the convolution model, the segmentation model, and the regression model according to a model loss function includes:

judging whether the model loss function meets the preset output requirement or not;

and if the model loss function does not meet the preset output requirement, simultaneously training the convolution model, the segmentation model and the regression model according to the model loss function, and re-executing the steps to input the image sample into the convolution model so as to extract the first image characteristic of the image sample.

With reference to the first possible implementation manner or the second possible implementation manner of the first aspect, an embodiment of the present application provides a third possible implementation manner of the first aspect, where the step of simultaneously training the convolution model, the segmentation model, and the regression model according to a model loss function further includes:

and if the model loss function meets the preset output requirement, generating a target positioning model consisting of a convolution model and a regression model.

In a second aspect, an embodiment of the present application further provides a target positioning method, where a target image is input into a convolution model in a target positioning model to extract a second image feature of the target image;

inputting the second image features into a regression model in the target positioning model to generate foreground coordinates of the target image.

In a third aspect, an embodiment of the present application further provides a training apparatus for a target location model, where the training apparatus includes a first extraction module, configured to input an image sample to a convolution model to extract a first image feature of the image sample;

a first processing module for inputting the first image feature to a segmentation model to generate first foreground coordinates of an image sample;

a second image processing module for inputting the first image features to a regression model to generate second foreground coordinates of the image sample;

the first analysis module is used for calculating a model loss function according to the first foreground coordinate and the second foreground coordinate;

and the first generation module is used for simultaneously training the convolution model, the segmentation model and the regression model according to the model loss function so as to generate a target positioning model consisting of the convolution model and the regression model.

With reference to the third aspect, embodiments of the present application provide a first possible implementation manner of the second aspect, where the first analysis module includes: the device comprises a first analysis unit, a second analysis unit and a first determination unit;

the first analysis unit is used for determining a first loss function according to the difference between the first foreground coordinate and the actual coordinate of the target in the image sample;

the second analysis unit is used for determining a second loss function according to the difference between the second foreground coordinate and the actual coordinate of the target in the image sample;

the first determining unit is configured to determine a model loss function according to the first loss function and the second loss function.

With reference to the first possible implementation manner of the third aspect, this application provides a second possible implementation manner of the third aspect, where the first generating module includes: the device comprises a first judging unit, a first generating unit and a first processing unit;

the first judging unit is used for judging whether the model loss function meets the preset requirement or not;

and the first processing unit is used for training the convolution model, the segmentation model and the regression model simultaneously according to the model loss function and driving the first extraction module to work again when the model loss function does not meet the preset output requirement.

With reference to the first possible implementation manner of the third aspect, an embodiment of the present application provides a third possible implementation manner of the third aspect, where the first generating module further includes: a second judging unit and a second generating unit;

the second judgment unit is used for judging whether the model loss function meets the preset output requirement or not;

and the second generation unit is used for generating a target positioning model consisting of a convolution model and a regression model if the model loss function meets the preset output requirement.

In a fourth aspect, an apparatus for locating a target is further provided in an embodiment of the present application, where the apparatus includes a second extraction module and a second analysis module;

the second extraction module is used for inputting the target image into the convolution model in the target positioning model so as to extract a second image characteristic of the target image;

and the second analysis module is used for inputting the second image characteristics into a regression model in the target positioning model so as to generate foreground coordinates of the target image.

The training method for the target positioning model provided by the embodiment of the application comprises the following steps: inputting the image sample to a convolution model to extract a first image feature of the image sample; inputting the first image feature to a segmentation model to generate first foreground coordinates of an image sample; inputting the first image feature to a regression model to generate second foreground coordinates of the image sample; calculating a model loss function according to the first foreground coordinate and the second foreground coordinate; and simultaneously training the convolution model, the segmentation model and the regression model according to the model loss function to generate a target positioning model consisting of the convolution model and the regression model. That is, in the training phase, the segmentation model is used to help the convolution model and/or the regression model to train, and finally after the training is completed, the convolution model and the regression model are combined into the target positioning model to improve the situation that the positioning speed is slow in the case of using the segmentation model.

In order to make the aforementioned objects, features and advantages of the present application more comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.

FIG. 1 is a basic flowchart of a training method of an object location model according to an embodiment of the present disclosure;

FIG. 2 is a schematic diagram of a model used in a training process in a training method for an object location model provided in an embodiment of the present application;

FIG. 3 is a flow chart illustrating an optimization of a training method for an object location model according to an embodiment of the present disclosure;

FIG. 4 is a flow chart illustrating an optimization of another method for training an object location model provided by an embodiment of the present application;

FIG. 5 is a flow chart illustrating a method for locating a target according to an embodiment of the present application;

FIG. 6 is a schematic diagram illustrating a trained model in an object localization method according to an embodiment of the present disclosure;

FIG. 7 is a schematic structural diagram of a training apparatus for an object location model according to an embodiment of the present disclosure;

fig. 8 is a schematic structural diagram of a computing device for performing a training method of an object location model and an object location method according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.

The image positioning identification technology is widely applied, especially in financial auditing business. The letter examiner needs to examine and verify the massive identity cards, bank cards, business licenses and other data uploaded to the page or the APP terminal by the user every day. In recent years, with the development of hardware acceleration devices such as a GPU (graphics processing unit), a TPU (thermoplastic polyurethane) and the like and the improvement of the accuracy of various image recognition algorithms, the positioning recognition of images is started by utilizing an artificial intelligence technology so as to save labor cost and improve production efficiency.

At present, the OCR recognition technology is mainly used in the financial credit and audit business. The character information in the certificate picture uploaded by the user can be automatically recognized through an OCR recognition technology. The auditing objects in the financial credit and audit business, whether the objects are identity cards or bank cards, are basically rectangular outer frames, and the text areas are relatively fixed and the sizes of characters are uniform. The positioning technology in the whole OCR recognition technology is used for positioning images by respectively positioning image foreground and character areas. In order to realize the positioning of pictures from the webpage end and the APP end, a user is generally required to upload the pictures and then perform OCR recognition on the pictures in a front box which defines that the size and the aspect ratio of the pictures are fixed. The limitation of the front frame is equivalent to filtering out a background picture, and the difficulty of foreground positioning and character positioning is greatly reduced. In addition, on the premise of not adding a front frame, the traditional boundary detection and a recently emerging neural network are utilized to carry out region positioning, after the coordinates of a foreground region are positioned, the foreground picture is scratched, and then the subsequent character region positioning is carried out.

The processing method for adding the front frame is generally used for a mobile phone APP end, and a camera of a mobile phone of a user is called to carry out on-site shooting. The scheme is limited in that pictures uploaded by the user are shot on site, historical pictures stored in the photo album cannot be used, in addition, the difficulty of shooting is improved for the user due to the arrangement of the front frame, and the user experience is reduced. In addition, when the current frame is not a strong rule, the occupation ratio of the background area in the whole picture can be limited to a certain extent, and the subsequent steps still lack foreground positioning. When the front frame is not limited, the picture may be taken from the scene or may be a historical picture in an album. When the traditional boundary detection is used for foreground positioning, firstly, the boundary detection is greatly influenced by the quality of the picture, the robustness is not strong, and when the picture is unclear, the boundary characteristics are not obvious or the background is too complex, the positioning result cannot be obtained, or the positioning error is very large.

In view of the foregoing problems, embodiments of the present application provide a method for training a target location model, and a method and an apparatus for target location, which are described below by way of embodiments.

To facilitate understanding of the embodiment, a method for training an object location model disclosed in the embodiment of the present application is first described, and as shown in fig. 1, the method includes the following steps:

s101, inputting an image sample into a convolution model to extract a first image feature of the image sample;

s102, inputting the first image characteristics into a segmentation model to generate first foreground coordinates of an image sample;

s103, inputting the first image characteristics into a regression model to generate second foreground coordinates of the image sample;

s104, calculating a model loss function according to the first foreground coordinate and the second foreground coordinate;

and S105, simultaneously training the convolution model, the segmentation model and the regression model according to the model loss function to generate a target positioning model consisting of the convolution model and the regression model.

As shown in fig. 2, the model in training used in steps S101-S105 described above is shown, and consists of a convolution model 201, a segmentation model 202, and a regression model 203. Wherein the segmentation model 202 and the regression model 203 respectively receive the first image features of the image samples output by the convolution models as two models receiving the output results of the convolution models.

In step S101, in training, image samples (e.g., pixel matrices) as a training set are scaled to the input size of the convolution model, and then input into the convolution model. This is to take into account that the fully connected layers in the convolution model are required for the input image size. Specifically, although the convolutional layer in the convolutional model has no size restriction requirement for an image, the full link layer needs to input an image of a fixed size. Therefore, the size of the image sample needs to be adjusted uniformly, that is, to a fixed size. More specifically, the dimension of the input vector of the full link layer (reflecting the size of the input image) corresponds to the number of weight parameters of the full link layer, and if the dimension of the input vector is not fixed, the number of weight parameters of the full link layer is also not fixed. Therefore, a pixel matrix of an image of a fixed size should be input as an image sample into the convolution model here to enable the convolution model to extract the first image feature of the image sample.

S102 and S103 may be two steps performed simultaneously, and when the step S102 is executed specifically, the first image feature of the image sample is subjected to segmentation recognition by using a segmentation model to determine the first foreground coordinate of the image sample. Here, the segmentation model is to accurately locate the foreground coordinates of the image by using various image segmentation algorithms. The image segmentation algorithm used here mainly includes a threshold-based segmentation method, an edge-based segmentation method, a region-based segmentation method, an image segmentation method based on cluster analysis, a wavelet transform-based segmentation method, a mathematical morphology-based segmentation method, and an artificial neural network-based segmentation method. The segmentation algorithm using the artificial neural network obtains a linear decision function by training a multilayer perceptron, and then classifies pixels by using the decision function to achieve the purpose of segmentation. The segmentation model needs a large amount of training data, and the neural network has huge connection, so that spatial information is easily introduced, and the problems of noise and non-uniformity in the image can be well solved. Therefore, the segmentation model is preferably a model of a segmentation algorithm using an artificial network.

In addition, in step S103, mainly by using a regression model, linear regression positioning is performed on the first image feature to obtain a second foreground coordinate of the image sample. The regression model is mainly used for searching a mapping corresponding to a target vector to be positioned so as to minimize the error between the target vector and the real position vector of the target. That is, given an input feature vector of a first image feature, then by learning a set of parameters, a second foreground coordinate value on an image sample input to a regression model and subjected to linear regression is made to be very close to an actual value. The feature vector of the first image feature of the image sample is used as input, the feature vector of the first image feature to be positioned is translated, and then scaling is performed to obtain a predicted value of a second foreground coordinate of the image sample. And calculating a functional relation between the predicted value and the actual value to obtain an optimized parameter. And finally, the predicted value is close to the true value through learning of the optimization parameters so as to obtain a second foreground coordinate of the image sample.

Step S104 is to calculate a first loss function (the first loss function is determined according to the first foreground coordinate) and a second loss function (the second loss function is determined according to the second foreground coordinate) respectively obtained by the segmentation model and the regression model during the training process, so as to obtain a model loss function.

In step S105, the convolution function, the segmentation function, and the regression model are trained simultaneously using the model loss function. Each portion of the training model may be optimized using a model loss function to generate a final object localization model consisting of a convolution model and a regression model. In specific implementation, the segmentation model and the regression model may be trained by using a model loss function, or the convolution model and the regression model may be trained, or the convolution model, the segmentation model and the loss model may be trained by using a model loss function.

The method for training the target positioning model in the embodiment of the application comprises the steps of firstly extracting first image characteristics of an image sample by using a convolution model, then respectively inputting results output by the convolution model, namely the first image characteristics, into a segmentation model and a regression model, and simultaneously training the convolution model, the segmentation model and the regression model according to output results of the segmentation model and the regression model, so that the target positioning model consisting of the convolution model and the regression model is obtained, and the situation that the positioning speed is slow under the condition that the segmentation model is used is improved.

Further, step S104 can be implemented as the following steps, as shown in fig. 3:

s301, determining a first loss function according to the difference between the first foreground coordinate and the actual coordinate of the target in the image sample;

s302, determining a second loss function according to the difference between the second foreground coordinate and the actual coordinate of the target in the image sample;

s303, determining a model loss function according to the first loss function and the second loss function.

Step S301, comparing the first foreground coordinate obtained by segmenting the model with the actual coordinate of the target in the image sample to determine a first loss function. During the model training process, the actual coordinates of the target in the image sample are determined in advance and labeled to determine the first loss function.

In step S302, a second loss function is determined by calculating a difference between a second foreground coordinate, i.e., a predicted value, and an actual coordinate, i.e., a true value, of the target in the image sample. Specifically, step S301 and step S302 may be executed simultaneously or separately.

In step S303, a final model loss function is calculated according to the first loss function and the second loss function obtained in the previous steps. The first loss function and the second loss function are continuously generated along with the model loss function respectively optimizing the convolution model, the segmentation model and the regression model, namely different first foreground coordinates and second foreground coordinates are generated, and the first foreground coordinates and the second foreground coordinates are respectively compared with the actual coordinates to respectively generate the corresponding first loss function and the second loss function. And determining a final model loss function by continuously generating a first loss function and a second loss function and calculating the first loss function and the second loss function.

Further, as shown in fig. 4, step S105 may be implemented as the following steps, and step S105 includes two cases, where the first case is specifically as follows:

s401, judging whether the model loss function meets a preset output requirement or not;

s402, if the model loss function does not meet the preset requirement, simultaneously training the convolution model, the segmentation model and the regression model according to the model loss function, and re-executing the steps to input the image sample into the convolution model so as to extract the first image feature of the image sample.

After determining a model loss function according to the first loss function and the second loss function, judging the generated model loss function, and when the final model loss function does not meet the preset output requirement, re-executing the step to input the image sample into the convolution model so as to extract the first image feature of the image sample; inputting the first image feature to a segmentation model to generate first foreground coordinates of an image sample; inputting the first image feature to a regression model to generate second foreground coordinates of the image sample; calculating a model loss function according to the first foreground coordinate and the second foreground coordinate; and training the convolution model, the segmentation model and the regression model simultaneously according to the generated model loss function. The preset output requirement means that when the error between the result respectively output by the convolution model, the segmentation model and the regression model and the real result is minimum, the corresponding model loss function determined according to the first loss function and the second loss function is determined as the final model loss function.

The second case is as follows:

s403, judging whether the model loss function meets the preset output requirement;

s404, if the model loss function meets the preset output requirement, a target positioning model composed of a convolution model and a regression model is generated.

The convolution model and regression model trained by the model loss function meeting the preset output requirement should be the finally determined target positioning model. When the model loss function is judged to meet the preset output requirement, namely the model loss function is determined to be the optimal model loss function, the gradient value of the model loss function is calculated by using a gradient descent algorithm, the corresponding optimization parameter is calculated, the convolution model, the segmentation model and the regression model are trained by using the model parameter, and the target positioning model finally composed of the convolution model and the regression model is generated. The segmentation model is used for training the convolution model so that the convolution model can obtain image features with higher precision in practical application, and meanwhile, the trained regression model has better positioning speed.

In summary, according to the training method for the target positioning model, the convolution model, the segmentation model and the regression model are respectively trained, so that the convolution model in the actual prediction stage obtains better image characteristics, and meanwhile, the regression model is trained to improve the image positioning speed.

Corresponding to the above training method of the target location model, the present application also provides a target location method, as shown in fig. 5:

s501, inputting the target image into a convolution model in the target positioning model to extract a second image feature of the target image;

s502, inputting the second image characteristics into a regression model in the target positioning model to generate foreground coordinates of the target image.

Steps S501 and S502 are a process of performing actual prediction using the target localization pattern determined by the above-described step training. And inputting the target image into a convolution model in the target positioning model to extract a second image characteristic of the target image. The target localization model here is the target localization model determined by training according to the above steps S101 to S105. The convolution model in the target positioning model is optimized according to the parameters of the segmentation model, so that the second image characteristics of the target image can be more accurately extracted. The second image features are input to a regression model in the target location model to generate the final desired foreground coordinates of the target image. The target positioning model used for actual prediction not only can have higher positioning precision for the target image, but also can have faster positioning speed.

In the practical application of image positioning, a segmentation model is abandoned, and a target positioning model composed of a convolution model and a regression model is used, so that the problem of low positioning speed of the segmentation model can be solved, and an image positioning result can be obtained better and faster.

The method described above with reference to fig. 5 is performed using an object localization model, i.e. as shown in fig. 6, the final generated actual object localization model is composed of a convolution model 601 and a regression model 602. The convolution model and the regression model in the target positioning model are utilized to improve the target positioning precision and improve the positioning speed.

In summary, the specific steps of the embodiments of the method described above are as follows:

step 1, inputting an image sample into a convolution model to extract a first image characteristic of the image sample;

step 2, inputting the first image characteristic into a segmentation model to generate a first foreground coordinate of an image sample;

step 3, determining a first loss function according to the difference between the first foreground coordinate and the actual coordinate of the target in the image sample;

step 4, inputting the first image characteristics into a regression model to generate second foreground coordinates of the image sample;

step 5, determining a second loss function according to the difference between the second foreground coordinate and the actual coordinate of the target in the image sample;

step 6, determining a model loss function according to the first loss function and the second loss function;

step 7, judging whether the model loss function meets the preset output requirement or not;

step 8, if the model loss function does not meet the preset output requirement, simultaneously training the convolution model, the segmentation model and the regression model according to the model loss function, and re-executing the step to input the image sample into the convolution model so as to extract the first image characteristic of the image sample;

step 9, if the model loss function meets the preset output requirement, generating a target positioning model consisting of a convolution model and a regression model;

step 10, inputting the target image into a convolution model in the target positioning model to extract a second image characteristic of the target image;

and 11, inputting the second image characteristics into a regression model in the target positioning model to generate foreground coordinates of the target image.

The training method of the target positioning model is realized through the steps 1-9, and the target positioning method is realized through the steps 10-11.

In the embodiment of the application, parameters of the shared convolution layer, the regression layer and the segmentation layer are updated by training a model structure composed of a convolution model, a regression simulation and a segmentation model and by using a model loss function constructed by the regression model and the segmentation model in the training process. In the actual application stage, the segmentation model is abandoned, and the output of the regression model is taken as the final result. In summary, because the convolution model performs feature abstraction and then performs regression model positioning, the training speed of the model is high, but the positioning accuracy is not high, and if segmentation positioning is used, high positioning accuracy can be obtained, but the model is complex and the training time is long. The convolution layer is shared by the regression model and the segmentation model, the segmentation model is used for obtaining better model characteristics in the training stage, the regression model is used in the prediction stage for accelerating the prediction speed, and meanwhile, the prediction precision is improved.

In addition, an embodiment of the present application further provides a training apparatus for a target location model, as shown in fig. 7: a first extraction module 701, configured to input the image sample to a convolution model to extract a first image feature of the image sample;

a first processing module 702, configured to input the first image feature to a segmentation model to generate a first foreground coordinate of an image sample;

a second processing module 703, configured to input the first image feature to a regression model to generate a second foreground coordinate of the image sample;

a first analysis module 704 for calculating a model loss function from the first foreground coordinates and the second foreground coordinates;

a first generating module 705 for training the convolution model, the segmentation model and the regression model simultaneously according to the model loss function to generate a target location model composed of the convolution model and the regression model.

Wherein the first analysis module 704 comprises: the device comprises a first analysis unit, a second analysis unit and a first determination unit;

and the first determining unit calculates according to the first loss function and the second loss function so as to determine a model loss function.

Wherein, the first generating module 705 includes: a first judging unit and a first processing unit;

the first judgment unit is used for judging whether the model loss function meets the preset output requirement or not;

and the first processing unit is used for training the convolution model, the segmentation model and the regression model simultaneously according to the model loss function when the model loss function does not meet the preset output requirement, and driving the first extraction module to work again.

Wherein, the first generating module 705 further includes: a second judging unit and a second generating unit;

The embodiment of the application also comprises a target positioning device, which comprises a second extraction module and a second analysis module;

As shown in fig. 8, a schematic diagram of a computing device provided in the embodiment of the present application, where the computing device 80 includes: a processor 81, a memory 82 and a bus 83, the memory 82 storing instructions for execution, the processor 81 and the memory 82 communicating via the bus 83 when the computing device is operating, the processor 81 executing steps stored in the memory 82 such as performing a training method of an object localization model and an object localization method.

The embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program performs the steps of the method for training the object location model and the method for locating the object in any of the above embodiments.

Specifically, the storage medium can be a general storage medium, such as a mobile disk, a hard disk, or the like, and when a computer program on the storage medium is executed, the above training method and the target positioning method for performing the target positioning model can be executed, so that the convolution model in the actual prediction stage obtains better image features by respectively training the convolution model, the segmentation model, and the regression model, and at the same time, the image positioning speed is improved by using the regression model.

The computer program product for performing the target location model training method and the target location method provided in the embodiments of the present application includes a computer-readable storage medium storing a program code, where instructions included in the program code may be used to execute the method in the foregoing method embodiments, and specific implementations may refer to the method embodiments and are not described herein again.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present application, and are used for illustrating the technical solutions of the present application, but not limiting the same, and the scope of the present application is not limited thereto, and although the present application is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope disclosed in the present application; such modifications, changes or substitutions do not depart from the spirit and scope of the exemplary embodiments of the present application, and are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method for training an object localization model, comprising:

2. The method of claim 1, wherein computing a model penalty function from the first foreground coordinate and the second foreground coordinate comprises:

3. The method of claim 1, wherein simultaneously training the convolution model, the segmentation model, and the regression model according to a model loss function comprises:

if the model loss function does not meet the preset output requirement, the convolution model, the segmentation model and the regression model are trained simultaneously according to the model loss function, and the step is executed again to input the image sample into the convolution model so as to extract the first image feature of the image sample.

4. The method of claim 3, wherein the convolution model, the segmentation model, and the regression model are trained simultaneously according to a model loss function, further comprising:

5. An object localization method, based on the method according to any of claims 1-4, comprising:

inputting the target image into a convolution model in the target positioning model to extract a second image feature of the target image;

6. An apparatus for training an object localization model, comprising:

the first extraction module is used for inputting the image sample into the convolution model so as to extract a first image characteristic of the image sample;

a second processing module for inputting the first image features to a regression model to generate second foreground coordinates of the image sample;

7. An apparatus as claimed in claim 6, wherein the first analysis module comprises: the device comprises a first analysis unit, a second analysis unit and a first determination unit;

8. An apparatus according to claim 6, wherein the first generating module comprises: the device comprises a first judging unit, a first generating unit and a first processing unit;

the first judging unit is used for judging whether the model loss function meets the preset output requirement or not;

9. An apparatus as claimed in claim 8, wherein the first generating module further comprises: a second judging unit and a second generating unit;

10. An apparatus for object localization according to any of claims 6-9, comprising: a second extraction module and a second analysis module;