CN109447169A

CN109447169A - The training method of image processing method and its model, device and electronic system

Info

Publication number: CN109447169A
Application number: CN201811306459.9A
Authority: CN
Inventors: 黎泽明; 俞刚
Original assignee: Beijing Megvii Technology Co Ltd
Current assignee: Yuanli Jinzhi (Chongqing) Technology Co.,Ltd.
Priority date: 2018-11-02
Filing date: 2018-11-02
Publication date: 2019-03-08
Anticipated expiration: 2038-11-02
Also published as: CN109447169B

Abstract

The present invention provides an image processing method and a model training method, device and electronic system; wherein, the method includes: obtaining candidate regions of target training images through a feature extraction network and a region candidate network; Perform instance location and instance segmentation and calculate the loss value to obtain the location area, segmentation area, location loss value and segmentation loss value containing the instance; classify the candidate area through the classification network and calculate the loss value to obtain the classification result of the candidate area and Categorize the loss value; train each of the above networks according to each loss value until each loss value converges, and obtain an image processing model. In the present invention, instance localization and instance segmentation are implemented by the same branch network, so that instance localization and instance segmentation can share feature information and promote each other, which is beneficial to improve the accuracy of instance localization and instance segmentation, thereby improving instance localization, segmentation and The overall accuracy of the classification.

Description

The training method of image processing method and its model, device and electronic system

Technical field

The present invention relates to technical field of image processing, more particularly, to the training side of a kind of image processing method and its model Method, device and electronic system.

Background technique

Example segmentation (Instance Segmentation) is a vital task of computer vision, can be picture In each target provide instance-level detection and segmentation.Example is divided into computer, and more accurately to understand that picture provides important Clue, have important role to fields such as automatic Pilots.In the related technology, example segmentation is mainly based upon classical target Detection method FPN (Feature Pyramid Network, feature pyramid network) is realized, and extends reality on the basis of FPN One branch of real example segmentation.Example has been divided into detection part and partitioning portion by this mode；Wherein, detection part Including location tasks and classification task, realized by same branching networks；Partitioning portion is completed by individual branching networks.

However, aforesaid way realizes partitioning portion by simply increasing branching networks, there is no the spies to each task Property is integrated well.For example, having very big otherness between the two for classification task and location tasks, classification is appointed Business needs global semantic information, and location tasks then need local marginal information；The two realized by same branching networks, Loss of learning is easily caused, causes final example to divide accuracy poor.

Summary of the invention

In view of this, the purpose of the present invention is to provide the training methods of a kind of image processing method and its model, device And electronic system, to improve the accuracy that example positioning is divided with example, and then improve example positioning, segmentation and entirety of classifying Accuracy.

In a first aspect, the embodiment of the invention provides a kind of training methods of image processing model, comprising: by preset Feature extraction network and region candidate network obtain the candidate region of target training image；Pass through preset locating segmentation network pair Candidate region carries out example positioning and example segmentation, and the penalty values of calculated examples positioning and example segmentation, obtains including reality Localization region, cut zone, positioning penalty values and the segmentation penalty values of example；Candidate region is carried out by preset sorter network Classification, and the penalty values of classification are calculated, obtain the classification results and Classification Loss value of candidate region；According to positioning penalty values, divide Penalty values and Classification Loss value is cut to instruct feature extraction network, region candidate network, locating segmentation network and sorter network Practice, until positioning penalty values, segmentation penalty values and Classification Loss value restrain, obtains image processing model.

In preferred embodiments of the present invention, above-mentioned locating segmentation network includes convolutional network；Sorter network includes connecting entirely Connect network.

In preferred embodiments of the present invention, mesh is obtained above by preset feature extraction network and region candidate network The step of marking the candidate region of training image, comprising: feature is carried out to target training image by preset feature extraction network Extraction process obtains the initial characteristics figure of target training image；Fusion Features processing is carried out to initial characteristics figure, it is special to obtain fusion Sign figure；By preset region candidate network, candidate region is extracted from fusion feature figure.

In preferred embodiments of the present invention, it is fixed that example is carried out to candidate region above by preset locating segmentation network The step of position and example segmentation, comprising: by the size adjusting of candidate region to the size to match with convolutional network；Pass through convolution Network carries out example detection processing and example dividing processing to candidate region adjusted, obtain include full instance positioning Region and cut zone；Localization region is identified by detection block；Cut zone passes through color identifier.

In preferred embodiments of the present invention, carried in above-mentioned target training image the corresponding positioning label of each example and Segmentation tag；The step of penalty values of calculated examples positioning and example segmentation, comprising: include by localization region, localization region The corresponding positioning label of example is substituting in preset positioning loss function, obtains positioning penalty values；By cut zone, cut section The corresponding segmentation tag of the example that domain includes is substituting in preset segmentation loss function, obtains segmentation penalty values.

In preferred embodiments of the present invention, the step classified above by preset sorter network to candidate region Suddenly, comprising: by the size adjusting of candidate region to the size to match with fully-connected network；Candidate region adjusted is inputted Into fully-connected network, the classification results of candidate region are exported.

In preferred embodiments of the present invention, the corresponding tag along sort of each example is carried in above-mentioned target training image； The step of calculating the penalty values of classification, comprising: corresponding point of example for will including in the classification results of candidate region, candidate region Class label is substituting in preset Classification Loss function, obtains Classification Loss value.

Second aspect, the embodiment of the invention provides a kind of image processing method, this method is applied to configured at image The equipment for managing model；Image processing model is the image processing model that the training method training of above-mentioned image processing model obtains； This method comprises: obtaining image to be processed；Image to be processed is input in image processing model, is exported each in image to be processed Localization region, cut zone and the classification results of a example.

In preferred embodiments of the present invention, the step of above-mentioned acquisition image to be processed, comprising: the camera shooting for passing through vehicle fills Set acquisition image to be processed；The step of exporting the localization region of each example, cut zone and classification results in image to be processed Later, method further include: generated according to localization region, cut zone and classification results and drive order, so that vehicle is according to driving Order carries out automatic Pilot.

The third aspect, the embodiment of the invention provides a kind of training devices of image processing model, comprising: region obtains mould Block, for obtaining the candidate region of target training image by preset feature extraction network and region candidate network；Positioning point Module is cut, for carrying out example positioning and example segmentation, and calculated examples to candidate region by preset locating segmentation network The penalty values of positioning and example segmentation obtain including the localization region of example, cut zone, positioning penalty values and segmentation loss Value；Categorization module for classifying by preset sorter network to candidate region, and calculates the penalty values of classification, obtains The classification results and Classification Loss value of candidate region；Training module, for being damaged according to positioning penalty values, segmentation penalty values and classification Mistake value is trained feature extraction network, region candidate network, locating segmentation network and sorter network, until positioning loss Value, segmentation penalty values and Classification Loss value restrain, and obtain image processing model.

Fourth aspect, the embodiment of the invention provides a kind of image processing apparatus, device is set to configured with image procossing The equipment of model；Image processing model is the image processing model that the training method training of above-mentioned image processing model obtains；Dress Setting includes: image collection module, for obtaining image to be processed；Image input module, for image to be processed to be input to figure As exporting the localization region of each example, cut zone and classification results in image to be processed in processing model.

5th aspect, the embodiment of the invention provides a kind of electronic system, the electronic system include: image capture device, Processing equipment and storage device；Image capture device, for obtaining preview video frame or image data；It is stored on storage device Computer program, computer program execute the training method such as above-mentioned image processing model when equipment processed is run, or Execute such as above-mentioned image processing method.

6th aspect, the embodiment of the invention provides a kind of computer readable storage medium, computer readable storage mediums On be stored with computer program, when computer program equipment operation processed, executes the training side such as above-mentioned image processing model Method, or execute such as the step of above-mentioned image processing method.

The embodiment of the present invention bring it is following the utility model has the advantages that

The training method of above-mentioned image processing method and its model, device and electronic system, pass through preset feature extraction After network and region candidate network get the candidate region of target training image, by locating segmentation network to candidate region into Row example positioning and example divides and calculates corresponding penalty values, obtain include example localization region and cut zone；Again Classified by sorter network to candidate region and calculate corresponding penalty values, obtains the classification results of candidate region；In turn Network is extracted to features described above according to positioning penalty values, segmentation penalty values and Classification Loss value, region candidate network positions are divided Network and sorter network are trained, until each penalty values restrain, obtain image processing model.In which, example positioning Divide with example and realized using the same branching networks, so that example positioning and example segmentation being capable of sharing feature information and mutual Promote, be conducive to the accuracy for improving example positioning with example segmentation, and then improves example positioning, segmentation and classification entirety Accuracy.

Other features and advantages of the present invention will illustrate in the following description, alternatively, Partial Feature and advantage can be with Deduce from specification or unambiguously determine, or by implementing above-mentioned technology of the invention it can be learnt that.

To enable the above objects, features and advantages of the present invention to be clearer and more comprehensible, better embodiment is cited below particularly, and match Appended attached drawing is closed, is described in detail below.

Detailed description of the invention

It, below will be to specific in order to illustrate more clearly of the specific embodiment of the invention or technical solution in the prior art Embodiment or attached drawing needed to be used in the description of the prior art be briefly described, it should be apparent that, it is described below Attached drawing is some embodiments of the present invention, for those of ordinary skill in the art, before not making the creative labor It puts, is also possible to obtain other drawings based on these drawings.

Fig. 1 is a kind of structural schematic diagram of electronic system provided in an embodiment of the present invention；

Fig. 2 is a kind of flow chart of the training method of image processing model provided in an embodiment of the present invention；

Fig. 3 is a kind of structural schematic diagram of initial characteristics figure provided in an embodiment of the present invention；

Fig. 4 is a kind of schematic diagram of image processing model provided in an embodiment of the present invention；

Fig. 5 is a kind of schematic diagram of image processing model in the prior art provided in an embodiment of the present invention；

Fig. 6 is a kind of flow chart of image processing method provided in an embodiment of the present invention；

Fig. 7 is a kind of structural schematic diagram of the training device of image processing model provided in an embodiment of the present invention.

Specific embodiment

In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with attached drawing to the present invention Technical solution be clearly and completely described, it is clear that described embodiments are some of the embodiments of the present invention, rather than Whole embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art are not making creative work premise Under every other embodiment obtained, shall fall within the protection scope of the present invention.

In view of in existing example partitioning scheme, each task unreasonable distribution, so that final example divides accuracy Poor problem, the embodiment of the invention provides the training method of a kind of image processing method and its model, device and Departments of Electronics System, the technology can be applied in the plurality of devices such as server, computer, camera, mobile phone, tablet computer, vehicle control device, The technology can be used corresponding software and hardware and realize, describe in detail below to the embodiment of the present invention.

Embodiment one:

Firstly, describing the training side of image processing method and its model for realizing the embodiment of the present invention referring to Fig.1 The example electronic system 100 of method, device and electronic system.

A kind of structural schematic diagram of electronic system as shown in Figure 1, electronic system 100 include one or more processing equipments 102, one or more storage devices 104, input unit 106, output device 108 and one or more image capture devices 110, these components pass through the interconnection of bindiny mechanism's (not shown) of bus system 112 and/or other forms.It should be noted that Fig. 1 institute The component and structure for the electronic system 100 shown be it is illustrative, and not restrictive, as needed, the electronic system It can have other assemblies and structure.

The processing equipment 102 can be gateway, or intelligent terminal, or include central processing unit It (CPU) or the equipment of the processing unit of the other forms with data-handling capacity and/or instruction execution capability, can be to institute The data for stating other components in electronic system 100 are handled, and other components in the electronic system 100 can also be controlled To execute desired function.

The storage device 104 may include one or more computer program products, and the computer program product can To include various forms of computer readable storage mediums, such as volatile memory and/or nonvolatile memory.It is described easy The property lost memory for example may include random access memory (RAM) and/or cache memory (cache) etc..It is described non- Volatile memory for example may include read-only memory (ROM), hard disk, flash memory etc..In the computer readable storage medium On can store one or more computer program instructions, processing equipment 102 can run described program instruction, to realize hereafter The client functionality (realized by processing equipment) in the embodiment of the present invention and/or other desired functions.Institute Various application programs and various data can also be stored by stating in computer readable storage medium, such as the application program uses And/or various data generated etc..

The input unit 106 can be the device that user is used to input instruction, and may include keyboard, mouse, wheat One or more of gram wind and touch screen etc..

The output device 108 can export various information (for example, image or sound) to external (for example, user), and It and may include one or more of display, loudspeaker etc..

Described image acquisition equipment 110 can acquire preview video frame or image data, and collected preview is regarded Frequency frame or image data are stored in the storage device 104 for the use of other components.

Illustratively, for realizing the training method of image processing method according to an embodiment of the present invention and its model, dress Each device set in the example electronic system with electronic system can integrate setting, such as can also be set processing with scattering device Standby 102, storage device 104, input unit 106 and output device 108 are integrally disposed in one, and by image capture device 110 It is set to the designated position that can collect target image.When each device in above-mentioned electronic system is integrally disposed, the electronics System may be implemented as camera, smart phone, tablet computer, computer, vehicle control device etc..

Embodiment two:

A kind of training method of image processing model is present embodiments provided, this method is by the processing in above-mentioned electronic system Equipment executes；The processing equipment can be any equipment or chip with data-handling capacity.The processing equipment can be independent The information received is handled, can also be connected with server, information is analyzed and processed jointly, and by processing result It is uploaded to cloud.

Above-mentioned image processing model be mainly used for example segmentation, the example segmentation generally comprise example positioning, segmentation and The operation such as classification；As shown in Fig. 2, the training method of the image processing model includes the following steps:

Step S202 obtains the candidate regions of target training image by preset feature extraction network and region candidate network Domain；

This feature, which extracts network, can pass through the trained acquisitions such as VGG16 network, RseNet network.In general, the target is instructed Practice includes a variety of examples, such as personage, animal, still life in image；Every kind of example can have it is multiple, as in image include three People, respectively personage 1, personage 2, personage 3 etc..The training goal of above-mentioned image processing model is exactly by a variety of realities in image Each example in example and every kind of example positions, divides and identify classification.

Above-mentioned candidate region can be identified by presetting the candidate frame of size, from target training image or target training image Characteristic pattern in choose multiple may include example image region, so that subsequent instance positioning, classification and segmentation use；It is mentioning When taking candidate region, the specification of above-mentioned candidate frame can have it is a variety of, for example, for target training image or target training image Characteristic pattern in a certain pixel, using the pixel as candidate frame center, the size of the candidate frame can change for 2*7, 3*6,5*5,6*3,7*2 etc. are a variety of, to obtain the multiple images region centered on the pixel；And then again with other pixels Point is candidate frame center, obtains the multiple images region centered on the pixel.

After obtaining multiple images region, the processing such as classified to these image-regions, screened is needed, also usually to obtain It may include example image region, these image-regions are above-mentioned candidate region；The process can pass through training in advance Neural fusion.For example, the neural network can be when extracting candidate region in the characteristic pattern from target training image RPN (Region Proposal Network, region candidate network) network.

Step S204 carries out example positioning to candidate region by preset locating segmentation network and example is divided, and counts The penalty values for calculating example positioning and example segmentation, obtain include the localization region of example, cut zone, positioning penalty values and point Cut penalty values；

In view of example positioning and example segmentation task needs use to position sensitive information, as edge feature information, Local edge characteristic information etc., and classifying for task needs to use whole semantic informations；Therefore, by above-mentioned figure in the present embodiment It is two branching networks as handling model partition, one of them is the locating segmentation network for example positioning and example segmentation, The other is the sorter network for classification.

In this division mode, locating segmentation network can concentrate the position sensing information extracted in each candidate region, fixed Position segmentation network can share the characteristic pattern extracted, characteristic information etc. when completing the task of example positioning and example segmentation, So that the locating segmentation network has the work mutually promoted when completing example positioning and example segmentation task each other between task With locating segmentation network of such as mutually promoting improves the ability for finding boundary.Sorter network, which can be concentrated, to be extracted in each candidate region Global semantic information, no longer need to extract position sensing information.

Relative in existing relevant way, example positioning and classification are realized by the same network branches, the network point Branch needs while extracting position sensing information and global semantic information, it is easy to the missing of Partial Feature information is caused, for example, such as The fruit network branches are realized by fully-connected network, then easily cause the marginal information missing for example positioning, cause example fixed Position accuracy is poor；If the network branches are realized by convolutional network, it is unfavorable for extracting global semantic information, causes to classify Accuracy it is poor.In addition, the network branches are needed based on candidate region since example segmentation is by other network branches realization Again characteristic information is extracted, it is difficult to realize the information sharing with example positioning correlated characteristic information.

Specific in above-mentioned S204, the nerve for being conducive to extract position sensing information is can be used in above-mentioned locating segmentation network Network implementations, for example, convolutional network etc.；After candidate region is input to the locating segmentation network, locating segmentation network can be to this Candidate region carries out example positioning and example segmentation simultaneously, can also first carry out example positioning and carry out example segmentation again, final to obtain To the localization region and cut zone for including example；In general, localization region is identified by detection block, such as hough transform frame, the inspection It surveys in frame and typically includes complete example；And the edge of cut zone is the edge of example, the cut zone of different instances It can be distinguished by different colors.

In the training process to image processing model, the accuracy of needs assessment model, therefore, above-mentioned target training figure As in usually be identified in advance each example standard localization region and standard cut zone, be referred to as positioning label and Segmentation tag；After locating segmentation network completes example positioning and example segmentation in each candidate region to target training image, Localization region and the cut zone of each example are exported, then calculates localization region and the positioning of each example by preset loss function The gap of label calculates the cut zone of each example and the gap of segmentation tag, on obtaining to obtain above-mentioned positioning penalty values State segmentation penalty values.

Step S206 classifies to candidate region by preset sorter network, and calculates the penalty values of classification, obtains The classification results and Classification Loss value of candidate region；

The neural fusion for being conducive to extract global semantic information can be used in above-mentioned sorter network, for example, full connection Network etc.；After candidate region is input to the sorter network, sorter network is obtained on the candidate region by modes such as semantic segmentations Semantic information hereafter, and then obtain global semantic information；Classified again based on global semantic information to the candidate region, Obtain classification results；The classification results are specifically as follows class indication, such as personage, ground, cup.For the classification to model As a result it is evaluated, it can be by the tag along sort of each example carried in the classification results and target training image (i.e. standard Classification results) it is compared, it can specifically be calculated between classification results and tag along sort by preset Classification Loss function Gap, to obtain loss Classification Loss value.

Step S208, according to positioning penalty values, segmentation penalty values and Classification Loss value to feature extraction network, region candidate Network, locating segmentation network and sorter network are trained, until positioning penalty values, segmentation penalty values and Classification Loss value are received It holds back, obtains image processing model.

It in the training process, can be to feature extraction by above-mentioned positioning penalty values, segmentation penalty values and Classification Loss value Network, region candidate network, locating segmentation network and sorter network parameter modify so that above-mentioned each penalty values are received It holds back, training process terminates.In above-mentioned training process, used target training image can be multiple, for example, using same Target training image repetition training reuses another target training image repetition training after above-mentioned each penalty values restrain, After each penalty values restrain, third target training image repetition training is reused, and so on, so that the performance of model is become further In stabilization.

The training method of above-mentioned image processing model, behind the candidate region for getting target training image, by preset After feature extraction network and region candidate network get the candidate region of target training image, by locating segmentation network to time Favored area carry out example positioning and example divides and calculates corresponding penalty values, obtain include example localization region and segmentation Region；Classified again by sorter network to candidate region and calculate corresponding penalty values, obtains the classification knot of candidate region Fruit；And then it is fixed to features described above extraction network, region candidate network according to positioning penalty values, segmentation penalty values and Classification Loss value Position segmentation network and sorter network are trained, until each penalty values restrain, obtain image processing model.It is real in which Example positioning and example segmentation are realized using the same branching networks, so that example positioning and example segmentation being capable of sharing feature information And mutually promote, be conducive to the accuracy for improving example positioning and example segmentation, and then improve example positioning, segmentation and classification Whole accuracy.

Embodiment three:

The training method of another image processing model is present embodiments provided, this method is on the basis of the above embodiments It realizes；In the present embodiment, emphasis description obtains the specific implementation of the candidate region of target training image；This method includes such as Lower step:

Step 302, feature extraction processing is carried out to target training image by preset feature extraction network, obtains target The initial characteristics figure of training image；

Wherein, sample image used in training this feature extraction network can be from ImageNet data set or other data Concentration obtains；Training this feature is extracted in network development process, and the performance of Top1 error in classification function evaluation network can be passed through；It should The Top1 error in classification function sample number that specifically to can be expressed as Top1=correct labeling different from the optimum mark that network exports/ Total number of samples.After feature extraction network training, above-mentioned target training image is input to this feature and is extracted in network Export the initial characteristics figure of the target training image.Specifically, which can also be accomplished in the following manner:

Step 1, by the size adjusting of target training image to pre-set dimension, target training image adjusted is carried out white Change processing；

For most of neural network, usually it is only capable of receiving fixed-size image data；Therefore it is being input to feature Before extracting network, the size to the target training image is needed to be adjusted；Specific adjustment mode can be with are as follows: if target is instructed That practices image is longer or wider than pre-set dimension, target training image is compressed to pre-set dimension, or delete extra image district Domain；If the length of target training image is wide less than pre-set dimension, target training image is stretched to pre-set dimension, or fill up The image-region of vacancy.

In general, target training image in imaging process can by environmental illumination intensity, object reflection, shooting camera etc. mostly because The influence of element, in order to reject these influence factors from target training image, making target training image includes not influenced by the external world Constant information, need to target training image carry out whitening processing.Thus, it is above-mentioned that target training image adjusted is carried out Whitening processing, it is understood that for the process to target training image dimensionality reduction adjusted.It is usually right during whitening processing The pixel value of each pixel of target training image is converted to the pixel value of zero-mean and unit variance.Specifically, it is necessary first to The average value mu and variance yields δ for calculating all pixels value of the target training image, by following formula to target training image Each pixel converted: xij=(pij- μ)/δ；Wherein, pij be the i-th row of target training image, jth column pixel it is original Pixel value, xij are the i-th row of target training image, the pixel value after the conversion of jth column pixel.

Step 2, by treated, target training image is input in preset feature extraction network, exports specified number of levels The initial characteristics figure of amount.

In actual implementation, the level quantity that the characteristic pattern of feature extraction network output can be preset, such as the level Quantity can be five layers, be denoted as Conv1, Conv2, Conv3, Conv4, Conv5 respectively.Due to current layer initial characteristics figure by Low one layer of initial characteristics figure of the current layer carries out convolutional calculation by preset convolution kernel and obtains (the initial characteristics of the bottom Figure carries out convolutional calculation by target training image and obtains), the scale of the initial characteristics figure of current layer is less than low one layer of initial spy Sign figure；Therefore, the scale of the initial characteristics figure of the designated layer number of stages of feature extraction network output from bottom to the top is by becoming greatly It is small, and mutual scale is different.

Fig. 3 show the structural schematic diagram of initial characteristics figure；It is said by taking the initial characteristics figure of five levels as an example in Fig. 3 It is bright, it is the initial characteristics figure of the bottom positioned at bottom according to the direction of arrow；It is the initial characteristics of top positioned at top Figure；Features described above is extracted in network and is also generally provided with multilayer convolutional layer；Above-mentioned target training image is input to feature extraction net After network, convolution algorithm first is carried out through first layer convolutional layer, obtains the initial characteristics figure of the bottom；The initial characteristics figure of the bottom is again Convolution algorithm is carried out through second layer convolutional layer, obtains the initial characteristics figure of the second layer, until being obtained most through the last layer convolutional layer The initial characteristics figure of top layer；In general, every layer of convolutional layer carries out the possible difference of convolution kernel used in convolution algorithm；Also, except volume Outside lamination, features described above is extracted in network and is usually also configured with pond layer, full articulamentum etc..

Step 304, Fusion Features processing is carried out to initial characteristics figure, obtains fusion feature figure；

It is obtained since the initial characteristics figure of each level carries out convolution algorithm by different convolution kernels, each level Initial characteristics figure in the feature comprising target training image variety classes or different dimensions；In order to enrich each level initial characteristics The feature that figure includes needs the initial characteristics figure of each level carrying out fusion treatment.Specific fusion process can be there are many shape The initial characteristics figure of current layer is such as merged with upper one layer of current layer of initial characteristics figure, obtains melting for current layer by formula Close characteristic pattern；For another example, before the initial characteristics figure of current layer is merged with upper one layer of current layer of initial characteristics figure, may be used also Merged with the initial characteristics figure combined with other layers or other layers, then by fused initial characteristics figure with upper one layer just Beginning characteristic pattern is merged.

Since scale is different between initial characteristics figure, usually required before being merged to the initial characteristics figure blended into Row pretreatment (such as convolution algorithm, difference operation), so that the scale between the initial characteristics figure blended is mutually matched；It is initial special Dot product, point plus or other logical operations can be carried out when being merged between sign figure, between corresponding characteristic point.

Specifically, above-mentioned steps 304 can also be accomplished in the following manner:

Step 1, the initial characteristics figure of top grade is determined as to the fusion feature figure of top grade；

Since the initial characteristics figure of upper one layer of pole is not present in the initial characteristics figure of top grade, thus to each level Initial characteristics figure carries out in fusion process, and the initial characteristics figure of top grade no longer carries out fusion treatment, directly by the initial spy Sign figure is determined as the fusion feature figure of top grade.

Step 2, in addition to top grade, by the fusion of the initial characteristics figure of current level and a upper level for current level Characteristic pattern is merged, and the fusion feature figure of current level is obtained.

In actual implementation, convolution algorithm can be carried out to the initial characteristics figure of current level by preset convolution kernel, Initial characteristics figure after obtaining convolution algorithm；Wherein, convolution kernel can be 3*3 convolution kernel, naturally it is also possible to use biggish volume Product core, such as 5*5 convolution kernel, 7*7 convolution kernel etc..Further according to the scale of the initial characteristics figure of current level, to current level The fusion feature figure of a upper level carries out interpolation arithmetic, obtains matching with the scale of the initial characteristics figure of current level current The fusion feature figure of a upper level for level.

Since the fusion feature figure of a upper level for current level is less than the initial characteristics figure of current level, for the ease of melting It closes, needs to carry out " stretching " to the ruler with the initial characteristics figure of current level to the fusion feature figure of a upper level for current level Spend it is identical, should the process of " stretching " can be realized by above-mentioned interpolation arithmetic.By taking linear interpolation as an example, simple example illustrates interpolation The process of operation, for example, the numerical value of three characteristic points in part in initial characteristics figure is respectively 5,7,9, in order to make the initial spy Sign Tula extends to default scale, needs to extend to above three characteristic point into five characteristic points, at this time can be by characteristic point 5 and spy The mean value of sign point 7, i.e. characteristic point 6 are inserted between characteristic point 5 and characteristic point 7, by the mean value of characteristic point 7 and characteristic point 9, i.e., special Sign point 9 is inserted between characteristic point 7 and characteristic point 9, so far three characteristic points in part can be extended to five characteristic points, respectively It is 5,6,7,8,9.

In addition to above-mentioned linear interpolation, other interpolation algorithms, such as bilinear interpolation can also be used；Bilinear interpolation is usually divided Interpolation arithmetic is not carried out respectively from the direction x and the direction y；Specifically, four characteristic points are selected first from initial characteristics figure, point Not Wei Q11, Q12, Q21 and Q22, this four characteristic points in initial characteristics figure be in distributed rectangular；In the direction x, the x of Q11 and Q21 The interpolation obtained after the linear interpolation of x coordinate of interpolation point a R1, Q12 and Q22 being obtained after the linear interpolation of coordinate Point R2；Again in the direction y, final difference point P will be obtained after interpolation point R1 and the linear interpolation of interpolation point R2, point P is one Newly-increased feature point after secondary bilinear interpolation.

After the completion of above-mentioned interpolation arithmetic, then by the fusion feature figure of a upper level for the current level after interpolation arithmetic with work as The initial characteristics figure of preceding level carries out the point-by-point sum operation between individual features point, obtains the fusion feature figure of current level.When So the fusion feature figure of a upper level for current level and the initial characteristics figure of current level can also be subjected to individual features point Between point-by-point multiplication operation or other logical operations.

Step 306, by preset region candidate network, candidate region is extracted from above-mentioned fusion feature figure.

Wherein, which is specifically as follows RPN network, which can specifically pass through following manner reality It is existing: on each layer fusion feature figure, to generate a length using the sliding window (when such as n=3, i.e. the sliding window of 3*3 size) of a n*n For the full connection features of 256 or 512 dimension length, connecting entirely for Liang Ge branch then is generated after the feature of this 256 dimension or 512 dimensions Connect layer or convolutional layer, respectively reg-layer and cls-layer；Wherein, reg-layer is used for the center in predicting candidate region The coordinate x, y and width high w, h of the corresponding candidate region of anchor point；And cls-layer for determine the candidate region be prospect or Background, thus screening obtain may include example candidate region.The candidate region is referred to as RoI (Region of Interest, area-of-interest).

Step 308, example positioning is carried out to candidate region by preset locating segmentation network and example is divided, and calculated The penalty values of example positioning and example segmentation obtain including the localization region of example, cut zone, positioning penalty values and segmentation Penalty values；

Step 310, classified by preset sorter network to candidate region, and calculate the penalty values of classification, obtained The classification results and Classification Loss value of candidate region；

Step 312, according to positioning penalty values, segmentation penalty values and Classification Loss value to feature extraction network, region candidate Network, locating segmentation network and sorter network are trained, until positioning penalty values, segmentation penalty values and Classification Loss value are received It holds back, obtains image processing model.

The training method of above-mentioned image processing model extracts the initial spy of target training image by feature extraction network After levying figure, Fusion Features processing is carried out to the initial characteristics figure, obtains fusion feature figure；Again by region candidate network from above-mentioned Candidate region is extracted in fusion feature figure；And then be based on the candidate region again, to above-mentioned locating segmentation network and sorter network into Row training, obtains image processing model.In which, example positioning and example segmentation are realized using the same branching networks, are made Example positioning and example segmentation sharing feature information and can mutually promote, be conducive to improve example positioning and example segmentation Accuracy, and then improve the whole accuracy of example positioning, segmentation and classification.

Example IV:

The training method of another image processing model is present embodiments provided, this method is on the basis of the above embodiments It realizes；In the present embodiment, emphasis description carries out example positioning to candidate region, example is divided and the specific implementation of classification. Since convolutional network is more advantageous to the position sensing information obtained in candidate region, such as edge contextual information；And it connects entirely Network is more advantageous to the global voice messaging obtained in candidate region；Because the locating segmentation network in the present embodiment passes through convolution Network implementations, sorter network are realized by fully-connected network, carry out side caused by example positioning to avoid by fully-connected network The problem of edge contextual information lacks.

Step 402, feature extraction processing is carried out to target training image by preset feature extraction network, obtains target The initial characteristics figure of training image；

Step 404, Fusion Features processing is carried out to initial characteristics figure, obtains fusion feature figure；

Step 406, by preset region candidate network, candidate region is extracted from above-mentioned fusion feature figure.

Step 408, by the size adjusting of candidate region to the size to match with convolutional network；

In general, the image data that convolutional network needs to input has fixed size, such as 14*14,7*7；Such as above-mentioned reality Apply described in example, can by stretching, compression, delete extraneous region, fill up the modes such as area of absence and adjust above-mentioned candidate region Size, so that the size of candidate region and the size of convolutional network match.

Step 410, example detection processing and example dividing processing are carried out to candidate region adjusted by convolutional network, Obtain include full instance localization region and cut zone；The localization region is identified by detection block；The cut zone is logical Cross color identifier.

After candidate region after size adjusting is input to convolutional network, which would generally be extracted in candidate region Location information, with obtain may include to the candidate region example marginal information；Pass through the marginal information got, volume Product network carries out example positioning and segmentation to candidate region, and in most instances, the task of example positioning and example segmentation can be with It carries out simultaneously.In addition, candidate region may not include complete example, and convolutional network can be looked at this time for biggish example Look for identical candidate region or anchor with the anchor point in current candidate region (Anchor, it can be understood as the central point of candidate region) The adjacent candidate region of point, the biggish candidate region of the marginal information degree of correlation is merged, or related based on marginal information Spend biggish candidate region to current candidate region carry out stretch processing, obtain include full instance region；The region can Can relative to example size it is larger, include more background area in the region, around example, then need at this time pair Region is adjusted again, so that the edge of example is close to edges of regions, so that just complete comprising this in final detection block Example.

Above-mentioned localization region is identified by detection block, which is specifically as follows rectangle frame；Include in the detection block The background area on example and the example periphery；The edge of above-mentioned cut zone is usually the edge contour of example, is usually passed through The mode of color filling distinguishes each example；For example, including personage 1, personage 2, cup and animal in target training image；It passes through at this time After example segmentation, personage 1 can be indicated with blue mark, personage 2 with red, and cup green mark, animal is identified with purple.

Above-mentioned convolutional network in the training process, needs to calculate the loss that convolutional network exports result by loss function Value, to evaluate the performance of convolutional network.Thus, above-mentioned target training image usually carry the corresponding positioning label of each example and Segmentation tag；The positioning label can also be identified with detection block, to show the accurate position of the example；The segmentation tag can be with Show the edge contour of the example by lines, which forms region occupied by the example, which can also be filled out with color It fills.

Specifically, it in order to evaluate the example positioning performance of convolutional network, needs to calculate positioning penalty values, can will specifically determine The corresponding positioning label of example that position region, the localization region include is substituting in preset positioning loss function, is positioned Penalty values；The positioning loss function can be Bbox Loss function or other can be used for evaluating the function of position loss.

In order to evaluate the example segmentation performance of convolutional network, need to calculate segmentation penalty values, specifically can by cut zone, The corresponding segmentation tag of the example that the cut zone includes is substituting in preset segmentation loss function, obtains segmentation penalty values. The segmentation loss function can be cross entropy loss function, such as Mask Sigmoid Loss function；It is appreciated that the intersection Entropy loss function can be used for evaluating the positioning loss of above-mentioned localization region.

Step 412, by the size adjusting of candidate region to the size to match with fully-connected network；

In general, fully-connected network needs the image data inputted also to have fixed size, such as 7*7,14*14；Equally Can by stretching, compression, delete extraneous region, fill up the size that the modes such as area of absence adjust above-mentioned candidate region so that The size of candidate region and the size of fully-connected network match.

Step 414, candidate region adjusted is input in fully-connected network, exports the classification results of candidate region.

After candidate region after size adjusting is input to fully-connected network, fully-connected network would generally extract candidate region In semantic information；By the semantic information got, fully-connected network classifies to candidate region；Candidate region is divided Class is mostly based on the example in the candidate region included and classifies.Due to being deposited between the candidate region of same anchor point or adjacent anchor point It is being overlapped, these candidate regions probably include same example, at this point, these candidate regions can be divided into same class Not.

It is usually expressed by class indication in the classification results of fully-connected network output, which can identify every Near the corresponding detection block in the localization region of a example；It therefore, can be with when determining which class indication is each detection block correspond to After localization region or cut zone determine, searched from classification results identical as the localization region or cut zone position Or the classification of similar candidate region, the category can be identified as the localization region or cut zone corresponds to the contingency table of detection block Know.

In addition, if the classification of the same or similar candidate region in the localization region or cut zone position be it is multiple, can To select the biggish classification of weight to correspond to as the localization region or cut zone the class indication of detection block from multiple types.

Above-mentioned fully-connected network in the training process, needs to calculate the loss that convolutional network exports result by loss function Value, to evaluate the performance of convolutional network.Thus, above-mentioned target training image usually carries the corresponding tag along sort of each example； The tag along sort can be correspondingly arranged with the positioning label of examples detailed above, can also be correspondingly arranged with the segmentation tag of example.

In order to evaluate the example positioning performance of fully-connected network, need to calculate Classification Loss value, it specifically can be by candidate regions The corresponding tag along sort of example for including in the classification results in domain, candidate region is substituting in preset Classification Loss function, is obtained To Classification Loss value.The Classification Loss function can be log loss function, quadratic loss function, figure penalties function etc..

Step 416, according to positioning penalty values, segmentation penalty values and Classification Loss value to feature extraction network, region candidate Network, locating segmentation network and sorter network are trained, until positioning penalty values, segmentation penalty values and Classification Loss value are received It holds back, obtains image processing model.

Fig. 4 show a kind of example of the image processing model；Sorter network in model includes two layers of full articulamentum (two The full articulamentum of layer is merely illustrative, is not intended as the restriction to the present embodiment), locating segmentation network includes five layers (five layers of convolutional layer Convolutional layer is merely illustrative, is not intended as the restriction to the present embodiment), respectively CONV1, CONV2, CONV3, CONV4 and DCONV；The candidate area size to match with the sorter network in Fig. 4 matches for the locating segmentation network in 7*7, with Fig. 4 Candidate region be 14*14.After candidate region adjusts size to 7*7, it is input to sorter network, is handled through two layers of full articulamentum Afterwards, output category result and Classification Loss value；After candidate region adjusts size to 14*14, it is input to locating segmentation network, through five After layer convolutional layer processing, output localization region, cut zone, positioning penalty values and segmentation penalty values.

As shown in Figure 4, image processing model provided in this embodiment, example positioning and example segmentation task are by the same volume Product network branches realize that classification task is realized by other fully-connected network；Contrastingly, Fig. 5 is one kind in the prior art Image processing model, the model can be realized by Mask R-CNN network model；It is candidate unlike the model in Fig. 4 The task of the classification in region and example positioning is realized by the same fully-connected network, and the task of example segmentation is by another volume Product network implementations.

In order to further verify the performance of two kinds of models of above-mentioned Fig. 4 and Fig. 5, the present embodiment has carried out replication experiment, under Stating table 1 is experimental result；Wherein, AP represents mask mean accuracy；And mmAP is a kind of a kind of MSCOCO (database-name) Evaluating method, mmAP are result of the AP under different classes of and different scale.In table 1, segmentation mmAP is the mask of example segmentation Mean accuracy, detection mmAP are the mask mean accuracy of example positioning and classification task；Due in two models, the mode of classification Do not change, therefore, data comparison in table 1 is it is found that example positioning and example in image processing model in the present embodiment The mask mean accuracy of segmentation task compared with the existing technology in Mask R-CNN network model be all significantly improved.

Table 1

Model	Divide mmAP	Detect mmAP
			Mask R-CNN network model	34.4	37
Image processing model in the present embodiment	35.4	38.7

In general, fully-connected network can integrate semantic information global in candidate region, but can damage in candidate region Spatial orientation information, therefore the model in Fig. 5 by example position and classify by consolidated network realize, be easy to cause example determine Position and classification conflict with each other so that locating effect is poor, and accuracy is lower.Relative to fully-connected network, convolutional network is fixed to example Position is more friendly, and convolutional network is more suitable for example location tasks.Based on this, in view of example segmentation and example in the present embodiment There is the dependence to object edge feature in positioning, realize that example segmentation and example position by the same convolutional network, and make With loss function monitor model for the performance of example segmentation and example positioning, enable example segmentation and example location tasks phase Mutually promote, and then improves the accuracy of example positioning and example segmentation.

The training method of above-mentioned image processing model is got by preset feature extraction network and region candidate network Behind the candidate region of target training image, example positioning is carried out to candidate region by convolutional network and example is divided, is wrapped Localization region and cut zone containing example；Classified again by fully-connected network to candidate region, obtains candidate region Classification results；And then network, region time are extracted to features described above according to positioning penalty values, segmentation penalty values and Classification Loss value Network selection network, locating segmentation network and sorter network are trained, until each penalty values restrain, obtain image processing model.It should In mode, example positioning and example segmentation are realized using the same branching networks, so that example positioning and example segmentation can be total to Enjoy characteristic information and mutually promote, be conducive to improve example positioning and example segmentation accuracy, and then improve example positioning, The accuracy of segmentation and classification entirety.

Embodiment five:

Corresponding to the training method of the image processing model provided in above-described embodiment, a kind of image is present embodiments provided Processing method, this method are applied to the equipment configured with image processing model；The image processing model is above-described embodiment training Obtained image processing model；As shown in fig. 6, this method comprises the following steps:

Step S602 obtains image to be processed；

Image to be processed is input in image processing model by step S604, exports each example in image to be processed Localization region, cut zone and classification results.

Based on above-mentioned image processing method, the present embodiment also provides a kind of specific application scenarios, i.e., in automatic Pilot Under scene, the step of above-mentioned image to be processed, it can specifically be collected by the photographic device of vehicle；Above-mentioned image procossing mould Type can be only fitted in the central control system of vehicle, and after photographic device collects image to be processed, central control system is by the figure to be processed As being input in image processing model, the localization region of each example in the image to be processed, cut zone and classification knot are exported Fruit, for example, the examples such as line, traffic sign, traffic lights are driven, according to the localization region of these examples, cut zone and classification As a result, central control system can analyze current driving road conditions, out to generate corresponding driving order, so that vehicle is according to driving Order carries out automatic Pilot.

Above-mentioned image processing method, example positioning and example segmentation use the same branch in the image processing model used Network implementations is conducive to improve example positioning so that example positioning and example segmentation sharing feature information and can mutually promote The accuracy divided with example, and then improve the whole accuracy of example positioning, segmentation and classification.

Embodiment six:

Corresponding to above method embodiment, a kind of structure of the training device of image processing model shown in Figure 7 is shown It is intended to, which includes:

Region obtains module 70, for obtaining target training figure by preset feature extraction network and region candidate network The candidate region of picture；

Locating segmentation module 71, for carrying out example positioning and example to candidate region by preset locating segmentation network Segmentation, and calculated examples positioning and example segmentation penalty values, obtain include example localization region, cut zone, positioning Penalty values and segmentation penalty values；

Categorization module 72 for classifying by preset sorter network to candidate region, and calculates the loss of classification Value, obtains the classification results and Classification Loss value of candidate region；

Training module 73 is used for according to positioning penalty values, segmentation penalty values and Classification Loss value to feature extraction network, area Domain candidate network, locating segmentation network and sorter network are trained, until positioning penalty values, segmentation penalty values and Classification Loss Value restrains, and obtains image processing model.

The training device of above-mentioned image processing model is got by preset feature extraction network and region candidate network Behind the candidate region of target training image, candidate region progress example positioning and example are divided and counted by locating segmentation network Calculate corresponding penalty values, obtain include example localization region and cut zone；Again by sorter network to candidate region into Row classifies and calculates corresponding penalty values, obtains the classification results of candidate region；And then according to positioning penalty values, segmentation penalty values It extracts network, region candidate network positions segmentation network and sorter network to features described above with Classification Loss value to be trained, directly It is restrained to each penalty values, obtains image processing model.In which, example positioning and example segmentation use the same branched network Network is realized so that example positioning and example segmentation sharing feature information and can mutually promote, be conducive to improve example positioning and The accuracy of example segmentation, and then improve the whole accuracy of example positioning, segmentation and classification.

Further, above-mentioned locating segmentation network includes convolutional network；Sorter network includes fully-connected network.

Further, above-mentioned zone obtains module, is also used to: by preset feature extraction network to target training image Feature extraction processing is carried out, the initial characteristics figure of target training image is obtained；Fusion Features processing is carried out to initial characteristics figure, is obtained To fusion feature figure；By preset region candidate network, candidate region is extracted from fusion feature figure.

Further, above-mentioned locating segmentation module, is also used to: by the size adjusting of candidate region to convolutional network phase The size matched；Example detection processing and example dividing processing are carried out to candidate region adjusted by convolutional network, wrapped Localization region and cut zone containing full instance；Localization region is identified by detection block；Cut zone passes through color identifier.

Further, the corresponding positioning label of each example and segmentation tag are carried in above-mentioned target training image；It is above-mentioned Locating segmentation module, is also used to: it is preset fixed that the corresponding positioning label of example that localization region, localization region include is substituting to In bit-loss function, positioning penalty values are obtained；The corresponding segmentation tag of example that cut zone, cut zone include is substituting to In preset segmentation loss function, segmentation penalty values are obtained.

Further, above-mentioned categorization module, is also used to: the size adjusting of candidate region is extremely matched with fully-connected network Size；Candidate region adjusted is input in fully-connected network, the classification results of candidate region are exported.

Further, the corresponding tag along sort of each example is carried in above-mentioned target training image；Above-mentioned categorization module, also For: the corresponding tag along sort of the example for including in the classification results of candidate region, candidate region is substituting to preset classification In loss function, Classification Loss value is obtained.

The technical effect and preceding method of image processing model provided by the present embodiment, realization principle and generation are implemented Example is identical, and to briefly describe, Installation practice part does not refer to place, can refer to corresponding contents in preceding method embodiment.

The present embodiment also provides a kind of image processing apparatus, which is set to the equipment configured with image processing model； The image processing model is the image processing model that the training method training of above-mentioned image processing model obtains；The device includes:

Image collection module, for obtaining image to be processed；

Image input module exports each in image to be processed for image to be processed to be input in image processing model Localization region, cut zone and the classification results of a example.

Further, above-mentioned image collection module, is also used to: acquiring image to be processed by the photographic device of vehicle；

Above-mentioned apparatus further include: order generation module, for according to the localization region, the cut zone and described point Class result, which generates, drives order, so that the vehicle is ordered according to the driving carries out automatic Pilot.

Above-mentioned image processing apparatus, example positioning and example segmentation use the same branch in the image processing model used Network implementations is conducive to improve example positioning so that example positioning and example segmentation sharing feature information and can mutually promote The accuracy divided with example, and then improve the whole accuracy of example positioning, segmentation and classification.

Embodiment seven:

The embodiment of the invention provides a kind of electronic system, the electronic system include: image capture device, processing equipment and Storage device；Image capture device, for obtaining preview video frame or image data；Computer journey is stored on storage device Sequence, computer program executes the above-mentioned training method such as image processing model when equipment processed is run, or executes above-mentioned Such as image processing method.

It is apparent to those skilled in the art that for convenience and simplicity of description, the electronics of foregoing description The specific work process of system, can refer to corresponding processes in the foregoing method embodiment, and details are not described herein.

Further, the present embodiment additionally provides a kind of computer readable storage medium, on computer readable storage medium It is stored with computer program, computer program equipment processed executes the above-mentioned training method such as image processing model when running, Or it executes above-mentioned such as image processing method.

Training method, device and the electronic system of a kind of image processing method and its model provided by the embodiment of the present invention Computer program product, the computer readable storage medium including storing program code, the finger that said program code includes Order can be used for executing previous methods method as described in the examples, and specific implementation can be found in embodiment of the method, and details are not described herein.

In addition, in the description of the embodiment of the present invention unless specifically defined or limited otherwise, term " installation ", " phase Even ", " connection " shall be understood in a broad sense, for example, it may be being fixedly connected, may be a detachable connection, or be integrally connected；It can To be mechanical connection, it is also possible to be electrically connected；It can be directly connected, can also can be indirectly connected through an intermediary Connection inside two elements.For the ordinary skill in the art, above-mentioned term can be understood at this with concrete condition Concrete meaning in invention.

It, can be with if the function is realized in the form of SFU software functional unit and when sold or used as an independent product It is stored in a computer readable storage medium.Based on this understanding, technical solution of the present invention is substantially in other words The part of the part that contributes to existing technology or the technical solution can be embodied in the form of software products, the meter Calculation machine software product is stored in a storage medium, including some instructions are used so that a computer equipment (can be a People's computer, server or network equipment etc.) it performs all or part of the steps of the method described in the various embodiments of the present invention. And storage medium above-mentioned includes: that USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), arbitrary access are deposited The various media that can store program code such as reservoir (RAM, Random Access Memory), magnetic or disk.

In the description of the present invention, it should be noted that term " center ", "upper", "lower", "left", "right", "vertical", The orientation or positional relationship of the instructions such as "horizontal", "inner", "outside" be based on the orientation or positional relationship shown in the drawings, merely to Convenient for description the present invention and simplify description, rather than the device or element of indication or suggestion meaning must have a particular orientation, It is constructed and operated in a specific orientation, therefore is not considered as limiting the invention.In addition, term " first ", " second ", " third " is used for descriptive purposes only and cannot be understood as indicating or suggesting relative importance.

Finally, it should be noted that embodiment described above, only a specific embodiment of the invention, to illustrate the present invention Technical solution, rather than its limitations, scope of protection of the present invention is not limited thereto, although with reference to the foregoing embodiments to this hair It is bright to be described in detail, those skilled in the art should understand that: anyone skilled in the art In the technical scope disclosed by the present invention, it can still modify to technical solution documented by previous embodiment or can be light It is readily conceivable that variation or equivalent replacement of some of the technical features；And these modifications, variation or replacement, do not make The essence of corresponding technical solution is detached from the spirit and scope of technical solution of the embodiment of the present invention, should all cover in protection of the invention Within the scope of.Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. a kind of training method of image processing model characterized by comprising

The candidate region of target training image is obtained by preset feature extraction network and region candidate network；

Example positioning is carried out to the candidate region by preset locating segmentation network and example is divided, and calculates the example The penalty values of positioning and example segmentation obtain including the localization region of example, cut zone, positioning penalty values and segmentation Penalty values；

Classified by preset sorter network to the candidate region, and calculate the penalty values of the classification, obtained described The classification results and Classification Loss value of candidate region；

According to the positioning penalty values, the segmentation penalty values and the Classification Loss value to the feature extraction network, described Region candidate network, the locating segmentation network and the sorter network are trained, until the positioning penalty values, described point It cuts penalty values and the Classification Loss value restrains, obtain image processing model.

2. the method according to claim 1, wherein the locating segmentation network includes convolutional network；Described point Class network includes fully-connected network.

3. the method according to claim 1, wherein described pass through preset feature extraction network and region candidate Network obtains the step of candidate region of target training image, comprising:

Feature extraction processing is carried out to target training image by preset feature extraction network, obtains the target training image Initial characteristics figure；

Fusion Features processing is carried out to the initial characteristics figure, obtains fusion feature figure；

By preset region candidate network, candidate region is extracted from the fusion feature figure.

4. according to the method described in claim 2, it is characterized in that, by preset locating segmentation network to the candidate region The step of carrying out example positioning and example segmentation, comprising:

By the size adjusting of the candidate region to the size to match with the convolutional network；

Example detection processing and example dividing processing are carried out to the candidate region adjusted by the convolutional network, obtained It include localization region and the cut zone of full instance；The localization region is identified by detection block；The cut zone is logical Cross color identifier.

5. according to the method described in claim 4, it is characterized in that, to carry each example in the target training image corresponding Position label and segmentation tag；

The step of penalty values for calculating example positioning and example segmentation, comprising: by the localization region, described The corresponding positioning label of the example that localization region includes is substituting in preset positioning loss function, obtains positioning penalty values；

The corresponding segmentation tag of example that the cut zone, the cut zone include is substituting to preset segmentation loss letter In number, segmentation penalty values are obtained.

6. according to the method described in claim 2, it is characterized in that, being carried out by preset sorter network to the candidate region The step of classification, comprising:

By the size adjusting of the candidate region to the size to match with the fully-connected network；

The candidate region adjusted is input in the fully-connected network, the classification results of the candidate region are exported.

7. according to the method described in claim 6, it is characterized in that, to carry each example in the target training image corresponding Tag along sort；

The step of penalty values for calculating the classification, comprising: by the classification results of the candidate region, the candidate region In include the corresponding tag along sort of example be substituting in preset Classification Loss function, obtain Classification Loss value.

8. a kind of image processing method, which is characterized in that the method is applied to the equipment configured with image processing model；It is described Image processing model is the image processing model that the method training of any one of claim 1 to 7 obtains；The described method includes:

Obtain image to be processed；

The image to be processed is input in described image processing model, each example determines in the output image to be processed Position region, cut zone and classification results.

9. according to the method described in claim 8, it is characterized in that, the step of obtaining image to be processed, comprising: pass through vehicle Photographic device acquires image to be processed；

It is described after the step of exporting the localization region of each example, cut zone and classification results in the image to be processed Method further include: generated according to the localization region, the cut zone and the classification results and drive order, so that the vehicle According to the driving order carry out automatic Pilot.

10. a kind of training device of image processing model characterized by comprising

Region obtains module, for obtaining the time of target training image by preset feature extraction network and region candidate network Favored area；

Locating segmentation module, for carrying out example positioning and example point to the candidate region by preset locating segmentation network Cut, and calculate the penalty values of example positioning and example segmentation, obtain include example localization region, cut section Domain, positioning penalty values and segmentation penalty values；

Categorization module for classifying by preset sorter network to the candidate region, and calculates the damage of the classification Mistake value obtains the classification results and Classification Loss value of the candidate region；

Training module is used for according to the positioning penalty values, the segmentation penalty values and the Classification Loss value to the feature It extracts network, the region candidate network, the locating segmentation network and the sorter network to be trained, until the positioning Penalty values, the segmentation penalty values and the Classification Loss value restrain, and obtain image processing model.

11. a kind of image processing apparatus, which is characterized in that described device is set to the equipment configured with image processing model；Institute Stating image processing model is the image processing model that the method training of any one of claim 1 to 7 obtains；Described device packet It includes:

Image collection module, for obtaining image to be processed；

Image input module exports described to be processed for the image to be processed to be input in described image processing model The localization region of each example, cut zone and classification results in image.

12. a kind of electronic system, which is characterized in that the electronic system includes: image capture device, processing equipment and storage dress It sets；

Described image acquires equipment, for obtaining preview video frame or image data；

Computer program is stored on the storage device, the computer program executes such as when being run by the processing equipment The described in any item methods of claim 1 to 7, or execute method as claimed in claim 8 or 9.

13. a kind of computer readable storage medium, computer program, feature are stored on the computer readable storage medium It is, the computer program equipment processed executes method as described in any one of claim 1 to 7 when running, or holds The step of row method as claimed in claim 8 or 9.