CN114202026B - Multi-task model training method and device, multi-task processing method and device - Google Patents
Multi-task model training method and device, multi-task processing method and device Download PDFInfo
- Publication number
- CN114202026B CN114202026B CN202111508235.8A CN202111508235A CN114202026B CN 114202026 B CN114202026 B CN 114202026B CN 202111508235 A CN202111508235 A CN 202111508235A CN 114202026 B CN114202026 B CN 114202026B
- Authority
- CN
- China
- Prior art keywords
- initial
- network
- image
- branch
- task
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000012549 training Methods 0.000 title claims abstract description 131
- 238000000034 method Methods 0.000 title claims abstract description 78
- 238000003672 processing method Methods 0.000 title description 7
- 238000012545 processing Methods 0.000 claims description 41
- 238000001514 detection method Methods 0.000 claims description 40
- 230000011218 segmentation Effects 0.000 claims description 19
- 238000004590 computer program Methods 0.000 claims description 14
- 230000004044 response Effects 0.000 claims description 9
- 230000000694 effects Effects 0.000 abstract description 12
- 238000013135 deep learning Methods 0.000 abstract description 4
- 230000006870 function Effects 0.000 description 17
- 238000004891 communication Methods 0.000 description 8
- 230000008569 process Effects 0.000 description 7
- 238000010586 diagram Methods 0.000 description 6
- 238000013473 artificial intelligence Methods 0.000 description 4
- 238000004422 calculation algorithm Methods 0.000 description 4
- 238000002372 labelling Methods 0.000 description 4
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 230000000007 visual effect Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 238000007781 pre-processing Methods 0.000 description 2
- 230000004913 activation Effects 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000011478 gradient descent method Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000001953 sensory effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Physics & Mathematics (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Medical Informatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Image Analysis (AREA)
Abstract
The disclosure provides a multi-task model training method and device, and relates to the technical fields of computer vision, deep learning and the like. The specific implementation scheme is as follows: acquiring a training sample set comprising at least one type of initial image; acquiring a pre-established multi-task network, wherein a general feature extractor in the multi-task network is respectively connected with each initial branch network through branch nodes; selecting an initial image from the training sample set, and inputting the selected initial image into a general feature extractor to obtain a feature map corresponding to the selected initial image; inputting the feature map into an initial branch network corresponding to the identification element of the feature map aiming at each feature map in the obtained feature map; and acquiring gradient values of all initial branch networks in the branch nodes, adjusting loss weight values of corresponding initial branch networks based on the gradient values of all initial branch networks, and responding to the condition that the multi-task network meets training completion conditions to obtain a multi-task model. This embodiment balances the training effects of multiple tasks.
Description
Technical Field
The present disclosure relates to the field of computer technology, and in particular, to the technical field of computer vision, deep learning, and the like, and more particularly, to a method and apparatus for training a multitasking model, a method and apparatus for processing a multitasking model, an electronic device, a computer readable storage medium, and a computer program product.
Background
With the development of AI (ARTIFICIAL INTELLIGENCE ) technology, the hardware performance of the mobile terminal is improved, the price is reduced, and the defect of positioning accuracy can be overcome on the mobile terminal by utilizing a visual algorithm, for example, lane-level navigation or other functions requiring high-accuracy positioning are realized on the mobile terminal, so that the method has important significance for automatic driving and map data production. Because the computing capacity of the mobile terminal is limited, more recognition elements are required to be recognized on the premise of ensuring the execution efficiency of the algorithm, the multi-task model is a necessary scheme, training of the model needs to balance training strengths of different tasks, and the best effect of the multi-task model on each task is ensured.
Disclosure of Invention
The present disclosure provides a multitasking model training method and apparatus, a multitasking method and apparatus, an electronic device, a computer readable storage medium, and a computer program product.
According to a first aspect, there is provided a method of training a multitasking model, the method comprising: acquiring a training sample set comprising at least one type of initial image, wherein each type of initial image is marked with at least one type of identification element; acquiring a pre-established multi-task network, wherein the multi-task network comprises a general feature extractor and initial branch networks corresponding to various identification elements one by one, and the general feature extractor is respectively connected with each initial branch network through branch nodes; the following training steps are performed: selecting initial images from the training sample set, inputting the selected initial images into a general feature extractor to obtain feature images corresponding to the selected initial images one by one, wherein identification elements of the selected initial images correspond to all initial branch networks; inputting the feature map into an initial branch network corresponding to the identification element of the feature map aiming at each feature map in the obtained feature map; and acquiring gradient values of all initial branch networks in the branch nodes, adjusting loss weight values of corresponding initial branch networks based on the gradient values of all initial branch networks, and responding to the condition that the multi-task network meets training completion conditions to obtain a multi-task model.
According to a second aspect, there is provided a multitasking method comprising: acquiring an image to be processed; inputting the image to be processed into a multi-task model generated by adopting the method described in any implementation manner of the first aspect, and obtaining a multi-task processing result of the image to be processed.
According to a third aspect, there is provided a multitasking model training apparatus, the apparatus comprising: a sample acquisition unit configured to acquire a training sample set including at least one type of initial image, wherein each type of initial image is labeled with at least one type of identification element; a network acquisition unit configured to acquire a pre-established multi-task network including a general feature extractor and initial branch networks corresponding to the various identification elements one by one, the general feature extractor being connected to the respective initial branch networks through branch nodes, respectively; the image selecting unit is configured to select initial images from the training sample set, input the selected initial images into the universal feature extractor to obtain feature images corresponding to the selected initial images one by one, and the identification elements of the selected initial images correspond to all the initial branch networks; a feature input unit configured to input, for each of the obtained feature graphs, the feature graph into an initial branch network corresponding to an identification element of the feature graph; the gradient adjusting unit is configured to acquire gradient values of all initial branch networks in the branch nodes and adjust loss weight values of the corresponding initial branch networks based on the gradient values of all the initial branch networks; and the model obtaining unit is configured to obtain the multi-task model when the multi-task network meets the training completion condition.
According to a fourth aspect there is provided a multitasking apparatus comprising: an acquisition unit configured to acquire an image to be processed; an input unit configured to input an image to be processed into a multitasking model generated by the apparatus described in any implementation manner of the third aspect, to obtain a multitasking result of the image to be processed.
According to a fifth aspect, there is provided an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method as described in any one of the implementations of the first or second aspect.
According to a sixth aspect, there is provided a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform a method as described in any implementation of the first or second aspect.
According to a seventh aspect, there is provided a computer program product comprising a computer program which, when executed by a processor, implements a method as described in any of the implementations of the first or second aspects.
The embodiment of the disclosure provides a method and a device for training a multitask model, firstly, a training sample set comprising at least one type of initial image is obtained, and each type of initial image is marked with at least one type of identification element; secondly, acquiring a pre-established multi-task network, wherein the multi-task network comprises a general feature extractor and initial branch networks corresponding to various identification elements one by one, and the general feature extractor is respectively connected with each initial branch network through branch nodes; thirdly, selecting an initial image from the training sample set, inputting the selected initial image into a general feature extractor to obtain a feature map corresponding to the selected initial image one by one, wherein identification elements of the selected initial image correspond to all initial branch networks; inputting the feature map into an initial branch network corresponding to the identification element of the feature map aiming at each feature map in the obtained feature map from time to time; and finally, acquiring gradient values of all initial branch networks in the branch nodes, adjusting loss weight values of the corresponding initial branch networks based on the gradient values of all the initial branch networks, and responding to the condition that the multi-task network meets the training completion condition to obtain a multi-task model. Therefore, the general feature extractor and the multi-task network of the plurality of initial branch networks are arranged, and the gradient of each initial branch network is adopted to adjust the loss weight value of the initial branch network during multi-task training, so that a multi-task model based on gradient balance is obtained, the training effect of a plurality of tasks is balanced, and the best effect of the multi-task model on each task is ensured.
The embodiment of the disclosure provides a multitasking method and a multitasking device, which are used for acquiring an image to be processed; inputting the image to be processed into a multi-task model generated by the multi-task model training method of the embodiment to obtain a multi-task processing result of the image to be processed. Therefore, the task processing effect of each task can be balanced by adopting the general feature extractor and the multi-task model generated by a plurality of initial branch networks, and the efficiency of multi-task processing is improved.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.
Drawings
The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1 is a flow chart of one embodiment of a method of multitasking model training according to the present disclosure;
FIG. 2 is a schematic diagram of a architecture for training a multi-tasking network in an embodiment of the present disclosure;
FIG. 3 is a flow chart of one embodiment of a method of multitasking in accordance with the present disclosure;
FIG. 4 is a schematic diagram of the architecture of one embodiment of a multitasking model training apparatus according to the present disclosure;
FIG. 5 is a schematic diagram of an embodiment of a multitasking device according to the present disclosure;
fig. 6 is a block diagram of an electronic device for implementing a multitasking model training method or multitasking method of an embodiment of the disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Based on the fact that a traditional multi-task model with multiple recognition elements needs to be labeled with a full dataset, or else a plurality of different task models are adopted, the disclosure proposes a multi-task model training method based on gradient balancing, and fig. 1 shows a flow 100 according to one embodiment of the multi-task model training method of the disclosure, where the multi-task model training method includes the following steps:
step 101, a training sample set including at least one type of initial image is obtained.
In this embodiment, the execution body on which the multitasking model training method operates may acquire the training sample set in various manners, for example, the execution body may acquire the training sample set stored therein from the database server through a wired connection manner or a wireless connection manner. For another example, a user may obtain a training sample set collected by a terminal by communicating with the terminal.
Here, the training sample set may include at least one type of initial image, and the initial images are classified into different classes according to different tasks implemented by different samples when the multi-task model is trained, where each type of initial image may implement sample labeling of an identification element corresponding to at least one task, for example, one type of initial image in the training sample set is used for a target detection task, another type of initial image in the training sample set is used for semantic segmentation and key point detection, and the initial image in the current training sample set may also be used for other visual tasks, which are not described herein.
In this embodiment, each type of initial image is labeled with at least one type of identification element, the identification element is a detection target that needs to be processed by the multitasking model, the detection target can be a person, an object, a scene, etc. in the image, the multitasking model generally has more than two detection targets, the identification element is labeled in the initial image, so that the multitasking network can accurately determine the identification element, and true value information of the detection target is provided for the multitasking model training. When the initial image is used for realizing a task, the initial image is marked with a type of identification elements; when the initial image is used for realizing more than two tasks, more than two types of identification elements are marked on the initial image; as shown in fig. 2, the identification element 1 (not shown in the figure) and the identification element 2 (not shown in the figure) are marked on the class a image, so that two tasks can be realized; the identification element 3 (not shown in the figure) is marked on the B-type image, and a task can be realized.
In this embodiment, obtaining a training sample set including at least one type of initial image includes: and adopting independent data reading modules for images of different identification elements, realizing independent data preprocessing according to task requirements, and processing different format information. For example, the preprocessing methods required by tasks such as semantic segmentation, target detection, key point detection and the like are different in processing of images of each type of recognition elements, so that the images are completely decoupled according to different recognition elements, and a training sample set is obtained.
In the technical scheme of the disclosure, the related processes of collecting, storing, using, processing, transmitting, providing, disclosing and the like of the initial image and the identification element are performed after being authorized, and accord with related laws and regulations.
Step 102, obtaining a pre-established multi-task network.
The multi-task network comprises a general feature extractor and initial branch networks corresponding to various identification elements one by one, and the general feature extractor is respectively connected with each initial branch network through branch nodes.
In this embodiment, the general feature extractor maps the input image into a high-dimensional feature space to obtain a feature map of the input image, where the feature map includes features of all recognition elements of the input image, and the general feature extractor is a network shared by all initial branch networks; the initial branch networks correspond to the tasks and the identification elements of the multi-task model respectively, the number of the initial branch networks is the same as that of the tasks of the multi-task model, the number of the initial branch networks is also the same as that of the identification elements, and each initial branch network performs task processing on the corresponding identification elements in the feature map of the input image based on the feature map of the input image to obtain task processing results of the corresponding identification elements.
In this embodiment, the general feature extractor extracts all features of the input initial image to obtain a feature map of the input image, splits the feature map into feature maps corresponding to different recognition elements based on the difference of the recognition elements, and inputs the feature map of each recognition element into an initial branch network corresponding to the recognition element to obtain a task processing result of each initial branch network for performing task processing on the respective feature map.
Based on different scenes to which the multi-task model is adapted, the initial branch network in the multi-task network can also be different in network structure, for example, in an automatic driving scene, the multi-task model needs, lane line segmentation, pedestrian detection and the like, and the multi-task network needs a network structure comprising a semantic segmentation network and a target detection network.
In some optional implementations of this embodiment, the initial branching network may include: any two of semantic segmentation network, target detection network and key point detection network.
In this embodiment, the semantic segmentation network takes a feature map of some original data (for example, a planar image) as input and converts the feature map into a mask with a highlighted region of interest, where each pixel in the image is assigned to a corresponding class according to the object of interest to which it belongs, and the semantic segmentation network of this embodiment directly takes the feature map of the initial image as input, compared to the traditional semantic segmentation network which takes the extracted feature map as an intermediate link.
In this embodiment, the object detection network takes a feature map of some original data as input, finds all interested objects in the feature map, and determines the position and size of the objects. Compared with the traditional target detection network, the target detection network of the embodiment takes the extracted feature map as an intermediate link, and the target detection network directly adopts the feature map of the initial image as input.
In this embodiment, the key point detection network takes a feature map of some original data as input, finds all interesting key points in the feature map, and determines the positional relationship between the key points. Compared with the traditional key point detection network, the key point detection network of the embodiment directly adopts the feature map of the initial image as input by taking the extracted feature map as an intermediate link.
In the alternative implementation mode, based on the task of the multi-task model, a plurality of initial branch networks are set, a plurality of alternative modes are provided for the representation of the initial branch networks, and the diversity of the initial branch network setting is improved.
And 103, selecting an initial image from the training sample set, and inputting the selected initial image into a general feature extractor to obtain a feature map corresponding to the selected initial image one by one.
Wherein the identification elements of the selected initial image correspond to all the initial branch networks.
In this embodiment, the executing body may select the initial image from the training sample set acquired in step 101, and execute the training steps from step 103 to step 106. The selection manner and the selection number of the initial images are not limited in the application. For example, in one iteration training, an initial image of a type is randomly selected, and at least two identification elements are marked on the initial image of the type; or randomly selecting two types of initial images in one iteration training, labeling one type of identification elements on each type of initial image, calculating the loss value of the multi-task network according to labeling information of the identification elements of the selected initial image, and adjusting parameters of the multi-task network.
In this embodiment, the general feature extractor is mainly used to map the selected initial image to the high-dimensional feature space, so as to obtain the high-dimensional feature. The generic feature extractor may be an encoder, e.g. the feature extractor is made up of two layers of DNN, each layer of DNN being 512 dimensions.
In this embodiment, the feature map corresponds to each type of initial image, for example, if one type of initial image is input, the generic feature extractor outputs the feature map of the type of initial image; and inputting the multi-type initial image, and outputting the feature map of the multi-type initial image by the general feature extractor. As shown in fig. 2, a class a image is input, and a class a feature map is output; and outputting the B-class feature map by inputting the B-class image.
Based on the different number of the selected initial images, the mode of obtaining the feature images corresponding to the selected initial images one by one is also different. In some optional implementations of this embodiment, the inputting the selected initial image into the universal feature extractor to obtain a feature map corresponding to the selected initial image one-to-one includes: responding to the selected initial images in multiple categories, superposing the multiple categories of initial images and inputting the superposed images into a general feature extractor to obtain a feature map output by the general feature extractor; and splitting the feature map output by the general feature extractor according to the types of the multi-type initial images to obtain feature maps corresponding to the multi-type initial images.
In the alternative implementation mode, when the selected initial images are of multiple types, multiple types of initial images are simultaneously input into the universal feature extractor in a superposition mode, the universal feature extractor is split according to the types of the multiple types of initial images, feature images corresponding to the various types of initial images can be effectively obtained, and effectiveness of subsequent initial branch network training is guaranteed.
Optionally, the inputting the selected initial image into the universal feature extractor to obtain a feature map corresponding to the selected initial image one-to-one, including: and in response to the selected initial image being one type, inputting the initial image into a general feature extractor to obtain a feature map output by the general feature extractor, wherein the feature map output by the general feature extractor is the feature map of all initial branch networks.
In this optional implementation manner, the feature map corresponding to the initial image may be used as input of each initial branch network, so that each initial branch network may process each identification element in the feature map corresponding to the initial image.
Step 104, inputting the feature map into the initial branch network corresponding to the identification element of the feature map for each feature map in the obtained feature map.
In this embodiment, during each iteration training of the multitasking network, a feature map is obtained from the general feature extractor, the feature map is split according to the type of the initial image, and each feature map is input into the initial branch network corresponding to the identification element of the initial image, as in fig. 2, if the identification element 1 and the identification element 2 are marked in the class a image, a class a feature map corresponding to the class a image is obtained from the general feature extractor, and the class a feature map is input into the initial branch network 1 and the initial branch network 2 respectively, at this time, the class a feature map is input into the initial branch network 1 and the initial branch network 2 respectively. If the identification element 3 is marked in the B-class image, a B-class feature map corresponding to the B-class image is obtained from the general feature extractor, and the B-class feature map is input into the initial branch network 3.
As shown in fig. 2, the initial images corresponding to different recognition elements are combined and then input into a general feature extractor of the multi-task network, the general feature extractor is utilized to extract features of the input initial images, the extracted feature images are split according to different recognition elements and then are respectively transmitted into independent initial branch networks, each initial branch network is subjected to independent recognition element processing, and the recognition element processing can be multiple kinds of target detection task processing or multiple kinds of target recognition task processing such as detection, semantic segmentation and key point detection. Because different initial branch networks correspond to independent loss functions, when each identification element is marked with data, each identification element can be marked only, and the prediction condition of other initial branch networks is not needed to be considered.
Step 105, collecting gradient values of each initial branch network in the branch nodes, and adjusting loss weight values of the corresponding initial branch networks based on the gradient values of each initial branch network.
In this embodiment, based on the structure of the multi-task network, independent loss functions may be set for different initial branch networks, and the loss functions of different initial branch networks are calculated in each iterative training of the multi-task network, as shown in fig. 2, the identification element 1 processed for the initial branch network 1 may independently calculate the loss function of the initial branch network 1, to obtain a loss value of the loss function of the initial branch network 1, and based on the loss value of the initial branch network 1, parameters of the multi-task network are adjusted.
The intention of the gradient is a vector that indicates that the directional derivative of a certain loss function at that point takes a maximum along that direction, i.e. the loss function changes the fastest along that direction at that point with the greatest rate of change. In deep learning, the main task of the neural network is to find the optimal network parameters (weights and biases) at the time of learning, which are the parameters at which the loss function is minimum. However, in general, the loss function is relatively complex, and the parameters are too many to specify the point at which the minimum value is obtained. The method of finding the minimum value (or the smallest value possible) by the gradient is the gradient descent method. In order to make the loss function of the initial branch network drop most quickly, a gradient descent algorithm can also be used to update the parameters of the multi-task network along the negative direction of the gradient.
Since the multi-tasking network will share most of the network structure (generic feature extractor in fig. 2), different initial branching networks will have an impact on feature activation of the generic feature extractor and thus a collision situation may occur. To address this problem, the loss weights of the different initial branch networks may be adjusted according to the gradient magnitude of each initial branch network at the common feature extractor. The training step of the multi-task network comprises multiple iterative training, wherein each iterative training can acquire gradient values of branch nodes connecting the general feature extractor and each initial branch network through a tool, and the gradient values corresponding to the loss functions of each initial branch network can be distinguished and obtained based on the acquired gradient values.
In this embodiment, during each iterative training, the loss value of the loss function of the multi-task network is calculated once, the loss value of the multi-task network is obtained by adding all the initial branch network product values, the product value of each initial network is obtained by multiplying the loss value of each initial branch network by the respective loss weight value, the proportion of the initial branch network in the multi-task network can be adjusted by adjusting the loss weight value of the initial branch network, and the larger the loss weight value is, the larger the training share of the multi-task network is.
In some optional implementations of this embodiment, the adjusting the loss weight value of the corresponding initial branch network based on the gradient value of each initial branch network includes: and setting the loss weight value of the current initial branch network to be smaller than the loss weight values of other initial branch networks in the next iteration training period in response to the gradient values of the current initial branch network in all the initial branch networks in the current iteration training period being larger than the gradient values of other initial branch networks.
In this alternative implementation, the iterative training period refers to a time period during which the loss function calculation and parameter adjustment are completed by all the initial branch networks in the multi-task network in the current iterative training.
In the alternative implementation manner, the gradient of the initial branch network in the current iterative training period is monitored, the loss weight value of the current initial branch network in the next iterative training period is adjusted, an adjusting means is provided for training of the multi-task network, and the reliability of the multi-task network training is ensured.
Optionally, the adjusting the loss weight value of the corresponding initial branch network based on the gradient value of each initial branch network further includes: and setting the loss weight value of the current initial branch network to be smaller than the loss weight values of other initial branch networks after the repeated iterative training period in response to the gradient values of the current initial branch network in all the initial branch networks being larger than the gradient values of other initial branch networks in the repeated iterative training period.
In the alternative implementation mode, after the gradient value of the current initial branch network is monitored for a plurality of iteration cycles, the loss weight value of the current initial branch network is adjusted, so that a reliable basis is provided for stable training of the multi-task network.
And step 106, responding to the multi-task network to meet the training completion condition, and obtaining a multi-task model.
In this embodiment, whether the multi-task network meets the training completion condition can be detected through the loss value of the multi-task network, and after the multi-task network meets the training completion condition, a multi-task model with the training completed is obtained.
In this embodiment, the training completion condition includes at least one of: the training iteration number of the multi-task network reaches a predetermined iteration threshold, and the multi-task network loss value is smaller than the predetermined loss value threshold. Wherein the predetermined iteration threshold is an empirical value derived based on a loss value of the multitasking network. For example, the predetermined iteration threshold for a multitasking network is 1 ten thousand times. The predetermined loss value threshold for the multitasking network is 0.02.
Optionally, in this embodiment, in response to the multitasking network not meeting the training completion condition, the relevant parameters in the multitasking network are adjusted so that the loss value of the multitasking network converges, and based on the adjusted multitasking network, the training steps 103 to 106 are continuously performed.
In this optional implementation manner, when the multi-task network does not meet the training completion condition, relevant parameters of the multi-task network are adjusted, which is helpful to help the convergence of the loss value of the multi-task network.
In this embodiment, if training is not completed, the loss value of the multi-tasking network may be converged by adjusting the parameters of the multi-tasking network. Specifically, adjusting the relevant parameters in the multi-tasking network such that the loss values of the multi-tasking network converge comprises: and (3) repeatedly adjusting the parameters of any one initial branch network or the loss weight value of any one initial branch network in the multi-task network by executing the steps 103 to 106 so as to enable the loss value of the multi-task network to be converged.
Optionally, in each iteration process, parameters of more than two initial branch networks can be adjusted simultaneously, so as to ensure that the loss value of the multi-task network becomes smaller gradually until the loss value is stable.
The method for training the multi-task model comprises the steps of firstly, obtaining a training sample set comprising at least one type of initial image, wherein each type of initial image is marked with at least one type of identification element; secondly, acquiring a pre-established multi-task network, wherein the multi-task network comprises a general feature extractor and initial branch networks corresponding to various identification elements one by one, and the general feature extractor is respectively connected with each initial branch network through branch nodes; thirdly, selecting an initial image from the training sample set, inputting the selected initial image into a general feature extractor to obtain a feature map corresponding to the selected initial image one by one, wherein identification elements of the selected initial image correspond to all initial branch networks; inputting the feature map into an initial branch network corresponding to the identification element of the feature map aiming at each feature map in the obtained feature map from time to time; and finally, acquiring gradient values of all initial branch networks in the branch nodes, adjusting loss weight values of the corresponding initial branch networks based on the gradient values of all the initial branch networks, and responding to the condition that the multi-task network meets the training completion condition to obtain a multi-task model. Therefore, the general feature extractor and the multi-task network of the plurality of initial branch networks are arranged, and the gradient of each initial branch network is adopted to adjust the loss weight value of the initial branch network during multi-task training, so that a multi-task model based on gradient balance is obtained, the training effect of a plurality of tasks is balanced, and the best effect of the multi-task model on each task is ensured.
In another embodiment of the present disclosure, the above-mentioned multi-task model training method further includes: acquiring a newly added image, wherein the newly added image is marked with at least one type of newly added elements; adding newly added branch networks corresponding to all newly added elements in the multi-task model so that the universal feature extractor is also connected with the newly added branch networks through branch nodes; the following new training steps are performed: selecting a new added image and an initial image, and inputting the selected new added image and the selected initial image into a general feature extractor simultaneously to obtain a new feature image; inputting the feature images which are split from the new feature images and correspond to the selected initial images into each initial branch network in sequence; inputting the feature images which are split from the new feature images and correspond to the selected new images into a new branch network; and acquiring gradient values of each initial branch network and the newly-added branch network in the branch node, and adjusting loss weight values of the corresponding initial branch network and/or the newly-added branch network based on the gradient values of each initial branch network and the gradient values of the newly-added branch network.
In the embodiment, the newly added image is an image different from the initial image, and the labeling information of the newly added element is labeled on the newly added image, so that true value information is provided for the newly added element, and training of the newly added branch network is facilitated; the newly added branch network is a network different from each of the initial branch networks, and the newly added branch network can realize different tasks from each of the initial branch networks.
In this embodiment, after the new image is input to the general feature extractor, a feature map corresponding to the new image is generated correspondingly, and the feature map corresponding to the new image is input to the new branch network, so that training of the new branch network is facilitated.
In this embodiment, when the multi-task model with the newly added branch network meets the training completion condition, a new multi-task model is obtained, and the new multi-task model has the newly added branch network compared with the multi-task model, so that new task processing can be realized.
According to the multi-task model training method provided by the embodiment of the disclosure, new newly-added elements can be arbitrarily expanded on the generated multi-task model, and meanwhile, only the newly-added data sets corresponding to the newly-added elements are required to be marked and newly-added branch networks are added, so that the arbitrary expansion of different elements is realized.
In another embodiment of the present disclosure, the above-mentioned multi-task model training method may further include: one or more initial branch networks in the multitasking model are removed.
According to the multi-task model training method provided by the embodiment of the disclosure, one or more initial branch networks can be removed from the trained multi-task model, and after the initial branch networks are removed, the performance of a new multi-task model is not affected, so that the expandability of the multi-task model is ensured.
Optionally, after the new multi-task model adds the new added branch network, in another optional implementation manner of this embodiment, the multi-task model training method may further include: one or more newly added branch networks in the new multitasking model are removed.
Further, based on the multi-task model training method provided by the above embodiment, the present disclosure further provides an embodiment of a multi-task processing method, and the multi-task processing method of the present disclosure combines the artificial intelligence fields of computer vision, deep learning, and the like.
Referring to fig. 3, a flow 300 is shown according to one embodiment of the disclosed multitasking method, which includes the steps of:
Step 301, an image to be processed is acquired.
In this embodiment, the image to be processed may be an image including information such as a person, an object, a scene, and the like, and different task processing results may be obtained by processing the image to be processed through the multitasking model. The execution subject of the multitasking method may acquire an image to be processed in various ways. For example, the execution subject may acquire the image to be processed stored therein from the database server by a wired connection or a wireless connection. For another example, the execution subject may also receive, in real time, the image to be processed acquired by the terminal or other devices in real time.
In this embodiment, the acquired image to be processed may or may not have an identification element, and when the image to be processed has an identification element, the identification element on the image to be processed may be one type or may be multiple types, each type of identification element corresponds to a task, and the identification element may be effectively identified based on a general feature extractor in the multitasking model and an initial branch network corresponding to the identification element, so as to obtain a task processing result of the identification element.
When the image to be processed does not have the identification element, the multitasking model can directly give the task processing result of the identification element which is not detected.
In this embodiment, the identification element corresponds to a task, for example, in a target detection task, the identification element is a target corresponding to the target detection task, and the target may be a person, an object, or the like in an image to be processed; in the semantic segmentation task, the identification elements are pixel categories of different objects in the image to be processed to be marked in the semantic segmentation task.
Step 302, inputting the image to be processed into the multi-task model to obtain a multi-task processing result of the image to be processed.
In this embodiment, the execution subject may input the image to be processed acquired in step 301 into the multitasking model, so as to obtain a multitasking result of the acquired image to be processed. It should be noted that the multitasking result is a result obtained by performing various task processing on the image to be processed, and based on the structure of the multitasking model, the obtained multitasking result can improve the efficiency of all task processing.
In this embodiment, the multitasking model may be trained by using the method described in the embodiment of fig. 1, and the specific training process may be described in the embodiment of fig. 1, which is not described herein.
In this embodiment, the multitasking result of the image to be processed is determined based on the initial branch network or/and the newly added branch network of the multitasking model, and the number of the initial branch network and the newly added branch network in the multitasking model is the number of the multitasking results of the image to be processed. For example, the multitasking model has only initial branch networks, and the number of the initial branch networks is two, and the multitasking result of the image to be processed is two. For another example, if the multitasking model includes two initial branch networks and three newly added branch networks, the multitasking result of the image to be processed is five.
In some alternative implementations of the present embodiment, the initial branching network of the multitasking model includes: any two or more of the semantic segmentation network, the target detection network and the key point detection network, the multi-task processing result comprises: at least two of semantic segmentation results, target detection results and key point detection results of targets in the image to be processed.
In this alternative implementation manner, the tasks based on the multitasking model are different, the obtained multitasking results are different, and the presentation forms of the obtained multitasking results may also be different after the multitasking model processes the image to be processed.
In the alternative implementation mode, based on the task of the multi-task model, the set multi-task processing result expression form provides various alternative modes for representing the multi-task processing result, and improves the diversity of processing the image to be processed by the multi-task model.
The multi-task processing method provided by the embodiment of the disclosure obtains an image to be processed; inputting the image to be processed into a multi-task model generated by the multi-task model training method of the embodiment to obtain a multi-task processing result of the image to be processed. Therefore, the task processing effect of each task can be balanced by adopting the general feature extractor and the multi-task model generated by a plurality of initial branch networks, and the efficiency of multi-task processing is improved.
With further reference to fig. 4, as an implementation of the method illustrated in the above figures, the present disclosure provides an embodiment of a multitasking model training apparatus, which corresponds to the method embodiment illustrated in fig. 1, and which is particularly applicable in various electronic devices.
As shown in fig. 4, the multi-task model training device 400 provided in this embodiment includes: a sample acquisition unit 401, a network acquisition unit 402, an image selection unit 403, a feature input unit 404, a gradient adjustment unit 405, and a model acquisition unit 406. The sample acquiring unit 401 may be configured to acquire a training sample set including at least one type of initial image, where each type of initial image is labeled with at least one type of identification element. The network acquisition unit 402 may be configured to acquire a pre-established multi-tasking network, where the multi-tasking network includes a generic feature extractor and initial branch networks corresponding to each type of identification element one to one, and the generic feature extractor is connected to each of the initial branch networks through a branch node. The image selection unit 403 may be configured to select an initial image from the training sample set, input the selected initial image into the universal feature extractor, and obtain a feature map corresponding to the selected initial image one by one, where the identification elements of the selected initial image correspond to all the initial branch networks. The feature input unit 404 may be configured to input, for each of the obtained feature maps, the feature map into an initial branch network corresponding to the identification element of the feature map. The gradient adjustment unit 405 may be configured to collect gradient values of each initial branch network in the branch nodes, and adjust the loss weight value of the corresponding initial branch network based on the gradient values of each initial branch network. The model obtaining unit 406 may be configured to obtain the multitasking model when the multitasking network satisfies the training completion condition.
In this embodiment, in the multitasking model training apparatus 400: the specific processing and the technical effects of the sample acquiring unit 401, the network acquiring unit 402, the image selecting unit 403, the feature input unit 404, the gradient adjusting unit 405, and the model obtaining unit 406 may refer to the relevant descriptions of the steps 101, 102, 103, 104, 105, and 106 in the corresponding embodiment of fig. 1, and are not described herein again.
In some optional implementations of the present embodiment, the image selecting unit 403 includes: the modules (not shown) are superimposed, resulting in a module (not shown). The superimposing module may be configured to, in response to the selected initial image being of multiple types, superimpose the multiple types of initial images into the common feature extractor to obtain a feature map output by the common feature extractor. The obtaining module may be configured to split the feature map output by the general feature extractor according to the types of the multiple types of initial images, so as to obtain feature maps corresponding to the multiple types of initial images.
In some optional implementations of this embodiment, the apparatus 400 further includes: a new acquisition unit (not shown in the figure), a network addition unit (not shown in the figure), a new selection unit (not shown in the figure), an initial input unit (not shown in the figure), and a new input unit (not shown in the figure). The new image acquisition unit is configured to acquire a new image, and the new image is marked with at least one type of new elements. The above network adding unit may be configured to add an added branch network corresponding to all the added elements in the multitasking model, so that the generic feature extractor is further connected to the added branch network through the branch node. The newly added selecting unit may be configured to select a newly added image and an initial image, and input the selected newly added image and the selected initial image into the general feature extractor at the same time, so as to obtain a new feature map. The initial input unit may be configured to sequentially input the feature maps corresponding to the selected initial images, which are split from the new feature maps, into the respective initial branch networks. The newly added input unit may be configured to input the feature map corresponding to the selected newly added image, which is split from the new feature map, into the newly added branch network. The newly-added adjustment unit may be configured to collect gradient values of each initial branch network and the newly-added branch network in the branch node, and adjust loss weight values of the corresponding initial branch network and/or the newly-added branch network based on the gradient values of each initial branch network and the gradient values of the newly-added branch network.
In some optional implementations of this embodiment, the initial branching network includes: any two of semantic segmentation network, target detection network and key point detection network.
In some optional implementations of this embodiment, the gradient adjustment unit 405 is further configured to: and setting the loss weight value of the current initial branch network to be smaller than the loss weight values of other initial branch networks in the next iteration training period in response to the gradient values of the current initial branch network in all the initial branch networks in the current iteration training period being larger than the gradient values of other initial branch networks.
In some optional implementations of this embodiment, the sample device 400 further includes: a removal unit (not shown in the figure). Wherein the removing unit may be configured to remove one or more initial branch networks in the multitasking model.
The embodiment of the present disclosure provides a multitasking model training device, firstly, a sample acquisition unit 401 acquires a training sample set including at least one type of initial image, and each type of initial image is labeled with at least one type of identification element; secondly, the network acquisition unit 402 acquires a pre-established multi-task network, wherein the multi-task network comprises a general feature extractor and initial branch networks corresponding to various identification elements one by one, and the general feature extractor is respectively connected with each initial branch network through branch nodes; thirdly, the image selecting unit 403 selects initial images from the training sample set, inputs the selected initial images into the general feature extractor to obtain feature images corresponding to the selected initial images one by one, and the identification elements of the selected initial images correspond to all initial branch networks; from time to time, the feature input unit 404 inputs, for each of the obtained feature graphs, the feature graph into an initial branch network corresponding to the identification element of the feature graph; then, the gradient adjustment unit 405 collects gradient values of each initial branch network in the branch nodes, and adjusts loss weight values of the corresponding initial branch networks based on the gradient values of each initial branch network; finally, the model obtaining unit 406 obtains a multitasking model in response to the multitasking network satisfying the training completion condition. Therefore, the general feature extractor and the multi-task network of the plurality of initial branch networks are arranged, and the gradient of each initial branch network is adopted to adjust the loss weight value of the initial branch network during multi-task training, so that a multi-task model based on gradient balance is obtained, the training effect of a plurality of tasks is balanced, and the best effect of the multi-task model on each task is ensured.
With further reference to fig. 5, as an implementation of the method shown in the above figures, the present disclosure provides an embodiment of a multitasking device, which corresponds to the method embodiment shown in fig. 3, and which is particularly applicable in various electronic apparatuses.
As shown in fig. 5, the multitasking device 500 provided in this embodiment includes: an acquisition unit 501, an input unit 502. The acquiring unit 501 may be configured to acquire an image to be processed. The input unit 502 may be configured to input the image to be processed into a multitasking model generated by the apparatus as described in the embodiment of fig. 3, to obtain a multitasking result of the image to be processed.
In the present embodiment, in the multitasking device 500: the specific processing of the obtaining unit 501 and the input unit 502 and the technical effects thereof may refer to the related descriptions of step 301 and step 302 in the corresponding embodiment of fig. 3, and are not described herein.
In some alternative implementations of the present embodiment, the multitasking results include: at least two of semantic segmentation results, target detection results and key point detection results of targets in the image to be processed.
In the technical scheme of the disclosure, the related processes of collecting, storing, using, processing, transmitting, providing, disclosing and the like of the personal information of the user accord with the regulations of related laws and regulations, and the public order colloquial is not violated.
According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.
Fig. 6 illustrates a schematic block diagram of an example electronic device 600 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 6, the apparatus 600 includes a computing unit 601 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 602 or a computer program loaded from a storage unit 608 into a Random Access Memory (RAM) 603. In the RAM603, various programs and data required for the operation of the device 600 may also be stored. The computing unit 601, ROM 602, and RAM603 are connected to each other by a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
Various components in the device 600 are connected to the I/O interface 605, including: an input unit 606 such as a keyboard, mouse, etc.; an output unit 607 such as various types of displays, speakers, and the like; a storage unit 608, such as a magnetic disk, optical disk, or the like; and a communication unit 609 such as a network card, modem, wireless communication transceiver, etc. The communication unit 609 allows the device 600 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.
The computing unit 601 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 601 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 601 performs the respective methods and processes described above, such as a multitasking model training method or a multitasking method. For example, in some embodiments, the multitasking model training method or multitasking processing method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 608. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 600 via the ROM 602 and/or the communication unit 609. When the computer program is loaded into the RAM 603 and executed by the computing unit 601, one or more steps of the above-described multitasking model training method or multitasking method may be performed. Alternatively, in other embodiments, the computing unit 601 may be configured to perform the multitasking model training method or the multitasking processing method in any other suitable way (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable multi-task model training device, multi-task processing device, such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel, sequentially, or in a different order, provided that the desired results of the disclosed aspects are achieved, and are not limited herein.
The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.
Claims (17)
1. A method of multitasking model training, the method comprising:
Acquiring a training sample set comprising at least one type of initial image, wherein each type of initial image is marked with at least one type of identification element;
the method comprises the steps of obtaining a pre-established multi-task network, wherein the multi-task network comprises a general feature extractor and initial branch networks corresponding to various identification elements one by one, and the general feature extractor is respectively connected with each initial branch network through branch nodes;
The following training steps are performed:
Selecting initial images from the training sample set, inputting the selected initial images into the general feature extractor to obtain feature images corresponding to the selected initial images one by one, wherein identification elements of the selected initial images correspond to all initial branch networks;
inputting the feature map into an initial branch network corresponding to the identification element of the feature map aiming at each feature map in the obtained feature map;
Acquiring gradient values of each initial branch network in the branch nodes, adjusting loss weight values of the corresponding initial branch networks based on the gradient values of each initial branch network, and responding to the condition that the multi-task network meets training completion to obtain a multi-task model, wherein the adjusting the loss weight values of the corresponding initial branch networks based on the gradient values of each initial branch network comprises the following steps: and setting the loss weight value of the current initial branch network to be smaller than the loss weight values of other initial branch networks in the next iteration training period in response to the gradient values of the current initial branch network in all the initial branch networks in the current iteration training period being larger than the gradient values of other initial branch networks.
2. The method of claim 1, wherein the inputting the selected initial image into the generic feature extractor results in a feature map that corresponds one-to-one to the selected initial image, comprising:
responding to the selected initial images in multiple categories, and superposing and inputting the multiple categories of initial images into the general feature extractor to obtain a feature map output by the general feature extractor;
and splitting the feature map output by the general feature extractor according to the types of the multi-type initial images to obtain feature maps corresponding to the multi-type initial images.
3. The method of claim 1, the method further comprising:
Acquiring a new added image, wherein the new added image is marked with at least one kind of new added elements;
Adding newly added branch networks corresponding to all newly added elements in the multi-task model so that the universal feature extractor is also connected with the newly added branch networks through the branch nodes;
the following new training steps are performed:
Selecting a new added image and an initial image, and inputting the selected new added image and the selected initial image into the universal feature extractor simultaneously to obtain a new feature image;
Inputting the feature images which are split from the new feature images and correspond to the selected initial images into each initial branch network in sequence;
Inputting the feature images which are split from the new feature images and correspond to the selected new images into the new branch network;
And acquiring gradient values of each initial branch network and the newly-added branch network in the branch node, and adjusting loss weight values of the corresponding initial branch network and/or the newly-added branch network based on the gradient values of each initial branch network and the gradient values of the newly-added branch network.
4. The method of claim 1, wherein the initial branched network comprises: any two of semantic segmentation network, target detection network and key point detection network.
5. The method of claim 1, the method further comprising:
One or more initial branch networks in the multitasking model are removed.
6. A method of multitasking, the method comprising:
acquiring an image to be processed;
inputting the image to be processed into a multi-task model generated by the method according to any one of claims 1-5, and outputting a multi-task processing result of the image to be processed.
7. The method of claim 6, wherein the multitasking results comprise:
at least two of semantic segmentation results, target detection results and key point detection results of targets in the image to be processed.
8. A multitasking model training apparatus, the apparatus comprising:
a sample acquisition unit configured to acquire a training sample set including at least one type of initial image, wherein each type of initial image is labeled with at least one type of identification element;
a network acquisition unit configured to acquire a pre-established multi-task network including a general feature extractor and initial branch networks corresponding to each type of identification elements one by one, the general feature extractor being connected to each initial branch network through branch nodes, respectively;
The image selecting unit is configured to select initial images from the training sample set, input the selected initial images into the general feature extractor to obtain feature images corresponding to the selected initial images one by one, and the identification elements of the selected initial images correspond to all initial branch networks;
A feature input unit configured to input, for each of the obtained feature graphs, the feature graph into an initial branch network corresponding to an identification element of the feature graph;
A gradient adjustment unit configured to acquire gradient values of respective initial branch networks in the branch nodes, and adjust loss weight values of the respective initial branch networks based on the gradient values of the respective initial branch networks, wherein the adjusting the loss weight values of the respective initial branch networks based on the gradient values of the respective initial branch networks includes: in response to the gradient values of the current initial branch network being greater than the gradient values of the other initial branch networks in all the initial branch networks in the current iterative training period, setting the loss weight value of the current initial branch network to be smaller than the loss weight values of the other initial branch networks in the next iterative training period;
and the model obtaining unit is configured to obtain a multi-task model when the multi-task network meets the training completion condition.
9. The apparatus of claim 8, wherein the image pick unit comprises:
the superposition module is configured to respond to the selected initial images in multiple categories, and input the multiple categories of initial images into the universal feature extractor in a superposition way to obtain a feature map output by the universal feature extractor;
The obtaining module is configured to split the feature map output by the general feature extractor according to the types of the multiple types of initial images to obtain feature maps corresponding to the multiple types of initial images.
10. The apparatus of claim 8, the apparatus further comprising:
A new-addition acquisition unit configured to acquire a new-addition image, the newly added image is marked with at least one type of newly added elements;
A network adding unit configured to add an added branch network corresponding to all added elements in the multitasking model, so that the generic feature extractor is further connected with the added branch network through the branch node;
A new selection unit configured to select a new image and an initial image, and input the selected new image and the selected initial image into the general feature extractor simultaneously to obtain a new feature map;
an initial input unit configured to input feature maps of the corresponding selected initial images split from the new feature maps into respective initial branch networks in sequence;
a new-addition input unit configured to input a feature map of a newly-added image selected correspondingly, split from the new feature map, into the newly-added branch network;
The newly-added adjusting unit is configured to collect the gradient values of each initial branch network and the newly-added branch network in the branch node, and adjust the loss weight value of the corresponding initial branch network and/or the newly-added branch network based on the gradient values of each initial branch network and the gradient values of the newly-added branch network.
11. The apparatus of claim 8, wherein the initial branched network comprises: any two of semantic segmentation network, target detection network and key point detection network.
12. The apparatus of claim 8, the apparatus further comprising:
A removal unit configured to remove one or more initial branch networks in the multitasking model.
13. A multitasking apparatus, the apparatus comprising:
an acquisition unit configured to acquire an image to be processed;
An input unit configured to input the image to be processed into a multitasking model generated using the apparatus according to any one of claims 8 to 12, and output a multitasking result of the image to be processed.
14. The apparatus of claim 13, wherein the multitasking results comprise:
at least two of semantic segmentation results, target detection results and key point detection results of targets in the image to be processed.
15. An electronic device, comprising:
At least one processor; and
A memory communicatively coupled to the at least one processor; wherein,
The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-7.
16. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-7.
17. A computer program product comprising a computer program which, when executed by a processor, implements the method of any of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111508235.8A CN114202026B (en) | 2021-12-10 | 2021-12-10 | Multi-task model training method and device, multi-task processing method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111508235.8A CN114202026B (en) | 2021-12-10 | 2021-12-10 | Multi-task model training method and device, multi-task processing method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114202026A CN114202026A (en) | 2022-03-18 |
CN114202026B true CN114202026B (en) | 2024-10-01 |
Family
ID=80652203
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111508235.8A Active CN114202026B (en) | 2021-12-10 | 2021-12-10 | Multi-task model training method and device, multi-task processing method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114202026B (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114723966B (en) * | 2022-03-30 | 2023-04-07 | 北京百度网讯科技有限公司 | Multi-task recognition method, training method, device, electronic equipment and storage medium |
CN114913371B (en) * | 2022-05-10 | 2024-10-22 | 平安科技(深圳)有限公司 | Multi-task learning model training method and device, electronic equipment and storage medium |
CN114897162A (en) * | 2022-05-18 | 2022-08-12 | Oppo广东移动通信有限公司 | Training method, selection method and device of object selection model and electronic equipment |
CN115114439B (en) * | 2022-08-30 | 2022-11-18 | 北京百度网讯科技有限公司 | Method and device for multi-task model reasoning and multi-task information processing |
CN117556322A (en) * | 2023-11-15 | 2024-02-13 | 北京航迹科技有限公司 | Multitasking data processing method and multitasking model training method |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111931929A (en) * | 2020-07-29 | 2020-11-13 | 深圳地平线机器人科技有限公司 | Training method and device of multi-task model and storage medium |
CN112016576A (en) * | 2019-05-30 | 2020-12-01 | 浙江商汤科技开发有限公司 | Method for training neural network, image processing method, apparatus, device, and medium |
CN113344048A (en) * | 2021-05-25 | 2021-09-03 | 上海商汤智能科技有限公司 | Multi-task behavior recognition model training method, device, equipment and storage medium |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10467500B1 (en) * | 2018-12-31 | 2019-11-05 | Didi Research America, Llc | Method and system for semantic segmentation involving multi-task convolutional neural network |
CN113392866A (en) * | 2020-11-19 | 2021-09-14 | 腾讯科技(深圳)有限公司 | Image processing method and device based on artificial intelligence and storage medium |
-
2021
- 2021-12-10 CN CN202111508235.8A patent/CN114202026B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112016576A (en) * | 2019-05-30 | 2020-12-01 | 浙江商汤科技开发有限公司 | Method for training neural network, image processing method, apparatus, device, and medium |
CN111931929A (en) * | 2020-07-29 | 2020-11-13 | 深圳地平线机器人科技有限公司 | Training method and device of multi-task model and storage medium |
CN113344048A (en) * | 2021-05-25 | 2021-09-03 | 上海商汤智能科技有限公司 | Multi-task behavior recognition model training method, device, equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN114202026A (en) | 2022-03-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN114202026B (en) | Multi-task model training method and device, multi-task processing method and device | |
CN113657465B (en) | Pre-training model generation method and device, electronic equipment and storage medium | |
CN114120253B (en) | Image processing method, device, electronic equipment and storage medium | |
CN113378770B (en) | Gesture recognition method, device, equipment and storage medium | |
CN113361710B (en) | Student model training method, picture processing device and electronic equipment | |
CN114186632A (en) | Method, device, equipment and storage medium for training key point detection model | |
CN113642431A (en) | Training method and device of target detection model, electronic equipment and storage medium | |
CN113947188A (en) | Training method of target detection network and vehicle detection method | |
CN113627536B (en) | Model training, video classification method, device, equipment and storage medium | |
CN113657466B (en) | Pre-training model generation method and device, electronic equipment and storage medium | |
CN115456167B (en) | Lightweight model training method, image processing device and electronic equipment | |
CN114186681A (en) | Method, apparatus and computer program product for generating model clusters | |
CN113792876B (en) | Backbone network generation method, device, equipment and storage medium | |
CN114332509A (en) | Image processing method, model training method, electronic device and automatic driving vehicle | |
CN113657468A (en) | Pre-training model generation method and device, electronic equipment and storage medium | |
CN114169425A (en) | Training target tracking model and target tracking method and device | |
CN114973333B (en) | Character interaction detection method, device, equipment and storage medium | |
CN115273148B (en) | Pedestrian re-recognition model training method and device, electronic equipment and storage medium | |
CN117274370A (en) | Three-dimensional pose determining method, three-dimensional pose determining device, electronic equipment and medium | |
CN113537309B (en) | Object identification method and device and electronic equipment | |
CN113139463B (en) | Method, apparatus, device, medium and program product for training a model | |
CN113869147A (en) | Target detection method and device | |
CN113591567A (en) | Target detection method, training method of target detection model and device thereof | |
CN113378773B (en) | Gesture recognition method, gesture recognition device, gesture recognition apparatus, gesture recognition storage medium, and gesture recognition program product | |
CN114494818B (en) | Image processing method, model training method, related device and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |