CN111626098B - Method, device, equipment and medium for updating parameter values of model - Google Patents
Method, device, equipment and medium for updating parameter values of model Download PDFInfo
- Publication number
- CN111626098B CN111626098B CN202010275896.XA CN202010275896A CN111626098B CN 111626098 B CN111626098 B CN 111626098B CN 202010275896 A CN202010275896 A CN 202010275896A CN 111626098 B CN111626098 B CN 111626098B
- Authority
- CN
- China
- Prior art keywords
- sub
- output
- submodels
- model
- models
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 54
- 238000012549 training Methods 0.000 claims description 58
- 238000012545 processing Methods 0.000 claims description 51
- 238000012360 testing method Methods 0.000 claims description 48
- 238000009826 distribution Methods 0.000 claims description 25
- 238000010586 diagram Methods 0.000 claims description 22
- 238000004590 computer program Methods 0.000 claims description 12
- 238000003860 storage Methods 0.000 claims description 8
- 238000000605 extraction Methods 0.000 claims description 5
- 238000012216 screening Methods 0.000 claims description 5
- 238000012805 post-processing Methods 0.000 claims description 3
- 230000008569 process Effects 0.000 description 11
- 239000013598 vector Substances 0.000 description 10
- 230000002860 competitive effect Effects 0.000 description 7
- 230000006870 function Effects 0.000 description 6
- 230000000717 retained effect Effects 0.000 description 5
- 230000004048 modification Effects 0.000 description 4
- 238000012986 modification Methods 0.000 description 4
- 230000009471 action Effects 0.000 description 3
- 238000012512 characterization method Methods 0.000 description 3
- 238000012937 correction Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000004927 fusion Effects 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000002349 favourable effect Effects 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000000638 stimulation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/12—Fingerprints or palmprints
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/30—Scenes; Scene-specific elements in albums, collections or shared content, e.g. social network photos or video
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/161—Detection; Localisation; Normalisation
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Multimedia (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Human Computer Interaction (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
Abstract
The embodiment of the invention provides a method, a device, equipment and a medium for updating parameter values of a model, which comprises the following steps: obtaining a sample image carrying a label, and inputting the sample image into a preset model to be trained, wherein the preset model comprises a plurality of submodels, and each submodel is used for identifying the sample image; obtaining an identification result output after the sample image is respectively identified by the plurality of sub-models; weighting the recognition results output by the sub-models respectively according to the weights corresponding to the sub-models respectively to obtain processed recognition results; determining loss differences between the processed recognition results and recognition results output by the sub models respectively; determining the overall loss value of the preset model according to each loss difference, the processed identification result, the label and the identification result output by each of the plurality of submodels; and updating the parameter values of the sub models respectively according to the overall loss value.
Description
Technical Field
The present invention relates to the field of machine learning technologies, and in particular, to a method, an apparatus, a device, and a medium for updating a parameter value of a model.
Background
Neural networks are learned in a variety of ways. Competitive learning refers to the right of all units in a network unit population to compete with each other for responses to external stimulus patterns. The connection rights of the competing winning cells change towards a more favorable competition for this stimulation pattern.
For image recognition problems, competitive learning can be typically employed to build the model. In this case, the competitive learning includes inter-class competition in the model parameter learning process, performance competition of output results of each submodel when multiple models are learned together, and the like.
In the related art, in the competitive learning process, one model to be trained includes n submodels, competitive learning exists among the n submodels, generally, in order to improve the instantaneity of Inference (a process of inputting untrained images into a trained model for testing), only the submodel with the optimal performance among the n submodels is usually selected when the model finally falls to the ground, and other submodels are discarded. Although the real-time performance of Inference can be improved more efficiently by this method, the actual performance of the selected optimal submodel is not good, and the accuracy and efficiency of image recognition by using the selected optimal submodel are not expected.
Disclosure of Invention
In view of the foregoing problems, embodiments of the present invention provide a method, an apparatus, a device, and a storage medium for updating parameter values of a model, so as to overcome or at least partially solve the above problems.
In order to solve the above problem, a first aspect of the present invention discloses a method for updating parameter values of a model, including:
obtaining a sample image carrying a label, and inputting the sample image into a preset model to be trained, wherein the preset model comprises a plurality of submodels, and each submodel is used for identifying the sample image;
obtaining a recognition result output after the sample image is recognized by each of the plurality of sub-models;
weighting the recognition results output by the sub models according to the weights corresponding to the sub models respectively to obtain processed recognition results;
determining loss differences between the processed recognition results and recognition results output by the sub models respectively;
determining an overall loss value of the preset model according to each loss difference, the processed identification result, the label and the identification result output by each of the plurality of sub-models;
and updating the parameter values of the sub-models respectively according to the overall loss value.
Optionally, after updating the parameter values of the plurality of submodels respectively according to the overall loss value, the method further includes:
determining the parameter average value of each of the plurality of sub-models in a plurality of rounds of training before the round of training;
and updating the updated parameter values of the sub-models again according to a preset coefficient, the updated parameter values of the sub-models and the average parameter values of the sub-models in multiple rounds of training before the round of training to obtain new parameter values of the sub-models after the round of training is finished.
Optionally, determining a loss difference between each of the processed recognition results and the recognition result output by each of the plurality of submodels includes:
determining cosine distances between the processed recognition results and recognition results output by the sub models respectively, and taking the cosine distances as the loss difference;
or determining relative entropies between the processed recognition results and recognition results output by the sub models respectively, and taking the relative entropies as the loss difference.
Optionally, determining an overall loss value of the preset model according to each loss difference, the processed recognition result, the label, and the recognition result output by each of the plurality of submodels, includes:
determining a first loss value corresponding to each of the plurality of submodels according to the label and the identification result output by each of the plurality of submodels;
determining a second loss value corresponding to the processed identification result according to the label and the processed identification result;
and determining the second loss value, each loss difference and the sum of the first loss values corresponding to the sub models as the integral loss value of the preset model.
Optionally, each sample image carries a plurality of attribute tags, and each sub-model is used for identifying a plurality of attributes of the sample image; determining the overall loss value of the preset model according to each loss difference, the identification result obtained by the weight post-processing, the label and the identification result output by each of the plurality of submodels, wherein the overall loss value comprises the loss difference;
for each attribute, determining an overall loss value corresponding to the attribute according to each loss difference corresponding to the attribute, the processed identification result corresponding to the attribute, the attribute label of the attribute and the identification result corresponding to the attribute output by each of the plurality of submodels;
and determining the sum of the overall loss values corresponding to the attributes as the overall loss value of the preset model.
Optionally, the preset model further includes a weight processing branch; the method further comprises the following steps:
obtaining a weight distribution proportion of the weight processing branch output, wherein the weight distribution proportion represents the ratio of weights corresponding to the identification results output by the sub models;
according to the weights corresponding to the submodels, weighting the recognition results output by the submodels respectively to obtain processed recognition results, and the method comprises the following steps:
according to the weight distribution proportion, carrying out weighted summation on the recognition results output by the sub-models respectively to obtain a processed recognition result;
according to the overall loss value, respectively updating the parameter values of the plurality of submodels, comprising the following steps:
and respectively updating the parameter values of the weight processing branches and the respective parameter values of the plurality of submodels according to the overall loss value.
Optionally, the weight processing branch comprises: a plurality of primary full-link layers respectively connected to the convolution layers of the plurality of submodels, and a secondary full-link layer connected to the plurality of primary full-link layers; wherein the weight distribution ratio is obtained according to the following steps:
obtaining a feature map output by each convolution layer of the plurality of sub-models, wherein the feature map is obtained by performing feature extraction on the sample image by each convolution layer of the plurality of sub-models;
respectively inputting the characteristic diagram output by the convolution layer of each sub-model into a primary full-connection layer connected to the convolution layer to obtain a result output by the primary full-connection layer;
and inputting the respective output results of the plurality of primary full-connection layers into the secondary full-connection layer to obtain the weight proportion output by the secondary full-connection layer.
Optionally, after updating the parameter values of the sub models respectively according to the overall loss value, the method further includes:
taking the test images in the test set as input, testing the preset model at the end of training to obtain test results corresponding to a plurality of sub models in the preset model at the end of training;
and screening the submodels with the test results meeting the preset test conditions from the preset model at the end of the training to obtain an image recognition model for image recognition.
In a second aspect of the embodiments of the present invention, there is provided a device for updating parameter values of a model, including:
the system comprises an input module, a training module and a training module, wherein the input module is used for obtaining a sample image carrying a label and inputting the sample image into a preset model to be trained, the preset model comprises a plurality of sub-models, and each sub-model is used for identifying the sample image;
an output result obtaining module, configured to obtain an identification result that is output after the sample image is identified by each of the plurality of submodels;
the weight processing module is used for weighting the recognition results output by the sub models according to the weights corresponding to the sub models respectively to obtain the processed recognition results;
a loss difference determining module, configured to determine loss differences between the processed recognition results and recognition results output by the multiple submodels, respectively;
the overall loss determining module is used for determining an overall loss value of the preset model according to each loss difference, the processed recognition result, the label and the recognition result output by each of the plurality of submodels;
and the parameter updating module is used for respectively updating the parameter values of the sub models according to the overall loss value.
In a third aspect of the embodiments of the present invention, an electronic device is further disclosed, including:
one or more processors; and
one or more machine-readable media having instructions stored thereon, which when executed by the one or more processors, cause the apparatus to perform a method for parameter value update of one or more models as described in embodiments of the first aspect of the invention.
In a fourth aspect of the embodiments of the present invention, a computer-readable storage medium is further disclosed, which stores a computer program for causing a processor to execute the method for updating parameter values of a model according to the embodiments of the first aspect of the present invention.
The embodiment of the invention has the following advantages:
in the embodiment of the invention, a sample image carrying a label is input into a preset model, the identification results output by a plurality of submodels are weighted according to the weights corresponding to the submodels in the preset model to obtain the processed identification results, then the loss difference between the processed identification results and the identification results output by the submodels is determined, the overall loss value is determined according to the loss differences, the processed identification results, the label and the identification results output by the submodels, and the parameter values of the submodels are updated according to the overall loss value.
According to the embodiment of the invention, the recognition results output by the sub-models are weighted according to the weights corresponding to the sub-models to obtain the processed recognition results, so that the recognition results output by the sub-models are fused, and stronger association is established among the competitive learning sub-models. And determining the loss difference between the processed recognition result and the recognition result output by each of the submodels, and determining the overall loss value according to the loss difference, the processed recognition result, the recognition result output by each submodel and the like, so that the overall loss value can simultaneously represent the loss of each submodel and the loss of the fused recognition result, and thus, when the parameters of each submodel are updated according to the overall loss value, the submodel with weaker learning capability can assist the updating of the parameters of the submodel with better learning capability in the plurality of submodels with strengthened relevance, so that the performance of the finally retained submodel can be better, and the accuracy and the recognition efficiency of the retained submodel for recognizing the image are improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the description of the embodiments of the present invention will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive labor
FIG. 1 is a schematic structural diagram of a default model according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating steps of a method for updating parameter values of a model according to an embodiment of the present invention;
FIG. 3 is a schematic structural diagram of another preset model according to an embodiment of the present invention;
fig. 4 is a block diagram of a parameter value updating apparatus of a model according to an embodiment of the invention.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below to clearly and completely describe the technical solutions in the embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, a schematic structural diagram of a preset model according to an embodiment of the present invention is shown, where the preset model may include a plurality of sub models. Only two submodels 101 and 102 are shown in fig. 1, and in practice, the model structures of the submodels may be the same or different, and the submodels may have different initial parameters or the same initial parameters. In one possible embodiment, the shallow network structures (for example, in fig. 1, conv2-Conv3 are shallow network structures, and in contrast, conv4-Conv5 are deep network structures) of the sub-models and the corresponding initial parameters may be the same or different. Wherein Conv represents a convolution layer (convolution).
The input of the multiple submodels is the same, and the same image recognition task can be executed on the same input, that is, the multiple submodels can all perform face recognition on the same input a. In practice, the multiple sub-models in the preset model can be applied to various image recognition tasks, such as face recognition, pedestrian attribute in a video structuring task, vehicle attribute recognition, clothing fine-grained attribute recognition, fingerprint recognition and the like.
A method for updating parameter values of a model according to an embodiment of the present invention is described with reference to the preset model shown in fig. 1.
Referring to fig. 2, a flowchart illustrating steps of a method for updating parameter values of a model in an embodiment is shown, and as shown in fig. 2, the method may specifically include the following steps:
step S201: and obtaining a sample image carrying a label, and inputting the sample image into a preset model to be trained.
The preset model comprises a plurality of submodels, wherein each submodel is used for identifying the sample image.
In the embodiment of the present invention, the number of the sub-models may be set according to actual requirements, and may be, for example, 3 or 5. The label carried by each sample image can be set according to the recognition task of the preset model.
For example, if the recognition task of the preset model is face recognition, the carried tag may be an ID of a face in the sample image, the ID is used for uniquely characterizing the real identity of the face, and the ID may be a number. For another example, if the recognition task of the preset model is pedestrian attribute recognition, the carried tag can represent which human pedestrian is in the sample image. For another example, if the identification task of the preset model is fingerprint identification, the carried tag may be a fingerprint ID of a fingerprint in the sample image, and the fingerprint ID may be used to uniquely characterize a real finger corresponding to the fingerprint.
As shown in fig. 1, a sample image with a tag may be used as an input, a convolutional layer Conv1 of a preset model performs convolution processing on the sample image to obtain a feature map after convolution processing, the feature map after convolution processing is respectively used as an input of a sub-model 101 and an input of a sub-model 102, and the sub-model 101 and the sub-model 102 respectively perform image recognition on the feature map after convolution processing, for example, both perform face recognition or both perform pedestrian attribute recognition.
Step S202: and obtaining the identification result output after the plurality of sub-models respectively identify the sample image.
In the embodiment of the invention, the plurality of sub-models can respectively perform image recognition on the feature map after convolution processing to obtain the recognition result output by each sub-model.
As shown in fig. 1, each of the submodel 101 and the submodel 102 may include a plurality of convolution kernels at different levels, and the convolution kernels may perform convolution processing at different levels on the feature map after the convolution processing, and then the feature map obtained by the convolution processing at different levels of the submodel 101 is used as an input of FC1, and FC1 outputs an identification result P1, the feature map obtained by the convolution processing at different levels of the submodel 102 is used as an input of FC2, and FC2 outputs an identification result P2.
In this embodiment, the characterization modes of the identification result may be different corresponding to different identification tasks, for example, if the identification task is a fingerprint identification task, the identification result may be a matching probability, that is, a probability that a fingerprint in the characterization sample image and a fingerprint in the prestored image in the base library are the same fingerprint. For another example, if the recognition task is an attribute recognition task, and if the recognition task is a pedestrian attribute recognition, the recognition result may be a1 × 2 vector, and two values of the vector respectively represent that the pedestrian is present and not the pedestrian. Of course, a vector of 1 × 3 is also possible, and three values in the vector of 1 × 3 represent a pedestrian, a non-pedestrian, and an unknown, respectively.
Step S203: and weighting the recognition results output by the sub models according to the weights corresponding to the sub models respectively to obtain the processed recognition results.
In one embodiment, the weight corresponding to each of the plurality of submodels may be preset, and the weight corresponding to one submodel may reflect: the ratio of the recognition results output by the submodel to the recognition results output by all submodels. The weight may be a positive number smaller than the positive number, and the sum of the weights corresponding to the respective submodels may be smaller than or equal to 1.
The weighting processing of the recognition results output by each of the plurality of submodels may be performed by performing weighted summation of the recognition results output by each of the plurality of submodels according to weights corresponding to each of the plurality of submodels, and taking the result after the weighted summation as the processed recognition result. The weighted summation of the recognition results output by the multiple submodels can be understood as the fusion of the recognition results output by the multiple submodels, so that the processed recognition result obtained after the fusion can be regarded as the result of the whole preset model for recognizing the pattern book.
For example, as shown in fig. 1, taking the weight corresponding to the sub-model 101 as 0.4 and the weight corresponding to the sub-model 102 as 0.6 as an example, the recognition result P1 output by the sub-model 101 and the recognition result P2 output by the sub-model 102 are weighted and summed to obtain P3, where P3=0.4 × P1+0.6 × P2. The P3 can be regarded as a recognition result of the preset model for image recognition.
Step S204: and determining loss differences between the processed recognition results and the recognition results output by the sub models respectively.
In this embodiment, since the processed recognition result is a result obtained by performing weighted summation on the recognition results output by each of the multiple submodels, the difference between the recognition result output by each submodel and the processed recognition result may be further determined, and the difference is taken as the loss difference.
For example, as shown in fig. 1, the difference between P1 and P3 and the difference between P2 and P3 may be determined, so that the loss difference L1 corresponding to the sub model 101 and the loss difference L2 corresponding to the sub model 102 may be obtained.
In one embodiment, the loss difference may be determined by step S20241 or step S2042 as follows:
step S2041: and determining cosine distances between the processed recognition results and recognition results output by the sub models respectively, and taking the cosine distances as the loss difference.
The cosine distance is also called cosine similarity, and is a measure for measuring the difference between two individuals by using a cosine value of an included angle between two vectors in a vector space. In some attribute recognition tasks, the recognition result may be a1 × n vector, and then the processed recognition result may also be a1 × n vector, and then the cosine distance between the recognition result output by each sub-model and the processed recognition result in the vector space may be calculated, and further the cosine distance may be used as the loss difference. Wherein the cosine distance may be in the range of [0,1].
Step S2042: and determining relative entropies between the processed recognition results and the recognition results output by the sub models respectively, and taking the relative entropies as the loss difference.
The relative entropy, also known as Kullback-Leibler divergence or information divergence (information divergence), is an asymmetry measure of the difference between two probability distributions, which is equivalent to the difference in information entropy (Shannon entropy) of the two probability distributions.
In this embodiment, in some recognition tasks of face recognition or fingerprint recognition, the recognition result may be a matching probability, and the processed recognition result may also be a matching probability, and then a difference value of information entropy between the recognition result output by each sub-model and the processed recognition result may be calculated, and the difference value may be used as a loss difference.
Step S205: and determining the overall loss value of the preset model according to each loss difference, the processed identification result, the label and the identification result output by each of the plurality of sub-models.
Step S206: and updating the parameter values of the sub-models respectively according to the overall loss value.
In this embodiment, since the overall loss value may be obtained from the processed recognition result, each loss difference, the label, and the recognition result output by each of the multiple submodels, the overall loss value may represent the loss of the multiple submodels for recognizing the sample image as a whole, that is, may reflect the capability of the multiple submodels for recognizing the sample as a whole, and further may establish a stronger association between the multiple submodels in the competitive learning, thereby fully integrating the performance of each submodel. When the parameter values of each submodel are updated, the submodel with the weaker learning ability can assist the learning of the submodel with the stronger learning ability, so that the updating direction of the parameter values of each submodel approaches to the global optimum, and the image identification accuracy of each submodel (particularly the submodel with the better learning ability) can be improved.
The overall loss value may include a loss value corresponding to the processed recognition result, a loss value corresponding to the recognition result of each of the plurality of submodels, and a loss difference corresponding to each submodel.
In one embodiment, the overall loss value of the predetermined model may be determined by:
step S2051: and determining a first loss value corresponding to each of the plurality of submodels according to the label and the identification result output by each of the plurality of submodels.
The sample image input to the preset model carries the label, and the label corresponds to different identification tasks, so that the real situation of the sample image under the identification tasks can be reflected. For example, if the recognition task is pedestrian attribute recognition, then the tag may characterize whether the person in the sample image is a real pedestrian.
In practice, a loss function in the related art may be adopted to determine a first loss value corresponding to each of the plurality of submodels according to the identification result output by each of the label and the plurality of submodels. The first loss value may characterize a difference between the recognition result output by the submodel and a true condition characterized by the tag.
Step S2052: and determining a second loss value corresponding to the processed identification result according to the label and the processed identification result.
Similarly, after the recognition results output by the multiple submodels are weighted, the obtained processed recognition results can represent the recognition results of the whole preset model on the sample image, and then the second loss value can be determined according to the label and the processed recognition results by adopting a loss function in the related technology, and can represent the difference between the recognition results output by the multiple submodels and the real situation represented by the label.
Step S2053: and determining the second loss value, each loss difference and the sum of the first loss values corresponding to the sub models as the overall loss value of the preset model.
In the embodiment of the invention, the overall loss value can be the sum of the second loss value, the loss difference corresponding to each submodel and the first loss value corresponding to each of the submodels, so that stronger association among the submodels is established through the determination of the overall loss value.
The above embodiment has explained the updating of the parameter values of the model by taking an input sample image as an example. In practice, there may be a plurality of sample images for training, and each sample image input to the preset model may be one or more sample images during each training round, so that the loss value may be determined for each sample image input during each training round according to the method described in the above embodiment, and at the end of each training round, the parameter values of each of the plurality of sub models may be updated according to the overall loss value at the end of the training round.
In practice, in one specific implementation, after updating the parameter values of multiple submodels in each round, parameter value correction may be performed on each submodel to achieve faster and more accurate convergence. After updating the parameter values of each sub-model in each round of training, the method may further include the following steps:
step S207: and determining the parameter average value of each of the plurality of sub-models in a plurality of rounds of training before the round of training.
In this embodiment, when updating the parameter value of the sub-model in each round, the parameter value of each model may be updated once according to the overall loss value determined in the round, and after the update, the parameter value of each sub-model at the end of the round of training (hereinafter, the parameter value is referred to as a parameter value to be corrected) may be obtained.
In example #1, as shown in fig. 1, if the update of the round is the nth round, the parameter value of the submodel 101 is m1 and the parameter value of the submodel 102 is m2 after the update.
In this embodiment, the updated parameter value of each sub-model at the end of each round of training can be recorded, so that the parameter value of each sub-model at the end of each round of training before the current round of training can be obtained, and the average value of the parameter value of each sub-model in multiple rounds of training before the current round of training can be determined.
Example #2, as shown in FIG. 1, the parameter mean1 of the submodel 101 in the n-1 update before the nth update and the parameter mean2 of the submodel 102 in the n-1 update before the nth update can be determined.
Step S208: and updating the updated parameter values of the sub-models again according to a preset coefficient, the updated parameter values of the sub-models and the average parameter values of the sub-models in multiple rounds of training before the round of training to obtain new parameter values of the sub-models after the round of training is finished.
In this embodiment, the parameter value to be corrected of the current round of each sub-model may be corrected according to the preset coefficient and the parameter average value of each sub-model, so as to obtain a corrected parameter value, and the corrected parameter value is used as a new parameter value of the sub-model after the training of the current round is finished (hereinafter, the new parameter value is referred to as the corrected parameter value).
In practice, after obtaining a new parameter value for each submodel, the new parameter value may be updated in a further round of training thereafter.
Specifically, the new parameter value of each sub-model after the training round is finished can be determined by the following formula:
wherein, y (m,n) Represents the corrected parameter value of the mth sub-model after the nth training round is finished,is a predetermined coefficient, x (m,n) For the parameter value to be corrected, x, of the mth sub-model at the end of the nth round of training mean The average value of each parameter value obtained in n-1 rounds of updating before the nth round of training is taken as the mth sub-model.
As shown in fig. 1, in the above example #1 and example #2, if the preset coefficient is set to 0.99, the parameter is corrected, and then the corrected parameter value m of the sub-model 101 is obtained 101 =0.99 xm 1+ (1-0.99) × mean, corrected parameter value m of submodel 102 102 =0.99×m2+(1-0.99)×mean2。
When the embodiment is adopted, because the record can be updated according to the historical parameters in the training process during each round of updating, the updated parameter values of each submodel are corrected once, the updating direction of the parameter values can be more accurate, and the performance of the submodel is better.
In practical applications, the recognition task performed by the preset model may be an attribute recognition task, and each sub-model in the preset model may be used to recognize the attribute of the image. For example, whether a person in the image wears a hat or not is identified, and in this case, the tag carried by the sample image may be an attribute tag.
In some specific application scenarios, it may be desirable to identify multiple attributes in the identification image at the same time, for example, identifying whether the person in the image is wearing a hat or a skirt. In this case, each sample image may carry a plurality of attribute tags, each attribute tag may characterize an attribute of the sample image. Accordingly, each sub-model may be used to separately identify a plurality of attributes of the sample image.
For example, the sample image carries 2 attribute tags, where one attribute tag is whether a person in the sample image wears a hat, if the attribute tag is A1, the person is said to wear a hat, and if the attribute tag is A0, the person does not wear a hat. The other attribute label is whether the person in the sample image wears a skirt, and if the attribute label is B1, the person wears the skirt, and if the attribute label is B0, the person does not wear the skirt. The sub-model 101 identifies whether the person in the sample image wears a hat or a skirt, and accordingly the sub-model 101 outputs an attribute identification result of whether the person wears a hat or not and an attribute identification result of whether the person wears a skirt or not.
In practice, when the sample image carries a plurality of attribute tags, each sub-model outputs an identification result corresponding to each attribute. If 3 attribute tags are carried, each sub-model outputs 3 identification results, wherein each identification result corresponds to one attribute.
In this application scenario, since each sub-model outputs a plurality of recognition results with different attributes, when determining the overall loss value of the preset model, the method may include the following steps:
step S2061': and for each attribute, determining an overall loss value corresponding to the attribute according to each loss difference corresponding to the attribute, the processed identification result corresponding to the attribute, the attribute label of the attribute and the identification result corresponding to the attribute output by each of the plurality of submodels.
Step S2062': and determining the sum of the overall loss values corresponding to the attributes as the overall loss value of the preset model.
In this embodiment, the overall loss value corresponding to each attribute may represent the accuracy of the preset model for identifying the attribute of the sample image.
In a specific implementation, the overall loss value corresponding to each attribute may be performed through the processes from step S202 to step S206. Specifically, the loss difference corresponding to each sub-model identifying each attribute may be determined. For each submodel, a loss value corresponding to each attribute in the submodel may be determined according to each attribute label and the identification result corresponding to the attribute label. Similarly, for the recognition result corresponding to each attribute output by each submodel, the recognition results corresponding to the attribute may be subjected to weighted summation to obtain the processed recognition result corresponding to each attribute, and the loss corresponding to each attribute may be determined according to the processed recognition result corresponding to each attribute and the attribute tag.
For example, as shown in fig. 1, it is assumed that there are 2 attribute tags, namely attribute tag a and attribute tag B, wherein attribute tag a indicates whether a hat is worn and attribute tag B indicates whether a skirt is worn. The recognition results outputted by the submodels 101 are P a 1 and P b 1, wherein P a 1 corresponds to the result of recognition of whether or not a hat is worn, P b 1 corresponds to the result of identification of whether or not a skirt is worn. Similarly, the recognition results output by the submodels 102 are P a 2 and P b 2. P may be paired according to the weights corresponding to the submodel 101 and the submodel 102, respectively a 1 and P a 2 weighted summation to obtain P a 3, to P b 1 and P b 2 weighted summation to get P b 3。
Further, P can be obtained a 1 and P a 3 difference in losses L a 1、P a 2 and P a 3 difference in loss L between a 2、P b 1 and P b 3 difference in loss L between b 1、P b 2 and P b 3 difference in loss L between b 2. According to P a 1 and attribute label A can result in a loss value L a 1' according to P b 1 and attribute tag B get the loss value L b 1' wherein the loss value L a 1' and L b 1' is the corresponding penalty for sub-model 101. In the same way, according to P a 2 and attribute tag A may yield a loss value L a 2' according to P b 2 and attribute tag B get the loss value L b 2', wherein the loss value L a 2' and L b 2 "is the corresponding penalty of the submodel 102And (6) losing.
Further, labels A and P are labeled according to attributes a 3, obtaining a loss value L a 3, labeling B and P according to attributes b 3, obtaining a loss value L b 3. The loss value corresponding to the attribute label a is L a 1、L a 1'、L a 2、L a 2' and L a 3' and the corresponding loss value of the attribute label B is L b 1、L b 1'、L b 2、L b 2' and L b 3' and (b).
The overall loss value of the preset model is the sum of the loss value corresponding to the attribute label a and the loss value corresponding to the attribute label B.
In practice, in order to improve the autonomous learning model of the preset model and avoid the problems of poor generalization and unreasonable performance caused by artificially setting the weights corresponding to the sub-models, in an embodiment, the weights corresponding to the sub-models may also be learned in the training process.
Specifically, in a specific implementation, the preset model may further include a weight processing branch, where an input of the weight processing branch is a feature obtained by performing convolution processing on the sample image in each sub-model, and in the actual processing, the method may further include the following steps:
step S2020: and obtaining a weight distribution proportion of the weight processing branch output, wherein the weight distribution proportion represents the ratio of weights corresponding to the identification results output by the sub models respectively.
The weight distribution ratio may be obtained while obtaining the recognition results output by each of the plurality of submodels.
In one embodiment, the weight processing branch comprises a plurality of primary fully-connected layers and a secondary fully-connected layer, wherein the input of the secondary fully-connected layer can be simultaneously connected with the output of the plurality of primary fully-connected layers. Wherein the input terminals of different first-level fully-connected layers can be connected with the output terminals of the convolution layers of a different sub-model.
Referring to fig. 3, a schematic structural diagram of the preset model shown in fig. 1 after adding a weight processing branch is shown, and as shown in fig. 3, the weight processing branch may include: two primary full link layers, namely FC3 and FC4, and one secondary full link layer FC5. The input end of the first-level full-connection layer FC3 is connected with the output end of the convolution layer in the sub-model 101, the input end of the first-level full-connection layer FC4 is connected with the output end of the convolution layer in the sub-model 102, and the input end of the second-level full-connection layer FC5 is connected with the output end of the first-level full-connection layer FC3 and the output end of the first-level full-connection layer FC4 at the same time.
Accordingly, how the weight processing branch outputs the weight distribution ratio will be described with reference to fig. 3. Specifically, the weight distribution ratio is obtained according to the following steps:
step S20201: and obtaining a characteristic diagram output by each convolution layer of the plurality of sub models, wherein the characteristic diagram is obtained by carrying out characteristic extraction on the sample image by each convolution layer of the plurality of sub models.
Step S20202: and respectively inputting the characteristic diagram output by the convolution layer of each sub-model into a primary full-connection layer connected to the convolution layer to obtain a result output by the primary full-connection layer.
In this embodiment, the characteristic diagram output by the convolution layer of each sub-model may be input to the first fully-connected layer connected to the output end of the convolution layer of the sub-model, and the result output by the first fully-connected layer may be obtained through the processing of the first fully-connected layer.
Step S20203: and inputting the respective output results of the plurality of primary full-connection layers into the secondary full-connection layer to obtain the weight proportion output by the secondary full-connection layer.
In this embodiment, the result output by each primary fully-connected layer may be input to a secondary fully-connected layer, and the secondary fully-connected layer may perform information processing on the result output by each primary fully-connected layer to obtain the weight of each submodel in competition learning, so as to form a weight ratio for output. In this way, the preset model can automatically correlate the output results of all the primary full-connection layers, and further learn a weight proportion, wherein the sum of each weight in the weight proportion is less than or equal to 1.
For example, as shown in fig. 1, if the weight ratio is 0.4 to 0.6, it may indicate that the weight of the sub-model 101 is 0.4 and the weight of the sub-model 102 is 0.6.
Accordingly, the post-processing recognition result can be obtained by:
step S203': and according to the weight distribution proportion, carrying out weighted summation on the identification results output by the sub-models respectively to obtain the processed identification result.
In this embodiment, the weight corresponding to each sub-model may be obtained according to the weight distribution proportion output by the second-level full-connected layer, and then the recognition results output by the plurality of sub-models are weighted and summed to obtain the processed recognition result.
In this embodiment, when updating the parameter values of each submodel, the parameter values of the weight processing branches may be updated according to the overall loss value, so that the weight processing branches may be trained together.
In this embodiment, after the preset model is trained by using a plurality of sample images as training samples, the accuracy of image recognition according to each submodel in the preset model may be retained, and the submodel with the highest accuracy, that is, the best performance, is retained, thereby obtaining the image recognition model. In one embodiment, after performing multiple rounds of updating on the parameter values of each sub-model, an image recognition model finally used for image recognition may be obtained through a process including the following steps:
step S207: and taking the test images in the test set as input, testing the preset model at the end of training to obtain test results corresponding to a plurality of sub models in the preset model at the end of training.
For example, if the recognition task is a fingerprint recognition task, the test image is a fingerprint image for test, if the recognition task is a pedestrian attribute recognition task, the test image is a pedestrian image for test, and if the recognition task is a clothing fine-grained attribute recognition task, the test image is a person clothing image for test.
In practice, the preset model after training includes a plurality of trained submodels, and the trained submodels can respectively identify the test image, so as to obtain identification results respectively output by the plurality of submodels, where the identification results are test results.
Wherein, the characterization mode of the test result can be different according to different recognition tasks. For example, for the task of fine-grained attribute identification of clothing, if the person in the clothing image of the test person is a person wearing a hat, the test result is a vector of 1 × 2, which is output after the sub-model identifies the clothing image of the test person and is used for judging whether the person is wearing a hat. For example, if the test result is (0.8, 0.2), it indicates that the probability of wearing a hat is 0.8.
Step S208: and screening the submodels with the test results meeting the preset test conditions from the preset model at the end of the training to obtain an image recognition model for image recognition.
In this embodiment, the accuracy rate of the recognition of the test image by each of the plurality of submodels may be determined according to the test result, and then the submodel corresponding to the highest accuracy rate may be determined as the submodel satisfying the preset test condition according to the order from the highest accuracy rate to the lowest accuracy rate. Of course, in practice, the submodel corresponding to the accuracy reaching the preset accuracy may also be determined as the submodel meeting the preset test condition.
In specific implementation, the submodels with the test results meeting the preset test conditions can be reserved, and the rest submodels can be discarded, so that the image recognition model is obtained.
For example, as shown in fig. 1, for the fine-grained attribute recognition task of clothing, if the test result output by the sub-model 101 is (0.8, 0.2), it indicates that the probability of wearing hat is 0.8, and if the test result output by the sub-model 102 is (0.9, 0.1), it indicates that the probability of wearing hat is 0.9, and in practice, if a person in the test image is wearing hat, the accuracy of the sub-model 102 is higher, so the sub-model 102 may be retained, the sub-model 101 may be discarded, and the obtained image recognition model may include only the sub-model 102.
It should be noted that, for simplicity of description, the method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the illustrated order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments of the present invention. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no particular act is required to implement the invention.
Based on the same inventive concept, referring to fig. 4, a schematic diagram of a framework of a parameter value updating apparatus of a model according to an embodiment of the present invention is shown, and as shown in fig. 4, the apparatus may specifically include the following modules:
an input module 401, configured to obtain a sample image carrying a label, and input the sample image into a preset model to be trained, where the preset model includes multiple submodels, and each submodel is used to identify the sample image;
an output result obtaining module 402, configured to obtain an identification result that is output after the sample image is identified by each of the multiple sub-models;
a weight processing module 403, configured to perform weighting processing on the recognition results output by the multiple submodels according to weights corresponding to the multiple submodels, to obtain processed recognition results;
a loss difference determining module 404, configured to determine loss differences between the processed recognition results and recognition results output by the sub models respectively;
an overall loss determining module 405, configured to determine an overall loss value of the preset model according to each loss difference, the processed identification result, the tag, and an identification result output by each of the multiple submodels;
and a parameter updating module 406, configured to update the parameter values of the multiple submodels respectively according to the overall loss value.
Optionally, the apparatus may further include a parameter modification module, where the parameter modification module specifically includes the following units:
a parameter average value determining unit, which can be used for determining the parameter average value of each of the plurality of sub models in a plurality of rounds of training before the round of training;
the parameter correction unit may be configured to update the updated parameter values of the sub models again according to a preset coefficient, the updated parameter values of the sub models, and an average parameter value of the sub models in multiple rounds of training before the round of training, so as to obtain new parameter values of the sub models after the round of training is completed.
Optionally, the loss difference determining module 404 may be configured to determine cosine distances between the processed recognition results and recognition results output by the multiple submodels, respectively, and use the cosine distances as the loss differences; or,
the method may be configured to determine relative entropies between the processed recognition results and recognition results respectively output by the multiple submodels, and use the relative entropies as the loss difference.
Optionally, the overall loss determining module 405 may include the following units:
a first determining unit, configured to determine, according to the tag and an identification result output by each of the plurality of submodels, a first loss value corresponding to each of the plurality of submodels;
a second determining unit, configured to determine, according to the tag and the processed identification result, a second loss value corresponding to the processed identification result;
and the third determining unit is used for determining the second loss value, each loss difference and the sum of the first loss values corresponding to the sub models as the overall loss value of the preset model.
Optionally, each sample image carries a plurality of attribute tags, and each sub-model is used for identifying a plurality of attributes of the sample image; the overall loss determination module may include the following units;
a fourth determining unit, configured to determine, for each attribute, an overall loss value corresponding to the attribute based on each loss difference corresponding to the attribute, the processed identification result corresponding to the attribute, the attribute tag of the attribute, and the identification result corresponding to the attribute output by each of the plurality of submodels;
a fifth determining unit, configured to determine a sum of overall loss values corresponding to the multiple attributes as an overall loss value of the preset model.
Optionally, the preset model further includes a weight processing branch; the apparatus may further include the following modules:
a weight distribution proportion obtaining module, configured to obtain a weight distribution proportion output by the weight processing branch, where the weight distribution proportion represents a ratio of weights corresponding to identification results output by each of the multiple submodels;
the weight processing module 403 may be specifically configured to perform weighted summation on the recognition results output by each of the multiple submodels according to the weight distribution ratio, so as to obtain a processed recognition result;
the parameter updating module 406 may be specifically configured to update the parameter values of the weight processing branch and the parameter values of the multiple submodels respectively according to the overall loss value.
Optionally, the weight processing branch comprises: a plurality of primary full-link layers respectively connected to the convolution layers of the plurality of submodels, and a secondary full-link layer connected to the plurality of primary full-link layers; wherein:
each primary full-link layer is used for processing the characteristic diagram output by the convolution layer of the corresponding sub-model and outputting a result; the characteristic graph is obtained by performing characteristic extraction on the sample image by the convolution layer of the sub-model;
and the secondary full-link layer is used for processing the results output by the plurality of primary full-link layers to obtain a weight ratio.
Optionally, the apparatus may further include the following modules:
the test module is used for testing the preset model after training is finished by taking the test images in the test set as input, and obtaining test results respectively corresponding to a plurality of sub models in the preset model after training is finished;
and the screening module is used for screening the submodels with the test results meeting the preset test conditions from the preset model at the end of the training to obtain the image recognition model for image recognition.
An embodiment of the present invention further provides an electronic device, which may be used to execute a parameter value updating method for a model, and may include a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor is configured to execute the parameter value updating method for the model.
Embodiments of the present invention further provide a computer-readable storage medium storing a computer program, which causes a processor to execute the method for updating the parameter value of the model according to the embodiments of the present invention.
The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
Embodiments of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing terminal to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing terminal to cause a series of operational steps to be performed on the computer or other programmable terminal to produce a computer implemented process such that the instructions which execute on the computer or other programmable terminal provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present invention have been described, additional variations and modifications of these embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including the preferred embodiment and all changes and modifications that fall within the true scope of the embodiments of the present invention.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrases "comprising one of \ ...does not exclude the presence of additional like elements in a process, method, article, or terminal device that comprises the element.
The above detailed description is provided for the parameter value updating method, apparatus, device and storage medium of the model provided by the present invention, and the principle and implementation of the present invention are explained by applying specific examples, and the description of the above embodiments is only used to help understanding the method and the core idea of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.
Claims (9)
1. A method for updating parameter values of a model, comprising:
obtaining a sample image carrying a label, and inputting the sample image into a preset model to be trained, wherein the preset model comprises a plurality of submodels and weight processing branches, and each submodel is used for identifying the sample image;
obtaining the identification results output after the sample images are respectively identified by the multiple submodels, and obtaining the weight distribution proportion output by the weight processing branch, wherein the weight distribution proportion represents the ratio of the weights corresponding to the identification results output by the multiple submodels;
according to the weight distribution proportion, carrying out weighted summation on the identification results output by the submodels respectively to obtain a processed identification result;
determining loss differences between the processed recognition results and recognition results output by the sub models respectively;
determining an overall loss value of the preset model according to each loss difference, the processed identification result, the label and the identification result output by each of the plurality of sub-models;
according to the overall loss value, respectively updating the parameter values of the weight processing branches and the respective parameter values of the plurality of submodels;
the weight processing branch comprises: a plurality of primary full-connection layers respectively connected with the convolution layers of the plurality of submodels, and a secondary full-connection layer connected with the plurality of primary full-connection layers; wherein the weight distribution ratio is obtained according to the following steps:
obtaining a feature map output by each convolution layer of the plurality of sub-models, wherein the feature map is obtained by performing feature extraction on the sample image by each convolution layer of the plurality of sub-models;
respectively inputting the characteristic diagram output by the convolution layer of each sub-model into a primary full-connection layer connected to the convolution layer to obtain a result output by the primary full-connection layer;
and inputting the respective output results of the plurality of primary full-connection layers into the secondary full-connection layer to obtain the weight proportion output by the secondary full-connection layer.
2. The method of claim 1, wherein after updating the parameter values of the plurality of submodels, respectively, according to the overall loss value, the method further comprises:
determining respective parameter average values of the plurality of sub-models in a plurality of rounds of training before the current round of training;
and updating the updated parameter values of the sub models again according to a preset coefficient, the updated parameter values of the sub models and the average parameter values of the sub models in multiple rounds of training before the current round of training to obtain new parameter values of the sub models after the current round of training is finished.
3. The method of claim 1, wherein determining a loss difference between the processed recognition results and the recognition results respectively output by the plurality of submodels comprises:
determining cosine distances between the processed recognition results and recognition results output by the sub models respectively, and taking the cosine distances as the loss difference;
or determining relative entropies between the processed recognition results and recognition results output by the sub models respectively, and taking the relative entropies as the loss difference.
4. The method of claim 1, wherein determining the overall loss value of the predetermined model according to the loss differences, the processed recognition result, the label, and the recognition result output by each of the sub-models comprises:
determining a first loss value corresponding to each of the plurality of submodels according to the label and the identification result output by each of the plurality of submodels;
determining a second loss value corresponding to the processed identification result according to the label and the processed identification result;
and determining the second loss value, each loss difference and the sum of the first loss values corresponding to the sub models as the integral loss value of the preset model.
5. The method of claim 1, wherein each sample image carries a plurality of attribute tags, each sub-model identifying a plurality of attributes of the sample image; determining an overall loss value of the preset model according to each loss difference, the recognition result obtained by the weight post-processing, the label and the recognition result output by each of the plurality of submodels, wherein the overall loss value comprises;
for each attribute, determining an overall loss value corresponding to the attribute according to each loss difference corresponding to the attribute, the processed identification result corresponding to the attribute, the attribute label of the attribute and the identification result corresponding to the attribute output by each of the plurality of submodels;
and determining the sum of the overall loss values corresponding to the attributes as the overall loss value of the preset model.
6. The method according to any of claims 1-5, wherein after updating the parameter values of the plurality of submodels separately according to the overall loss value, the method further comprises:
taking the test images in the test set as input, testing the preset model at the end of training to obtain test results corresponding to a plurality of sub models in the preset model at the end of training;
and screening the submodels with the test results meeting the preset test conditions from the preset model at the end of the training to obtain an image recognition model for image recognition.
7. An apparatus for updating parameter values of a model, comprising:
the system comprises an input module, a training module and a training module, wherein the input module is used for obtaining a sample image carrying a label and inputting the sample image into a preset model to be trained, the preset model comprises a plurality of submodels and a weight branch model, and each submodel is used for identifying the sample image;
an output result obtaining module, configured to obtain an identification result that is output after the plurality of submodels identify the sample image;
the weight distribution proportion obtaining module is used for obtaining the weight distribution proportion output by the weight processing branch, and the weight distribution proportion represents the ratio of the weights corresponding to the identification results output by the sub models;
the weight processing module is used for carrying out weighting processing on the recognition results output by the sub-models respectively according to the weight distribution proportion to obtain processed recognition results;
a loss difference determining module, configured to determine a loss difference between each of the processed recognition results and a recognition result output by each of the plurality of submodels;
the overall loss determining module is used for determining an overall loss value of the preset model according to each loss difference, the processed recognition result, the label and the recognition result output by each of the plurality of submodels;
the parameter updating module is used for respectively updating the parameter values of the weight processing branches and the parameter values of the sub models according to the overall loss value;
the weight processing branch comprises: a plurality of primary full-link layers respectively connected to the convolution layers of the plurality of submodels, and a secondary full-link layer connected to the plurality of primary full-link layers; wherein:
each primary full-connection layer is used for processing the characteristic diagram output by the convolution layer of the corresponding sub-model and outputting a result; the characteristic graph is obtained by performing characteristic extraction on the sample image by the convolution layer of the sub-model;
and the second-level full connection layer is used for processing the results output by the plurality of first-level full connection layers to obtain a weight proportion.
8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor when executing implementing a method for updating parameter values of a model according to any one of claims 1 to 6.
9. A computer-readable storage medium storing a computer program for causing a processor to execute a parameter value updating method of a model according to any one of claims 1 to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010275896.XA CN111626098B (en) | 2020-04-09 | 2020-04-09 | Method, device, equipment and medium for updating parameter values of model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010275896.XA CN111626098B (en) | 2020-04-09 | 2020-04-09 | Method, device, equipment and medium for updating parameter values of model |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111626098A CN111626098A (en) | 2020-09-04 |
CN111626098B true CN111626098B (en) | 2023-04-18 |
Family
ID=72273006
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010275896.XA Active CN111626098B (en) | 2020-04-09 | 2020-04-09 | Method, device, equipment and medium for updating parameter values of model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111626098B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113935400B (en) * | 2021-09-10 | 2024-11-01 | 东风商用车有限公司 | Vehicle fault diagnosis method, device, system and storage medium |
CN115524615A (en) * | 2022-10-08 | 2022-12-27 | 深圳先进技术研究院 | Method for predicting battery performance based on material parameter combination of battery pulping process |
CN118411381B (en) * | 2024-07-02 | 2024-09-24 | 杭州百子尖科技股份有限公司 | Boundary coordinate detection method, device, electronic equipment and storage medium |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110363138A (en) * | 2019-07-12 | 2019-10-22 | 腾讯科技(深圳)有限公司 | Model training method, image processing method, device, terminal and storage medium |
CN110363302A (en) * | 2019-06-13 | 2019-10-22 | 阿里巴巴集团控股有限公司 | Training method, prediction technique and the device of disaggregated model |
Family Cites Families (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107844781A (en) * | 2017-11-28 | 2018-03-27 | 腾讯科技(深圳)有限公司 | Face character recognition methods and device, electronic equipment and storage medium |
CN108491720B (en) * | 2018-03-20 | 2023-07-14 | 腾讯科技(深圳)有限公司 | Application identification method, system and related equipment |
CN109934249A (en) * | 2018-12-14 | 2019-06-25 | 网易(杭州)网络有限公司 | Data processing method, device, medium and calculating equipment |
US10510002B1 (en) * | 2019-02-14 | 2019-12-17 | Capital One Services, Llc | Stochastic gradient boosting for deep neural networks |
CN109886343B (en) * | 2019-02-26 | 2024-01-05 | 深圳市商汤科技有限公司 | Image classification method and device, equipment and storage medium |
CN110399895A (en) * | 2019-03-27 | 2019-11-01 | 上海灏领科技有限公司 | The method and apparatus of image recognition |
CN110309922A (en) * | 2019-06-18 | 2019-10-08 | 北京奇艺世纪科技有限公司 | A kind of network model training method and device |
CN110598210B (en) * | 2019-08-29 | 2023-08-04 | 深圳市优必选科技股份有限公司 | Entity recognition model training, entity recognition method, entity recognition device, entity recognition equipment and medium |
CN110909784B (en) * | 2019-11-15 | 2022-09-02 | 北京奇艺世纪科技有限公司 | Training method and device of image recognition model and electronic equipment |
-
2020
- 2020-04-09 CN CN202010275896.XA patent/CN111626098B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110363302A (en) * | 2019-06-13 | 2019-10-22 | 阿里巴巴集团控股有限公司 | Training method, prediction technique and the device of disaggregated model |
CN110363138A (en) * | 2019-07-12 | 2019-10-22 | 腾讯科技(深圳)有限公司 | Model training method, image processing method, device, terminal and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN111626098A (en) | 2020-09-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111523621B (en) | Image recognition method and device, computer equipment and storage medium | |
KR102564855B1 (en) | Device and method to recognize object and face expression, and device and method to train obejct and face expression robust to facial change | |
CN111626098B (en) | Method, device, equipment and medium for updating parameter values of model | |
CN108230291B (en) | Object recognition system training method, object recognition method, device and electronic equipment | |
CN109840530A (en) | The method and apparatus of training multi-tag disaggregated model | |
CN108629326A (en) | The action behavior recognition methods of objective body and device | |
CN110824587B (en) | Image prediction method, image prediction device, computer equipment and storage medium | |
CN113095370A (en) | Image recognition method and device, electronic equipment and storage medium | |
CN111401339A (en) | Method and device for identifying age of person in face image and electronic equipment | |
CN112597984A (en) | Image data processing method, image data processing device, computer equipment and storage medium | |
CN112560823B (en) | Adaptive variance and weight face age estimation method based on distribution learning | |
CN111950633A (en) | Neural network training method, neural network target detection method, neural network training device, neural network target detection device and storage medium | |
CN111523604A (en) | User classification method and related device | |
CN113379045A (en) | Data enhancement method and device | |
CN111783936B (en) | Convolutional neural network construction method, device, equipment and medium | |
CN114385846A (en) | Image classification method, electronic device, storage medium and program product | |
CN116561562B (en) | Sound source depth optimization acquisition method based on waveguide singular points | |
CN113486804B (en) | Object identification method, device, equipment and storage medium | |
CN116563597A (en) | Image recognition model training method, recognition method, device, medium and product | |
CN112183631B (en) | Method and terminal for establishing intention classification model | |
CN114863485A (en) | Cross-domain pedestrian re-identification method and system based on deep mutual learning | |
CN116702918A (en) | Federal learning method and related equipment | |
CN113627591A (en) | Dynamic graph data processing method and device, electronic equipment and storage medium | |
WO2021053815A1 (en) | Learning device, learning method, reasoning device, reasoning method, and recording medium | |
CN114972772B (en) | Method, device, equipment and storage medium for customizing graphic neural network architecture |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |