[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

WO2020155939A1 - 图像识别方法、装置、存储介质和处理器 - Google Patents

图像识别方法、装置、存储介质和处理器 Download PDF

Info

Publication number
WO2020155939A1
WO2020155939A1 PCT/CN2019/127817 CN2019127817W WO2020155939A1 WO 2020155939 A1 WO2020155939 A1 WO 2020155939A1 CN 2019127817 W CN2019127817 W CN 2019127817W WO 2020155939 A1 WO2020155939 A1 WO 2020155939A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
model
training
sets
accuracy
Prior art date
Application number
PCT/CN2019/127817
Other languages
English (en)
French (fr)
Inventor
张玉兵
Original Assignee
广州视源电子科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 广州视源电子科技股份有限公司 filed Critical 广州视源电子科技股份有限公司
Publication of WO2020155939A1 publication Critical patent/WO2020155939A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition

Definitions

  • This application relates to the field of image recognition, for example, to an image recognition method, device, storage medium, and processor.
  • image recognition models are all trained based on deep learning algorithm models.
  • the quality of deep learning model training has an impact on recognition accuracy.
  • the data set used for training is the top priority, which will have a decisive impact on the final algorithm performance of the deep learning model.
  • Deep learning models are basically performed on a single training data set.
  • the training data set can be face data collected in a certain scene or a public face database downloaded from the Internet. Because different data sets may cover the same person, and because the naming rules between different data sets are not uniform, it is difficult to merge face pictures of the same person according to their file names. In the face recognition classification training, the face pictures of the same person must be required to share the same label category number, so it is impossible to use multiple face data sets that may have intersections at the same time.
  • a deep learning model trained only on a single training data set has low accuracy in image recognition and cannot meet the needs of different applications.
  • the embodiments of the present application provide an image recognition method, device, storage medium, and processor to at least solve the problem of low recognition accuracy of the image recognition method in the related art.
  • an image recognition method which includes: obtaining an image to be recognized; obtaining a pre-established image recognition model, wherein the image recognition model is obtained by training the initial model through multiple training sets Yes, the initial model is a recognition model based on the branch training algorithm, the same training set is extracted from the same data set, and different training sets are extracted from different data sets; the image recognition model is used to recognize the image to be recognized, Get the recognition result.
  • the above method further includes: acquiring multiple data sets; classifying each image in the multiple data sets to obtain a label of each image, wherein the label is used to represent the classification result of each image, and The labels of at least two images contained in each data set are the same; sample images are extracted from each data set after classification to obtain multiple training sets.
  • the above method before extracting sample images from each classified data set to obtain multiple training sets, the above method further includes: extracting preset features of each image in each classified data set; The preset features of images are aligned for each image; sample images are extracted from each data set after the operation to obtain multiple training sets.
  • the preset feature when each image is a face image, includes at least one of the following: eyes, eyebrows, nose tip, and mouth corners.
  • extracting sample images from each data set after the operation to obtain multiple training sets includes: randomly extracting sample images from each data set after the operation; obtaining the storage path and labels of the sample images to obtain multiple training sets. Training sets.
  • acquiring multiple data sets includes: acquiring a video image and a preset data set collected by an acquisition device; and detecting the video image and the preset data set to obtain multiple data sets.
  • the above method further includes: establishing an initial model based on a branch training algorithm, where the initial model at least includes: multiple loss functions, multiple loss functions corresponding to multiple training sets one-to-one; multiple training sets Set parallel input into the initial model to train the initial model; determine whether the trained model meets the preset conditions; if the trained model meets the preset conditions, the trained model is determined to be an image recognition model.
  • inputting multiple training sets into the initial model in parallel to train the initial model includes: inputting multiple training sets into the initial model in parallel to obtain function values of multiple loss functions; according to the multiple loss functions The function value of and the chain derivation algorithm to obtain the gradient value of each parameter in the initial model; the gradient value of each parameter is updated according to the stochastic gradient descent algorithm to obtain the trained model.
  • judging whether the model obtained by training satisfies a preset condition includes: obtaining a verification set; verifying the model obtained by using the verification set to obtain the accuracy of the trained model; judging the accuracy of the trained model Whether the historical accuracy is the same, where the historical accuracy is the accuracy obtained by the trained model in the last verification process; if the accuracy of the trained model is the same as the historical accuracy, it is determined that the trained model meets the preset conditions.
  • the accuracy of the trained model is determined to be the historical accuracy, and the initial model is continuously trained.
  • accuracy is used to characterize the ratio of the sum of the verification results of all verification samples in the verification set to the total number of all verification samples.
  • obtaining the verification set includes: obtaining images other than the sample images in the multiple data sets; randomly extracting image verification pairs from other images to obtain the verification set.
  • the image verification pair includes: a positive sample pair and a negative sample pair, the positive sample pair includes two images with the same label, and the negative sample pair includes two images with different labels.
  • the loss function is a square loss function.
  • an image recognition device including: a first acquisition module for acquiring an image to be recognized; a second acquisition module for acquiring a pre-established image recognition model, wherein The image recognition model is obtained by training the initial model through multiple training sets.
  • the initial model is a recognition model established based on the branch training algorithm.
  • the same training set is extracted from the same data set, and different training sets are derived from different It is extracted from the data set; the recognition module is used to recognize the image to be recognized by using the image recognition model to obtain the recognition result.
  • a storage medium includes a stored program, wherein the device where the storage medium is located is controlled to execute the above-mentioned image recognition method when the program runs.
  • a processor is also provided, which is configured to run a program, wherein the image recognition method described above is executed when the program is running.
  • an initial model can be established based on a branch training algorithm, and the initial model can be trained through multiple training sets generated from different data sets to obtain an image recognition model.
  • the image recognition model is used to perform the image recognition input by the user. Recognize, get the final recognition result.
  • the image recognition model that combines branch training with multiple data sets has a higher accuracy rate than the image recognition model trained based on a single data set, and achieves the technical effect of improving the recognition accuracy, thereby solving related technologies.
  • Fig. 1 is a flowchart of an image recognition method according to an embodiment of the present application
  • Fig. 2 is a schematic diagram of an optional face picture according to an embodiment of the present application.
  • Fig. 3 is a schematic diagram of an optional aligned face picture according to an embodiment of the present application.
  • FIG. 4 is a schematic diagram of an optional face recognition deep neural network model based on a single data set input according to an embodiment of the present application
  • Fig. 5 is a schematic diagram of an optional deep neural network model for face recognition based on input of multiple data sets according to an embodiment of the present application
  • Fig. 6 is a flowchart of an optional image recognition method according to an embodiment of the present application.
  • Fig. 7 is a schematic diagram of an image recognition device according to an embodiment of the present application.
  • an embodiment of an image recognition method is provided.
  • the steps shown in the flowchart of the accompanying drawings can be executed in a computer system such as a set of computer-executable instructions, and although shown in the flowchart The logical order is shown, but in some cases, the steps shown or described can be performed in a different order than here.
  • Fig. 1 is a flowchart of an image recognition method according to an embodiment of the present application. As shown in Fig. 1, the method includes the following steps:
  • Step S102 Obtain an image to be recognized.
  • the above-mentioned image to be recognized may be an image that needs to be recognized.
  • a face image is taken as an example for description.
  • Step S104 Obtain a pre-established image recognition model.
  • the image recognition model is obtained by training the initial model through multiple training sets.
  • the initial model is a recognition model established based on the branch training algorithm.
  • the same training set is from the same Extracted from one data set, and different training sets are extracted from different data sets.
  • multiple training sets may be constructed through multiple different data sets in advance, and the initial model may be trained through the training sets, so as to obtain the final image recognition model.
  • the branch training method can be combined to build a deep neural network model to obtain the initial model. By separating different data sets for branch training, a trained image recognition model can be obtained, and the trained image recognition model can be deployed to application scenarios.
  • Step S106 using the image recognition model to recognize the image to be recognized, and obtain the recognition result.
  • the face recognition process can be performed by comparing the facial feature feat-ID (using Euclidean distance).
  • an initial model can be established based on a branch training algorithm, and the initial model can be trained through multiple training sets generated from different data sets to obtain an image recognition model.
  • the image recognition model is used to perform the image recognition input by the user. Recognize, get the final recognition result.
  • the image recognition model that combines branch training with multiple data sets has a higher accuracy rate than the image recognition model trained based on a single data set, and achieves the technical effect of improving the recognition accuracy, thereby solving related technologies.
  • the technical problem of the low recognition accuracy of the image recognition method is based on a branch training algorithm, and the initial model can be trained through multiple training sets generated from different data sets to obtain an image recognition model.
  • the image recognition model is used to perform the image recognition input by the user. Recognize, get the final recognition result.
  • the image recognition model that combines branch training with multiple data sets has a higher accuracy rate than the image recognition model trained based on a single data set, and achieves the technical effect of improving the recognition accuracy, thereby solving related technologies.
  • the method further includes: acquiring multiple data sets; classifying each image in the multiple data sets to obtain a label of each image, wherein the label is used to characterize each image
  • the labels of at least two images contained in multiple data sets are the same; sample images are extracted from each data set after classification to obtain multiple training sets.
  • face pictures in different application scenarios may be obtained in advance to obtain multiple data sets. Since public face data sets downloaded from the Internet are generally already labeled, for unlabeled data sets, face images can be manually detected and extracted, classified and labeled, and face images belonging to the same person are placed Put them together and label them, and get a label for each photo. Suppose the total number of people is N, and each person has M face pictures. A certain number of face images can be randomly selected from each data set that has been labeled to obtain each training set.
  • the method before extracting sample images from each classified data set to obtain multiple training sets, the method further includes: extracting a preview of each image in each classified data set. Set features; based on the preset features of each image, perform alignment operations on each image; extract sample images from each data set after the operation to obtain multiple training sets.
  • the preset feature includes at least one of the following: eyes, eyebrows, nose tip, and mouth corners.
  • the angle of the face and the position of the face in the face picture are inconsistent.
  • the image is aligned to remove the influence of face angle on face recognition.
  • the key points include the positions of the eyes, nose tip, and mouth corners, as shown in Figure 2.
  • the aligned face is shown in Figure 3.
  • extracting sample images from each data set after the operation to obtain multiple training sets includes: randomly extracting sample images from each data set after the operation; and obtaining the storage path of the sample images And labels to get multiple training sets.
  • a face image containing both face identity information and verification information may be randomly selected from face images that have been annotated and face aligned to obtain sample images.
  • Each training sample extracted is as follows: Face picture img_1, identity information (category number) of img_1, ..., face picture img_N, identity information (category number) of img_N.
  • the face picture img_1 refers to the storage path of the first face picture
  • the category number refers to the pre-labeled label for the person
  • the category number generally starts from 0.
  • Different labels represent the numerical codes for different people in the same data set. For example, if there are 100 people in the first data set, the category numbers are 1-0, 1-1, 1-2,..., 1-99; the second data set or scene covers 50 people, then the category numbers are respectively It is 2-0, 2-1, 2-2,..., 2-49.
  • the two groups of category numbers are not the same, they come from different data sets.
  • acquiring multiple data sets includes: acquiring a video image and a preset data set collected by an acquisition device; and detecting the video image and the preset data set to obtain multiple data sets.
  • the collection device in the field of face recognition, can be a camera installed in different application scenarios.
  • the camera is used to collect video pictures and stored in a computer system through network transmission and data lines.
  • the application scenario can be engineering Use scenarios corresponding to the project, such as bank remote teller machine (Video Teller Machine, VTM) verification, jewelry store VIP identification, etc.
  • the aforementioned preset data set may be a public face data set downloaded from the Internet.
  • the face data sets obtained by the above methods may cover the same people.
  • the photos of customers captured by cameras in banks and jewelry stores may also appear on the Internet and be sorted into public face data sets.
  • the public face data sets A and B on the Internet may also contain face pictures of the same person.
  • face detection is performed on the collected video pictures, and the face pictures are extracted and stored in the hard disk of the computer system.
  • the method further includes: establishing an initial model based on a branch training algorithm, where the initial model at least includes: multiple loss functions, and multiple loss functions have a one-to-one correspondence with multiple training sets ; Input multiple training sets into the initial model in parallel to train the initial model; determine whether the trained model meets the preset conditions; if the trained model meets the preset conditions, determine that the trained model is an image recognition model.
  • Softmax loss loss function SoftmaxLoss 1.
  • Different data sets can be divided for branch training and input into the same image recognition model in parallel.
  • the aligned face images in the i-th data set are forward-propagated to obtain features, they are connected to the corresponding loss function SoftmaxLoss i, Optimize as an independent objective function.
  • the image recognition model shown in FIG. 4 and FIG. 5 shows a schematic diagram of a simplified general residual network.
  • the loss function is a square loss function.
  • multiple loss functions in the initial model may be square loss functions.
  • the aforementioned preset condition may be a training end judgment condition.
  • the model obtained by training satisfies the preset condition, it is determined that the training ends, and the final trained model is a trained image recognition model.
  • inputting multiple training sets into the initial model in parallel to train the initial model includes: inputting multiple training sets into the initial model in parallel to obtain function values of multiple loss functions; According to the function values of multiple loss functions and the chain derivation algorithm, the gradient value of each parameter in the initial model is obtained; the gradient value of each parameter is updated according to the stochastic gradient descent algorithm to obtain the trained model.
  • the function value Loss of the loss function can be obtained through branch training, and the image recognition model shown in Figure 5 can be obtained according to Loss and the chain derivation algorithm.
  • the gradient value of each parameter updates the model parameters according to the stochastic gradient descent algorithm to obtain a trained model. After the trained model meets the training end judgment condition, the trained model can be determined as the final image recognition model.
  • judging whether the model obtained by training satisfies preset conditions includes: obtaining a verification set; verifying the model obtained by using the verification set to obtain the accuracy of the trained model; Whether the accuracy of the model is the same as the historical accuracy, where the historical accuracy is the accuracy of the trained model in the last verification process; if the accuracy of the trained model is the same as the historical accuracy, it is determined that the trained model meets the preset condition.
  • the currently trained model can be tested on the validation set every fixed number of iterations. As the model is trained, the trained model will be tested on the validation set. The accuracy will continue to improve, but as the model continues to be trained, when the model tends to converge or overfitting occurs, the accuracy of the model on the validation set will no longer increase steadily, indicating that the model training can be stopped.
  • accuracy is used to characterize the ratio of the sum of the verification results of all verification samples in the verification set to the total number of all verification samples.
  • the verification set is composed of a verification pair of randomly selected face pictures.
  • the number of face image verification pairs in the verification set is 6000 pairs.
  • the aforementioned historical accuracy may be the accuracy of the trained model obtained when the trained model was verified last time. If during this verification process, the accuracy of the trained model is the same as the historical accuracy, that is, the accuracy of the trained model is no longer steadily improving, the training can be determined to end, and the trained model will be used as the final image recognition model.
  • the accuracy of the trained model is determined to be the historical accuracy, and the initial model is continuously trained.
  • the accuracy of the trained model is different from the historical accuracy, that is, the trained model does not meet the preset conditions, it is determined that the training has not ended and the training needs to be continued. As the historical accuracy during the next model verification. It is judged again whether the accuracy of the trained model is the same as the historical accuracy, so as to determine whether the trained model meets the preset conditions.
  • obtaining the verification set includes: obtaining images other than the sample images in the multiple data sets; and randomly extracting image verification pairs from other images to obtain the verification set.
  • the image verification pair includes: a positive sample pair and a negative sample pair, the positive sample pair includes two images with the same label, and the negative sample pair includes two images with different labels.
  • the face pictures of the remaining N-K individuals can be used for the preparation of the verification set.
  • the verification set consists of randomly selected face photo verification pairs. Positive sample pairs and negative sample pairs are drawn. The number of positive and negative sample pairs is the same. For a verification set containing 6000 pairs of face image verification pairs, each positive and negative sample pair Take 3000 pairs. Among them, the positive sample pair is the ath picture of the nth person, and the bth picture of the nth person; the negative sample pair is the cth picture of the i-th person, and the dth picture of the jth person.
  • the verification result can be determined to be correct; when the image recognition model judges the two face pictures in the negative sample pair as not the same person, it can confirm the verification The result is correct; otherwise, the verification result is wrong.
  • Fig. 6 is a flowchart of an optional image recognition method according to an embodiment of the present application, taking the field of face recognition as an example for description.
  • the method includes: collecting face pictures in multiple scenes ; Perform face detection on the collected face pictures, extract the face pictures and store them in the computer hard disk; manually classify and label the detected and extracted face pictures, and place the face pictures belonging to the same person Mark them together and mark them together; perform key point alignment operations on face images to remove the impact of face angles on face recognition; randomly select photos that have been marked and aligned to contain face identity information and verification
  • the face image pair of the information is trained, that is, the face identity-verification training set is extracted; combined with the branch training algorithm to build a face recognition deep neural network model, the model contains multiple loss functions; face recognition based on multiple data sets
  • the deep neural network model is trained to obtain a trained network model; to determine whether the test accuracy of the trained network model on the verification set is continuously improving, that is, to determine whether the training end condition is reached; if it is not met,
  • the solution provided by the above embodiments can be used in the bank VIP recognition project to collect face pictures in real application scenarios, and at the same time download some public face data sets from the Internet; detect the face pictures in these data sets Align the operation, and make the corresponding face identity-verification training set; use the method described above to train the face recognition algorithm model, so as to obtain the face recognition algorithm with high recognition rate and recognition effect in the bank VIP recognition scene, This method can better combine the face data information in multiple data sets, so as to obtain a face recognition model with better recognition effect.
  • the branch training facial deep neural network model that combines multiple data sets has a higher accuracy than the general deep learning network based on a single data set training (including successive fine-tuning on multiple data sets).
  • an embodiment of an image recognition device is provided.
  • Fig. 7 is a schematic diagram of an image recognition device according to an embodiment of the present application. As shown in Fig. 7, the device includes:
  • the first obtaining module 72 is used to obtain the image to be recognized.
  • the above-mentioned image to be recognized may be an image that needs to be recognized.
  • a face image is taken as an example for description.
  • the second acquisition module 74 is used to acquire a pre-built image recognition model.
  • the image recognition model is obtained by training the initial model through multiple training sets.
  • the initial model is a recognition model established based on the branch training algorithm.
  • the training set is extracted from the same data set, and different training sets are extracted from different data sets.
  • multiple training sets may be constructed through multiple different data sets in advance, and the initial model may be trained through the training sets, so as to obtain the final image recognition model.
  • the branch training method can be combined to build a deep neural network model to obtain the initial model. By separating different data sets for branch training, a trained image recognition model can be obtained, and the trained image recognition model can be deployed to application scenarios.
  • the recognition module 76 is configured to recognize the image to be recognized by using the image recognition model to obtain the recognition result.
  • the face recognition process can be performed by comparing the facial feature feat-ID (using Euclidean distance).
  • an initial model can be established based on a branch training algorithm, and the initial model can be trained through multiple training sets generated from different data sets to obtain an image recognition model.
  • the image recognition model is used to perform the image recognition Recognize, get the final recognition result.
  • the image recognition model that combines branch training with multiple data sets has a higher accuracy rate than the image recognition model trained based on a single data set, and achieves the technical effect of improving the recognition accuracy, thereby solving related technologies.
  • the technical problem of the low recognition accuracy of the image recognition method is based on a branch training algorithm, and the initial model can be trained through multiple training sets generated from different data sets to obtain an image recognition model.
  • the image recognition model is used to perform the image recognition Recognize, get the final recognition result.
  • the image recognition model that combines branch training with multiple data sets has a higher accuracy rate than the image recognition model trained based on a single data set, and achieves the technical effect of improving the recognition accuracy, thereby solving related technologies.
  • the device further includes: a third acquisition module for acquiring multiple data sets; a classification module for classifying each image in the multiple data sets to obtain each image The label is used to characterize the classification result of each image, and the labels of at least two images contained in multiple data sets are the same; the first extraction module is used to extract sample images from each data set after classification to obtain Multiple training sets.
  • the device further includes: a second extraction module, configured to extract preset features of each image in each data set after classification; an alignment module, configured based on each image Preset features to perform alignment operations on each image; the third extraction module is used to extract sample images from each data set after the operation to obtain multiple training sets.
  • a second extraction module configured to extract preset features of each image in each data set after classification
  • an alignment module configured based on each image Preset features to perform alignment operations on each image
  • the third extraction module is used to extract sample images from each data set after the operation to obtain multiple training sets.
  • the preset feature includes at least one of the following: eyes, eyebrows, nose tip, and mouth corners.
  • the third extraction module includes: an extraction unit for randomly extracting sample images from each data set after the operation; a first acquisition unit for acquiring storage paths and labels of the sample images , Get multiple training sets.
  • the third acquisition module includes: a second acquisition unit, configured to acquire a video image and a preset data set collected by the collection device; a detection unit, configured to compare the video image and preset data Collect multiple data sets for detection.
  • the device further includes: a building module for building an initial model based on a branch training algorithm, where the initial model at least includes: multiple loss functions, multiple loss functions, and multiple training sets There is a one-to-one correspondence; the training module is used to input multiple training sets into the initial model in parallel to train the initial model; the judgment module is used to judge whether the trained model meets the preset conditions; the determination module is used to if The model obtained by training satisfies the preset conditions, and the model obtained by training is determined to be an image recognition model.
  • the loss function is a square loss function.
  • the training module includes: an input unit for inputting multiple training sets into the initial model in parallel to obtain function values of multiple loss functions; The function value of and the chain derivation algorithm to obtain the gradient value of each parameter in the initial model; the update unit is used to update the gradient value of each parameter according to the stochastic gradient descent algorithm to obtain the trained model.
  • the judgment module includes: a third acquisition unit, configured to acquire a verification set; and a verification unit, configured to verify the model obtained by training using the verification set to obtain the accuracy of the model obtained by training;
  • the judging unit is used to judge whether the accuracy of the trained model is the same as the historical accuracy, where the historical accuracy is the accuracy of the trained model in the last verification process;
  • the determining unit is used to determine if the accuracy of the trained model is the same as If the historical accuracy is the same, it is determined that the trained model meets the preset conditions.
  • accuracy is used to characterize the ratio of the sum of the verification results of all verification samples in the verification set to the total number of all verification samples.
  • the training module is further configured to determine that the accuracy of the trained model is the historical accuracy if the accuracy of the trained model is different from the historical accuracy, and continue to train the initial model.
  • the third acquiring unit is configured to acquire images other than the sample images in the multiple data sets, and randomly extract image verification pairs from the other images to obtain a verification set.
  • the image verification pair includes: a positive sample pair and a negative sample pair, the positive sample pair includes two images with the same label, and the negative sample pair includes two images with different labels.
  • an embodiment of a storage medium includes a stored program, wherein the device where the storage medium is located is controlled to execute the image recognition method in Embodiment 1 when the program is running.
  • an embodiment of a processor is provided, and the processor is used to run a program, where the image recognition method in the foregoing embodiment 1 is executed when the program is running.
  • the disclosed technical content can be implemented in other ways.
  • the device embodiments described above are merely illustrative.
  • the division of the units may be a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or may be Integrate into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual couplings or direct couplings or communication connections may be indirect couplings or communication connections through some interfaces, units or modules, and may be in electrical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.
  • the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
  • Part of the technical solution of this application or all or part of the technical solution can be embodied in the form of a software product.
  • the computer software product is stored in a storage medium and includes several instructions to enable a computer device (which can be a personal computer, A server or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage media include: U disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), mobile hard disk, magnetic disk or optical disk and other media that can store program code .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Image Analysis (AREA)

Abstract

本申请公开了一种图像识别方法、装置、存储介质和处理器。其中,该方法包括:获取待识别图像;获取预先建立好的图像识别模型,其中,图像识别模型是通过多个训练集对初始模型进行训练得到的,初始模型是基于分支训练算法建立的识别模型,同一个训练集是从同一个数据集中提取得到的,不同训练集是从不同数据集中提取得到的;利用图像识别模型对待识别图像进行识别,得到识别结果。

Description

图像识别方法、装置、存储介质和处理器
本申请要求在2019年01月31日提交中国专利局、申请号为201910101257.9的中国专利申请的优先权,该申请的全部内容通过引用结合在本申请中。
技术领域
本申请涉及图像识别领域,例如,涉及一种图像识别方法、装置、存储介质和处理器。
背景技术
在图像识别领域,特别在主流的人脸识别领域中,通过图像识别模型进行识别,图像识别模型都是基于深度学习算法模型进行训练得到,深度学习模型训练的好坏对识别准确度的影响至关重要。而在整个深度学习模型训练过程中,用于训练的数据集又是重中之重,会对深度学习模型的最终算法性能产生决定性影响。
深度学习模型基本都是在单个训练数据集上进行,比如人脸识别领域中,训练数据集可以是在某个场景下采集到的人脸数据或者是从网上下载的公开人脸数据库。由于不同数据集之间可能覆盖到相同的人,而又由于不同数据集之间命名规则不统一,所以很难根据其文件名合并相同的人的人脸图片。而在进行人脸识别分类训练时,必须要求相同的人的人脸图片共享相同的标签类别号,所以导致无法同时利用多个可能出现人员交集的人脸数据集。仅仅基于单个训练数据集训练得到的深度学习模型,在图像识别中准确度低,无法满足不同应用场合的需求。
针对相关技术中图像识别方法的识别准确率低的问题,尚未提出有效的解决方案。
发明内容
本申请实施例提供了一种图像识别方法、装置、存储介质和处理器,以至少解决相关技术中图像识别方法的识别准确率低的问题。
根据本申请实施例的一个方面,提供了一种图像识别方法,包括:获取待识别图像;获取预先建立好的图像识别模型,其中,图像识别模型是通过多个训练集对初始模型进行训练得到的,初始模型是基于分支训练算法建立的识别模型,同一个训练集是从同一个数据集中提取得到的,不同训练集是从不同数 据集中提取得到的;利用图像识别模型对待识别图像进行识别,得到识别结果。
在一实施例中,上述方法还包括:获取多个数据集;对多个数据集中的每张图像进行分类,得到每张图像的标签,其中,标签用于表征每张图像的分类结果,多个数据集中包含的至少两张图像的标签相同;从分类后的每个数据集中提取样本图像,得到多个训练集。
在一实施例中,在从分类后的每个数据集中提取样本图像,得到多个训练集之前,上述方法还包括:提取分类后的每个数据集中的每张图像的预设特征;基于每张图像的预设特征,对每张图像进行对齐操作;从操作后的每个数据集中提取样本图像,得到多个训练集。
在一实施例中,在每张图像为人脸图像的情况下,预设特征至少包括如下之一:眼睛、眉毛、鼻尖和嘴角。
在一实施例中,从操作后的每个数据集中提取样本图像,得到多个训练集,包括:从操作后的每个数据集中随机提取样本图像;获取样本图像的存储路径和标签,得到多个训练集。
在一实施例中,获取多个数据集,包括:获取采集设备采集到的视频图像和预设数据集;对视频图像和预设数据集进行检测,得到多个数据集。
在一实施例中,上述方法还包括:基于分支训练算法建立初始模型,其中,初始模型至少包括:多个损失函数,多个损失函数与多个训练集是一一对应的;将多个训练集并行输入初始模型中,对初始模型进行训练;判断训练得到的模型是否满足预设条件;如果训练得到的模型满足预设条件,则确定训练得到的模型为图像识别模型。
在一实施例中,将多个训练集并行输入初始模型中,对初始模型进行训练,包括:将多个训练集并行输入初始模型中,得到多个损失函数的函数值;根据多个损失函数的函数值和链式求导算法,得到初始模型中每个参数的梯度值;根据随机梯度下降算法对每个参数的梯度值进行更新,得到训练得到的模型。
在一实施例中,判断训练得到的模型是否满足预设条件,包括:获取验证集;利用验证集对训练得到的模型进行验证,得到训练得到的模型的精度;判断训练得到的模型的精度与历史精度是否相同,其中,历史精度为训练得到的模型在上一次验证过程中得到的精度;如果训练得到的模型的精度与历史精度相同,则确定训练得到的模型满足预设条件。
在一实施例中,如果训练得到的模型的精度与历史精度不同,则确定训练得到的模型的精度为历史精度,并继续对初始模型进行训练。
在一实施例中,精度用于表征验证集中所有验证样本的验证结果之和与所 有验证样本总数的比例。
在一实施例中,获取验证集,包括:获取多个数据集中样本图像之外的其他图像;从其他图像中随机提取图像验证对,得到验证集。
在一实施例中,图像验证对包括:正样本对和负样本对,正样本对包含两张标签相同的图像,负样本对包含两张标签不同的图像。
在一实施例中,损失函数为平方损失函数。
根据本申请实施例的另一方面,还提供了一种图像识别装置,包括:第一获取模块,用于获取待识别图像;第二获取模块,用于获取预先建立好的图像识别模型,其中,图像识别模型是通过多个训练集对初始模型进行训练得到的,初始模型是基于分支训练算法建立的识别模型,同一个训练集是从同一个数据集中提取得到的,不同训练集是从不同数据集中提取得到的;识别模块,用于利用图像识别模型对待识别图像进行识别,得到识别结果。
根据本申请实施例的另一方面,还提供了一种存储介质,存储介质包括存储的程序,其中,在程序运行时控制存储介质所在设备执行上述的图像识别方法。
根据本申请实施例的另一方面,还提供了一种处理器,处理器用于运行程序,其中,程序运行时执行上述的图像识别方法。
在本申请实施例中,可以基于分支训练算法建立初始模型,并通过不同数据集生成的多个训练集对初始模型进行训练,得到图像识别模型,通过图像识别模型对用户输入的待识别图像进行识别,得到最终的识别结果。与相关技术相比,结合了多个数据集的分支训练的图像识别模型比基于单个数据集训练的图像识别模型的准确率更高,达到了提高识别准确率的技术效果,进而解决了相关技术中图像识别方法的识别准确率低的问题。
附图说明
图1是根据本申请实施例的一种图像识别方法的流程图;
图2是根据本申请实施例的一种可选的人脸图片的示意图;
图3是根据本申请实施例的一种可选的对齐后的人脸图片的示意图;
图4是根据本申请实施例的一种可选的基于单个数据集输入的人脸识别深度神经网络模型的示意图;
图5是根据本申请实施例的一种可选的基于多个数据集输入的人脸识别深 度神经网络模型的示意图;
图6是根据本申请实施例的一种可选的图像识别方法的流程图;以及
图7是根据本申请实施例的一种图像识别装置的示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行描述,显然,所描述的实施例仅仅是本申请一部分的实施例,而不是全部的实施例。
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。这样使用的数据在适当情况下可以互换,以便这里描述的本申请的实施例能够以除了在这里图示或描述的那些以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。
实施例1
根据本申请实施例,提供了一种图像识别方法的实施例,在附图的流程图示出的步骤可以在诸如一组计算机可执行指令的计算机系统中执行,并且,虽然在流程图中示出了逻辑顺序,但是在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤。
图1是根据本申请实施例的一种图像识别方法的流程图,如图1所示,该方法包括如下步骤:
步骤S102,获取待识别图像。
在一实施例中,上述的待识别图像可以是需要进行识别的图像,在本申请实施例中,以人脸图像为例进行说明。
步骤S104,获取预先建立好的图像识别模型,其中,图像识别模型是通过多个训练集对初始模型进行训练得到的,初始模型是基于分支训练算法建立的识别模型,同一个训练集是从同一个数据集中提取得到的,不同训练集是从不同数据集中提取得到的。
在一实施例中,为了能够提高图像识别准确率,可以预先通过多个不同的 数据集构建多个训练集,并通过训练集对初始模型进行训练,从而得到最终的图像识别模型。
在人脸识别领域中,由于不同数据集之间可能包含相同的人的人脸图片,而且用户无法确定不同数据集中包含哪些相同的人,因此,不能将不同数据集进行简单直接地合并成一个单一的数据集。可以结合分支训练方法建立深度神经网络模型,得到初始模型,通过将不同数据集分开进行分支训练,从而能够得到训练好的图像识别模型,并将训练好的图像识别模型部署到应用场景中。
步骤S106,利用图像识别模型对待识别图像进行识别,得到识别结果。
在一实施例中,在人脸识别领域中,可以通过比对人脸特征feat-ID(采用欧式距离)进行人脸识别流程。
本申请上述实施例中,可以基于分支训练算法建立初始模型,并通过不同数据集生成的多个训练集对初始模型进行训练,得到图像识别模型,通过图像识别模型对用户输入的待识别图像进行识别,得到最终的识别结果。与相关技术相比,结合了多个数据集的分支训练的图像识别模型比基于单个数据集训练的图像识别模型的准确率更高,达到了提高识别准确率的技术效果,进而解决了相关技术中图像识别方法的识别准确率低的技术问题。
可选地,本申请上述实施例中,该方法还包括:获取多个数据集;对多个数据集中的每张图像进行分类,得到每张图像的标签,其中,标签用于表征每张图像的分类结果,多个数据集中包含的至少两张图像的标签相同;从分类后的每个数据集中提取样本图像,得到多个训练集。
在一实施例中,在人脸识别领域中,为了构建多个训练集,可以预先获取不同应用场景下的人脸图片,得到多个数据集。由于从互联网上下载的公开人脸数据集一般是已经标注好的,对于没有标注的数据集,可以人工检测并提取出人脸图片,进行分类和标注,将属于相同的人的人脸图片放在一起并予以标记,得到每张照片的标签。假设总人数为N,每个人有M张人脸图片。可以在已经进行标注的每个数据集中随机抽取一定数量的人脸图片,得到每个训练集。
可选地,本申请上述实施例中,在从分类后的每个数据集中提取样本图像,得到多个训练集之前,该方法还包括:提取分类后的每个数据集中的每张图像的预设特征;基于每张图像的预设特征,对每张图像进行对齐操作;从操作后的每个数据集中提取样本图像,得到多个训练集。
可选地,在每张图像为人脸图像的情况下,预设特征至少包括如下之一:眼睛、眉毛、鼻尖和嘴角。
在一实施例中,在人脸识别领域中,人脸图片中的人脸角度和人脸位置是 不一致的,为了保证提取到稳定的特征并取得较好的人脸识别效果,需要对人脸图片进行对齐操作,以去除人脸角度对人脸识别带来的影响。关键点包括眼睛、鼻尖和嘴角等的位置,如图2所示。对齐后的人脸如图3所示。
可选地,本申请上述实施例中,从操作后的每个数据集中提取样本图像,得到多个训练集,包括:从操作后的每个数据集中随机提取样本图像;获取样本图像的存储路径和标签,得到多个训练集。
在一实施例中,可以在已进行标注和人脸对齐的人脸图片中随机抽取同时包含人脸身份信息和验证信息的人脸图片,得到样本图像,抽取出的每个训练样本如下:人脸图片img_1,img_1的身份信息(类别号)、...、人脸图片img_N,img_N的身份信息(类别号)。
其中,人脸图片img_1指的是第1张人脸图片的存储路径,类别号是指为该人预先标注好的标签,类别号一般从0开始。不同的标签表示同一个数据集内部对于不同的人的数字代号。比如第一个数据集中,共有100个人,那么类别号分别为1-0,1-1,1-2,……,1-99;第二个数据集或者场景覆盖50个人,那么类别号分别为2-0,2-1,2-2,……,2-49。两组类别号之间不等同,分别来自不同的数据集。
可选地,本申请上述实施例中,获取多个数据集,包括:获取采集设备采集到的视频图像和预设数据集;对视频图像和预设数据集进行检测,得到多个数据集。
在一实施例中,在人脸识别领域中,采集设备可以是安装在不同应用场景中的摄像头,使用摄像头采集视频图片,并通过网络传输和数据线存储在计算机系统中,应用场景可以是工程项目对应的使用场景,例如银行远程柜员机(Video Teller Machine,VTM)验证、珠宝店VIP识别等。上述的预设数据集可以是从互联网下载的公开人脸数据集。
通过上述方法获取的人脸数据集之间可能覆盖到相同的人,例如,在银行和珠宝店用摄像头拍到的顾客,其照片也可能在互联网上出现并被整理到公开人脸数据集中。而且互联网上公开人脸数据集A和B之间可能也包含相同人的人脸图片。
对于摄像头采集到的视频图片,对采集到的视频图片进行人脸检测,将人脸图片提取出来存储在计算机系统硬盘中。
可选地,本申请上述实施例中,该方法还包括:基于分支训练算法建立初始模型,其中,初始模型至少包括:多个损失函数,多个损失函数与多个训练集是一一对应的;将多个训练集并行输入初始模型中,对初始模型进行训练; 判断训练得到的模型是否满足预设条件;如果训练得到的模型满足预设条件,则确定训练得到的模型为图像识别模型。
在一实施例中,相关的图像识别模型中只使用一个Softmax loss损失函数作为目标进行训练,如图4所示,图4所示的基于单个数据集输入的图像识别模型只包含一个分类损失函数,Loss=SoftmaxLoss 1。
可以将不同数据集分开进行分支训练,并行地输入到同一个图像识别模型中,第i个数据集中的对齐后的人脸图片经过前向传播得到特征后,对接到对应的损失函数SoftmaxLoss i,作为独立的目标函数进行优化。如图5所示,当输入第i个人脸数据集中的人脸图片到初始模型中进行分支训练时,对应的损失函数为Loss=SoftmaxLoss i。
在一实施例中,图4和图5所示的图像识别模型示出了简化后的通用残差网络的示意图。
可选地,损失函数为平方损失函数。
在一实施例中在人脸识别领域中,为了采用欧式距离进行人脸识别流程,初始模型中的多个损失函数可以是平方损失函数。
在一实施例中,上述的预设条件可以是训练结束判断条件,当训练得到的模型满足预设条件时,确定训练结束,最终训练得到的模型为训练好的图像识别模型。
可选地,本申请上述实施例中,将多个训练集并行输入初始模型中,对初始模型进行训练,包括:将多个训练集并行输入初始模型中,得到多个损失函数的函数值;根据多个损失函数的函数值和链式求导算法,,得到初始模型中每个参数的梯度值;根据随机梯度下降算法对每个参数的梯度值进行更新,得到训练得到的模型。
在一实施例中,在将多个训练集并行输入初始模型中之后,可以通过分支训练得到损失函数的函数值Loss,根据Loss和链式求导算法得到如图5所示的图像识别模型中每一个参数的梯度值,根据随机梯度下降算法更新模型参数,得到训练好的模型,在训练好的模型满足训练结束判断条件之后,可以确定训练好的模型为最终的图像识别模型。
可选地,本申请上述实施例中,判断训练得到的模型是否满足预设条件,包括:获取验证集;利用验证集对训练得到的模型进行验证,得到训练得到的模型的精度;判断训练得到的模型的精度与历史精度是否相同,其中,历史精度为训练得到的模型在上一次验证过程中得到的精度;如果训练得到的模型的精度与历史精度相同,则确定训练得到的模型满足预设条件。
在一实施例中,在图像识别模型的训练过程中,每隔固定的迭代次数可以将当前训练好的模型在验证集上进行测试,随着模型的训练,训练好的模型在验证集上的精度会不断提升,但随着模型不断训练,当模型趋于收敛或者出现过拟合的现象时,模型在验证集上的精度不会再稳定提升,表明模型训练可以停止了。
可选地,精度用于表征验证集中所有验证样本的验证结果之和与所有验证样本总数的比例。
在一实施例中,在人脸识别领域中,验证集由随机抽取的人脸图片验证对组成。按照国际标准人脸验证测试集LFW的规则,验证集中人脸图片验证对的数量为6000对。对于包含6000对人脸图片验证对的验证集,测试精度可以定义为:
Figure PCTCN2019127817-appb-000001
其中,x i用于表征第i个人脸图片验证对的验证结果。如果模型的识别结果与人脸图片验证对的实际标签相同,则确定验证正确,也即x i=1;如果模型的识别结果与人脸图片验证对的实际标签不同,则确定验证错误,也即x i=0。
在一实施例中,上述的历史精度可以是上一次对训练好的模型进行验证时,获取到的训练好的模型的精度。如果此次验证过程中,训练好的模型的精度与历史精度相同,也即训练好的模型的精度不再稳定提升,可以确定训练结束,将此次训练好的模型作为最终的图像识别模型。
可选地,本申请上述实施例中,如果训练得到的模型的精度与历史精度不同,则确定训练得到的模型的精度为历史精度,并继续对初始模型进行训练。
在一种可选的方案中,如果训练得到的模型的精度与历史精度不同,也即,训练得到的模型满足不满足预设条件,则确定训练未结束,需要继续进行训练,将此次精度作为下一次模型验证过程中的历史精度。再次判断训练好的模型的精度与历史精度是否相同,从而确定训练得到的模型是否满足预设条件。
可选地,本申请上述实施例中,获取验证集,包括:获取多个数据集中样本图像之外的其他图像;从其他图像中随机提取图像验证对,得到验证集。
可选地,图像验证对包括:正样本对和负样本对,正样本对包含两张标签相同的图像,负样本对包含两张标签不同的图像。
在一实施例中,在人脸识别领域中,假设有K个人的人脸图片用于训练集的制作,则可以将剩下的N-K个人的人脸图片用于验证集的制作。验证集由随机抽取的人脸照片验证对组成,抽取正样本对和负样本对,正、负样本对的数量相同,对于包含6000对人脸图片验证对的验证集,正、负样本对各取3000对。其中,正样本对为第n个人的第a张图片,第n个人的第b张图片;负样 本对为第i个人的第c张图片,第j个人的第d张图片。图像识别模型将正样本对中的两张人脸图片判断为同一个人时,可以确定验证结果正确;图像识别模型将负样本对中的两张人脸图片判断为不是一个人时,可以确定验证结果正确;否则,验证结果错误。
图6是根据本申请实施例的一种可选的图像识别方法的流程图,以人脸识别领域为例进行说明,如图6所示,该方法包括:收集多个场景下的人脸图片;对收集到的人脸图片进行人脸检测,将人脸图片提取出来存储在计算机硬盘中;人工对检测并提取出的人脸图片进行分类和标注,属于相同的人的人脸图片放在一起并予以标记;对人脸图片进行关键点对齐操作,以去除人脸角度对人脸识别带来的影响;在已进行标注和人脸对齐的照片中随机抽取同时包含人脸身份信息和验证信息的人脸图片对进行训练,也即提取人脸身份-验证训练集;结合分支训练算法建立人脸识别深度神经网络模型,模型中包含有多个损失函数;基于多数据集对人脸识别深度神经网络模型进行训练,得到训练好的网络模型;判断训练好的网络模型在验证集上的测试精度是否不断提升,也即判断是否到达训练结束条件;如果不满足,则继续进行模型训练;如果满足,则得到人脸识别算法网络模型和模型参数;将训练好的人脸识别算法网络模型部署到应用场景中,可以通过比对人脸特征feat-ID(采用欧式距离)进行人脸识别流程。
通过上述实施例提供的方案可以用于银行VIP识别项目中,在真实应用场景下采集人脸图片,同时从互联网上也下载到一些公开人脸数据集;将这些数据集中的人脸图片进行检测、对齐操作,并制作相应的人脸身份-验证训练集;使用前面描述的方法训练出人脸识别算法模型,从而获得在银行VIP识别场景中具有高识别率和识别效果的人脸识别算法,该方法能够更好地结合多个数据集中的人脸数据信息,从而能够得到识别效果更好的人脸识别模型。结合了多个数据集的分支训练人脸深度神经网络模型要比通用的基于单个数据集训练(包括在多个数据集上逐次微调)的深度学习网络的人脸识别算法准确率更高。
实施例2
根据本申请实施例,提供了一种图像识别装置的实施例。
图7是根据本申请实施例的一种图像识别装置的示意图,如图7所示,该装置包括:
第一获取模块72,用于获取待识别图像。
在一实施例中,上述的待识别图像可以是需要进行识别的图像,在本申请实施例中,以人脸图像为例进行说明。
第二获取模块74,用于获取预先建立好的图像识别模型,其中,图像识别 模型是通过多个训练集对初始模型进行训练得到的,初始模型是基于分支训练算法建立的识别模型,同一个训练集是从同一个数据集中提取得到的,不同训练集是从不同数据集中提取得到的。
在一实施例中,为了能够提高图像识别准确率,可以预先通过多个不同的数据集构建多个训练集,并通过训练集对初始模型进行训练,从而得到最终的图像识别模型。
在人脸识别领域中,由于不同数据集之间可能包含相同的人的人脸图片,而且用户无法确定不同数据集中包含哪些相同的人,因此,不能将不同数据集进行简单直接地合并成一个单一的数据集。可以结合分支训练方法建立深度神经网络模型,得到初始模型,通过将不同数据集分开进行分支训练,从而能够得到训练好的图像识别模型,并将训练好的图像识别模型部署到应用场景中。
识别模块76,用于利用图像识别模型对待识别图像进行识别,得到识别结果。
在一实施例中,在人脸识别领域中,可以通过比对人脸特征feat-ID(采用欧式距离)进行人脸识别流程。
本申请上述实施例中,可以基于分支训练算法建立初始模型,并通过不同数据集生成的多个训练集对初始模型进行训练,得到图像识别模型,通过图像识别模型对用户输入的待识别图像进行识别,得到最终的识别结果。与相关技术相比,结合了多个数据集的分支训练的图像识别模型比基于单个数据集训练的图像识别模型的准确率更高,达到了提高识别准确率的技术效果,进而解决了相关技术中图像识别方法的识别准确率低的技术问题。
可选地,本申请上述实施例中,该装置还包括:第三获取模块,用于获取多个数据集;分类模块,用于对多个数据集中的每张图像进行分类,得到每张图像的标签,其中,标签用于表征每张图像的分类结果,多个数据集中包含的至少两张图像的标签相同;第一提取模块,用于从分类后的每个数据集中提取样本图像,得到多个训练集。
可选地,本申请上述实施例中,该装置还包括:第二提取模块,用于提取分类后的每个数据集中的每张图像的预设特征;对齐模块,用于基于每张图像的预设特征,对每张图像进行对齐操作;第三提取模块,用于从操作后的每个数据集中提取样本图像,得到多个训练集。
可选地,在每张图像为人脸图像的情况下,预设特征至少包括如下之一:眼睛、眉毛、鼻尖和嘴角。
可选地,本申请上述实施例中,第三提取模块包括:提取单元,用于从操 作后的每个数据集中随机提取样本图像;第一获取单元,用于获取样本图像的存储路径和标签,得到多个训练集。
可选地,本申请上述实施例中,第三获取模块包括:第二获取单元,用于获取采集设备采集到的视频图像和预设数据集;检测单元,用于对视频图像和预设数据集进行检测,得到多个数据集。
可选地,本申请上述实施例中,该装置还包括:建立模块,用于基于分支训练算法建立初始模型,其中,初始模型至少包括:多个损失函数,多个损失函数与多个训练集是一一对应的;训练模块,用于将多个训练集并行输入初始模型中,对初始模型进行训练;判断模块,用于判断训练得到的模型是否满足预设条件;确定模块,用于如果训练得到的模型满足预设条件,则确定训练得到的模型为图像识别模型。
可选地,损失函数为平方损失函数。
可选地,本申请上述实施例中,训练模块包括:输入单元,用于将多个训练集并行输入初始模型中,得到多个损失函数的函数值;处理单元,用于根据多个损失函数的函数值和链式求导算法,得到初始模型中每个参数的梯度值;更新单元,用于根据随机梯度下降算法对每个参数的梯度值进行更新,得到训练得到的模型。
可选地,本申请上述实施例中,判断模块包括:第三获取单元,用于获取验证集;验证单元,用于利用验证集对训练得到的模型进行验证,得到训练得到的模型的精度;判断单元,用于判断训练得到的模型的精度与历史精度是否相同,其中,历史精度为训练得到的模型在上一次验证过程中得到的精度;确定单元,用于如果训练得到的模型的精度与历史精度相同,则确定训练得到的模型满足预设条件。
可选地,精度用于表征验证集中所有验证样本的验证结果之和与所有验证样本总数的比例。
可选地,本申请上述实施例中,训练模块还用于如果训练得到的模型的精度与历史精度不同,则确定训练得到的模型的精度为历史精度,并继续对初始模型进行训练。
可选地,本申请上述实施例中,第三获取单元用于获取多个数据集中样本图像之外的其他图像,并从其他图像中随机提取图像验证对,得到验证集。
可选地,图像验证对包括:正样本对和负样本对,正样本对包含两张标签相同的图像,负样本对包含两张标签不同的图像。
实施例3
根据本申请实施例,提供了一种存储介质的实施例,存储介质包括存储的程序,其中,在程序运行时控制存储介质所在设备执行上述实施例1中的图像识别方法。
实施例4
根据本申请实施例,提供了一种处理器的实施例,处理器用于运行程序,其中,程序运行时执行上述实施例1中的图像识别方法。
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。
在本申请的上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。
在本申请所提供的几个实施例中,所揭露的技术内容,可通过其它的方式实现。其中,以上所描述的装置实施例仅仅是示意性的,例如所述单元的划分,可以为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,单元或模块的间接耦合或通信连接,可以是电性或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。本申请的技术方案部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可为个人计算机、服务器或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。

Claims (17)

  1. 一种图像识别方法,包括:
    获取待识别图像;
    获取预先建立好的图像识别模型,其中,所述图像识别模型是通过多个训练集对初始模型进行训练得到的,所述初始模型是基于分支训练算法建立的识别模型,同一个训练集是从同一个数据集中提取得到的,不同训练集是从不同数据集中提取得到的;
    利用所述图像识别模型对所述待识别图像进行识别,得到识别结果。
  2. 根据权利要求1所述的方法,还包括:
    获取多个数据集;
    对所述多个数据集中的每张图像进行分类,得到所述每张图像的标签,其中,所述标签用于表征所述每张图像的分类结果,所述多个数据集中包含的至少两张图像的标签相同;
    从分类后的每个数据集中提取样本图像,得到所述多个训练集。
  3. 根据权利要求2所述的方法,在从分类后的每个数据集中提取所述样本图像,得到所述多个训练集之前,还包括:
    提取所述分类后的每个数据集中的每张图像的预设特征;
    基于所述每张图像的预设特征,对所述每张图像进行对齐操作;
    从操作后的每个数据集中提取所述样本图像,得到所述多个训练集。
  4. 根据权利要求3所述的方法,其中,在所述每张图像为人脸图像的情况下,所述预设特征至少包括如下之一:眼睛、眉毛、鼻尖和嘴角。
  5. 根据权利要求3所述的方法,其中,从操作后的每个数据集中提取所述样本图像,得到所述多个训练集,包括:
    从所述操作后的每个数据集中随机提取所述样本图像;
    获取所述样本图像的存储路径和标签,得到所述多个训练集。
  6. 根据权利要求2所述的方法,其中,获取多个数据集,包括:
    获取采集设备采集到的视频图像和预设数据集;
    对所述视频图像和所述预设数据集进行检测,得到所述多个数据集。
  7. 根据权利要求2所述的方法,还包括:
    基于所述分支训练算法建立所述初始模型,其中,所述初始模型至少包括:多个损失函数,所述多个损失函数与所述多个训练集是一一对应的;
    将所述多个训练集并行输入所述初始模型中,对所述初始模型进行训练;
    判断训练得到的模型是否满足预设条件;
    如果所述训练得到的模型满足所述预设条件,则确定所述训练得到的模型为所述图像识别模型。
  8. 根据权利要求7所述的方法,其中,将所述多个训练集并行输入所述初始模型中,对所述初始模型进行训练,包括:
    将所述多个训练集并行输入所述初始模型中,得到所述多个损失函数的函数值;
    根据所述多个损失函数的函数值和链式求导算法,得到所述初始模型中每个参数的梯度值;
    根据随机梯度下降算法对所述每个参数的梯度值进行更新,得到所述训练得到的模型。
  9. 根据权利要求7所述的方法,其中,判断训练得到的模型是否满足预设条件,包括:
    获取验证集;
    利用所述验证集对所述训练得到的模型进行验证,得到所述训练得到的模型的精度;
    判断所述训练得到的模型的精度与历史精度是否相同,其中,所述历史精度为所述训练得到的模型在上一次验证过程中得到的精度;
    如果所述训练得到的模型的精度与所述历史精度相同,则确定所述训练得到的模型满足所述预设条件。
  10. 根据权利要求9所述的方法,其中,如果所述训练得到的模型的精度与所述历史精度不同,则确定所述训练得到的模型的精度为所述历史精度,并继续对所述初始模型进行训练。
  11. 根据权利要求10所述的方法,其中,所述精度用于表征所述验证集中所有验证样本的验证结果之和与所有验证样本总数的比例。
  12. 根据权利要求9所述的方法,其中,获取验证集,包括:
    获取所述多个数据集中样本图像之外的其他图像;
    从所述其他图像中随机提取图像验证对,得到所述验证集。
  13. 根据权利要求12所述的方法,其中,所述图像验证对包括:正样本对和负样本对,所述正样本对包含两张标签相同的图像,所述负样本对包含两 张标签不同的图像。
  14. 根据权利要求7所述的方法,其中,所述损失函数为平方损失函数。
  15. 一种图像识别装置,包括:
    第一获取模块,用于获取待识别图像;
    第二获取模块,用于获取预先建立好的图像识别模型,其中,所述图像识别模型是通过多个训练集对初始模型进行训练得到的,所述初始模型是基于分支训练算法建立的识别模型,同一个训练集是从同一个数据集中提取得到的,不同训练集是从不同数据集中提取得到的;
    识别模块,用于利用所述图像识别模型对所述待识别图像进行识别,得到识别结果。
  16. 一种存储介质,所述存储介质包括存储的程序,其中,在所述程序运行时控制所述存储介质所在设备执行权利要求1至14中任意一项所述的图像识别方法。
  17. 一种处理器,所述处理器用于运行程序,其中,所述程序运行时执行权利要求1至14中任意一项所述的图像识别方法。
PCT/CN2019/127817 2019-01-31 2019-12-24 图像识别方法、装置、存储介质和处理器 WO2020155939A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910101257.9A CN109766872B (zh) 2019-01-31 2019-01-31 图像识别方法和装置
CN201910101257.9 2019-01-31

Publications (1)

Publication Number Publication Date
WO2020155939A1 true WO2020155939A1 (zh) 2020-08-06

Family

ID=66455816

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/127817 WO2020155939A1 (zh) 2019-01-31 2019-12-24 图像识别方法、装置、存储介质和处理器

Country Status (2)

Country Link
CN (1) CN109766872B (zh)
WO (1) WO2020155939A1 (zh)

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112070060A (zh) * 2020-09-21 2020-12-11 北京金山云网络技术有限公司 识别年龄的方法、年龄识别模型的训练方法和装置
CN112149741A (zh) * 2020-09-25 2020-12-29 北京百度网讯科技有限公司 图像识别模型的训练方法、装置、电子设备及存储介质
CN112395439A (zh) * 2020-11-17 2021-02-23 林铭 一种图像数据库实现方法及其系统和网络通信设备
CN112434714A (zh) * 2020-12-03 2021-03-02 北京小米松果电子有限公司 多媒体识别的方法、装置、存储介质及电子设备
CN112529008A (zh) * 2020-11-03 2021-03-19 浙江大华技术股份有限公司 图像识别和图像特征处理方法、电子设备及存储介质
CN112733958A (zh) * 2021-01-22 2021-04-30 北京农业信息技术研究中心 一种温室臭氧浓度控制方法及系统
CN112766162A (zh) * 2021-01-20 2021-05-07 北京市商汤科技开发有限公司 活体检测方法、装置、电子设备及计算机可读存储介质
CN112766052A (zh) * 2020-12-29 2021-05-07 有米科技股份有限公司 基于ctc的图像文字识别方法及装置
CN112818865A (zh) * 2021-02-02 2021-05-18 北京嘀嘀无限科技发展有限公司 车载领域图像识别方法、识别模型建立方法、装置、电子设备和可读存储介质
CN113657406A (zh) * 2021-07-13 2021-11-16 北京旷视科技有限公司 模型训练和特征提取方法、装置、电子设备及存储介质
CN113743499A (zh) * 2021-09-02 2021-12-03 广东工业大学 一种基于对比学习的视角无关特征解离方法及系统
CN113743176A (zh) * 2021-01-29 2021-12-03 北京沃东天骏信息技术有限公司 一种图像识别方法、设备和计算机可读存储介质
CN114264361A (zh) * 2021-12-07 2022-04-01 深圳市博悠半导体科技有限公司 结合雷达和摄像头的物体识别方法、装置及智能电子秤
CN114792426A (zh) * 2021-10-25 2022-07-26 北京中电兴发科技有限公司 一种行人属性识别中的图像数据均衡方法
CN116612358A (zh) * 2023-07-20 2023-08-18 腾讯科技(深圳)有限公司 一种数据处理的方法、相关装置、设备以及存储介质
CN117237705A (zh) * 2023-08-29 2023-12-15 广州粤建三和软件股份有限公司 一种关于自建房屋的安全隐患管理系统

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109766872B (zh) * 2019-01-31 2021-07-09 广州视源电子科技股份有限公司 图像识别方法和装置
CN110569911B (zh) * 2019-09-11 2022-06-07 深圳绿米联创科技有限公司 图像识别方法、装置、系统、电子设备及存储介质
CN110674720A (zh) * 2019-09-18 2020-01-10 深圳市网心科技有限公司 图片识别方法、装置、电子设备及存储介质
CN110784465B (zh) * 2019-10-25 2023-04-07 新华三信息安全技术有限公司 一种数据流检测方法、装置及电子设备
CN111141412A (zh) * 2019-12-25 2020-05-12 深圳供电局有限公司 电缆温度和防盗的双监测方法、系统和可读存储介质
CN111814810A (zh) * 2020-08-11 2020-10-23 Oppo广东移动通信有限公司 图像识别方法、装置、电子设备及存储介质
CN112116021A (zh) * 2020-09-27 2020-12-22 广州华多网络科技有限公司 一种宝石相似性度量数据处理方法及相关设备
CN113052561A (zh) * 2021-04-01 2021-06-29 苏州惟信易量智能科技有限公司 一种基于可穿戴设备的流程控制系统及方法
CN114782757A (zh) * 2022-06-21 2022-07-22 北京远舢智能科技有限公司 香烟缺陷检测模型训练方法、装置、电子设备及存储介质
CN115019218B (zh) * 2022-08-08 2022-11-15 阿里巴巴(中国)有限公司 图像处理方法和处理器

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6456991B1 (en) * 1999-09-01 2002-09-24 Hrl Laboratories, Llc Classification method and apparatus based on boosting and pruning of multiple classifiers
CN104715227A (zh) * 2013-12-13 2015-06-17 北京三星通信技术研究有限公司 人脸关键点的定位方法和装置
CN105975959A (zh) * 2016-06-14 2016-09-28 广州视源电子科技股份有限公司 基于神经网络的人脸特征提取建模、人脸识别方法及装置
CN106503669A (zh) * 2016-11-02 2017-03-15 重庆中科云丛科技有限公司 一种基于多任务深度学习网络的训练、识别方法及系统
CN106778684A (zh) * 2017-01-12 2017-05-31 易视腾科技股份有限公司 深度神经网络训练方法及人脸识别方法
CN109766872A (zh) * 2019-01-31 2019-05-17 广州视源电子科技股份有限公司 图像识别方法和装置

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7813538B2 (en) * 2007-04-17 2010-10-12 University Of Washington Shadowing pipe mosaicing algorithms with application to esophageal endoscopy
US8442330B2 (en) * 2009-03-31 2013-05-14 Nbcuniversal Media, Llc System and method for automatic landmark labeling with minimal supervision
CN105404877A (zh) * 2015-12-08 2016-03-16 商汤集团有限公司 基于深度学习和多任务学习的人脸属性预测方法及装置
WO2017177371A1 (en) * 2016-04-12 2017-10-19 Xiaogang Wang Method and system for object re-identification
CN107025443A (zh) * 2017-04-06 2017-08-08 江南大学 基于深度卷积神经网络的堆场烟雾监测及在线模型更新方法
CN107247947B (zh) * 2017-07-07 2021-02-09 智慧眼科技股份有限公司 人脸属性识别方法及装置
CN107392164A (zh) * 2017-07-28 2017-11-24 深圳市唯特视科技有限公司 一种基于面部动作单元强度估计的表情分析方法
CN107633242A (zh) * 2017-10-23 2018-01-26 广州视源电子科技股份有限公司 网络模型的训练方法、装置、设备和存储介质
CN107844784A (zh) * 2017-12-08 2018-03-27 广东美的智能机器人有限公司 人脸识别方法、装置、计算机设备和可读存储介质
CN108509860A (zh) * 2018-03-09 2018-09-07 西安电子科技大学 基于卷积神经网络的可可西里藏羚羊检测方法
CN108921092B (zh) * 2018-07-02 2021-12-17 浙江工业大学 一种基于卷积神经网络模型二次集成的黑色素瘤分类方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6456991B1 (en) * 1999-09-01 2002-09-24 Hrl Laboratories, Llc Classification method and apparatus based on boosting and pruning of multiple classifiers
CN104715227A (zh) * 2013-12-13 2015-06-17 北京三星通信技术研究有限公司 人脸关键点的定位方法和装置
CN105975959A (zh) * 2016-06-14 2016-09-28 广州视源电子科技股份有限公司 基于神经网络的人脸特征提取建模、人脸识别方法及装置
CN106503669A (zh) * 2016-11-02 2017-03-15 重庆中科云丛科技有限公司 一种基于多任务深度学习网络的训练、识别方法及系统
CN106778684A (zh) * 2017-01-12 2017-05-31 易视腾科技股份有限公司 深度神经网络训练方法及人脸识别方法
CN109766872A (zh) * 2019-01-31 2019-05-17 广州视源电子科技股份有限公司 图像识别方法和装置

Cited By (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112070060A (zh) * 2020-09-21 2020-12-11 北京金山云网络技术有限公司 识别年龄的方法、年龄识别模型的训练方法和装置
CN112149741A (zh) * 2020-09-25 2020-12-29 北京百度网讯科技有限公司 图像识别模型的训练方法、装置、电子设备及存储介质
CN112149741B (zh) * 2020-09-25 2024-04-16 北京百度网讯科技有限公司 图像识别模型的训练方法、装置、电子设备及存储介质
CN112529008A (zh) * 2020-11-03 2021-03-19 浙江大华技术股份有限公司 图像识别和图像特征处理方法、电子设备及存储介质
CN112395439A (zh) * 2020-11-17 2021-02-23 林铭 一种图像数据库实现方法及其系统和网络通信设备
CN112395439B (zh) * 2020-11-17 2024-03-01 林铭 一种图像数据库实现方法及其系统和网络通信设备
CN112434714A (zh) * 2020-12-03 2021-03-02 北京小米松果电子有限公司 多媒体识别的方法、装置、存储介质及电子设备
CN112766052A (zh) * 2020-12-29 2021-05-07 有米科技股份有限公司 基于ctc的图像文字识别方法及装置
CN112766162B (zh) * 2021-01-20 2023-12-22 北京市商汤科技开发有限公司 活体检测方法、装置、电子设备及计算机可读存储介质
CN112766162A (zh) * 2021-01-20 2021-05-07 北京市商汤科技开发有限公司 活体检测方法、装置、电子设备及计算机可读存储介质
CN112733958B (zh) * 2021-01-22 2024-06-07 北京农业信息技术研究中心 一种温室臭氧浓度控制方法及系统
CN112733958A (zh) * 2021-01-22 2021-04-30 北京农业信息技术研究中心 一种温室臭氧浓度控制方法及系统
CN113743176A (zh) * 2021-01-29 2021-12-03 北京沃东天骏信息技术有限公司 一种图像识别方法、设备和计算机可读存储介质
CN112818865A (zh) * 2021-02-02 2021-05-18 北京嘀嘀无限科技发展有限公司 车载领域图像识别方法、识别模型建立方法、装置、电子设备和可读存储介质
CN113657406B (zh) * 2021-07-13 2024-04-23 北京旷视科技有限公司 模型训练和特征提取方法、装置、电子设备及存储介质
CN113657406A (zh) * 2021-07-13 2021-11-16 北京旷视科技有限公司 模型训练和特征提取方法、装置、电子设备及存储介质
CN113743499B (zh) * 2021-09-02 2023-09-05 广东工业大学 一种基于对比学习的视角无关特征解离方法及系统
CN113743499A (zh) * 2021-09-02 2021-12-03 广东工业大学 一种基于对比学习的视角无关特征解离方法及系统
CN114792426A (zh) * 2021-10-25 2022-07-26 北京中电兴发科技有限公司 一种行人属性识别中的图像数据均衡方法
CN114792426B (zh) * 2021-10-25 2024-05-28 北京中电兴发科技有限公司 一种行人属性识别中的图像数据均衡方法
CN114264361A (zh) * 2021-12-07 2022-04-01 深圳市博悠半导体科技有限公司 结合雷达和摄像头的物体识别方法、装置及智能电子秤
CN116612358B (zh) * 2023-07-20 2023-10-03 腾讯科技(深圳)有限公司 一种数据处理的方法、相关装置、设备以及存储介质
CN116612358A (zh) * 2023-07-20 2023-08-18 腾讯科技(深圳)有限公司 一种数据处理的方法、相关装置、设备以及存储介质
CN117237705A (zh) * 2023-08-29 2023-12-15 广州粤建三和软件股份有限公司 一种关于自建房屋的安全隐患管理系统

Also Published As

Publication number Publication date
CN109766872B (zh) 2021-07-09
CN109766872A (zh) 2019-05-17

Similar Documents

Publication Publication Date Title
WO2020155939A1 (zh) 图像识别方法、装置、存储介质和处理器
WO2019120115A1 (zh) 人脸识别的方法、装置及计算机装置
CN109284729B (zh) 基于视频获取人脸识别模型训练数据的方法、装置和介质
CN105844206A (zh) 身份认证方法及设备
CN106529414A (zh) 一种通过图像比对实现结果认证的方法
CN110472494A (zh) 脸部特征提取模型训练方法、脸部特征提取方法、装置、设备及存储介质
WO2020134527A1 (zh) 人脸识别的方法及装置
CN105468760B (zh) 对人脸图片进行标注的方法和装置
CN109271915B (zh) 防伪检测方法和装置、电子设备、存储介质
CN111861240A (zh) 可疑用户识别方法、装置、设备及可读存储介质
CN106803289A (zh) 一种智能移动防伪签到方法与系统
CN110688901A (zh) 一种人脸识别方法及装置
WO2016084072A1 (en) Anti-spoofing system and methods useful in conjunction therewith
CN109063649B (zh) 基于孪生行人对齐残差网络的行人重识别方法
CN108009482A (zh) 一种提高人脸识别效率方法
CN103415859A (zh) 用于人脸注册的方法
CN110502694A (zh) 基于大数据分析的律师推荐方法及相关设备
JP6969663B2 (ja) ユーザの撮影装置を識別する装置及び方法
CN109919060A (zh) 一种基于特征匹配的身份证内容识别系统及方法
CN109902223A (zh) 一种基于多模态信息特征的不良内容过滤方法
CN108108711A (zh) 人脸布控方法、电子设备及存储介质
CN106991448A (zh) 一种人像比对处理方法
CN110503099A (zh) 基于深度学习的信息识别方法及相关设备
CN108446687A (zh) 一种基于移动端和后台互联的自适应人脸视觉认证方法
CN113591603A (zh) 证件的验证方法、装置、电子设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19914046

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19914046

Country of ref document: EP

Kind code of ref document: A1