CN111507403B

CN111507403B - Image classification method, apparatus, computer device and storage medium

Info

Publication number: CN111507403B
Application number: CN202010303814.8A
Authority: CN
Inventors: 李岩; 康斌
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-04-17
Filing date: 2020-04-17
Publication date: 2024-11-05
Anticipated expiration: 2040-04-17
Also published as: CN111507403A

Abstract

The application relates to the technical field of artificial intelligence, and provides an image classification method, an image classification device, computer equipment and a storage medium, wherein an image to be classified is obtained, at least two image features of the image to be classified are correspondingly input into at least two image classifiers, the at least two image classifiers are respectively corresponding to at least two classification levels, and the image features of the image classifiers corresponding to adjacent classification levels have similarity constraint relations and are used for reducing the similarity between the image features; and acquiring a hierarchical classification result of the image to be classified according to the classification result of the image to be classified, which is output by the image classifier, on the corresponding classification hierarchy. According to the scheme, the similarity constraint relation is applied between the image features, so that the similarity between the image features of the image classifiers corresponding to adjacent classification levels is reduced, the image classifiers corresponding to different classification levels pay attention to different image features on the image to be classified, and the accuracy of hierarchical classification of the image is improved.

Description

Image classification method, apparatus, computer device and storage medium

Technical Field

The present application relates to the field of artificial intelligence, and in particular, to an image classification method, apparatus, computer device, and storage medium.

Background

With the development of artificial intelligence technology, a technology of classifying images based on a deep learning technology such as a deep neural network has emerged, for example, an image classifier may be constructed based on the deep neural network to classify input images. Wherein the status between the categories of the images is generally equal for the traditional image classification task, i.e. the classification is not made for all images, in which case the image classifier is easier to implement for a simpler classification task, such as distinguishing between the images of a car and the images of other categories, such as cats and dogs. By adopting the hierarchical image classification technology, the hierarchical classification task can be completed by judging whether the image belongs to an animal class or a non-animal class and then further learning cat and dog classes from the animal class for example.

In the hierarchical image classification method provided by the traditional technology, images are often directly input into two different image classifiers, for example, one is used for classifying and training a large category, and the other is used for classifying and training a sub-category under the large category. However, the hierarchical classification of images in this manner is less accurate.

Disclosure of Invention

Based on this, it is necessary to provide an image classification method, apparatus, computer device and storage medium in view of the above technical problems.

A method of image classification, the method comprising:

acquiring an image to be classified;

Inputting at least two image features of the image to be classified into at least two image classifiers correspondingly; the at least two image classifiers correspond to the at least two classification levels respectively; the image features input to the image classifier corresponding to the adjacent classification level have a similarity constraint relationship and are used for reducing the similarity between the image features;

And acquiring a hierarchical classification result of the image to be classified according to the classification result of the image to be classified, which is output by the image classifier, on the corresponding classification level.

An image classification apparatus, the apparatus comprising:

the image acquisition module is used for acquiring images to be classified;

the feature input module is used for correspondingly inputting at least two image features of the image to be classified into at least two image classifiers; the at least two image classifiers correspond to the at least two classification levels respectively; the image features input to the image classifier corresponding to the adjacent classification level have a similarity constraint relationship and are used for reducing the similarity between the image features;

the result acquisition module is used for acquiring the hierarchical classification result of the image to be classified according to the classification result of the image to be classified on the corresponding classification hierarchy, which is output by the image classifier.

A computer device comprising a memory storing a computer program and a processor which when executing the computer program performs the steps of:

Acquiring an image to be classified; inputting at least two image features of the image to be classified into at least two image classifiers correspondingly; the at least two image classifiers correspond to the at least two classification levels respectively; the image features input to the image classifier corresponding to the adjacent classification level have a similarity constraint relationship and are used for reducing the similarity between the image features; and acquiring a hierarchical classification result of the image to be classified according to the classification result of the image to be classified, which is output by the image classifier, on the corresponding classification level.

A computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of:

The image classification method, the device, the computer equipment and the storage medium acquire the image to be classified, correspondingly input at least two image features of the image to be classified into at least two image classifiers, wherein the at least two image classifiers respectively correspond to at least two classification levels, and the image features of the image classifiers corresponding to adjacent classification levels have similarity constraint relations and are used for reducing the similarity between the image features; and then obtaining a hierarchical classification result of the image to be classified according to the classification result of the image to be classified, which is output by the image classifier, on the corresponding classification hierarchy. According to the scheme, the similarity constraint relation is applied between the image features, so that the similarity between the image features of the image classifiers corresponding to adjacent classification levels can be reduced as much as possible, the image classifiers corresponding to different classification levels can pay attention to different image features on the same image to be classified, the images are classified on respective classification levels according to the corresponding image features, the accuracy of hierarchical classification of the images is improved, and the classification tasks of the image to be classified on a plurality of classification levels can be completed simultaneously.

Drawings

FIG. 1 is a diagram of an application environment for an image classification method in one embodiment;

FIG. 2 is a schematic diagram of an image classification task in one embodiment;

FIG. 3 is a flow diagram of an image classification method in one embodiment;

FIG. 4 is a flow diagram of the steps for constructing an image classifier in one embodiment;

FIG. 5 is a schematic diagram of image classification in one embodiment;

FIG. 6 is a flowchart illustrating steps for acquiring sample image features in one embodiment;

FIG. 7 is a schematic diagram of an interface showing image information in one embodiment;

FIG. 8 is a schematic diagram of image classification in one example of application;

FIG. 9 is a block diagram of an image classification apparatus in one embodiment;

Fig. 10 is an internal structural view of a computer device in one embodiment.

Detailed Description

The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.

The image classification method provided by the application can be applied to an application environment shown in fig. 1, and fig. 1 is an application environment diagram of the image classification method in one embodiment. Wherein the terminal 110 may communicate with the server 120 through a network. The terminal 110 may be, but not limited to, various personal computers, notebook computers, smartphones, tablet computers, and portable wearable devices, and the server 120 may be implemented as a stand-alone server or a server cluster composed of a plurality of servers.

The application provides an image classification method, and relates to the technical field of artificial intelligence. Wherein, artificial intelligence (ARTIFICIAL INTELLIGENCE, AI for short) is a theory, method, technique and application system that can simulate, extend and expand human intelligence using a digital computer or a machine controlled by a digital computer, and that can sense the environment, acquire knowledge and use the knowledge to obtain the best result.

The artificial intelligence technology includes Computer Vision technology (CV) which means that a camera, a Computer and other terminal devices can be used to replace human eyes to perform machine Vision such as recognition, tracking and measurement on a target, and further perform graphic processing, so that the Computer is processed into an image more suitable for human eyes to observe or transmit to an instrument to detect. While computer vision techniques may include techniques such as image recognition and image classification, for example, recognizing that the image is an image of an automobile, an image of a cat, or an image of a dog, etc.

The machine learning technology is combined into the computer vision technology, so that the terminal equipment can intelligently classify the images to be classified according to the learned image classification knowledge. Furthermore, the terminal device can be enabled to conduct hierarchical classification on the images to be classified.

The hierarchical classification task is described with reference to fig. 2, fig. 2 is a schematic diagram of an image classification task in an embodiment, and fig. 2 shows the distinction between a general classification task and a hierarchical classification task, in the general classification task, the status between the classes of each image is equal, the image classifier will not distinguish each class, that is, directly identify the image as a cat image, a dog image, a bicycle image or an automobile image, etc., but in reality, the relationship between different classes is different, for example, the relationship between the cat and the dog belongs to animal classes in four classes of cats, dogs, automobiles and bicycles, the relationship between the animal classes and the vehicle classes to which the automobiles and bicycles belong is relatively far. Therefore, the hierarchical classification task can judge whether the image belongs to an animal class or a non-animal class such as a vehicle class, and then further identify a cat, a dog, a bicycle or an automobile in the animal class and the vehicle class, wherein the animal class and the vehicle class belong to the same classification level, and can be called as a large class classification level; cats and dogs in animal class and bicycles and automobiles in vehicle class belong to another classification level, and can be a subclass classification level. The image classification method provided by the application can obtain the classification result of the image to be classified on each classification level to obtain the hierarchical classification result of the image to be classified, for example, the hierarchical classification result of the image to be classified belonging to animal class and cat image can be obtained.

The image classification method provided by the application can be applied to various content auditing and content understanding tasks including images, for example, for video type business, a video frame extraction strategy can be adopted to realize the content auditing and understanding of video business.

Specifically, the image classification method provided by the present application may be executed by the terminal 110 or the server 120 alone, or may be executed by the terminal 110 and the server 120 in cooperation.

Firstly, taking the example that the terminal 110 is independently executed as an example, the terminal 110 may obtain an image to be classified, and input at least two image features of the image to be classified into at least two image classifiers correspondingly; the at least two image classifiers may be preconfigured on the terminal 110, and correspond to at least two classification levels respectively, and have a similarity constraint relationship between image features input to the image classifier corresponding to an adjacent classification level, so as to reduce similarity between the image features; finally, the terminal 110 may obtain the hierarchical classification result of the image to be classified according to the classification result of the image to be classified on the corresponding classification hierarchy output by the image classifier.

The image classification method provided by the application can also be cooperatively executed by the terminal 110 and the server 120, specifically, the terminal 110 can acquire an image to be classified, the image to be classified is sent to the server 120, and the server 120 correspondingly inputs at least two image features of the image to be classified into at least two image classifiers; the at least two image classifiers may be preconfigured on the server 120, and correspond to at least two classification levels respectively, and have a similarity constraint relationship between image features input to the image classifier corresponding to an adjacent classification level, so as to reduce similarity between the image features; the server 120 may then send the classification result of the image to be classified output by each image classifier on the corresponding classification level to the terminal 110, and the terminal 110 may obtain the hierarchical classification result of the image to be classified according to the classification result.

In one embodiment, as shown in fig. 3, fig. 3 is a flow chart of an image classification method in one embodiment, and an image classification method is provided, and the method is applied to the terminal 110 in fig. 1 for illustration, and includes the following steps:

step S301, obtaining an image to be classified;

In this step, the terminal 110 may acquire an image to be classified. The image to be classified may be an image captured by the terminal 110 through an image capturing device such as a camera, or may be an image pre-stored in an electronic gallery of the terminal 110, where the image to be classified may include an object such as an animal or a plant. Specifically, the terminal 110 may capture, in real time, an image of the cat through its configured camera, where the image of the cat may be used as the image to be classified.

Step S302, at least two image features of the image to be classified are correspondingly input into at least two image classifiers;

In this step, the terminal 110 may input at least two image features of the image to be classified into at least two image classifiers correspondingly. The at least two image classifiers may be preconfigured on the terminal 110, where the at least two image classifiers correspond to at least two classification levels respectively, that is, different image classifiers correspond to different classification levels, as described in connection with fig. 2, and in the hierarchical classification task, the terminal 110 may be configured with, for example, two image classifiers, one for classifying at the level of "animal, vehicle", and the other for classifying at the level of "cat and dog under animal, bicycle and car under vehicle", and the like ".

Further, the image features of the image classifier corresponding to the adjacent classification hierarchy have a similarity constraint relationship, and the similarity constraint relationship is used for reducing the similarity between the image features of the image classifier corresponding to the adjacent classification hierarchy. Specifically, taking three-level classification tasks as examples for illustration, a first-level classification hierarchy is set as follows: a plant or animal; a second level of classification, for example, animal: a mammal or reptile; a third level of classification, for example reptiles: lizard or snake. In this case, two similarity constraint relationships are added, the first similarity constraint relationship is applied between the image features of the image classifiers corresponding to the first and second classification levels, and the second similarity constraint relationship is applied between the image features of the image classifiers corresponding to the second and third classification levels. Where image features may be generally represented by vectors, a similarity constraint relationship may be imposed on the vectors by reducing similarity between the vectors to reduce similarity between image features input to image classifiers corresponding to adjacent classification levels.

Such similarity constraint relationships may include, for example, mutual information constraint relationships or orthogonal constraint relationships. For the mutual information constraint relation, the similarity between two image features can be reduced by solving the mutual information between the image features of the image classifier corresponding to the adjacent classification level so as to minimize the mutual information between the two image features. Similarly, for the orthogonal constraint relationship, the similarity between two image features can be reduced by making the image features input to the image classifier corresponding to the adjacent classification hierarchy orthogonal to each other.

Through the above manner, the terminal 110 can decouple the two image features input to the image classifier corresponding to the adjacent classification level, so that the similar part between the image features of different types can be removed, the two image features respectively correspond to different characteristics on the image, and the image classifier can pay attention to the characteristics of different types on the image based on the input decoupled image features, and the image classifier can classify the image to be classified on the corresponding classification level better and more accurately by utilizing the different image features more suitable for the corresponding classification level, thereby completing the classification tasks of a plurality of classification levels.

Step S303, obtaining a hierarchical classification result of the image to be classified according to the classification result of the image to be classified on the corresponding classification hierarchy output by the image classifier.

In this step, the terminal 110 may obtain the classification result output by each image classifier, where the classification result may include the classification result of the image to be classified on each classification level, for example, in the case of having three image classifiers, the terminal 110 may obtain the classification results corresponding to the three classification levels output by the three image classifiers, respectively, and the terminal 110 may use the three classification results as the hierarchical classification result of the image to be classified, or may select the classification result of one of the classification levels as the hierarchical classification result required by the terminal 110, thereby the terminal 110 may complete the classification task of the image to be classified on multiple classification levels at the same time.

In the above image classification method, the terminal 110 acquires an image to be classified and inputs at least two image features of the image to be classified into at least two image classifiers corresponding to at least two classification levels respectively, and the image features of the image classifiers corresponding to adjacent classification levels have a similarity constraint relationship for reducing the similarity between the image features; the terminal 110 may then obtain a hierarchical classification result of the image to be classified according to the classification result of the image to be classified on the corresponding classification hierarchy output by the image classifier. According to the scheme, the similarity constraint relation is applied between the image features, so that the similarity between the image features of the image classifiers corresponding to adjacent classification levels can be reduced as much as possible, the image classifiers corresponding to different classification levels can pay attention to different image features on the same image to be classified, the images are classified on respective classification levels according to the corresponding image features, the accuracy of hierarchical classification of the images is improved, and the classification tasks of the image to be classified on a plurality of classification levels can be completed simultaneously.

In one embodiment, the inputting, in step S302, at least two image features of the image to be classified into at least two image classifiers, may include:

At least two image features are acquired through a pre-constructed feature extractor and are correspondingly input into at least two image classifiers.

In this embodiment, the terminal 110 may obtain at least two image features from the image to be classified by using a pre-constructed feature extractor, and input the at least two image features to the aforementioned at least two image classifiers. Wherein the feature extractor is constructed with the at least two image classifiers based on the similarity constraint relationship. The feature extractor may be implemented based on a neural network model. The terminal 110 may input the image to be classified to a feature extractor based on a neural network model, divide the image features output from the last convolution layer of the feature extractor into the aforementioned at least two image features, and input the same to at least two image classifiers.

According to the scheme of the embodiment, the terminal 110 can perform feature extraction on the image to be classified through the feature extractor obtained by combining training with at least two image classifiers in advance under the similarity constraint relationship, so that various image features which can be applied to classification of the image classifiers are obtained, the similarity between the image features does not need to be recalculated every time the image is performed, and the image classification efficiency is improved.

In one embodiment, further, the feature extractor may further include a feature extraction network and an encoder; the step of acquiring at least two image features through the pre-constructed feature extractor may specifically include:

Inputting the image to be classified into a feature extraction network to obtain initial image features output by the feature extraction network; inputting the initial image characteristics to an encoder to obtain encoded initial image characteristics output by the encoder; based on the encoded initial image features, at least two image features are acquired.

In this embodiment, the feature extractor may further include a feature extraction network and an encoder. The feature extraction network and the encoder may be implemented based on a neural network model, such as ResNet residual network model, among others.

The feature extraction network can be used for preliminarily acquiring the image features of the images to be classified, and the image features acquired by the feature extraction network are often high in dimension and contain redundant information. Therefore, the terminal 110 may input the image to be classified into the feature extraction network to obtain the initial image feature output by the feature extraction network, and then the terminal 110 further inputs the initial image feature into the encoder, where the terminal 110 may map the image feature to the encoding feature using the encoder, which may be used to reduce feature dimensions, and play a role in removing redundant information in the initial image feature. Thus, the terminal 110 acquires at least two image features from the encoded initial image features output from the encoder, based on the encoded initial image features. The feature extraction network, the encoder and at least one classifier are all obtained by training based on the similarity constraint relationship, and by adopting the scheme of the embodiment, the terminal 110 can directly use the trained feature extraction network to acquire initial image features when classifying images, and after feature dimension reduction is performed by using the encoder, the initial image features encoded by the encoder can be divided into at least two image features, and the image features input to the image classifier corresponding to the adjacent classification level have the constraint relationship, so that the efficiency and accuracy of image classification are improved.

In one embodiment, as shown in fig. 4, fig. 4 is a flow chart illustrating the steps of constructing an image classifier in one embodiment, before at least two image features are acquired by a pre-constructed feature extractor, the feature extractor and the image classifier may be constructed by:

step S401, acquiring a sample image and acquiring classification labels of the sample image on at least two classification levels as real classification labels of at least two image classifiers;

In this step, the terminal 110 may acquire a sample image, the number of which is generally plural. The terminal 110 also needs to acquire a classification label corresponding to the sample image, where the classification label includes a classification label of the sample image at each classification level. As described in connection with fig. 2, the terminal 110 may obtain a cat image as a sample image, and the terminal 110 may also need to obtain classification labels, i.e. "animals", "cats", on two classification levels corresponding to the cat image. Further, the terminal 110 may use the classification labels of the sample image on at least two classification levels as the true classification labels of at least two image classifiers for training the image classifiers and the feature extractor.

Step S402, inputting the sample image to a feature extractor, and acquiring at least two sample image features with the same dimension according to the image features of the sample image output by the feature extractor.

Referring to fig. 5, fig. 5 is a schematic diagram of image classification in one embodiment, in which the terminal 110 inputs a sample image to a feature extractor, the feature extractor outputs image features of the sample image, and the terminal 110 further divides the sample image features into at least two sample image features having the same dimensions. Specifically, the terminal 110 may divide the sample image features into image features a and image features B having the same dimensions. I.e., assuming that the dimension of the sample image feature is 2d, the terminal 110 splits the sample image feature into an image feature a and an image feature B having dimensions of 1d, respectively. For example, assuming that the image feature dimension of the sample image output by the feature extractor is a 2048-dimensional vector, the splitting process may be to correspond the first 1/2-dimensional vector, i.e., the 0 th to 1023 rd-dimensional vectors, of 2048 dimensions to the image feature a, and the last 1/2-dimensional vector, i.e., the 1024 th to 2047 th-dimensional vectors, to the image feature B.

In one embodiment, the inputting the sample image into the feature extractor in step S402 may specifically include: preprocessing a sample image to obtain a sample image with an image size being a preset image size; the sample image of the preset image size is input to a feature extractor.

In this embodiment, the terminal 110 may pre-process the sample image before inputting the sample image into the feature extractor, and may adjust the image size of the sample image to a preset image size by scaling the image, so as to input the sample image with the preset image size into the feature extractor. Specifically, for example, an image classifier, a feature extractor, and the like usually require an image of a fixed image size for model training, so that this embodiment can scale a sample image of an arbitrary image size to an image of an image size of, for example, 256×256, and then randomly crop out an image of a size of 224×224 as a sample image of a preset image size for training.

Step S403, inputting at least two sample image features into at least two image classifiers respectively, and obtaining prediction classification labels of sample images output by the at least two image classifiers on corresponding classification levels;

In this step, referring to fig. 5, the terminal 110 may input the image feature a to the image classifier a and the image feature B to the image classifier B. The image classifier A and the image classifier B are used for classifying the images to be classified on different classification levels. In the process of model construction, the image classifier A can predict a classification result according to the input image characteristic A to obtain a prediction classification label A, and similarly, the image classifier B can obtain a prediction classification label B according to the input image characteristic B. The predictive classification labels may correspond to probability values belonging to a certain class at the respective classification level. As described in detail with reference to fig. 2, the predictive classification label may be a probability value that the sample image belongs to an animal or a vehicle at the classification level of "animal, vehicle".

Step S404, constructing a similarity constraint relation between sample image features input to an image classifier corresponding to an adjacent classification hierarchy;

in this step, the terminal 110 constructs a similarity constraint relationship between the sample image features input to the image classifier corresponding to the adjacent classification hierarchy.

Step S405, training the feature extractor and the at least two image classifiers based on the true classification label, the prediction classification label, and the similarity constraint relationship, and constructing the feature extractor and the at least two image classifiers.

According to the technical scheme of the embodiment, the terminal 110 can perform joint training on the feature extractor and at least two image classifiers based on the real classification label, the prediction classification label and the similarity constraint relation of the sample image, and construct the feature extractor and the at least two image classifiers, so that the trained feature extractor can acquire at least two image features with the similarity constraint relation from the image to be classified, and the at least two image features can be input into the at least two image classifiers to be classified, so that the rapid and accurate classification of the image to be classified is realized.

In one embodiment, as shown in fig. 6, fig. 6 is a flow chart illustrating steps of acquiring features of a sample image in one embodiment, and the feature extractor may include a feature extraction network and an encoder; the step of inputting the sample image to the feature extractor in step S402, and obtaining at least two sample image features with the same dimensions according to the image features of the sample image output by the feature extractor may include:

step S601, inputting a sample image into a feature extraction network to obtain initial sample image features output by the feature extraction network;

step S602, inputting the initial sample image characteristics to an encoder to obtain encoded initial sample image characteristics output by the encoder;

Step S603, splitting the initial sample image feature into at least two sample image features with the same dimension.

In this embodiment, the terminal 110 may obtain the at least two sample image features based on the feature extraction network and the encoder included in the feature extractor. Referring to fig. 5, fig. 5 is a schematic diagram of image classification in an embodiment, the terminal 110 may input a sample image to the feature extraction network, the feature extraction network may be used to initially obtain an image feature from the sample image, the terminal 110 obtains an initial sample image feature output by the feature extraction network, as in the above embodiment, the image feature obtained by the feature extraction network often contains redundant information and has a relatively high feature dimension, so that the terminal 110 further inputs the initial sample image feature to the encoder, the encoder may be used to map the image feature to an encoding feature, so as to reduce the feature dimension of the initial sample image feature obtained by the feature extraction network, and remove the redundant information in the initial sample image feature, and finally the terminal 110 splits the encoded initial sample image feature output by the encoder into at least two sample image features with the same dimension. By adopting the scheme of the embodiment, the feature extraction network, the encoder and at least two image classifiers can be used together for model training based on the similarity constraint relation, so that the trained feature extraction network, encoder and at least two image classifiers are used as the whole of an image classification tool, and the images to be classified can be classified quickly and accurately.

In one embodiment, the training the feature extractor and the at least two image classifiers based on the true classification label, the predicted classification label, and the similarity constraint relationship in the step S405 may include:

Constructing first loss functions corresponding to two classification levels according to the real classification labels and the prediction classification labels to obtain at least two first loss functions; constructing a second loss function according to the similarity constraint relation; the feature extractor and the at least two image classifiers are trained based on the at least two first and second loss functions such that the at least two first and second loss functions are maximized.

The present embodiment provides a specific way of training a feature extractor and at least two image classifiers. Specifically, the terminal 110 may construct a first loss function based on the real classification label and the predictive classification label, where the first loss function may include a plurality of first loss functions, and each first loss function corresponds to a different classification level, for example, in a case where there are three classification levels, the first loss function includes three first loss functions. In addition, the terminal 110 also constructs a second loss function according to the similarity constraint relationship, that is, constructs the second loss function by inputting the similarity constraint relationship between the image features of the image classifier corresponding to the adjacent classification hierarchy. Accordingly, if there are two classification levels, the number of constructed second loss functions is one, and if there are three classification levels, the number of constructed second loss functions is two. Thus, the terminal 110 trains the feature extractor and the at least two image classifiers with the at least two first and second loss functions such that the at least two first and second loss functions are maximized.

Specifically, the image classification task of two classification levels is used for explanation, and corresponds to the large category classification and the sub-category classification, wherein, if the large category label is y _super and the sub-category label is y _sub, the first loss functions corresponding to the two classification levels are respectively:

Wherein L _super represents the first loss function of the large category, L _sub represents the first loss function of the sub-category, C _super represents the total amount of the large category (sub category), C _sub represents the total amount of the sub-category (sub category), A true class label representing a large class,A true class label representing a sub-class,A predictive classification label representing a large class, i.e. a probability value for an image to be classified belonging to class i under the large class,The predictive classification labels of the subcategories, i.e. the probability values under which the images to be classified belong to category i, are friends.

In addition, the image features input to the image classifiers of the large category and the sub-category are set to be E _α (x) and E _β (x), respectively. The similarity constraint relationship applied to the two image features may be a mutual information constraint relationship or an orthogonal constraint relationship. Taking the example of applying mutual information constraints, the corresponding second loss function is:

Where L _mul represents a second loss function, where r in the mutual information constraint represents a gradient inversion layer (GRADIENT REVERSAL LAYER) that functions to multiply the gradient by-1, i.e., the "inversion gradient", as the network counter-propagates the gradient. Wherein, as indicated by the letter, the element-wise multiplication is performed. Note that the range of values can reach minus infinity, and to avoid that the image classifier and feature extractor learn this, L2 Normalization (i.e., L2 Normalization) can be used to limit the range of values of the features for the features E _α (x) and E _β (x) after decoupling. In this way, the second loss function L _mul takes a minimum value of-1 only when the two image features are identical, i.e., E _α(x)＝＝E_β (x), and takes a maximum value of 0 when the two image features are completely orthogonal and different from each other. Since a gradient inversion layer is used, minimizing mutual information loss after gradient inversion is equivalent to maximizing the second loss function, i.e. requiring that the two image features reduce the degree of similarity as much as possible.

Finally, two first loss functions and one second loss function together, maximizing co-training the entire network based on l=l _super+L_sub+L_mul, including training the feature extractor and at least two image classifiers, and when the feature extractor includes the feature extraction network and the encoder, training the feature extraction network and the encoder and at least two image classifiers together.

the method comprises the steps of sending an image to be classified to a server, enabling the server to correspondingly input at least two image features of the image to be classified to at least two image classifiers, and obtaining a classification result of the image to be classified, which is output by the image classifier, on a corresponding classification level; and receiving the classification result obtained by the server.

Referring to fig. 1, the image classification processing in this embodiment is mainly performed by the server 120. Specifically, the terminal 110 may acquire an image to be classified and then transmit the image to be classified to the server 120, and the server 120 may be preconfigured with at least two image classifiers. After receiving the image to be classified, the server 120 correspondingly inputs at least two image features of the image to be classified into at least two image classifiers to obtain a classification result of the image to be classified on a corresponding classification level, which is output by the image classifier, and then sends the classification result to the terminal 110, and the terminal 110 receives the classification result sent by the server 120.

By adopting the technical solution of the present embodiment, the terminal 110 may transfer the task of image classification processing to the server 120 for processing, so as to reduce the data processing pressure of the terminal 110.

In one embodiment, as shown in fig. 7, fig. 7 is an interface schematic diagram showing image information in one embodiment, and after obtaining a hierarchical classification result of an image to be classified according to a classification result of the image to be classified on a corresponding classification level output by the image classifier in step S303, the method may further include the following steps:

acquiring image classification information carrying a hierarchical classification result; the image classification information is displayed on the image to be classified.

In this embodiment, the terminal 110 may directly display the classification result of the image to be classified on the image to be classified. Referring to fig. 7, an image 700 to be classified may be displayed on the terminal 110, and after the terminal 110 acquires the hierarchical classification result of the image 700 to be classified, the image classification information carrying the hierarchical classification result may be displayed in the information display area 710. The hierarchical classification result of the image to be classified 700 may include a major class classification result A1 and a minor class classification result B2 of the image to be classified. Specifically, assuming that the image 700 to be classified is a cat image, the image classification information presented by the terminal 110 may include a large class classification result: an animal; subclass classification results: a cat. By adopting the technical scheme of the embodiment, the hierarchical classification result can be displayed on the image to be classified in a superimposed manner, and the display efficiency of the hierarchical classification result is improved.

In order to more clearly illustrate the technical solution provided by the present application, the principle of image classification is described in detail with reference to fig. 8, and fig. 8 is a schematic diagram of the principle of image classification in an application example.

In general, the input image (x) may be an image of any size, and training of the model (including training of feature extraction networks, encoders, and major and minor classes of classifiers) generally requires the use of fixed image size images, so that an image of any image size may be adjusted to 256×256, and then an image of 224×224 image size may be randomly cropped therefrom as the image to be processed. The image features f (x) may then be extracted using a feature extraction network. The picture features are then mapped to coding features E (x) using an encoder, which can be seen as image features before decoupling. The encoded feature is then divided into two-part features, corresponding to the two-part decoupling features. The first partial characteristic E _α (x) is used for training the large-class classifier, the second partial characteristic E _β (x) is used for training the sub-class classifier, and meanwhile mutual information constraint is applied between the two decoupling characteristics, so that the similarity between the two decoupling characteristics is reduced. It should be noted that, the two parts of features extracted from the image to be processed and input to the major class classifier and the sub class classifier do not have any correlation characteristic, that is, only the image to be processed is provided for the major class classifier and the sub class classifier, no special processing is required to be performed on the image features of the image in advance, the training of the model including the feature extraction network, the encoder, the major class classifier, the sub class classifier and the like can be automatically completed by inputting the image to be classified into the model, and the hierarchical classification of the image can be realized based on the trained model.

Specifically, assuming that the input image is x, its major class label is y _super, and its sub class label is y _sub, a feature extraction network is used to extract image features, and the feature extraction network is not specifically limited, for example, various neural network models may be used. Generally, the output of the last convolutional layer in the neural network can be employed as the picture decimation feature f (x). Next, the image extraction feature f (x) is mapped to a coding feature E (x) with a dimension of 2d using an encoder. The encoder structure may employ a single full link layer (fully connected layer). Then, the 2d dimension encoding feature E (x) is split into two parts of the same dimension: e (x) → [ E _α(x);E_β (x) ]. Wherein, the features E _α (x) and E _β (x) are 1d dimension features, which are used for large category classification and sub-category classification, respectively. The corresponding loss functions are respectively:

Wherein L _super represents a large class of loss function, L _sub represents a sub-class of loss function, C _super represents a large class (sub-category) class total, C _sub represents a sub-class (sub-category) class total, A true class label representing a large class,A true class label representing a sub-class,A predictive classification label representing a large category, i.e. a probability value for an image belonging to category i under the large category,The predictive classification label representing a subcategory, i.e. the probability value that an image belongs to category i under that subcategory.

In addition, to ensure that as different image features as possible can be learned between features E _α (x) and E _β (x), a mutual information constraint relationship is imposed between the two image features:

Where L _mul represents the mutual information loss function, where r in the mutual information constraint represents the gradient inversion layer (GRADIENT REVERSAL LAYER) that functions to multiply the gradient by-1, i.e., the "inversion gradient," as the gradient is counter-propagated by the network. Wherein, as indicated by the letter, the element-wise multiplication is performed. Note that the range of values can reach minus infinity, and to avoid the network learning this while training, L2 Normalization (i.e., L2 Normalization) can be used to limit the range of values of the features for the features E _α (x) and E _β (x) after decoupling.

In this case, the mutual information loss function takes a minimum value of-1 only when the two image features are identical, i.e., E _α(x)＝＝E_β (x), and takes a maximum value of 0 when the two image features are completely orthogonal and different from each other. Since a gradient inversion layer is used, minimizing mutual information loss after gradient inversion is equivalent to maximizing the mutual information loss function, i.e., requiring that the two image features reduce the degree of similarity as much as possible. Finally, the three loss functions can be used together to co-train the entire network (including the feature extraction network, the encoder, and the major and minor classes classifier): l=l _super+L_sub+L_mul.

According to the technical scheme provided by the application example, the image features of the image can be decoupled into two parts of image features suitable for large-class classification and subcategory classification, and meanwhile, mutual information constraint is used for reducing the similarity degree between the two parts of image features, so that the two parts of image features pay attention to different characteristics in the image as much as possible, and hierarchical classification tasks are completed better.

It should be understood that, although the steps in the flowcharts of fig. 3 to 6 are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least a portion of the steps of fig. 3 to 6 may include a plurality of steps or stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the execution of the steps or stages is not necessarily sequential, but may be performed in turn or alternately with at least a portion of the steps or stages of other steps or steps.

In one embodiment, as shown in fig. 9, fig. 9 is a block diagram of an image classification apparatus in one embodiment, where an image classification apparatus is provided, and the apparatus may use a software module or a hardware module, or a combination of the two, which is a part of a computer device, where the apparatus 900 specifically includes:

an image acquisition module 901, configured to acquire an image to be classified;

The feature input module 902 is configured to input at least two image features of an image to be classified into at least two image classifiers correspondingly; the at least two image classifiers correspond to the at least two classification levels respectively; the image features input to the image classifier corresponding to the adjacent classification level have a similarity constraint relationship and are used for reducing the similarity between the image features;

The result obtaining module 903 is configured to obtain a hierarchical classification result of the image to be classified according to the classification result of the image to be classified on the corresponding classification hierarchy output by the image classifier.

In one embodiment, the feature input module 902 is further configured to obtain at least two image features through a pre-constructed feature extractor, and input the at least two image features to at least two image classifiers correspondingly; the feature extractor and the at least two image classifiers are constructed based on a similarity constraint relationship.

In one embodiment, the feature extractor includes a feature extraction network and an encoder; the feature input module 902 is further configured to input an image to be classified into a feature extraction network, so as to obtain an initial image feature output by the feature extraction network; inputting the initial image characteristics to an encoder to obtain encoded initial image characteristics output by the encoder; based on the encoded initial image features, at least two image features are acquired.

In one embodiment, the apparatus 900 may further include:

The classifier construction module is used for acquiring a sample image and acquiring classification labels of the sample image on at least two classification levels as real classification labels of at least two image classifiers; inputting the sample image into a feature extractor, and acquiring at least two sample image features with the same dimension according to the image features of the sample image output by the feature extractor; respectively inputting at least two sample image features into at least two image classifiers, and obtaining prediction classification labels of sample images output by the at least two image classifiers on corresponding classification levels; constructing a similarity constraint relation between sample image features input to the image classifier corresponding to the adjacent classification level; the feature extractor and the at least two image classifiers are trained based on the true classification labels, the predictive classification labels, and the similarity constraint relationship to construct the feature extractor and the at least two image classifiers.

In one embodiment, the feature extractor includes a feature extraction network and an encoder; the classifier construction module is further configured to: inputting the sample image into a feature extraction network to obtain initial sample image features output by the feature extraction network; inputting the initial sample image characteristics to an encoder to obtain encoded initial sample image characteristics output by the encoder; splitting the initial sample image features into at least two sample image features of the same dimension.

In one embodiment, the classifier construction module is further to: constructing first loss functions corresponding to two classification levels according to the real classification labels and the prediction classification labels to obtain at least two first loss functions; constructing a second loss function according to the similarity constraint relation; the feature extractor and the at least two image classifiers are trained based on the at least two first and second loss functions such that the at least two first and second loss functions are maximized.

In one embodiment, the similarity constraint relationship comprises a mutual information constraint relationship or an orthogonal constraint relationship.

In one embodiment, the classifier construction module is further to: preprocessing a sample image to obtain a sample image with an image size being a preset image size; a sample image of a preset image size is input to a feature extractor.

In one embodiment, the apparatus 900 may further include:

the information display module is used for acquiring image classification information carrying a hierarchical classification result; the image classification information is displayed on the image to be classified.

In one embodiment, the feature input module 902 is further configured to send the image to be classified to a server, so that the server correspondingly inputs at least two image features of the image to be classified to at least two image classifiers, and obtains a classification result of the image to be classified on a corresponding classification level, which is output by the image classifier; and receiving the classification result obtained by the server.

For specific limitations of the image classification apparatus, reference may be made to the above limitations of the image classification method, and no further description is given here. The respective modules in the above-described image classification apparatus may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In one embodiment, a computer device is provided, which may be a terminal, and an internal structure diagram thereof may be as shown in fig. 10, and fig. 10 is an internal structure diagram of the computer device in one embodiment. The computer device includes a processor, a memory, a communication interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless mode can be realized through WIFI, an operator network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement an image classification method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, can also be keys, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.

It will be appreciated by those skilled in the art that the structure shown in FIG. 10 is merely a block diagram of some of the structures associated with the present inventive arrangements and is not limiting of the computer device to which the present inventive arrangements may be applied, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.

In an embodiment, there is also provided a computer device comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the steps of the method embodiments described above when the computer program is executed.

In one embodiment, a computer-readable storage medium is provided, storing a computer program which, when executed by a processor, implements the steps of the method embodiments described above.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, or the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory. By way of illustration, and not limitation, RAM can be in various forms such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), etc.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The above examples illustrate only a few embodiments of the application, which are described in detail and are not to be construed as limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of protection of the present application is to be determined by the appended claims.

Claims

1. A method of classifying images, the method comprising:

acquiring a sample image and a classification label of large-class classification and a classification label of sub-class classification of the sample image, wherein the classification label and the classification label of the sub-class classification are respectively used as a real classification label of an image classifier of the large-class classification and a real classification label of the sub-class classification;

Inputting the sample image to a feature extraction network of a feature extractor to obtain initial sample image features output by the feature extraction network; inputting the initial sample image characteristics to an encoder of the characteristic extractor to obtain encoded initial sample image characteristics output by the encoder; splitting the initial sample image features into first partial features and second partial features with the same dimensions;

Inputting the first partial features and the second partial features into the image classifier of the large category classification and the image classifier of the sub-category classification respectively, and acquiring a prediction classification label of a sample image output by the image classifier of the large category classification in the large category classification and a prediction classification label of a sample image output by the image classifier of the sub-category classification in the sub-category classification;

Constructing a similarity constraint relationship between the first partial feature and the second partial feature;

constructing first loss functions respectively corresponding to the large category classification and the sub-category classification according to the real classification label and the prediction classification label to obtain two first loss functions; constructing a second loss function according to the similarity constraint relation; the similarity constraint relationship is a mutual information constraint relationship, and the second loss function is ，Representing the second loss function in question,Representing the gradient inversion layer,Representing an element-by-element multiplication operation,A first partial characteristic is indicated and is indicated,Representing a second partial feature;

training the feature extractor and the two image classifiers based on the two first loss functions and the second loss function such that the two first loss functions and the second loss function are maximized, constructing the feature extractor, the large class classified image classifier, and the sub-class classified image classifier;

acquiring an image to be classified;

Inputting the image to be classified into a feature extraction network of the pre-constructed feature extractor to obtain initial image features output by the feature extraction network, inputting the initial image features into an encoder of the feature extractor, and performing coding mapping on the initial image features through the encoder to obtain coded features with redundant information removed; the feature dimension of the coding feature is lower than the feature dimension of the initial image feature; dividing the coding features into two image features with the same dimension;

correspondingly inputting the two image features into two image classifiers; the two image classifiers are respectively corresponding to the two classification levels of the large category classification and the sub-category classification;

2. The method of claim 1, wherein the feature extractor is implemented based on a neural network model.

3. The method of claim 1, wherein the inputting the sample image to a feature extraction network of a feature extractor comprises:

preprocessing the sample image to obtain a sample image with an image size being a preset image size;

and inputting the sample image with the preset image size into a feature extraction network of the feature extractor.

4. The method according to claim 1, wherein the step of obtaining the hierarchical classification result of the image to be classified according to the classification result of the image to be classified on the corresponding classification level output by the image classifier comprises:

acquiring image classification information carrying the hierarchical classification result;

and displaying the image classification information on the image to be classified.

5. The method of claim 1, wherein the inputting the two image features into the two image classifiers comprises:

The image to be classified is sent to a server, so that the server correspondingly inputs the two image features of the image to be classified to the two image classifiers to obtain a classification result of the image to be classified, which is output by the image classifier, on a corresponding classification level;

and receiving the classification result obtained by the server.

6. An image classification apparatus, the apparatus comprising:

The classifier construction module is used for acquiring a sample image and a classification label of large class classification and a classification label of sub class classification of the sample image, and respectively used as a real classification label of the image classifier of the large class classification and a real classification label of the sub class classification; inputting the sample image to a feature extraction network of a feature extractor to obtain initial sample image features output by the feature extraction network; inputting the initial sample image characteristics to an encoder of the characteristic extractor to obtain encoded initial sample image characteristics output by the encoder; splitting the initial sample image features into first partial features and second partial features with the same dimensions; inputting the first partial features and the second partial features into the image classifier of the large category classification and the image classifier of the sub-category classification respectively, and acquiring a prediction classification label of a sample image output by the image classifier of the large category classification in the large category classification and a prediction classification label of a sample image output by the image classifier of the sub-category classification in the sub-category classification; constructing a similarity constraint relationship between the first partial feature and the second partial feature; constructing first loss functions respectively corresponding to the large category classification and the sub-category classification according to the real classification label and the prediction classification label to obtain two first loss functions; constructing a second loss function according to the similarity constraint relation; the similarity constraint relationship is a mutual information constraint relationship, and the second loss function is ，Representing the second loss function in question,Representing the gradient inversion layer,Representing an element-by-element multiplication operation,A first partial characteristic is indicated and is indicated,Representing a second partial feature;

the image acquisition module is used for acquiring images to be classified;

The feature input module is used for inputting the image to be classified into a feature extraction network of the feature extractor, obtaining initial image features output by the feature extraction network, inputting the initial image features into an encoder of the feature extractor, and performing coding mapping on the initial image features through the encoder to obtain coded features with redundant information removed; the feature dimension of the coding feature is lower than the feature dimension of the initial image feature; dividing the coding features into two image features with the same dimension; correspondingly inputting the two image features into two image classifiers; the two image classifiers are respectively corresponding to the two classification levels of the large category classification and the sub-category classification;

7. The apparatus of claim 6, wherein the feature extractor is implemented based on a neural network model.

8. The apparatus of claim 6, wherein the classifier construction module is further to:

9. The apparatus of claim 6, wherein the apparatus further comprises:

the information display module is used for acquiring image classification information carrying the hierarchical classification result;

10. The apparatus of claim 6, wherein the feature input module is further configured to:

and receiving the classification result obtained by the server.

11. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any one of claims 1 to 5 when the computer program is executed.

12. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the method of any one of claims 1 to 5.