[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN117540291A - Classification model processing method, content classification device and computer equipment - Google Patents

Classification model processing method, content classification device and computer equipment Download PDF

Info

Publication number
CN117540291A
CN117540291A CN202311375744.7A CN202311375744A CN117540291A CN 117540291 A CN117540291 A CN 117540291A CN 202311375744 A CN202311375744 A CN 202311375744A CN 117540291 A CN117540291 A CN 117540291A
Authority
CN
China
Prior art keywords
classification
classification model
training
network
sub
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311375744.7A
Other languages
Chinese (zh)
Inventor
林志文
韩金伟
鄢科
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202311375744.7A priority Critical patent/CN117540291A/en
Publication of CN117540291A publication Critical patent/CN117540291A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/042Knowledge-based neural networks; Logical representations of neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application relates to a classification model processing method, apparatus, computer device, storage medium and computer program product. To pre-training models in large models and machine learning techniques in artificial intelligence, including: inputting sample content into a first classification model, and generating first classification features through a multi-level sub-network in the first classification model; inputting sample content into a first classification model to obtain intermediate features which are extracted by multi-stage sub-networks shielding an adaptation network respectively; inputting sample content into a second classification model, generating second classification characteristics through multi-level substructures in the second classification model, and obtaining intermediate characteristics extracted by each level of substructures; training a second classification model according to the difference between the first classification characteristic and the second classification characteristic and the difference between the substructures and the intermediate characteristics respectively extracted by the corresponding sub-networks; and training the adaptive network based on the classification result obtained by classifying the first classification features. By adopting the method, the classification accuracy can be improved.

Description

Classification model processing method, content classification device and computer equipment
Technical Field
The present invention relates to the field of computer technology, and in particular, to a classification model processing method, a content classification apparatus, a computer device, a storage medium, and a computer program product.
Background
With the development of computer technology, there is a technology of classifying contents, and a technology of classifying contents such as images or texts, thereby determining the category to which the contents belong
In the conventional technology, for a classification task under a specific scene, content under the specific scene is generally adopted, a classification model is trained based on a machine learning method, and the trained classification model is used for classifying the content.
However, the classification model is generally trained based on the machine learning method by adopting the content in the specific scene, so that the knowledge learned by the trained classification model has limitation, thereby resulting in lower classification accuracy.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a classification model processing method, a content classification method, an apparatus, a computer device, a computer-readable storage medium, and a computer program product that are capable of improving classification accuracy.
In one aspect, the present application provides a classification model processing method, including: acquiring sample content, inputting the sample content into a first classification model, and extracting features through a multi-level sub-network in the first classification model to generate first classification features; each level of the sub-network comprises a pre-training structure, and at least one level of sub-network in the multi-level sub-network further comprises an adaptive network to be trained; under the condition of shielding the adaptation network, inputting the sample content into the first classification model, and extracting features through the multi-stage sub-network shielded by the adaptation network to obtain intermediate features extracted by each stage of sub-network; inputting the sample content into a second classification model, performing feature extraction through a multi-level sub-structure corresponding to the multi-level sub-network in the second classification model to generate a second classification feature, and obtaining an intermediate feature extracted by each level of sub-structure; training the second classification model according to the difference between the first classification characteristic and the second classification characteristic and the difference between at least one level of sub-structure and the intermediate characteristic respectively extracted by the corresponding sub-network; and acquiring a classification result obtained by classifying according to the first classification characteristic, and training an adaptation network in the first classification model based on the classification result and the sample label of the sample content.
On the other hand, the application also provides a classification model processing device, which comprises: the first classification characteristic generation module is used for acquiring sample content, inputting the sample content into a first classification model, and generating a first classification characteristic by carrying out characteristic extraction through a multi-level sub-network in the first classification model; each level of the sub-network comprises a pre-training structure, and at least one level of sub-network in the multi-level sub-network further comprises an adaptive network to be trained; the intermediate feature extraction module is used for inputting the sample content into the first classification model under the condition of shielding the adaptation network, and extracting features through the multi-stage sub-network shielding the adaptation network to obtain intermediate features extracted by each stage of sub-network; the second classification characteristic generation module is used for inputting the sample content into a second classification model, carrying out characteristic extraction on a multi-level sub-structure corresponding to the multi-level sub-network in the second classification model to generate a second classification characteristic, and obtaining an intermediate characteristic extracted by each level of sub-structure; the second classification model training module is used for training the second classification model according to the difference between the first classification characteristic and the second classification characteristic and the difference between the intermediate characteristics respectively extracted by at least one level of sub-structure and the corresponding sub-network; the first classification model training module is used for acquiring a classification result obtained by classifying according to the first classification characteristic and training an adaptation network in the first classification model based on the classification result and the sample label of the sample content.
In some embodiments, the classification model processing device further comprises a progress determination module for determining a current training progress; the second classification model training module is further configured to train the second classification model according to a difference between the first classification feature and the second classification feature and a difference between at least one level of sub-structure and intermediate features respectively extracted by corresponding sub-networks when a current training progress reaches the first preset progress.
In some embodiments, the second classification model training module is further configured to train the second classification model based on a difference between the at least one level of sub-structure and the intermediate features respectively extracted by the corresponding sub-network, if the current training progress does not reach the first preset progress; and returning to the step of acquiring sample content to iteratively train the first classification model and the second classification model.
In some embodiments, the classification result obtained by classifying according to the first classification feature is a first classification result; the second classification model training module is further used for acquiring a classification result obtained by classifying according to the second classification characteristics under the condition that the current training progress does not reach the first preset progress to obtain the second classification result; generating at least one intermediate loss value according to the difference between the intermediate features respectively extracted by the at least one level of sub-structure and the corresponding sub-network; generating a classification loss value of a second classification model according to the second classification result and the sample label of the sample content; training the second classification model based on the at least one intermediate loss value and a classification loss value of the second classification model.
In some embodiments, the second classification model training module is further configured to determine, for each level of the substructure, a difference between the substructure and a feature value at each location in the intermediate features respectively extracted by the corresponding sub-network; and obtaining the intermediate loss value corresponding to the substructure according to the difference value between the characteristic values at each position.
In some embodiments, the second classification model training module is further configured to generate at least one intermediate loss value according to a difference between the at least one level of sub-structure and the intermediate feature respectively extracted by the corresponding sub-network when the current training progress reaches the first preset progress; obtaining a classification result obtained by classifying according to the second classification characteristic, obtaining a second classification result, and generating a classification loss value of a second classification model according to the second classification result and a sample label of the sample content; generating a classification feature loss value according to the difference between the first classification feature and the second classification feature; training the second classification model based on the at least one intermediate loss value, the classification loss value of the second classification model, and the classification feature loss value.
In some embodiments, the second classification model training module is further configured to normalize each feature value in the first classification feature to obtain a first probability distribution; normalizing each characteristic value in the second classification characteristic to obtain a second probability distribution; a classification feature loss value is generated based on a difference between the first probability distribution and the second probability distribution.
In some embodiments, the classification model processing device further comprises a progress determination module for determining a current training progress; the first classification model training module is further used for training the adaptation network in the first classification model based on the classification result and the sample label of the sample content under the condition that the current training progress does not reach the second preset progress; and returning to the step of acquiring sample content to iteratively train the first classification model and the second classification model.
In some embodiments, the first classification model training module is further configured to generate at least one intermediate loss value according to a difference between the at least one level of sub-structure and the intermediate feature respectively extracted by the corresponding sub-network when the current training progress reaches a second preset progress; generating a classification loss value of a first classification model according to the classification result and the sample label of the sample content; training an adaptation network in the first classification model based on the at least one intermediate loss value and a classification loss value of the first classification model; and returning to the step of acquiring sample content to iteratively train the first classification model and the second classification model until the training ending condition is met, and stopping iteration.
In some embodiments, the model complexity of the second classification model is smaller than the model complexity of the first classification model, the pre-training structure in the first classification model is used for respectively extracting features in multiple tasks, and the sample content and the sample label are used for training the second classification model for executing preset classification tasks.
In another aspect, the present application further provides a computer device, including a memory and a processor, where the memory stores a computer program, and the processor implements the steps in the classification model processing method when executing the computer program.
In another aspect, the present application also provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the classification model processing method described above.
In another aspect, the present application also provides a computer program product comprising a computer program which, when executed by a processor, implements the steps of the classification model processing method described above.
According to the classification model processing method, the device, the computer equipment, the storage medium and the computer program product, as the intermediate features extracted by the sub-network are generated through the pre-training structure, the intermediate features extracted by the sub-network reflect the knowledge learned by the pre-training structure in the pre-training process, so that the second classification model is trained according to the difference between at least one level of sub-structure and the intermediate features respectively extracted by the corresponding sub-network, the knowledge learned by the pre-training structure of the sub-network can be transferred to the sub-structure, and the second classification model learns the knowledge learned by the pre-training structure; the method comprises the steps of obtaining a classification result obtained by classifying according to a first classification characteristic, training an adaptation network in a first classification model based on the classification result and a sample label of sample content, and training the adaptation network to enable the first classification model to learn the classification capability, so that according to the difference between the first classification characteristic and a second classification characteristic, training a second classification model to enable the second classification model to learn the knowledge under a classification scene. Therefore, the second classification model learns knowledge learned by the pre-training structure while learning knowledge in the classification scene, and the knowledge learned by the pre-training structure can widen the knowledge learned by the second classification model, so that the classification accuracy of the second classification model is improved.
In another aspect, the present application provides a content classification method, including: acquiring content to be classified; inputting the content into a trained second classification model, extracting features through a multi-level substructure in the second classification model to obtain classification features of the content, and classifying the content based on the classification features of the content to obtain a classification result of the content; the second classification model is obtained through training by the classification model processing method.
In another aspect, the present application further provides a content classification device, including: the content acquisition module is used for acquiring the content to be classified; the classification result obtaining module is used for inputting the content into a trained second classification model, extracting the characteristics through a multi-level substructure in the second classification model to obtain the classification characteristics of the content, and classifying the content based on the classification characteristics of the content to obtain the classification result of the content; the second classification model is obtained through training by the classification model processing method.
In some embodiments, the sample content in the classification model processing method provided by the application is a sample image; the content acquisition module is also used for acquiring the image to be audited; the classification result obtaining module is further used for inputting the image into a trained second classification model, extracting image features through a multi-level substructure in the second classification model, extracting the classification features of the image based on the extracted image features, and classifying the image based on the classification features of the image to obtain the classification result of the image; the content classification device further comprises an auditing result determining module, wherein the auditing result determining module is used for determining the auditing result of the image based on the classification result of the image.
In another aspect, the present application further provides a computer device, including a memory and a processor, where the memory stores a computer program, and the processor implements the steps in the content classification method described above when executing the computer program.
In another aspect, the present application also provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the content classification method described above.
In another aspect, the present application also provides a computer program product comprising a computer program which, when executed by a processor, implements the steps of the content classification method described above.
According to the content classification method, the content classification device, the computer equipment, the storage medium and the computer program product, as the intermediate features extracted by the sub-network are generated through the pre-training structure, the intermediate features extracted by the sub-network reflect the knowledge learned by the pre-training structure in the pre-training process, so that the second classification model is trained according to the difference between at least one level of sub-structure and the intermediate features respectively extracted by the corresponding sub-network, the knowledge learned by the pre-training structure of the sub-network can be transferred to the sub-structure, and the second classification model learns the knowledge learned by the pre-training structure; the method comprises the steps of obtaining a classification result obtained by classifying according to a first classification characteristic, training an adaptation network in a first classification model based on the classification result and a sample label of sample content, and training the adaptation network to enable the first classification model to learn the classification capability, so that according to the difference between the first classification characteristic and a second classification characteristic, training a second classification model to enable the second classification model to learn the knowledge under a classification scene. Therefore, the second classification model learns knowledge learned by the pre-training structure while learning knowledge in the classification scene, and knowledge learned by the second classification model can be widened from the knowledge learned by the pre-training structure, so that the classification accuracy of the second classification model is improved, and the content is classified by the trained second classification model, so that the classification accuracy of the content is improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the related art, the drawings that are required to be used in the embodiments or the related technical descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to the drawings without inventive effort for a person having ordinary skill in the art.
FIG. 1 is a diagram of an application environment for a content classification method in one embodiment;
FIG. 2 is a flow diagram of a classification model processing method in one embodiment;
FIG. 3A is a schematic diagram of a first classification model according to an embodiment;
FIG. 3B is a schematic diagram of the architecture of a subnetwork and an adaptation network in one embodiment;
FIG. 3C is a schematic diagram of a first classification model according to an embodiment;
FIG. 3D is a schematic diagram of model training in one embodiment;
FIG. 4 is a flow chart of a classification model processing method according to another embodiment;
FIG. 5 is a schematic diagram of a timing relationship of tasks according to another embodiment;
FIG. 6 is a flow diagram of a content classification method in one embodiment;
FIG. 7 is a block diagram showing a structure of a classification model processing apparatus in one embodiment;
FIG. 8 is a block diagram of a content classification device in one embodiment;
FIG. 9 is an internal block diagram of a computer device in one embodiment;
fig. 10 is an internal structural view of a computer device in one embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.
The classification model processing method provided by the embodiment of the application can be applied to an application environment shown in fig. 1. Wherein the terminal 102 communicates with the server 104 via a network. The data storage system may store data that the server 104 needs to process. The data storage system may be integrated on the server 104 or may be located on a cloud or other network server.
Specifically, the server 104 may obtain sample content, input the sample content into a first classification model, and perform feature extraction through a multi-level sub-network in the first classification model to generate a first classification feature; wherein each level of subnetworks comprises a pre-training structure, and at least one level of subnetworks in the multi-level subnetworks further comprises an adaptation network to be trained. The server 104 may mask the adaptation network, and in the case of masking the adaptation network, input the sample content into the first classification model, and perform feature extraction through multiple levels of sub-networks that mask the adaptation network, to obtain intermediate features extracted by each level of sub-network. The server 104 may input the sample content into a second classification model, perform feature extraction through a multi-level sub-structure corresponding to the multi-level sub-network in the second classification model to generate a second classification feature, and obtain an intermediate feature extracted by each level of sub-structure. The server 104 may train the second classification model according to the difference between the first classification feature and the second classification feature and the difference between the intermediate features respectively extracted by the at least one level of sub-structure and the corresponding sub-network, obtain a classification result obtained by classifying according to the first classification feature, and train the adaptation network in the first classification model based on the classification result and the sample label of the sample content. Server 104 may iteratively train the first classification model and the second classification model, and after training is complete, the second classification model may be used in the classification task. For example, the server may input the content to be classified into the second classification model to obtain a classification result of the content. The content to be classified may be acquired by the server 104 from other devices, or may be transmitted by the terminal 102 to the server 104.
The terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, internet of things devices, and portable wearable devices, where the internet of things devices may be smart speakers, smart televisions, smart air conditioners, smart vehicle devices, and the like. The portable wearable device may be a smart watch, smart bracelet, headset, or the like. The server 104 may be implemented by a stand-alone server or a server cluster formed by a plurality of servers, and may also be a cloud server, where the cloud server is configured to provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDN (Content Delivery Network ), and basic cloud computing services such as big data and artificial intelligence platforms. The terminal 102 and the server 104 may be directly or indirectly connected through wired or wireless communication, which is not limited herein.
The classification model processing method provided by the application can be based on artificial intelligence technology. Artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use the knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision. The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, pre-training model technologies, operation/interaction systems, mechatronics, and the like. The Pre-training model (Pre-training model), also called a kerbstone model and a large model, refers to a deep neural network (Deep neural network, DNN) with large parameters, trains massive unlabeled data, utilizes the function approximation capability of the large-parameter DNN to enable PTM to extract common features on the data, and is suitable for downstream tasks through fine tuning (fine tuning), efficient fine tuning (PEFT) of parameters, prompt-tuning and other technologies. Therefore, the pre-training model can achieve ideal effects in a small sample (Few-shot) or Zero sample (Zero-shot) scene. PTM can be classified according to the data modality of the process into a language model (ELMO, BERT, GPT), a visual model (swin-transducer, viT, V-MOE), a speech model (VALL-E), a multi-modal model (ViBERT, CLIP, flamingo, gato), etc., wherein a multi-modal model refers to a model that builds a representation of the characteristics of two or more data modalities. The pre-training model is an important tool for outputting Artificial Intelligence Generation Content (AIGC), and can also be used as a general interface for connecting a plurality of specific task models. For example, in the classification model processing method provided by the application, the first classification model may be a model obtained by adding an adaptive network on the basis of a general model, the general model belongs to a pre-training model,
Machine Learning (ML) is a multi-domain interdisciplinary, involving multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, etc. It is specially studied how a computer simulates or implements learning behavior of a human to acquire new knowledge or skills, and reorganizes existing knowledge structures to continuously improve own performance. Machine learning is the core of artificial intelligence, a fundamental approach to letting computers have intelligence, which is applied throughout various areas of artificial intelligence. Machine learning and deep learning typically include techniques such as artificial neural networks, confidence networks, reinforcement learning, transfer learning, induction learning, teaching learning, and the like. The pre-training model is the latest development result of deep learning, and integrates the technology. The second classification model and the first classification model may both be neural network models and are trained using machine learning techniques.
The classification model processing method provided by the application can be based on a pre-training model, for example, the first classification model can be regarded as being obtained by adding an adaptation network on the basis of a general model, and the general model belongs to the pre-training model and comprises a pre-training structure. The Pre-training model (Pre-training model), also called a matrix model and a large model, refers to a deep neural network (Deep neural network, DNN) with large parameters, trains massive unlabeled data, utilizes the function approximation capability of the large-parameter DNN to enable PTM to extract common features on the data, and is suitable for downstream tasks through fine tuning (fine tuning), efficient fine tuning (PEFT) of parameters, prompt-tuning and other technologies. Therefore, the pre-training model can achieve ideal effects in a small sample (Few-shot) or Zero sample (Zero-shot) scene. PTM can be classified according to the data modality of the process into a language model (ELMO, BERT, GPT), a visual model (swin-transducer, viT, V-MOE), a speech model (VALL-E), a multi-modal model (ViBERT, CLIP, flamingo, gato), etc., wherein a multi-modal model refers to a model that builds a representation of the characteristics of two or more data modalities. The pre-training model is an important tool for outputting Artificial Intelligence Generation Content (AIGC), and can also be used as a general interface for connecting a plurality of specific task models.
The classification model processing method can be used in a model compression and quantization scene, and the model compression and quantization refers to the reduction of the size of a model and the acceleration of model reasoning through the compression and quantization technology, so that the cost of the model in storage and calculation is reduced. Model compression typically includes pruning, low-rank decomposition, knowledge distillation, etc., and model quantization refers to converting floating point number parameters in a model to fixed point number or integer parameters, thereby reducing model size and speeding up model reasoning. For example, the classification model processing methods provided herein may be used to improve the process of knowledge distillation (Knowledge Distillation). Knowledge distillation is a model compression technique aimed at transferring knowledge of a teacher model to a student model, which is smaller and less complex than the teacher model. The purpose of knowledge distillation is to reduce the computational and storage costs of the model while maintaining the model's performance. In knowledge distillation, the output of a teacher model is typically used as a target for a student model that is trained to minimize the difference between its output and the output of the teacher model. Knowledge distillation may be applied to a variety of tasks including, but not limited to, image classification, object detection, or natural language processing. However, in knowledge distillation, distillation is generally started from a teacher network of a specific task, and knowledge learned by the teacher network is transferred to a student network, so that a student model only learns knowledge of a characteristic task and cannot learn general knowledge. Generic knowledge refers to knowledge that can be applied to multiple tasks or to multiple tasks, and the lack of delivery of such knowledge can result in model overfitting or lack of generalization capability.
In the classification model processing method provided by the application, the task is a classification task, and can be regarded as a new knowledge distillation method taking the first classification model as a teacher network and the second classification model as a student network, in the new knowledge distillation method, distillation is not needed from the teacher network with a specific task, but distillation is started from the teacher network with a pre-training structure, so that the student network learns general knowledge, specifically, a classification result obtained by classifying according to a first classification feature is obtained, and based on the classification result and a sample label of sample content, an adaptation network in the first classification model is trained, so that the first classification model (the teacher network) learns knowledge of the classification task, and the first classification model (the teacher network) simultaneously has the general knowledge and knowledge of the classification task, and the second classification model (the student network) is trained according to the difference between the first classification feature and the second classification feature, so that the second classification model (the student network) learns knowledge of the first classification model (the teacher network) on the classification task. The sub-network of the first classification model comprises a pre-training structure, parameters of the pre-training structure are obtained through pre-training, so that universal knowledge can be trained and learned in the pre-training process, namely the pre-training structure can learn the universal knowledge, and therefore intermediate features obtained by feature extraction through the multi-stage sub-network shielding the adaptive network contain the universal knowledge. Therefore, the student model can learn not only the knowledge of the teacher model on the characteristic task, but also the general knowledge learned by the teacher model, so that distillation of the general knowledge and the special knowledge is realized, and the performance of the second classification model (student network) is improved. Proprietary knowledge refers to knowledge that is used for a particular task.
In knowledge distillation, a teacher network of a feature task is usually trained first, and then knowledge distillation is performed, that is, the knowledge distillation process is usually divided into two stages, wherein the first stage is a teacher network training stage, and in the first stage, the teacher network is trained so that the teacher network achieves higher performance on the feature task, and because the teacher network usually has a complex structure and more parameters, the first stage usually needs to occupy more computing resources and takes longer time. The second phase is a student network training phase in which the student network is trained by knowledge distillation techniques. Because the teacher network needs to be trained first and then the student network needs to be trained, the training time is longer, so that more computer resources are occupied in the training process, and the training mode can limit the application of knowledge distillation aiming at the scene that the computer resources are limited or the training time is strictly required.
In the classification model processing method provided by the application, in the knowledge distillation process, namely in the process of training the second classification model based on the first classification model, the first classification model is trained simultaneously, namely the adaptive network in the first classification model is trained simultaneously, so that the first classification model transmits the learned knowledge to the second classification model in the process of learning the knowledge in the classification task, the training time is shortened, the computer resources occupied in the training process are saved, and the knowledge distillation technology can still be applied for the scene with limited computer resources or strict requirements on the training time, thereby expanding the application scene of the knowledge distillation.
In addition, as the pre-training structure in the first classification model is pre-trained, only parameters of the adaptive network are updated in the process of training the first classification model, so that the time for training the first classification model is shortened, the efficiency of training the first classification model is improved, and the computer resources occupied in the training process are further saved.
The scheme provided by the embodiment of the application relates to the technology of artificial intelligence such as machine learning, and is specifically described by the following embodiments:
in some embodiments, as shown in fig. 2, a classification model processing method is provided, which may be executed by a terminal or a server, or may be executed by the terminal and the server together, where the method is applied to the server 104 in fig. 1, and is illustrated as an example, and includes the following steps 202 to 210. Wherein:
step 202, acquiring sample content, inputting the sample content into a first classification model, and extracting features through a multi-level sub-network in the first classification model to generate first classification features; each level of subnetworks comprises a pre-training structure, and at least one level of subnetworks in the multi-level subnetworks further comprises an adaptation network to be trained.
Wherein the content may be in any form including, but not limited to, at least one of text, image, audio, or video. Sample content is content for training.
The first classification model is a model for classification. The first classification model includes a plurality of levels of subnetworks, where a plurality refers to at least two, such as 4, 5, or 6, without limitation. In the sub-networks of two adjacent levels, the data output by the sub-network of the previous level is transmitted into the sub-network of the next level, and as shown in fig. 3A, for example, the first classification model includes 4 sub-networks of the level, wherein the 4 sub-networks of the first classification model are respectively sub-networks 1 to 4, the sub-network 1 is adjacent to the sub-network 2, and the data output by the sub-network 1 is transmitted into the sub-network 2. Each sub-network includes a respective pre-training structure therein.
At least one of the subnetworks comprises an adaptation network, i.e. all or part of the subnetworks comprises an adaptation network, e.g. each subnetwork comprises a respective adaptation network. As shown in fig. 3A, the first classification model is shown to include multiple levels of sub-networks, sub-network 1 through sub-network 4, respectively, each including a respective pre-training architecture and adaptation network. The sub-network may comprise at least one adaptation network, for example two adaptation networks. The pre-training architecture may include a multi-headed attention network, at least one feed-forward neural network, at least one normalization layer, as shown in fig. 3B (a), which is a block diagram of a sub-network in some embodiments, including two adaptation networks, adaptation network 1 and adaptation network 2, respectively, and the pre-training architecture includes 2 normalization layers, 2 feed-forward neural networks, and a multi-headed attention network. The adaptive network may be a neural network, may include a feedforward neural network and may further include a nonlinear layer, as shown in (B) in fig. 3B, which is a block diagram of the adaptive network in some embodiments, the nonlinear layer may be implemented using a nonlinear function, the feedforward neural network 3 may be a network for downsampling or downsampling, the feedforward neural network 4 may be a network for upsampling or downsampling, and the feedforward neural network in the pre-training structure and the feedforward neural network in the adaptive network may be convolutional neural networks.
The structure of the adaptation network included in each sub-network may be the same or different, and in the case that the structure of the adaptation network is the same, the initial values of the parameters of the adaptation network may be the same or different. On the premise that the pre-training structure is obtained, the adaptation network in the sub-network needs to be trained. The first classification feature is a feature for generating a classification result corresponding to the sample content.
Specifically, the first classification characteristic may be a characteristic output by a sub-network of a last hierarchy of the multi-hierarchy sub-network. Alternatively, the first classification feature may be a feature generated by further processing the feature output by the sub-network of the last hierarchy, the further processing including at least one of linear or nonlinear operations, for example the further processing may be a convolution process.
In some embodiments, a first fully connected layer (Fully Connected Layer) may also be included in the first classification model. The first full-connection layer in the first classification model is located behind the multi-level sub-network, that is, the characteristics output by the sub-network of the last level are transmitted to the first full-connection layer. The first classification characteristic may be a characteristic of the first full connection layer output. The server can input sample content into the first classification model, sequentially extract features through multi-level sub-networks in the first classification model, input the features extracted by the sub-networks of the last level into the first full-connection layer after the features extracted by the sub-networks of the last level are obtained, and obtain the features output by the first full-connection layer to obtain the first classification features. In the process of training the adaptation network, the first full connection layer in the first classification model may be trained synchronously.
In step 204, under the condition of shielding the adaptive network, the sample content is input into the first classification model, and feature extraction is performed through the multi-stage sub-network shielding the adaptive network, so as to obtain intermediate features extracted by each stage of sub-network.
Wherein the adaptation network in the sub-network has a connection relationship with at least one pre-training structure. Masking the adaptation network refers to skipping the structure to which the adaptation network will be connected through the adaptation network. Fig. 3A shows the connection relationship of the sub-networks in the first classification model without shielding the adaptation network. Fig. 3C shows the connection of the sub-networks in the first classification model in the case of a masked adaptation network, and in fig. 3C, the pre-training structure in the sub-network of the adjacent hierarchy has a connection, for example, the pre-training structure 1 has a connection with the pre-training structure 2, and the feature output by the pre-training structure 1 is input to the pre-training structure 2. The intermediate features extracted by the sub-network are the features output by the sub-network. Each sub-network outputs a respective intermediate feature.
The modes of the first classification model may include a first mode in which the adaptation network in the first classification model is not masked and a second mode in which the adaptation network in the first classification model is masked. The first classification characteristic in step 202 is obtained when the first classification model is in the first mode.
Specifically, the server may set the mode of the first classification model as the first mode, and input the sample content to the first classification model, and generate the first classification feature by performing feature extraction through a multi-level sub-network in the first classification model. The server may set the mode of the first classification model to the second mode and input sample content to the first classification model through the screenAnd carrying out feature extraction on the multi-stage subnetworks which cover the adaptation network, and acquiring the features extracted by the subnetworks of each level respectively to obtain intermediate features output by each subnetwork respectively. As shown in fig. 3D, a first classification model with 4-layer sub-networks, sub-networks 1-4, respectively, is shown, the intermediate characteristics of the sub-network 1 output are f t1 The intermediate characteristic of the output of the sub-network 2 is f t2 The intermediate characteristic of the output of the sub-network 3 is f t3 The intermediate characteristic of the output of the sub-network 4 is f t4 . In fig. 3D, the dashed box in each sub-network represents one adaptation network, and each dashed box is enlarged and displayed to show the adaptation network represented by the dashed box, for example, the adaptation network represented by the dashed box in the sub-network 1 is the adaptation network 1, and the dashed line between the adaptation network 1 and the dashed box is the enlarged and displayed meaning.
And 206, inputting the sample content into a second classification model, performing feature extraction on the multi-level sub-structure corresponding to the multi-level sub-network in the second classification model to generate a second classification feature, and obtaining an intermediate feature extracted by each level of sub-structure.
Wherein the second classification model is a model for classification and the adaptation network in the second classification model and the first classification model is trained synchronously. The second classification model includes a plurality of levels of substructures, a plurality referring to at least two. The number of substructures included in the second classification model corresponds to the number of sub-networks included in the first classification model, e.g. 4 levels of sub-networks are included in the first classification model, then 4 levels of substructures are included in the second classification model. Intermediate features extracted from the substructures are features of the substructures output. Each sub-structure outputs a respective intermediate feature. As shown in fig. 3D, a second classification model with a 4-layer substructure is also shown, the 4-layer substructure being substructure 1-substructure 4, respectively, the intermediate feature of the output of substructure 1 being f s1 The intermediate feature of the output of substructure 2 is f s2 The intermediate feature of the output of substructure 3 is f s3 The intermediate feature of the output of substructure 4 is f s4
In particular, the second classification characteristic may be a characteristic output by a sub-structure of a last level in the multi-level sub-structure. Alternatively, the second classification feature may be a feature generated by further processing the feature output by the last level of substructures, the further processing including at least one of linear or nonlinear operations, for example, the further processing may be convolution. It should be noted that, the method for obtaining the first classification feature and the second classification feature needs to be consistent, for example, if the first classification feature is a feature output by a sub-network of a last level in the multi-level sub-network, the second classification feature is a feature output by a sub-structure of the last level in the multi-level sub-structure. If the first classification feature is a feature generated by further processing the feature output by the sub-network of the last hierarchy, the second classification feature is a feature generated by further processing the feature output by the sub-structure of the last hierarchy.
In some embodiments, a second full connection layer may be further included in the second classification model, where the second full connection layer is located after the multi-level substructure, that is, features of the final level substructure output are transferred to the second full connection layer. The second classification characteristic may be a characteristic of the second fully connected layer output. The server can input the sample content into the second classification model, perform feature extraction through a multi-level substructure in the second classification model, and after obtaining the features extracted by the substructure of the last level, input the features extracted by the substructure of the last level into the second full-connection layer, and obtain the features output by the second full-connection layer to obtain the second classification features. In training the second classification model, the multi-level substructures and fully connected layers in the second classification model are both required to be trained.
In some embodiments, the server may first perform content processing on the sample content to obtain a processing result corresponding to the sample content, input the processing result into the first classification model, and input the processing result into the second classification model. The content processing includes, but is not limited to, dividing the sample content into a plurality of parts, and the processing result includes each part obtained by dividing. For example, if the sample content is an image, the content processing may be to segment the image into a plurality of small images, and input the segmented small images into the first classification model and the second classification model.
Step 208, training a second classification model according to the difference between the first classification characteristic and the second classification characteristic and the difference between the at least one level of sub-structure and the intermediate characteristic respectively extracted by the corresponding sub-network.
The sub-structure corresponding to the sub-network refers to a sub-structure consistent with the level of the sub-network, for example, the sub-structure corresponding to the level 1 sub-network is the level 1 sub-structure. The at least one level of subnetworks may be a designated one or more subnetworks, a plurality referring to at least two. At least one level of subnetworks may also be all subnetworks.
Specifically, the server may generate a classification feature loss value according to a difference between the first classification feature and the second classification feature, generate at least one intermediate loss value according to a difference between the at least one primary substructure and the intermediate feature extracted by the corresponding sub-network, perform statistics on the at least one intermediate loss value and the classification feature loss value to obtain a model loss value of the second classification model, and update parameters of the second classification model by adopting a gradient descent algorithm towards a direction of reducing the model loss value of the second classification model so as to train the second classification model.
In some embodiments, during each training, the server may generate a classification characteristic loss value based on a difference between the first classification characteristic and the second classification characteristic. The difference between the first classification feature and the second classification feature is in positive correlation with the classification feature loss value. Specifically, the first classification feature and the second classification feature are the same in scale, for example, may be vectors of the same scale or matrices of the same size, for example, the first classification feature and the second classification feature are each matrices of m rows and n columns, and each element in the matrices may be referred to as a feature value. The server may calculate a difference between the feature values at the same position in the first classification feature and the second classification feature to obtain a difference corresponding to each position, for example, calculate a difference between the feature value of the ith row and the jth column in the first classification feature and the feature value of the ith row and the jth column in the second classification feature to obtain a difference corresponding to the ith row and the jth column. After obtaining the difference value corresponding to each position, the server can perform statistical operation on the difference value corresponding to each position to obtain the classification characteristic loss value. The statistical operation may be a summation calculation or a mean calculation.
In some embodiments, the server may normalize each feature value in the first classification feature to obtain a normalized feature corresponding to the first classification feature, and normalize each feature in the second classification feature to obtain a normalized feature corresponding to the second classification feature. The normalization process may be performed by any normalization method, but is not limited thereto, and may be performed by, for example, a formulaAnd carrying out normalization processing on the first classification characteristic. Wherein z represents a first classification feature, z i And z j Representing the feature values in the first classification feature, T represents the temperature at the time of normalization calculation for smoothing output. The server may normalize the second classification characteristic using the same method. The server may calculate a difference between the normalized feature corresponding to the first classification feature and the normalized feature corresponding to the second classification feature, to obtain a classification feature loss value. The normalized features corresponding to the first classification features are abbreviated as first normalized features, and the normalized features corresponding to the second classification features are abbreviated as second normalized features. The server can calculate the difference value between the numerical values at the same position in the first normalized feature and the second normalized feature to obtain a numerical value difference value corresponding to each position, and perform statistical operation on the numerical value difference value corresponding to each position to obtain a classification feature loss value. The statistical operation may be a summation calculation or a mean calculation. / >
In some embodiments, since each of the values in the first normalized feature and the second normalized feature is a value before 0-1 after normalization, both the first normalized feature and the second normalized feature can be considered as probability distributions, so the server can also calculate KL (Kullback-Leibler) divergence between the first normalized feature and the second normalized featureFor example, using formula L kd =L(p(z t ,T),p(z s T)) calculate a KL divergence between the first normalized feature and the second normalized feature, wherein L kd Represents the KL divergence, p (z) t T) represents a first normalized feature, p (z) s T) represents a second normalized feature, L (p (z) t ,T),p(z s T) represents calculating p (z) t T) and p (z) s KL divergence between T). The magnitude of the KL divergence is used to reflect the degree of difference between the first normalized feature and the second normalized feature, and the smaller the KL divergence, the smaller the difference between the first normalized feature and the second normalized feature. The server may consider the KL divergence between the first normalized feature and the second normalized feature as the classification feature loss value.
In some embodiments, for the same level of substructures and sub-networks, the server may calculate the difference between the intermediate features extracted by the substructures and the intermediate features extracted by the sub-networks, resulting in intermediate feature differences. The intermediate features extracted from the sub-network are abbreviated as first intermediate features, and the intermediate features extracted from the sub-structure are abbreviated as second intermediate features. The second intermediate feature and the first intermediate feature are matrices or vectors with the same scale, the server can calculate the difference value between the feature values at the same position in the second intermediate feature and the first intermediate feature, and the corresponding difference value of the position can be obtained according to the difference value between the feature values. The server may perform a statistical operation on the difference values corresponding to the respective positions to obtain an intermediate loss value. The statistical operation may be a sum operation or a mean operation. The server may train a second classification model based on the at least one intermediate loss value and the classification characteristic loss value.
In some embodiments, the server may use the difference between the feature values as the difference corresponding to the location, or the server may compare the difference between the feature values with a preset value, and determine the difference corresponding to the location according to the comparison result. The preset value may be set as required, and the preset value may be 1 or a value similar to 1, for example, 1.2 or 0.9, etc. The difference between the eigenvalues is less thanIn the case of a preset value, the difference between the feature values is subjected to a squaring operation, and the difference corresponding to the position is determined based on the result of the squaring operation, for example, the result of the squaring operation may be used as the difference corresponding to the position, or the result of the squaring operation may be adjusted by a preset first factor, for example, the result of the squaring operation is multiplied by the first factor, and the multiplied result is used as the difference corresponding to the position. Wherein the first factor is a value between 0 and 1, for example 0.5. In the case where the difference between the characteristic values is smaller than or equal to the preset value, the server may use the difference between the characteristic values as the difference corresponding to the position, or adjust the difference between the characteristic values by a preset second factor, for example, subtract the second factor on the basis of the difference between the characteristic values, and subtract the second factor from the difference between the characteristic values to obtain a result, as the difference corresponding to the position. Wherein the second factor is a value between 0 and 1, and the second factor may be the same as or different from the first factor, for example, the second factor may be 0.5. For example, the server may employ a formula Calculating a difference value corresponding to each position, wherein L feature (f ti ,f si ) Representing the difference corresponding to the i-th position. f (f) ti Representing a first intermediate feature f t Characteristic value f at the i-th position in (b) si Representing a second intermediate feature f s The characteristic value at the ith position in (a) represents a first factor, b is a second factor, and c represents a preset value.
Step 210, obtaining a classification result obtained by classifying according to the first classification feature, and training an adaptation network in the first classification model based on the classification result and a sample label of sample content.
The sample label of the sample content refers to a real classification result of the sample content, and the first classification model may further include a first classification layer, where the first classification layer is configured to classify a first classification feature output by the first full-connection layer to obtain a classification result, and the classification result obtained by classifying according to the first classification feature is referred to as a first classification result. The first classification layer may be implemented using any function capable of implementing classification, for example, may be implemented using Softmax. Sample content and sample labels for training a model that performs a preset classification task. The preset classification task may be, for example, a specific classification task in a specific scenario. Specific scenarios include, but are not limited to, content review scenarios or text archiving scenarios, and specific classification tasks include, but are not limited to, classifying images in a content review scenario to detect non-compliant images, classifying documents in a text archiving scenario into some of a specified number of categories, and so forth.
Specifically, the server may determine a classification loss value of the first classification model according to the first classification result and the sample label of the sample content, for example, the classification loss value of the first classification model may be calculated using a cross entropy loss function. For example, a formula can be usedCalculating a classification loss value of a first classification model, wherein M is the number of classes, y ic For the true probability that the sample content belongs to each category, the sample label comprises the true probability that the sample content belongs to each category, wherein the category with the true probability equal to 1 is the category to which the sample content truly belongs. The first classification result comprises the prediction probability of the sample content belonging to each category, p ic Representing the predicted probability that the sample content belongs to each category. L (L) cls1 Is the true sample content calculated classification loss value of the first classification model. If the model parameters are updated by using a plurality of sample contents, the sum of the classification loss values of the first classification model of each of the plurality of sample contents, namely, the sum of the classification loss values of the first classification model of each of the plurality of sample contents can be calculatedWherein N represents the number of sample contents, i is more than or equal to 1 and N is more than or equal to 1.
In some embodiments, the server may update parameters of the first classification model based on the classification loss value of the first classification model, e.g., may employ a gradient descent algorithm to update parameters of the adaptation network in the first classification model in a direction such that the classification loss value of the first classification model decreases, to train the adaptation network in the first classification model.
In the classification model processing method, because the intermediate features extracted by the sub-network are generated through the pre-training structure, the intermediate features extracted by the sub-network reflect the knowledge learned by the pre-training structure in the pre-training process, so that the second classification model is trained according to the difference between at least one level of sub-structure and the intermediate features respectively extracted by the corresponding sub-network, the knowledge learned by the pre-training structure of the sub-network can be transferred to the sub-structure, and the second classification model learns the knowledge learned by the pre-training structure; the method comprises the steps of obtaining a classification result obtained by classifying according to a first classification characteristic, training an adaptation network in a first classification model based on the classification result and a sample label of sample content, and training the adaptation network to enable the first classification model to learn the classification capability, so that according to the difference between the first classification characteristic and a second classification characteristic, training a second classification model to enable the second classification model to learn the knowledge under a classification scene. Therefore, the second classification model learns knowledge learned by the pre-training structure while learning knowledge in the classification scene, and the knowledge learned by the pre-training structure can widen the knowledge learned by the second classification model, so that the classification accuracy of the second classification model is improved.
In some embodiments, the method further comprises: determining the current training progress; training a second classification model based on differences between the first classification feature and the second classification feature, and differences between at least one level of substructures and intermediate features respectively extracted by corresponding sub-networks, comprising: under the condition that the current training progress reaches a first preset progress, training a second classification model according to the difference between the first classification characteristic and the second classification characteristic and the difference between the at least one level of sub-structure and the intermediate characteristic respectively extracted by the corresponding sub-network.
Specifically, the current training progress refers to the training progress at the current moment. The server may count the number of times that the current time has been trained to obtain the current training number, and the server may determine the current training progress according to the current training number, for example, may use the current training number as the current training progress.
In some embodiments, the server may preset a training frequency threshold, where the training frequency threshold may be an empirical value, and the first classification model and the second classification model may have a better training effect when the training frequency reaches the training frequency threshold. The server may calculate a ratio of the current training number to the training number threshold, and use the ratio of the current training number to the training number threshold as the current training progress. For example, the training frequency threshold is 90 times, the current training frequency is 20 times, and the current training progress is not 2/9.
In some embodiments, in the case where the current training schedule is the current training number, the first preset schedule is a preset training number, and the first preset schedule is less than a training number threshold. Under the condition that the current training progress is the ratio of the current training times to the training times threshold value, the first preset progress is the preset ratio, and the first preset progress is smaller than 1, for example, can be 1/3.
In some embodiments, the server may generate the classification feature loss value based on a difference between the first classification feature and the second classification feature, and generate the at least one intermediate loss value based on a difference between the at least one level of sub-structure and the intermediate feature respectively extracted by the corresponding sub-network. The calculation process of the classification characteristic loss value and the intermediate loss value is described in the foregoing, and is not repeated here.
In some embodiments, in the case that the current training progress reaches the first preset progress, the server may adjust parameters of the second classification model according to the classification feature loss value and the at least one intermediate loss value, or the server may adjust parameters of the second classification model according to the classification feature loss value and the at least one intermediate loss value, for example, a gradient descent algorithm may be used to adjust parameters of the second classification model.
In this embodiment, under the condition that the current training progress reaches the first preset progress, the first classification model has learned knowledge in a relatively large number of classification scenes and the learned knowledge has high reliability, so that the second classification model is trained through the classification feature loss value, so that the knowledge in the classification scenes learned by the first classification model is transferred to the second classification model, and the training of the second classification model is accelerated and the accuracy of the training of the second classification model is improved.
In some embodiments, the method further comprises: under the condition that the current training progress does not reach the first preset progress, training a second classification model based on the difference between at least one level of substructures and intermediate features respectively extracted by corresponding sub-networks; returning to the step of obtaining sample content to iteratively train the first classification model and the second classification model.
In particular, the server may generate at least one intermediate loss value based on differences between the at least one level of substructures and intermediate features respectively extracted by the corresponding sub-networks. And under the condition that the current training progress does not reach the first preset progress, adjusting the second classification model parameters based on the at least one intermediate loss value.
In this embodiment, under the condition that the current training progress does not reach the first preset progress, the knowledge of the classification scene learned by the first classification model is insufficient and the reliability is low, so that under the condition that the current training progress does not reach the first preset progress, the second classification model is not trained by using the classification characteristic loss value, but under the condition that the current training progress reaches the first preset progress, the second classification model is trained by using the classification characteristic loss value, and the training accuracy is improved.
In some embodiments, the classification result obtained by classifying according to the first classification feature is a first classification result; under the condition that the current training progress does not reach the first preset progress, training a second classification model based on the difference between the at least one level of substructures and the intermediate features respectively extracted by the corresponding sub-networks, wherein the training comprises the following steps: under the condition that the current training progress does not reach the first preset progress, a classification result obtained by classifying according to the second classification characteristics is obtained, and a second classification result is obtained; generating at least one intermediate loss value according to the difference between the intermediate features respectively extracted by the at least one level of sub-structure and the corresponding sub-network; generating a classification loss value of the second classification model according to the second classification result and the sample label of the sample content; the second classification model is trained based on the at least one intermediate loss value and the classification loss value of the second classification model.
The classification result obtained by classifying according to the second classification characteristic is simply referred to as a second classification result. The second classification model can also comprise a second classification layer, and the second classification layer is used for classifying the second classification features output by the second full-connection layer to obtain a second classification result. The second classification layer may be implemented using any classification-capable function, such as Softmax.
Specifically, the server may generate the classification loss value of the second classification model according to the second classification result and the sample label, and the method for generating the classification loss value of the second classification model may refer to the method for generating the classification loss value of the first classification model, which is not described herein.
In some embodiments, in the event that the current training schedule does not reach the first preset schedule, the server may adjust parameters of the second classification model based on the at least one intermediate loss value and the classification loss value of the second classification model to train the second classification model.
In this embodiment, under the condition that the current training progress does not reach the first preset progress, the second classification model is trained according to the intermediate loss value, so that the second classification model learns at least that the pre-training structure learns, and the second classification model is trained according to the classification loss value of the second classification model, so that the second classification model learns the knowledge under the classification scene, thereby enabling the second classification model to learn the knowledge quickly under the condition that the current training progress does not reach the first preset progress, and improving the training effect.
In some embodiments, generating at least one intermediate loss value from differences between the at least one level of substructures and intermediate features respectively extracted by the corresponding sub-networks comprises: and aiming at each level of the substructure, obtaining an intermediate loss value corresponding to the substructure based on the difference between the intermediate features respectively extracted by the substructure and the corresponding sub-network.
Specifically, for each level of substructures, the server may generate an intermediate loss value corresponding to the substructures. As in FIG. 3D, L feat1 、L feat2 、L feat3 、L feat4 Representing the corresponding intermediate loss values of substructures 1 through 4, respectively. The method for generating the intermediate loss value refers to the previous method for generating the intermediate loss value, and will not be described herein.
In this embodiment, for each level of sub-structure, the corresponding intermediate loss value is obtained, so that knowledge learned by the pre-training structure, for example, general knowledge, can be fully transferred to the second classification model according to the intermediate loss value corresponding to each sub-structure, thereby improving the training effect of the second classification model.
In some embodiments, generating at least one intermediate loss value from differences between the at least one level of substructures and intermediate features respectively extracted by the corresponding sub-networks comprises: for each level of substructures, determining differences between the substructures and feature values at each position in the intermediate features respectively extracted by the corresponding sub-networks; and obtaining an intermediate loss value corresponding to the substructure according to the difference value between the characteristic values at each position.
Specifically, for each location, the server may use the difference between the feature values at the location as the difference corresponding to the location, or the server may compare the difference between the feature values with a preset value, and determine the difference corresponding to the location according to the comparison result. The preset value may be set as required, and the preset value may be 1 or a value similar to 1, for example, 1.2 or 0.9, etc. When the difference between the feature values is smaller than the preset value, the difference corresponding to the position is determined by squaring the difference between the feature values based on the result of the squaring, for example, the result of the squaring may be used as the difference corresponding to the position, or the result of the squaring may be adjusted by a preset first factor, for example, the result of the squaring is multiplied by the first factor, and the multiplied result is used as the difference corresponding to the position. Wherein the first factor is a value between 0 and 1, for example 0.5. In the case where the difference between the characteristic values is smaller than or equal to the preset value, the server may use the difference between the characteristic values as the difference corresponding to the position, or adjust the difference between the characteristic values by a preset second factor, for example, subtract the second factor on the basis of the difference between the characteristic values, and subtract the second factor from the difference between the characteristic values to obtain a result, as the difference corresponding to the position. Wherein the second factor is a value between 0 and 1, which may be the same or different from the first factor, for example, the second factor is 0.5. The server can perform statistical operation on the difference values corresponding to the positions respectively to obtain an intermediate loss value. The statistical operation may be a sum operation or a mean operation.
In this embodiment, the intermediate loss value is obtained by the difference value between the feature values at each position, so that the accuracy and rationality of the intermediate loss value are improved.
In some embodiments, under the condition that the current training progress reaches the first preset progress, training the second classification model according to the difference between the first classification feature and the second classification feature and the difference between the at least one level of sub-structure and the intermediate feature respectively extracted by the corresponding sub-network, including: generating at least one intermediate loss value according to the difference between the intermediate features respectively extracted by the at least one level of sub-structure and the corresponding sub-network under the condition that the current training progress reaches a first preset progress; obtaining a classification result obtained by classifying according to the second classification characteristics, obtaining a second classification result, and generating a classification loss value of a second classification model according to the second classification result and a sample label of sample content; generating a classification feature loss value according to the difference between the first classification feature and the second classification feature; the second classification model is trained based on the at least one intermediate loss value, the classification loss value and the classification feature loss value of the second classification model.
Specifically, when the current training progress reaches the first preset progress, the server trains the second classification model according to the classification loss value of the second classification model in addition to training the second classification model according to the intermediate loss value and the classification characteristic loss value, and enables the second classification model to learn better knowledge in the classification scene, namely knowledge suitable for the classification scene, through the classification loss value of the second classification model.
In this embodiment, under the condition that the current training progress reaches the first preset progress, the second classification model can be trained according to the classification loss value of the second classification model in addition to training the second classification model according to the intermediate loss value and the classification feature loss value, and the knowledge of the second classification model under the better learning classification scene is the knowledge suitable for the classification scene through the classification loss value of the second classification model, so that the training effect of the second classification model is improved.
In some embodiments, generating the classification feature loss value from the difference between the first classification feature and the second classification feature comprises: normalizing each characteristic value in the first classification characteristic to obtain first probability distribution; normalizing each characteristic value in the second classification characteristic to obtain a second probability distribution; a classification feature loss value is generated based on a difference between the first probability distribution and the second probability distribution.
The first probability distribution is the first normalized feature, and the second probability distribution is the second normalized feature.
Specifically, the server may calculate a KL divergence between the first probability distribution and the second probability distribution, the KL divergence characterizing a degree of difference between the first probability distribution and the second probability distribution, the greater the KL divergence, the greater the degree of difference. The server may determine a classification feature loss value according to the calculated KL divergence, where the classification feature loss value and the KL divergence form a positive correlation.
In this embodiment, since the first probability distribution is generated by normalizing the first classification feature and the second probability distribution is generated by normalizing the second classification feature, the difference between the first probability distribution and the second probability distribution can reflect the difference between the first classification feature and the second classification feature, and the difference between the reaction features can be better according to the difference between the probability distributions, so that the classification feature loss value is generated according to the difference between the probability distributions, and the accuracy of the classification feature loss value is improved.
In some embodiments, the method further comprises: determining the current training progress; training an adaptation network in a first classification model based on classification results and sample labels of sample content, comprising: under the condition that the current training progress does not reach the second preset progress, training an adaptation network in the first classification model based on the classification result and a sample label of sample content; returning to the step of obtaining sample content to iteratively train the first classification model and the second classification model.
The classification result refers to a first classification result. The second preset progress is greater than the first preset progress, for example, the first preset progress is 1/3, and the second preset progress is 2/3.
Specifically, in the case that the current training progress does not reach the second preset progress, the server may generate a classification loss value of the first classification model according to the first classification result and the sample label, for example, may generate a classification loss value of the first classification model according to a cross entropy loss function, and update the adaptation network in the first classification model according to the classification loss value of the first classification model. It should be noted that, in the training process, only the parameters of the adaptation network are updated, and the parameters of the pre-training structure in the first classification model are not updated.
In this embodiment, under the condition that the current training progress does not reach the second preset progress, according to the first classification structure and the sample label, the adaptation network in the first classification model is trained, so that the adaptation network learns knowledge in the classification scene, and the first classification model has knowledge in the classification scene.
In some embodiments, the method further comprises: generating at least one intermediate loss value according to the difference between the intermediate features respectively extracted by the at least one level of sub-structure and the corresponding sub-network under the condition that the current training progress reaches a second preset progress; generating a classification loss value of the first classification model according to the classification result and the sample label of the sample content; training an adaptation network in the first classification model based on the at least one intermediate loss value and the classification loss value of the first classification model; and returning to the step of acquiring the sample content to iteratively train the first classification model and the second classification model until the training ending condition is met, and stopping iteration.
The classification result refers to a first classification result. The training end condition refers to a condition for stopping training, and is preset, for example, the training end condition is that the training times reach a training times threshold. The training ending condition may also be that the classification accuracy of each of the first classification model and the second classification model reaches a preset accuracy, and the preset accuracy may be set as required, for example, 90% or 95% or the like. And under the condition that the training times do not reach the training times threshold, but the respective classification accuracy of the first classification model and the second classification model reach the preset accuracy, determining that the training ending condition is met, and stopping training. Or under the condition that the training times reach the training times threshold value, determining that the training ending condition is met, and stopping training. Specifically, in case the current training schedule reaches the second preset schedule, the adaptation network in the first classification model is trained by at least one intermediate loss value in addition to the adaptation network in the first classification model by the classification loss value of the first classification model.
In this embodiment, in addition to training the adaptation network in the first classification model by the classification loss value of the first classification model, the adaptation network in the first classification model is also trained by at least one intermediate loss value under the condition that the current training progress reaches the second preset progress. Therefore, the first classification model and the second classification model can be mutually supervised, so that the first classification model and the second classification model learn knowledge in a classification scene better, the learned knowledge is suitable for the classification scene, and the classification accuracy of the first classification model and the second classification model is improved.
In some embodiments, the model complexity of the second classification model is less than the model complexity of the first classification model, the pre-training structure in the first classification model is used for respectively extracting features in a plurality of tasks, and the sample content and the sample label are used for training the second classification model for executing the preset classification task.
In particular, model complexity may be measured by the number of parameters that the model includes, the greater the number of parameters that are included, the greater the model complexity. The preset classification task may be, for example, a specific classification task in a specific scenario. Specific scenarios include, but are not limited to, content review scenarios or text archiving scenarios, and specific classification tasks include, but are not limited to, classifying images in a content review scenario to detect non-compliant images, classifying documents in a text archiving scenario into some of a specified number of categories, and so forth. Under the condition that the adaptive network is shielded in the first classification model, the multi-level sub-network in the first classification model can be used for classifying in various tasks to extract the characteristics, namely, the multi-level sub-network in the first sub-classification model can be used as a shared network and shared by a plurality of tasks.
In this embodiment, since the pre-training structure in the first classification model is used for extracting features in multiple tasks respectively, the pre-training structure in the first classification model has knowledge applicable to multiple tasks, so that the first classification model trains out the second classification model for executing the preset classification task, the knowledge learned by the pre-training structure in the first classification model can be transferred to the second classification model, the knowledge learned by the second classification model is more comprehensive, and the classification accuracy of the second classification model in the preset classification task can be improved.
In some embodiments, as shown in fig. 4, a classification model processing method is provided, which may be executed by a terminal or a server, or may be executed by the terminal and the server together, and the method is applied to the server in fig. 1, which is described as an example, and includes the following steps 402 to 424. Wherein:
step 402, sample content is obtained.
And step 404, under the condition that the adaptive network is not shielded, inputting the sample content into a first classification model, extracting the characteristics through a multi-level sub-network in the first classification model to generate first classification characteristics, and classifying the first classification characteristics to obtain a first classification result.
Wherein each level of subnetworks comprises a pre-training structure, and at least one level of subnetworks in the multi-level subnetworks further comprises an adaptation network to be trained.
Step 406, under the condition of shielding the adaptive network, inputting the sample content into a first classification model, and extracting the characteristics through multi-level sub-networks in the first classification model to obtain the intermediate characteristics respectively extracted by each sub-network.
Step 408, inputting the sample content into a second classification model, performing feature extraction through a multi-level sub-structure corresponding to the multi-level sub-network in the second classification model to generate a second classification feature, obtaining intermediate features extracted by each level of sub-structure, and classifying the second classification feature to obtain a second classification result.
Step 410, generating at least one intermediate loss value based on the difference between the intermediate features respectively extracted by the at least one level of sub-structure and the corresponding sub-network, generating a classification loss value of the second classification model based on the second classification result and the sample label of the sample content, generating a classification feature loss value according to the difference between the first classification feature and the second classification feature, and generating a classification loss value of the first classification model according to the first classification result and the sample label of the sample content.
It should be noted that the classification feature loss value may be generated before the current training progress reaches the first preset progress, or may be generated after the current training progress reaches the first preset progress.
Step 412, determining whether the current training progress reaches the first preset progress, if not, entering step 414, if yes, entering step 416.
Step 414, training the second classification model according to the at least one intermediate loss value and the classification loss value of the second classification model, training the adaptation network in the first classification model according to the classification loss value of the first classification model, and returning to step 402.
Wherein, during the iterative process, the sample content obtained in step 402 may be different for each time.
Step 416, it is determined whether the current training progress reaches the second preset progress, if not, step 418 is entered, and if yes, step 420 is entered.
Step 418, training the second classification model according to the at least one intermediate loss value, the classification loss value of the second classification model, and the classification feature loss value, and training the adaptation network in the first classification model according to the classification loss value of the first classification model, and returning to step 402.
Step 420, training the second classification model according to the at least one intermediate loss value, the classification loss value of the second classification model, and the classification feature loss value, and training the adaptation network in the first classification model based on the at least one intermediate loss value and the classification loss value of the first classification model.
When training is performed according to different types of loss values, the weights of the different types of loss values can be determined, the loss values are weighted to obtain total loss values, and then a model is trained through the total loss values, wherein the weights of the different types of loss values can be preset and can be dynamically learned in the training process.
The training model parameters according to each class of loss values can be regarded as a task, for example, training a first classification model according to the classification loss values of a first classification model, can be regarded as a task 1, training a second classification model according to at least one intermediate loss value, can be regarded as a task 2, training the second classification model according to the classification loss values of the second classification model, can be regarded as a task 3, training the second classification model according to the classification feature loss values, can be regarded as a task 4, training the first classification model according to at least one intermediate loss value, and can be regarded as a task 5, the 5 tasks have a sequential relationship in time sequence, as shown in fig. 5, the sequential relationship in time sequence is shown, as can be seen from fig. 5, training is divided into 3 stages, the tasks 1, 2, and 3 are synchronous in time sequence, the task 4 is later than the tasks 1 to 3, and the task 5 is later than the task 4.
Step 422, it is determined whether the training end condition is satisfied, if not, the procedure returns to step 402, and if yes, the procedure proceeds to step 424.
Step 424, stopping training.
According to the classification model processing method, under the condition that the current training progress does not reach the first preset progress, the second classification model is trained according to the intermediate loss value and the classification loss value of the second classification model, so that the second classification model learns knowledge learned by the pre-training structure and learns knowledge under classification scenes, namely classification tasks, and an adaptation network in the first classification model is trained according to the classification loss value of the first classification model, so that the first classification model learns knowledge under classification scenes; because the knowledge in the classification scene learned by the first classification model is insufficient under the condition that the current training progress does not reach the first preset progress, and the first classification model has learned more knowledge in the classification scene under the condition that the current training progress reaches the first preset progress, and because the first classification model has a pre-training structure, the first classification model has stronger knowledge in the classification scene than the second classification model, so that under the condition that the current training progress reaches the first preset progress, the second classification model is trained through the classification loss value of the intermediate loss value and the classification loss value of the second classification model, and the second classification model is trained through the classification feature loss value, so that the knowledge in the classification scene learned by the first classification model is transmitted to the second classification model, the training of the second classification model is accelerated, and the training accuracy of the second classification model is improved; under the condition that the current training progress reaches the second preset progress, besides training the adaptation network in the first classification model based on the classification loss value of the first classification model, training the adaptation network in the first classification model through the intermediate loss value, wherein the intermediate loss value is used for training the first classification model and the second classification model at the same time, so that the first classification model and the second classification model can be mutually supervised, knowledge under a better learning classification scene of the first classification model and the second classification model can be enabled, the learned knowledge is applicable to the classification scene, and classification accuracy of the first classification model and the second classification model is improved.
In some embodiments, as shown in fig. 6, a content classification method is provided, which may be performed by a terminal or a server, or may be performed by the terminal and the server together, and the method is applied to the server in fig. 1, for example, and includes the following steps 602 to 604. Wherein:
in step 602, the content to be classified is obtained.
Step 604, inputting the content into the trained second classification model, extracting features through a multi-level substructure in the second classification model to obtain classification features of the content, and classifying based on the classification features of the content to obtain classification results of the content.
The second classification model is obtained through training by the classification model processing method. The content to be classified may be arbitrary content.
Specifically, the content to be classified may be content from a user, and the server may input the content from the user into the trained second classification model to classify the content, so as to obtain a classification result of the content. In the content auditing scene, the server can also determine whether the content from the user is legal or not according to the classification result of the content. In the content classification method, because the intermediate features extracted by the sub-network are generated through the pre-training structure, the intermediate features extracted by the sub-network reflect the knowledge learned by the pre-training structure in the pre-training process, so that the knowledge learned by the pre-training structure can be transmitted to the sub-structure by training the second classification model according to the difference between at least one level of sub-structure and the intermediate features respectively extracted by the corresponding sub-network, so that the knowledge learned by the pre-training structure is learned by the second classification model; the method comprises the steps of obtaining a classification result obtained by classifying according to a first classification characteristic, training an adaptation network in a first classification model based on the classification result and a sample label of sample content, and training the adaptation network to enable the first classification model to learn the classification capability, so that according to the difference between the first classification characteristic and a second classification characteristic, training a second classification model to enable the second classification model to learn the knowledge under a classification scene. Therefore, the second classification model learns knowledge learned by the pre-training structure while learning knowledge in the classification scene, and knowledge learned by the second classification model can be widened from the knowledge learned by the pre-training structure, so that the classification accuracy of the second classification model is improved, and the content is classified by the trained second classification model, so that the classification accuracy of the content is improved.
In some embodiments, the sample content in the classification model processing method provided by the application is a sample image; the obtaining of the content to be classified comprises the following steps: acquiring an image to be audited; inputting the content into a trained second classification model, extracting the characteristics through a multi-level substructure in the second classification model to obtain classification characteristics of the content, classifying the content based on the classification characteristics of the content, and obtaining a classification result of the content comprises the following steps: inputting the image into a trained second classification model, extracting image features through a multi-level substructure in the second classification model, extracting classification features of the image based on the extracted image features, and classifying based on the classification features of the image to obtain a classification result of the image; the method further comprises the steps of: and determining the auditing result of the image based on the classification result of the image.
The sample content used in the training process may be an image, and the sample image refers to an image used for training. The sample label comprises probabilities corresponding to a plurality of preset image categories respectively, and the probability corresponding to the preset image type in the sample label refers to the probability that the sample image truly belongs to the preset image category. The classification result of the image comprises prediction probabilities corresponding to a plurality of preset image categories respectively, wherein the prediction probabilities are the probabilities predicted by the second classification model. In the content inspection scene, the preset image category can comprise pornography, gambling, legal category and the like
Specifically, the server may determine the maximum prediction probability from the classification result of the image, determine the preset image category corresponding to the maximum prediction probability as the category of the image, determine that the auditing result is auditing passing if the category of the image is legal, and determine that the auditing result is auditing failing if the category of the image is illegal, such as pornography.
In this embodiment, since the second classification model learns knowledge in both the classification scene and the pre-training structure, sufficient knowledge is learned, so that the image is classified by the second classification model, and the image classification accuracy is improved.
The content classification method provided by the application can be applied to any application scene for content classification, including but not limited to a content auditing scene or a text classification scene. In the content auditing scene, the pictures, the characters and the videos need to be audited, for example, the pictures, the characters and the videos uploaded by the user are audited, and illegal information is timely found, so that the illegal information is prevented from being further spread on the network.
For example, aiming at an image auditing scene in a content auditing scene, a server can acquire a sample image, under the condition of not screening an adapting network, input the sample image into a first classification model, extract image features through multi-stage sub-networks in the first classification model to generate a first classification feature, classify the first classification feature to generate a first classification result, under the condition of screening the adapting network, input the sample image into the first classification model, extract image features through multi-stage sub-networks in the first classification model, obtain intermediate features respectively extracted by each sub-network, input the sample image into a second classification model, extract image features through multi-stage sub-structures corresponding to the multi-stage sub-networks in the second classification model, generate a second classification feature, acquire intermediate features respectively extracted by each stage sub-structure, classify the second classification feature to generate a second classification result, generate at least one intermediate loss value based on the difference between the intermediate features respectively extracted by at least one stage sub-structure and the corresponding sub-network, generate a sample label based on the second classification result and the sample image, generate a second classification loss value according to the first classification model, and a training loss value according to the first classification model, if the first classification loss value is reached, the first classification loss value is generated according to the first classification loss value is obtained, and the first classification loss value is obtained by training model is obtained, if yes, judging whether the current training progress reaches a second preset progress, if not, training a second classification model according to at least one intermediate loss value, the classification loss value of the second classification model and the classification feature loss value, training an adaptation network in the first classification model according to the classification loss value of the first classification model, acquiring a new sample image, and performing the next training round, if yes, training the second classification model according to at least one intermediate loss value, the classification loss value of the second classification model and the classification feature loss value, training the adaptation network in the first classification model based on at least one intermediate loss value and the classification loss value of the first classification model, judging whether the training end condition is met, if not, acquiring a new sample image, performing the next training round, if yes, stopping training, and obtaining a trained second classification model and a trained first classification model. The server can acquire the image uploaded by the user, input the image into a trained second classification model, extract the characteristics of the image through a multi-level substructure in the second classification model, obtain the classification characteristics of the image, classify the image based on the classification characteristics of the image, obtain the classification result of the image, and determine the image auditing result based on the classification result.
For example, the server may obtain a sample text, in the case of an unmasked adaptation network, input the sample text into a first classification model, extract text features through multi-level sub-networks in the first classification model to generate a first classification feature, classify the first classification feature to obtain a first classification result, in the case of a masked adaptation network, input the sample text into the first classification model, extract text features through multi-level sub-networks in the first classification model, obtain intermediate features extracted by each sub-network respectively, input the sample text into a second classification model, extract text features through multi-level sub-structures corresponding to multi-level sub-networks in the second classification model, generate a second classification feature, obtain intermediate features extracted by each level sub-structure respectively, classify the second classification feature to obtain a second classification result, generate at least one intermediate loss value based on differences between the intermediate features extracted by at least one level sub-structure and the corresponding sub-network respectively, generate a classification loss value of the second classification model based on the sample label of the second classification result and the sample text, if the first classification loss value reaches the first classification loss value and the second classification value, if the first classification loss value reaches the current training value, then the first classification loss value is obtained by the first classification model, if not, training a second classification model according to the at least one intermediate loss value, the classification loss value of the second classification model and the classification feature loss value, training an adaptation network in the first classification model according to the classification loss value of the first classification model, acquiring a new sample text, and performing the next training round, if yes, training the second classification model according to the at least one intermediate loss value, the classification loss value of the second classification model and the classification feature loss value, training the adaptation network in the first classification model according to the at least one intermediate loss value and the classification loss value of the first classification model, judging whether the training end condition is met, if not, acquiring the new sample text, performing the next training round, if yes, stopping training, and obtaining the trained second classification model and the trained first classification model. In the text classifying scene, the server can input the text to be classified into a trained second classifying model, perform feature extraction through a multi-level substructure in the second classifying model to obtain classifying features of the text, classify the text based on the classifying features of the text to obtain a classifying result of the text, and classify the text based on the classifying result.
It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.
Based on the same inventive concept, the embodiment of the application also provides a classification model processing device for realizing the above-mentioned classification model processing method. The implementation of the solution provided by the device is similar to the implementation described in the above method, so the specific limitation in the embodiments of the classification model processing device or devices provided below may refer to the limitation of the classification model processing method hereinabove, and will not be described herein.
In some embodiments, as shown in fig. 7, there is provided a classification model processing apparatus including: a first classification feature generation module 702, an intermediate feature extraction module 704, a second classification feature generation module 706, a second classification model training module 708, and a first classification model training module 710, wherein:
the first classification feature generation module 702 is configured to obtain sample content, input the sample content into a first classification model, and perform feature extraction through a multi-level sub-network in the first classification model to generate a first classification feature; each level of subnetworks comprises a pre-training structure, and at least one level of subnetworks in the multi-level subnetworks further comprises an adaptation network to be trained.
The intermediate feature extraction module 704 is configured to input the sample content into the first classification model under the condition of shielding the adaptation network, and perform feature extraction through a multi-level sub-network that shields the adaptation network, so as to obtain intermediate features extracted by each level of sub-network.
The second classification feature generating module 706 is configured to input the sample content into a second classification model, perform feature extraction through a multi-level sub-structure corresponding to the multi-level sub-network in the second classification model to generate a second classification feature, and obtain an intermediate feature extracted by each level of sub-structure.
A second classification model training module 708, configured to train a second classification model according to the differences between the first classification feature and the second classification feature, and the differences between the at least one level of sub-structure and the intermediate features respectively extracted by the corresponding sub-networks.
The first classification model training module 710 is configured to obtain a classification result obtained by classifying according to the first classification feature, and train the adaptation network in the first classification model based on the classification result and the sample label of the sample content.
In some embodiments, the classification model processing device further includes a progress determination module, for determining a current training progress; the second classification model training module 708 is further configured to train the second classification model according to a difference between the first classification feature and the second classification feature and a difference between at least one level of sub-structure and intermediate features respectively extracted by the corresponding sub-network when the current training progress reaches the first preset progress.
In some embodiments, the second classification model training module 708 is further configured to train the second classification model based on a difference between the at least one level of sub-structure and the intermediate features respectively extracted by the corresponding sub-network if the current training progress does not reach the first preset progress; returning to the step of obtaining sample content to iteratively train the first classification model and the second classification model.
In some embodiments, the classification result obtained by classifying according to the first classification feature is a first classification result; the second classification model training module 708 is further configured to obtain a classification result obtained by classifying according to the second classification feature, and obtain a second classification result if the current training progress does not reach the first preset progress; generating at least one intermediate loss value according to the difference between the intermediate features respectively extracted by the at least one level of sub-structure and the corresponding sub-network; generating a classification loss value of the second classification model according to the second classification result and the sample label of the sample content; the second classification model is trained based on the at least one intermediate loss value and the classification loss value of the second classification model.
In some embodiments, the second classification model training module 708 is further configured to determine, for each level of the sub-structure, a difference between the sub-structure and a feature value at each location in the intermediate features respectively extracted by the corresponding sub-network; and obtaining an intermediate loss value corresponding to the substructure according to the difference value between the characteristic values at each position.
In some embodiments, the second classification model training module 708 is further configured to generate at least one intermediate loss value according to a difference between the at least one level of sub-structure and the intermediate feature respectively extracted by the corresponding sub-network when the current training progress reaches the first preset progress; obtaining a classification result obtained by classifying according to the second classification characteristics, obtaining a second classification result, and generating a classification loss value of a second classification model according to the second classification result and a sample label of sample content; generating a classification feature loss value according to the difference between the first classification feature and the second classification feature; the second classification model is trained based on the at least one intermediate loss value, the classification loss value and the classification feature loss value of the second classification model.
In some embodiments, the second classification model training module 708 is further configured to normalize each feature value in the first classification feature to obtain a first probability distribution; normalizing each characteristic value in the second classification characteristic to obtain a second probability distribution; a classification feature loss value is generated based on a difference between the first probability distribution and the second probability distribution.
In some embodiments, the classification model processing device further includes a progress determination module, for determining a current training progress; the first classification model training module 710 is further configured to train the adaptation network in the first classification model based on the classification result and the sample label of the sample content if the current training progress does not reach the second preset progress; returning to the step of obtaining sample content to iteratively train the first classification model and the second classification model.
In some embodiments, the first classification model training module 710 is further configured to generate at least one intermediate loss value according to a difference between the intermediate features respectively extracted by the at least one level of sub-structure and the corresponding sub-network when the current training progress reaches the second preset progress; generating a classification loss value of the first classification model according to the classification result and the sample label of the sample content; training an adaptation network in the first classification model based on the at least one intermediate loss value and the classification loss value of the first classification model; and returning to the step of acquiring the sample content to iteratively train the first classification model and the second classification model until the training ending condition is met, and stopping iteration.
In some embodiments, the model complexity of the second classification model is less than the model complexity of the first classification model, the pre-training structure in the first classification model is used for respectively extracting features in a plurality of tasks, and the sample content and the sample label are used for training the second classification model for executing the preset classification task.
The respective modules in the above classification model processing apparatus may be implemented in whole or in part by software, hardware, or a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
Based on the same inventive concept, the embodiment of the application also provides a content classification device for realizing the content classification method. The implementation of the solution provided by the device is similar to the implementation described in the above method, so the specific limitation in the embodiments of one or more content classification devices provided below may be referred to the limitation of the content classification method hereinabove, and will not be repeated here.
In some embodiments, as shown in fig. 8, there is provided a content classification apparatus including: a content acquisition module 802 and a classification result obtaining module 804, wherein:
The content acquisition module 802 is configured to acquire content to be classified.
The classification result obtaining module 804 is configured to input the content into the trained second classification model, perform feature extraction through a multi-level substructure in the second classification model to obtain classification features of the content, and perform classification based on the classification features of the content to obtain a classification result of the content; the second classification model is obtained through training by the classification model processing method.
In some embodiments, the sample content in the classification model processing method provided by the application is a sample image; the content acquisition module 802 is further configured to acquire an image to be audited; the classification result obtaining module 804 is further configured to input the image into a trained second classification model, extract image features through a multi-level substructure in the second classification model, extract classification features of the image based on the extracted image features, and classify the image based on the classification features of the image to obtain a classification result of the image; the content classification device further comprises an auditing result determining module which is used for determining the auditing result of the image based on the classification result of the image.
The respective modules in the above-described content classification apparatus may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
In some embodiments, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 9. The computer device includes a processor, a memory, an Input/Output interface (I/O) and a communication interface. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface is connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer equipment is used for storing data related to the classification model processing method and the content classification method provided by the application. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for communicating with an external terminal through a network connection. The computer program, when executed by the processor, implements a classification model processing method or implements a content classification method.
In some embodiments, a computer device is provided, which may be a terminal, and the internal structure of which may be as shown in fig. 10. The computer device includes a processor, a memory, an input/output interface, a communication interface, a display unit, and an input means. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface, the display unit and the input device are connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless mode can be realized through WIFI, a mobile cellular network, NFC (near field communication) or other technologies. The computer program, when executed by the processor, implements a classification model processing method or implements a content classification method. The display unit of the computer device is used for forming a visual picture, and can be a display screen, a projection device or a virtual reality imaging device. The display screen can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, can also be a key, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.
It will be appreciated by those skilled in the art that the structures shown in fig. 9 and 10 are block diagrams of only some of the structures associated with the present application and are not intended to limit the computer device to which the present application may be applied, and that a particular computer device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.
In some embodiments, a computer device is provided, comprising a memory having a computer program stored therein and a processor, which when executing the computer program implements the steps of the classification model processing method described above.
In one embodiment, a computer readable storage medium is provided having a computer program stored thereon, which when executed by a processor, implements the steps of the classification model processing method described above.
In one embodiment, a computer program product is provided comprising a computer program which, when executed by a processor, implements the steps of the classification model processing method described above.
In some embodiments, a computer device is provided comprising a memory having a computer program stored therein and a processor that when executing the computer program performs the steps of the content classification method described above.
In one embodiment, a computer readable storage medium is provided having a computer program stored thereon, which when executed by a processor, implements the steps of the content classification method described above.
In one embodiment, a computer program product is provided comprising a computer program which, when executed by a processor, implements the steps of the content classification method described above.
It should be noted that, the user information (including, but not limited to, user equipment information, user personal information, etc.) and the data (including, but not limited to, data for analysis, stored data, presented data, etc.) referred to in the present application are information and data authorized by the user or sufficiently authorized by each party, and the collection, use, and processing of the related data are required to meet the related regulations.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the various embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the various embodiments provided herein may include at least one of relational databases and non-relational databases. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic units, quantum computing-based data processing logic units, etc., without being limited thereto.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The above examples only represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the present application. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application shall be subject to the appended claims.

Claims (17)

1. A classification model processing method, the method comprising:
acquiring sample content, inputting the sample content into a first classification model, and extracting features through a multi-level sub-network in the first classification model to generate first classification features; each level of the sub-network comprises a pre-training structure, and at least one level of sub-network in the multi-level sub-network further comprises an adaptive network to be trained;
Under the condition of shielding the adaptation network, inputting the sample content into the first classification model, and extracting features through the multi-stage sub-network shielded by the adaptation network to obtain intermediate features extracted by each stage of sub-network;
inputting the sample content into a second classification model, performing feature extraction through a multi-level sub-structure corresponding to the multi-level sub-network in the second classification model to generate a second classification feature, and obtaining an intermediate feature extracted by each level of sub-structure;
training the second classification model according to the difference between the first classification characteristic and the second classification characteristic and the difference between at least one level of sub-structure and the intermediate characteristic respectively extracted by the corresponding sub-network;
and acquiring a classification result obtained by classifying according to the first classification characteristic, and training an adaptation network in the first classification model based on the classification result and the sample label of the sample content.
2. The method according to claim 1, wherein the method further comprises:
determining the current training progress;
training the second classification model according to the difference between the first classification feature and the second classification feature and the difference between at least one level of sub-structure and the intermediate feature respectively extracted by the corresponding sub-network, including:
And under the condition that the current training progress reaches the first preset progress, training the second classification model according to the difference between the first classification characteristic and the second classification characteristic and the difference between at least one level of sub-structure and the intermediate characteristic respectively extracted by the corresponding sub-network.
3. The method according to claim 2, wherein the method further comprises:
training the second classification model based on differences between at least one level of substructures and intermediate features respectively extracted by corresponding sub-networks under the condition that the current training progress does not reach the first preset progress;
and returning to the step of acquiring sample content to iteratively train the first classification model and the second classification model.
4. A method according to claim 3, wherein the classification result obtained by classifying according to the first classification feature is a first classification result;
under the condition that the current training progress does not reach the first preset progress, training the second classification model based on the difference between the intermediate features respectively extracted by the at least one level of sub-structure and the corresponding sub-network, including:
under the condition that the current training progress does not reach the first preset progress, a classification result obtained by classifying according to the second classification characteristics is obtained, and a second classification result is obtained;
Generating at least one intermediate loss value according to the difference between the intermediate features respectively extracted by the at least one level of sub-structure and the corresponding sub-network;
generating a classification loss value of a second classification model according to the second classification result and the sample label of the sample content;
training the second classification model based on the at least one intermediate loss value and a classification loss value of the second classification model.
5. The method of claim 4, wherein generating at least one intermediate loss value based on differences between the at least one level one sub-structure and intermediate features respectively extracted by the corresponding sub-network comprises:
for each level of the substructure, determining a difference value between the substructure and a feature value at each position in the intermediate features respectively extracted by the corresponding sub-network;
and obtaining the intermediate loss value corresponding to the substructure according to the difference value between the characteristic values at each position.
6. The method according to claim 2, wherein the training the second classification model according to the difference between the first classification feature and the second classification feature and the difference between at least one level of substructures and intermediate features respectively extracted by corresponding sub-networks in the case that the current training progress reaches the first preset progress, comprises:
Generating at least one intermediate loss value according to the difference between the intermediate features respectively extracted by the at least one level of sub-structure and the corresponding sub-network under the condition that the current training progress reaches the first preset progress;
obtaining a classification result obtained by classifying according to the second classification characteristic, obtaining a second classification result, and generating a classification loss value of a second classification model according to the second classification result and a sample label of the sample content;
generating a classification feature loss value according to the difference between the first classification feature and the second classification feature;
training the second classification model based on the at least one intermediate loss value, the classification loss value of the second classification model, and the classification feature loss value.
7. The method of claim 6, wherein generating a classification feature loss value based on a difference between the first classification feature and the second classification feature comprises:
normalizing each feature value in the first classification feature to obtain a first probability distribution;
normalizing each characteristic value in the second classification characteristic to obtain a second probability distribution;
A classification feature loss value is generated based on a difference between the first probability distribution and the second probability distribution.
8. The method according to any one of claims 1 to 7, further comprising:
determining the current training progress;
the training the adaptation network in the first classification model based on the classification result and the sample label of the sample content comprises:
under the condition that the current training progress does not reach a second preset progress, training an adaptation network in the first classification model based on the classification result and the sample label of the sample content;
and returning to the step of acquiring sample content to iteratively train the first classification model and the second classification model.
9. The method of claim 8, wherein the method further comprises:
generating at least one intermediate loss value according to the difference between the intermediate features respectively extracted by the at least one level of sub-structure and the corresponding sub-network under the condition that the current training progress reaches a second preset progress;
generating a classification loss value of a first classification model according to the classification result and the sample label of the sample content;
Training an adaptation network in the first classification model based on the at least one intermediate loss value and a classification loss value of the first classification model;
and returning to the step of acquiring sample content to iteratively train the first classification model and the second classification model until the training ending condition is met, and stopping iteration.
10. The method of claim 1, wherein the model complexity of the second classification model is less than the model complexity of the first classification model, a pre-training structure in the first classification model is used for feature extraction in a plurality of tasks, respectively, the sample content and the sample label are used for training the second classification model for performing a preset classification task.
11. A method of content classification, the method comprising:
acquiring content to be classified;
inputting the content into a trained second classification model, extracting features through a multi-level substructure in the second classification model to obtain classification features of the content, and classifying the content based on the classification features of the content to obtain a classification result of the content;
wherein the second classification model is trained by the method of any one of claims 1 to 10.
12. The method according to claim 11, wherein the sample content in the method according to any one of claims 1 to 10 is a sample image;
the obtaining the content to be classified comprises the following steps:
acquiring an image to be audited;
inputting the content into a trained second classification model, extracting features through a multi-level substructure in the second classification model to obtain classification features of the content, classifying the content based on the classification features of the content to obtain a classification result of the content, wherein the method comprises the following steps:
inputting the image into a trained second classification model, extracting image features through a multi-level substructure in the second classification model, extracting the classification features of the image based on the extracted image features, and classifying the image based on the classification features of the image to obtain a classification result of the image;
the method further comprises the steps of:
and determining the auditing result of the image based on the classification result of the image.
13. A classification model processing apparatus, the apparatus comprising:
the first classification characteristic generation module is used for acquiring sample content, inputting the sample content into a first classification model, and generating a first classification characteristic by carrying out characteristic extraction through a multi-level sub-network in the first classification model; each level of the sub-network comprises a pre-training structure, and at least one level of sub-network in the multi-level sub-network further comprises an adaptive network to be trained;
The intermediate feature extraction module is used for inputting the sample content into the first classification model under the condition of shielding the adaptation network, and extracting features through the multi-stage sub-network shielding the adaptation network to obtain intermediate features extracted by each stage of sub-network;
the second classification characteristic generation module is used for inputting the sample content into a second classification model, carrying out characteristic extraction on a multi-level sub-structure corresponding to the multi-level sub-network in the second classification model to generate a second classification characteristic, and obtaining an intermediate characteristic extracted by each level of sub-structure;
the second classification model training module is used for training the second classification model according to the difference between the first classification characteristic and the second classification characteristic and the difference between the intermediate characteristics respectively extracted by at least one level of sub-structure and the corresponding sub-network;
the first classification model training module is used for acquiring a classification result obtained by classifying according to the first classification characteristic and training an adaptation network in the first classification model based on the classification result and the sample label of the sample content.
14. A content classification apparatus, the apparatus comprising:
The content acquisition module is used for acquiring the content to be classified;
the classification result obtaining module is used for inputting the content into a trained second classification model, extracting the characteristics through a multi-level substructure in the second classification model to obtain the classification characteristics of the content, and classifying the content based on the classification characteristics of the content to obtain the classification result of the content; wherein the second classification model is trained by the method of any one of claims 1 to 10.
15. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any one of claims 1 to 12 when the computer program is executed.
16. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 12.
17. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any one of claims 1 to 12.
CN202311375744.7A 2023-10-20 2023-10-20 Classification model processing method, content classification device and computer equipment Pending CN117540291A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311375744.7A CN117540291A (en) 2023-10-20 2023-10-20 Classification model processing method, content classification device and computer equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311375744.7A CN117540291A (en) 2023-10-20 2023-10-20 Classification model processing method, content classification device and computer equipment

Publications (1)

Publication Number Publication Date
CN117540291A true CN117540291A (en) 2024-02-09

Family

ID=89787053

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311375744.7A Pending CN117540291A (en) 2023-10-20 2023-10-20 Classification model processing method, content classification device and computer equipment

Country Status (1)

Country Link
CN (1) CN117540291A (en)

Similar Documents

Publication Publication Date Title
CN112446476B (en) Neural network model compression method, device, storage medium and chip
Chen et al. Saliency detection via the improved hierarchical principal component analysis method
Gupta et al. ALMNet: Adjacent layer driven multiscale features for salient object detection
Zheng et al. Fast ship detection based on lightweight YOLOv5 network
US20210216874A1 (en) Radioactive data generation
Hui et al. Large-scale urban cellular traffic generation via knowledge-enhanced GANs with multi-periodic patterns
Klemmer et al. Spate-gan: Improved generative modeling of dynamic spatio-temporal patterns with an autoregressive embedding loss
CN117291895A (en) Image detection method, device, equipment and storage medium
Sharma et al. Real-time attention-based embedded LSTM for dynamic sign language recognition on edge devices
Yuan et al. An effective graph embedded YOLOv5 model for forest fire detection
Xiao et al. CTNet: hybrid architecture based on CNN and transformer for image inpainting detection
Nguyen et al. Meta-learning and personalization layer in federated learning
Zhu et al. Vehicle type recognition algorithm based on improved network in network
Shaikh Generative Adversarial Networks (GAN) Insights for Cyber Security Applications
Wang et al. Bilateral attention network for semantic segmentation
Han et al. CFNet: head detection network based on multi‐layer feature fusion and attention mechanism
Liu [Retracted] Sports Deep Learning Method Based on Cognitive Human Behavior Recognition
Ye Emotion recognition of online education learners by convolutional neural networks
Qiao et al. Two-Stream Convolutional Neural Network for Video Action Recognition
CN117668246A (en) Multi-mode-based time knowledge graph reasoning method and device
Zheng et al. Little‐YOLOv4: A Lightweight Pedestrian Detection Network Based on YOLOv4 and GhostNet
Wu et al. [Retracted] The Construction of Online Course Learning Model of Ideological and Political Education for College Students from the Perspective of Machine Learning
Wen et al. Video abnormal behaviour detection based on pseudo‐3D encoder and multi‐cascade memory mechanism
CN117540291A (en) Classification model processing method, content classification device and computer equipment
De Bortoli et al. A fast face recognition CNN obtained by distillation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication