CN114416989A

CN114416989A - Text classification model optimization method and device

Info

Publication number: CN114416989A
Application number: CN202210051488.5A
Authority: CN
Inventors: 曹磊; 蒋宁; 王洪斌; 吴海英; 李长林
Original assignee: Mashang Consumer Finance Co Ltd
Current assignee: Mashang Consumer Finance Co Ltd
Priority date: 2022-01-17
Filing date: 2022-01-17
Publication date: 2022-04-29
Anticipated expiration: 2042-01-17
Also published as: CN114416989B

Abstract

The invention discloses a text classification model optimization method and device, which are used for solving the problem of low multi-task identification efficiency of a model. This scheme includes: acquiring a training sample of a text classification model; inputting a plurality of sample sentences and corresponding sample labels into a feature extraction layer of the text classification model to obtain feature vectors respectively corresponding to the sample sentences; inputting the feature vectors of the sample sentences into a plurality of classifiers of the text classification model to obtain classification labels corresponding to the feature vectors of the sample sentences, wherein the text classification model comprises a sensitive word classifier, an emotion classifier and a semantic classifier; and optimizing a text classification model according to the classification label corresponding to the sample statement and the sample label. According to the scheme, a feature extraction layer is shared to obtain the feature vectors, and classification results are output through a plurality of different classifiers, so that text classification can be executed based on multiple tasks, the overall capacity of the model is reduced, and the recognition efficiency of the multi-task model is effectively improved.

Description

Text classification model optimization method and device

Technical Field

The invention relates to the field of deep learning, in particular to a text classification model optimization method and device.

Background

In the field of model training for deep learning, training samples are often needed to train a model, so that the trained model can perform classification and recognition on contents to be recognized. In practical application, the recognition requirements are various, and for various recognition tasks, classification and recognition of contents to be recognized are often required to be performed from different aspects by using a plurality of models. However, although recognition results can be obtained by performing recognition using each of the plurality of models, the total number of the plurality of models is enormous, and therefore, the time required for performing recognition classification is long, and recognition efficiency is low.

In addition, under the condition of large sample deviation and sparse sample data, the problem of poor model identification accuracy is often caused due to insufficient training samples.

How to improve the multi-task recognition efficiency of the model is a technical problem to be solved by the application.

Disclosure of Invention

The embodiment of the application aims to provide a text classification model optimization method and device, and aims to solve the problem that the multi-task recognition efficiency of a model is low.

In a first aspect, a text classification model optimization method is provided, including:

obtaining a training sample of a text classification model, wherein the training sample comprises a plurality of sample sentences and sample labels corresponding to the sample sentences, the sample labels in the training sample comprise sensitive word labels, emotion labels and semantic labels, and any sample sentence corresponds to at least one sample label;

inputting the plurality of sample sentences and the corresponding sample labels into a feature extraction layer of the text classification model for feature extraction to obtain feature vectors respectively corresponding to the sample sentences, wherein the feature vectors represent feature values of the corresponding sample sentences in at least one feature dimension of sensitive word dimensions, emotion dimensions and semantic dimensions;

inputting the feature vectors of the sample sentences into a plurality of classifiers of the text classification model for processing to obtain classification labels corresponding to the feature vectors of the sample sentences, wherein the text classification model comprises a sensitive word classifier, an emotion classifier and a semantic classifier;

and optimizing the text classification model according to the classification label corresponding to the sample statement and the sample label.

In a second aspect, an apparatus for optimizing a text classification model is provided, including:

the system comprises an acquisition module, a classification module and a classification module, wherein the acquisition module is used for acquiring a training sample of a text classification model, the training sample comprises a plurality of sample sentences and sample labels corresponding to the sample sentences, the sample labels in the training sample comprise sensitive word labels, emotion labels and semantic labels, and any sample sentence corresponds to at least one sample label;

the characteristic extraction module is used for performing a characteristic extraction layer on the input sample sentences and the corresponding sample labels to obtain characteristic vectors respectively corresponding to the sample sentences, and the characteristic vectors represent characteristic values of the corresponding sample sentences in at least one characteristic dimension of sensitive word dimensions, emotion dimensions and semantic dimensions;

the classification module is used for processing the feature vectors of the sample sentences to obtain classification labels corresponding to the feature vectors of the sample sentences, and comprises a sensitive word classifier, an emotion classifier and a semantic classifier;

and the optimization module is used for optimizing the text classification model by using the classification label corresponding to the sample statement and the sample label.

In a third aspect, an electronic device is provided, the electronic device comprising a processor, a memory and a computer program stored on the memory and executable on the processor, the computer program, when executed by the processor, implementing the steps of the method according to the first aspect.

In a fourth aspect, a computer-readable storage medium is provided, on which a computer program is stored, which computer program, when being executed by a processor, realizes the steps of the method as in the first aspect.

In the embodiment of the application, a training sample of a text classification model is obtained; inputting a plurality of sample sentences and corresponding sample labels into a feature extraction layer of the text classification model for feature extraction to obtain feature vectors respectively corresponding to the sample sentences; inputting the feature vectors of the sample sentences into a plurality of classifiers of the text classification model for processing to obtain classification labels corresponding to the feature vectors of the sample sentences, wherein the text classification model comprises a sensitive word classifier, an emotion classifier and a semantic classifier; and optimizing the text classification model according to the classification label corresponding to the sample statement and the sample label. The text classification model obtained by optimizing the scheme provided by the embodiment of the application can realize multi-task recognition and has the advantages of small capacity and high recognition efficiency. In an application scene needing to execute multiple task recognition, a text to be recognized can be input into one text classification model of the scheme to obtain a multi-task output result, the multi-task recognition efficiency of the model is effectively improved, and the text does not need to be input into different models for prediction for multiple times. In addition, the scheme of the embodiment of the invention shares one feature extraction layer to obtain the feature vectors, and then the classification results are output by a plurality of different classifiers, so that the text classification can be executed based on multiple tasks, and the overall capacity of the model is reduced. Because this scheme relates to the multiple classifier of multiple task, compare single classifier model, noise can be increased between the different kinds of task in this scheme training process, promotes the generalization ability of model, optimizes the classification ability of model.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention and not to limit the invention. In the drawings:

fig. 1 is one of the flow diagrams of a text classification model optimization method according to an embodiment of the present invention.

Fig. 2 is a second flowchart illustrating a text classification model optimization method according to an embodiment of the present invention.

Fig. 3 is a third flowchart illustrating a text classification model optimization method according to an embodiment of the present invention.

Fig. 4 is a fourth flowchart illustrating a text classification model optimization method according to an embodiment of the present invention.

Fig. 5a is a fifth flowchart illustrating a text classification model optimization method according to an embodiment of the present invention.

Fig. 5b is a sixth flowchart illustrating a text classification model optimization method according to an embodiment of the present invention.

Fig. 6 is a seventh schematic flowchart of a text classification model optimization method according to an embodiment of the present invention.

Fig. 7 is an eighth flowchart illustrating a text classification model optimization method according to an embodiment of the present invention.

Fig. 8a is one of the structural diagrams of a text classification model according to an embodiment of the present invention.

Fig. 8b is a second schematic structural diagram of a text classification model according to an embodiment of the present invention.

Fig. 8c is a third schematic structural diagram of a text classification model according to an embodiment of the present invention.

Fig. 9 is a schematic structural diagram of a text classification model optimization apparatus according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention. The reference numbers in the present application are only used for distinguishing the steps in the scheme and are not used for limiting the execution sequence of the steps, and the specific execution sequence is described in the specification.

In the field of deep learning, in order to meet different recognition requirements, a large number of sample training models are often required to be adopted for recognizing targets. In practical application, the recognition task is often various and complex, and a single model is difficult to meet the actual requirement. For example, if it is required to recognize whether a text contains a sensitive word, a sensitive word recognition model needs to be trained to complete a sensitive word recognition task. If the emotion expressed by the text is required to be recognized, the emotion recognition model is required to be trained to complete the emotion recognition task. If the semantics of the text expression need to be recognized, the semantic recognition model needs to be trained to complete the semantic recognition task. It follows that if multi-tasking is to be achieved, a plurality of different recognition models need to be used. In order to obtain a plurality of recognition models, a large amount of sample data is required to be used for iterative training, and the samples of different models are often marked by spending labor and time. Therefore, to realize multitask recognition, costs such as time and labor are high.

In a multitask scene, a large amount of manpower and time cost are needed to be spent in advance to train the models, and the trained models are independent from each other, so that the recognition result does not have relevance, and the situation of inaccurate recognition is often caused. For example, suppose the first text is "is the merchant complained or not? ", the second text is" I want to complain you! ". Taking the semantic recognition task as an example, since both sentences contain the keyword "complaint", the semantic recognition results of both the texts may be "complaint intention". However, by combining the semantics expressed by the two text segments, the first text segment is consulted with the content related to the complaint, and does not express negative emotion, so that the semantics expressed by the first text segment is not intended to be complained. And the second text expresses negative emotion and expresses the intention of the complaint. As can be seen from this example, the semantic recognition result and the emotion recognition result are independent from each other, so it is difficult to determine the intention of the real expression, and the real content of a text expression can be determined by combining the semantic and emotion. Therefore, the recognition result of a single recognition task is not accurate.

In order to solve the problems in the prior art, embodiments of the present application provide a text classification model optimization method, which is intended to solve the problems in the prior art that the accuracy of multi-task recognition is low, the number of samples required for training a plurality of models is large, and the number of models obtained through training is large.

The method provided by the embodiment of the application is shown in fig. 1 and comprises the following steps:

s11: the method comprises the steps of obtaining a training sample of a text classification model, wherein the training sample comprises a plurality of sample sentences and sample labels corresponding to the sample sentences, the sample labels in the training sample comprise sensitive word labels, emotion labels and semantic labels, and any sample sentence corresponds to at least one sample label.

The text classification model may be a pre-trained model, and is specifically configured to perform feature extraction and classification on an input text. The training sample is a sample used for training a text classification model, and the training sample specifically includes a plurality of sample sentences, and the sample sentences are used for inputting the text classification model for training. The sample label corresponding to the sample sentence may be a label manually labeled in advance to calibrate the classification of the corresponding sample sentence.

In the scheme, each sample statement corresponds to at least one label. The sample tags in this embodiment include sensitive word tags, emotion tags, and semantic tags. The sensitive word tag can represent whether a corresponding sample sentence contains a preset sensitive word or not, or whether a corresponding sample sentence contains a word or a sentence close to the preset sensitive word or not. The preset sensitive words can be words and sentences which contain malicious or relate to user information and have security risks. The emotion tags can characterize the emotion expressed by the corresponding sample sentences, for example, whether the expressed emotion is positive, negative or neutral. While semantic tags may characterize the primary intent expressed by the corresponding sample statement.

Any sample sentence in the training sample corresponds to at least one sample label. For example, in the case where a sample statement corresponds to a sample tag, the sample statement is, for example, "too good! "the emotion label corresponding to the sample sentence is" forward ", and the label corresponding to the sample sentence characterizes that the emotion expressed by the sample sentence is" forward ".

As another example, where a sample statement corresponds to two sample tags, the sample statement may be, for example, "I must complain you! "its corresponding emotion label is" negative ", and its corresponding semantic label is" complaint intent ". The label corresponding to the sample sentence represents that the sample sentence expresses negative emotion and has the intention of initiating complaints.

S12: and inputting the plurality of sample sentences and the corresponding sample labels into a feature extraction layer of the text classification model for feature extraction to obtain feature vectors respectively corresponding to the sample sentences, wherein the feature vectors represent feature values of the corresponding sample sentences in at least one feature dimension of the sensitive word dimension, the emotion dimension and the semantic dimension.

The feature extraction layer is used to perform feature extraction, and the feature extraction in this embodiment refers to converting an input sample into a digital feature that can be used for machine learning. Specifically, the input sample in this example is a sample sentence and a corresponding sample label, and the feature extraction process is to convert the input sample sentence into a digital feature that can be used for machine learning, where the digital feature can be represented by a form of a feature value. The sample labels are labels corresponding to the sample sentences and used for representing the belonged classification of the corresponding sample sentences, and in the step, the sample sentences and the corresponding sample labels are input into the model to train the model.

In general, the digital features extracted by the feature extraction layer may have multiple dimensions, and each feature dimension corresponds to a feature value, and the feature value expresses the feature of the sample sentence in the feature dimension. And the characteristic values corresponding to the multiple dimensions can be represented in the form of a characteristic vector, wherein the dimensions of the characteristic vector are the extracted multiple dimensions, and the values of the vector in the dimensions are the characteristic values in the characteristic dimensions.

In this step, feature extraction is performed on the input sample sentence by the feature extraction layer, and the extracted feature vector is a digital feature that can be used for machine learning, and represents a feature value of the input sample sentence in each feature dimension.

Specifically, the feature vector in the present scheme represents a feature value of a corresponding sample sentence in at least one feature dimension of a sensitive word dimension, an emotion dimension, and a semantic dimension. The characteristic value of the sensitive word dimension is used for expressing whether the sample sentence contains a preset sensitive word or not, or whether the sample sentence contains a word or a sentence which is close to the preset sensitive word or not. The feature words in the emotion dimension are used for expressing the emotion expressed by the sample sentence, such as whether the sentence expresses a positive emotion or a negative emotion. The feature value of the semantic latitude is used to express the intention expressed in the sample sentence.

In practical application, in this step, a pre-training model can be used to perform feature extraction on input sample sentences, the pre-training model can be specifically a BERT model, a Transformer structure is mainly adopted, and the pre-training BERT model provides a powerful context-related sentence representation form and can fully consider context information. Meanwhile, a plurality of tasks are put together, and each task can learn the additional information brought by other tasks. It should be understood that, this step may also select other types of models to implement feature extraction according to features of the input sample or an actual application scenario, and this scheme is not limited in this respect.

S13: and inputting the feature vectors of the sample sentences into a plurality of classifiers of the text classification model for processing to obtain classification labels corresponding to the feature vectors of the sample sentences, wherein the text classification model comprises a sensitive word classifier, an emotion classifier and a semantic classifier, and the classification label of each sample sentence comprises at least one of a sensitive word label, an emotion label and a semantic label.

Based on the feature vectors extracted from the sample sentences in the steps, multi-task output is realized through various classifiers in the step. In practical application, the feature vectors of the sample statements can be accessed to different downstream classifiers for processing. The classifier is used for classifying based on the input feature vector, and can also be understood as classifying according to the feature value of each feature dimension expressed by the feature vector. The classification results of the classifier are presented in the form of output classification labels that express the classification that the sample probably belongs to.

For example, the feature value of the feature vector of the sample sentence in the sensitive word dimension is 0.96, it is assumed that the feature value 1 indicates that the sample sentence includes the preset sensitive word, and the feature value 0 indicates that the sample sentence does not include the preset sensitive word. Then, the feature value of 0.96 can be understood as the sample sentence including the words close to the preset sensitive words. And after the characteristic vector is input into a sensitive word classifier, the sensitive word classifier judges classification based on the characteristic value of 0.96, and outputs a result that the sample statement contains the sensitive word. In practical applications, a classifier often determines a classification of a sample according to feature values of a plurality of feature dimensions of a feature vector, and may output a plurality of classification results, each of which expresses a probability that the sample belongs to the classification by a corresponding confidence.

S14: and optimizing the text classification model according to the classification label corresponding to the sample statement and the sample label.

For the sample statement, the classification label predicted by the model for the sample statement can be obtained through the steps, and the classification label expresses the classification result of the sample attribution predicted by the model. And for the sample statement, a sample label is calibrated in advance, and the sample label expresses a classification result of the true attribution of the sample statement. The difference between the classification result predicted by the model and the real classification result can be obtained based on the classification label and the sample label, and the difference expresses the inaccuracy of model prediction. Specifically, when the sample statement corresponds to a plurality of classification tags, the plurality of sample tags corresponding to the plurality of classification tags are compared, the classification tags and the sample tags are compared with each other in the same type, and the text classification model is optimized according to the comparison result between the tags. In the step, the text classification model is optimized based on the classification labels and the sample labels, namely, the model parameters are optimized according to the classification result predicted by the model and the real classification result, so that the model prediction result is closer to the real classification result, the purpose of optimizing the text classification model is achieved, and the prediction accuracy of the optimized text classification model is improved.

The text classification model obtained by optimizing the scheme provided by the embodiment of the application can realize multi-task identification, and in an application scene needing to execute multiple task identification, a text to be identified can be input into one text classification model of the scheme to obtain a multi-task output result, so that the multi-task identification efficiency of the model is effectively improved, and the text does not need to be input into different models for prediction for multiple times. In addition, the scheme of the embodiment of the invention can share one feature extraction layer to extract the feature vectors, and then output the classification result through a plurality of different classifiers, thereby being capable of executing text classification based on multiple tasks and reducing the whole capacity of the model. Because the scheme relates to multiple classifiers of multiple tasks, compared with a single classifier model, semantic information can be shared among different types of tasks in the training process, the generalization capability of the model is improved, and the classification capability of the model is optimized.

Based on the method provided by the foregoing embodiment, optionally, as shown in fig. 2, the step S14 includes:

s21: and determining a loss function of the text classification model according to the classification label and the sample label corresponding to each sample statement.

In the step of optimizing the text classification model, the present embodiment first determines a loss function according to the classification label and the sample label corresponding to the same sample sentence. The loss function expresses the deviation between the predicted classification result and the true classification result of the text classification model.

Specifically, the loss function may be determined in various ways, and if divided in the form of the loss function, the loss function may be, for example, an absolute value loss function, a log-log loss function, a square loss function, an exponential loss function, or the like. In practical application, a suitable loss function can be selected according to the characteristics of the sample or the model to express the difference between the predicted result and the actual result. For example, a cross entropy loss function may be used in this example. According to the scheme, the multi-task classification is involved, so that the corresponding loss functions can be determined according to the classification result of each classifier, and then the loss functions of the plurality of classifiers corresponding to various tasks are combined into the loss function of the text classification model in a weighted mode. Optionally, the weighting coefficient may be preset according to the number of samples, the application requirement of the model, and other factors.

S22: and optimizing the text classification model according to the loss function.

In this step, the text classification model is optimally trained based on the loss function to optimize parameters of the text classification model, so that the predicted classification result of the text classification model is closer to the real classification result. Specifically, the loss function is used as an objective function for model training, and if the loss function is still unsatisfactory after single training, the model can be subjected to iterative training to iteratively adjust model parameters, so that the predicted classification result of the model is continuously close to the real classification result along with the increase of the iteration times.

By the scheme provided by the embodiment of the application, the text classification model can be optimized based on the classification result predicted by the model and the real classification result, so that the optimized text classification model has more accurate classification performance.

Based on the method provided by the foregoing embodiment, optionally, as shown in fig. 3, the step S21 includes:

s31: a loss function is determined for each classifier.

In the embodiment of the application, the text classification model comprises a sensitive word classifier, an emotion classifier and a semantic classifier. In this step, a sensitive word loss function loss1, an emotion loss function loss2 and a semantic loss function loss3 are determined for the three classifiers respectively.

S32: and determining the weighted sum of the loss functions corresponding to the classifiers as the loss function of the text classification model according to a preset weight coefficient.

The preset weight coefficient may be preset according to a requirement, and the weight coefficient corresponds to the classifier. For example, the weight coefficient w1 is set for the sensitive word classifier, the weight coefficient w2 is set for the emotion classifier, and the weight coefficient w3 is set for the semantic classifier.

The loss function loss _ sum w1 loss1+ w2 loss2+ w3 loss3 of the text classification model can be determined by weighting the loss functions.

According to the scheme provided by the embodiment of the application, the multi-task text classification model can be optimized, wherein different task classifiers can be optimized by setting different weights, and the requirement of classification tasks in practical application can be met.

Based on the method provided by the foregoing embodiment, optionally, as shown in fig. 4, the step S13 includes:

s41: and inputting the feature vector of the sample sentence into the sensitive word classifier for processing to obtain a sensitive word label output by the sensitive word classifier according to a sensitive word hidden vector, wherein the sensitive word hidden vector is obtained by decoding a hidden state output by the feature extraction layer by a sensitive word decoder.

S42: and inputting the feature vector of the sample statement into the emotion classifier for processing to obtain an emotion label output by the emotion classifier according to an emotion hidden vector, wherein the emotion hidden vector is obtained by decoding a hidden state output by the feature extraction layer by an emotion decoder.

S43: and inputting the feature vector of the sample sentence into the semantic classifier for processing to obtain a semantic label output by the semantic classifier according to a semantic hidden vector, wherein the semantic hidden vector is obtained by decoding a hidden state output by the feature extraction layer by a semantic decoder.

Optionally, the execution sequence of the steps S41 to S43 may be exchanged, or may be executed synchronously.

In the embodiment of the application, the training samples are input into the feature extraction layer for feature extraction, and the feature extraction layer can output feature vectors of all sample sentences. In this example, a standard multi-task pre-training model structure can be adopted according to actual requirements, and the model such as Bert, Roberta with better reproduction effect, or XLNet with better effect on ultra-long text can be used. The hidden states output by the pre-trained model can be followed by decoders of different tasks for further output to the respective classifiers. Optionally, decoders of different tasks may be connected to different positions output by the feature extraction layer, so that the output hidden vectors meet the requirements of various classifiers. The decoder may include the following:

and (3) keyword recognition and decoding: based on the preset keyword recognition position, the highest states of the last layer corresponding to each token are connected with a classifier, and in the scheme, softmax can be used for sequence labeling tasks.

And (3) emotion recognition decoding: based on the preset emotion recognition position, the hierarchy states at the last layer of the feature extraction layer are followed by a classifier to perform a text classification task, and softmax can be used for text classification in the scheme.

Semantic point decoding for each dialog turn: based on the preset semantic recognition position, the hierarchy states at the last layer of the feature extraction layer are followed by a classifier, and softmax can be used for text classification in the scheme.

According to the scheme provided by the embodiment of the application, the decoding is performed on the feature vectors output by the feature extraction layer through the decoders of different types, the hidden vectors required by the classifiers of different tasks can be obtained, the classification effect of each classifier is optimized, and the recognition efficiency of the text classification model is improved.

Based on the method provided by the foregoing embodiment, optionally, as shown in fig. 5a, the foregoing step S11 includes:

s51: and acquiring the dialogue record of the text classification model.

The conversation record obtained in the scheme can be a record file for conversation among multiple persons. Dialogue recordings contain more information than text. The dialogue recording not only contains dialogue texts, but also can distinguish speaking roles according to tone colors. Furthermore, the emotion and the speaking habit of the speaker can be identified according to the conversation record to assist in calibrating the labels of the sentences, and the labels are used for optimizing the text classification model in the subsequent steps.

The conversation recording may be an integrated conversation recording that includes multiple sentences of conversation between the conversation characters. Alternatively, the conversation record may be a multi-segment conversation record sent by different account numbers on the social software.

S52: and recognizing the conversation sound record into a plurality of conversation text sentences based on time sequence and conversation role identifications corresponding to the conversation text sentences.

In this step, asr (automatic Speech recognition) technology may be applied for Speech recognition. The method is used for converting the dialogue recording from an audio form to a character form, wherein a plurality of ordered dialogue text sentences are obtained according to the sequence of the characters. And distinguishing conversation roles according to the tone in the conversation record, the account role for sending the conversation record, the speaking speed or other characteristics so as to determine the conversation role identification corresponding to each conversation text statement. The dialog character identification is used for distinguishing different users of the dialog, and can be used for correlating context in the subsequent step and more accurately determining the emotion expressed by the sentence.

S53: and combining the plurality of conversation text sentences based on the time sequence into a plurality of ordered sample sentences according to the corresponding conversation role identifications, wherein the conversation sentences in the same sample sentence correspond to the same conversation role identification.

Typically, the recognition result of ASR is a discrete sentence, and in this step, sentences spoken continuously by the same dialog character are combined into a sample sentence based on the dialog character identification. A sample sentence corresponds to a unique character identifier and the dialog sentences in the sample sentence are spoken by the same character. Through the combination of the steps, sentences continuously spoken by one role can be combined together to obtain sample sentences, the combined sample sentences have certain continuity, the emotion and the expressed semantics of the conversation role can be better expressed, and the condition that the sentences are split due to the speech pause of the conversation role is avoided.

S54: and generating a training sample containing a plurality of sample sentences and corresponding role identifications.

In this step, the combined sample sentences and corresponding role identifiers are corresponded to generate training samples, and the training samples include role identifiers corresponding to the sample sentences and at least one sample label. The emotion tags in the sample tags can be determined from the emotion recognized in the ASR recognition process. In addition, a corresponding sample label can be added to the sample sentence in an artificial labeling mode.

To further explain the scheme with reference to fig. 5b, first, the original audio of the text classification model is obtained, and the original audio is the dialogue recording. The original audio is then converted to text form by speech recognition techniques. Subsequently, data preprocessing including the steps of this embodiment S52-S54 is performed based on the converted text. Optionally, optimization such as cleaning and gap filling can be performed on the text obtained by conversion, so that the quality of the training sample is improved. And then inputting the training samples into a feature extraction layer for text feature extraction. And inputting the extracted features into a classifier of the model to perform model prediction so as to obtain a classification label.

By the scheme provided by the embodiment of the application, the dialogue recording in the audio form can be converted into the sample sentence in the text form, and the dialogue role identification corresponding to the sample sentence is marked. The conversation character identification can assist in determining the emotion and the intention expressed by the sample sentence in the subsequent steps, and the accuracy of text classification is improved. In addition, the ordered sample sentences generated by the example have certain continuity, and the emotion and the intention of the text expression can be better determined based on the continuity in the model training process.

Based on the method provided by the foregoing embodiment, optionally, as shown in fig. 6, the step S13 includes:

s61: inputting a target feature vector of a sample sentence containing a target role identifier into the sensitive word classifier for processing, and obtaining a sensitive word label output by the sensitive word classifier according to a feature value of a sensitive word dimension of the target feature vector, wherein the sensitive word label represents whether the sample sentence corresponding to the target feature vector comprises a sensitive word.

Sensitive words in the example can refer to illegal words and sentences, and quality inspection of target role conversation contents can be realized in practical application. For example, in an application scenario in which an agent has a conversation with a client, whether statements of the agent are legal or not and whether irregular expressions exist or not in a conversation process can be detected through the scheme provided by the example, and the method can be used for optimizing the service quality of the agent.

For example, the agent should try to apply written usage when providing services to customers, reducing spoken or localized expressions. For example, the agent should avoid "why it is not" difficult to follow "or" how to follow "or other counterquestions or blame utterances during the conversation. The listed words and sentences can be preset as sensitive words, and the characteristic extraction layer can determine the characteristic value of the dimension of the sensitive words based on the preset sensitive words so as to express whether the sample sentences include words and sentences close to the preset sensitive words. The classification labels are then determined by the sensitive word classifier from the feature vectors.

In the application scenario of this example, the sensitive word classification may be performed only for the agent sentences, and not for the customer sentences. Specifically, the feature vectors of the sample sentences corresponding to the seat role identifiers are input into a sensitive word classifier of the text classification model, and the obtained classification labels can express whether the sample sentences of the seats include preset sensitive words or not.

Through the scheme provided by the embodiment, sensitive word detection and classification can be performed on part of conversation roles in a conversation scene, the roles to be detected are selected to be classified based on actual requirements, the classification efficiency can be effectively improved, and unnecessary calculation power consumption is reduced.

S62: inputting a target feature vector of a sample statement containing a target role identification into the emotion classifier for processing, and obtaining an emotion label output by the emotion classifier according to the feature value of the emotion dimension of the target feature vector, wherein the emotion label represents the emotion of the role corresponding to the target role identification.

Since the sample sentence is associated with the dialog character identification, the emotion expressed by the sample sentence of the dialog character is determined in this step by making use of the dialog character identification. Generally, in a dialogue scene, the emotion of a character has a certain continuity. In this example, based on the target role identification, the emotion classifier can identify an emotion classification for the target role.

In a conversation scene, a plurality of conversation characters may be involved, and through the scheme provided by the embodiment, emotion recognition can be performed on sample sentences of target characters, so that the accuracy of emotion recognition is improved. For example, if the BERT model is applied to the feature extraction layer, since the network structure is a transform and has a certain context correlation understanding capability and semantic feature extraction capability, the scheme provided by this embodiment can more accurately extract features of the sample sentences of the target character, and the extracted features more conform to the emotion that the target character really wants to express.

S63: inputting a target feature vector of a sample sentence containing a target role identification into the semantic classifier for processing, and obtaining a semantic label output by the semantic classifier according to a feature value of a semantic dimension of the target feature vector, wherein the semantic label represents the semantic of a role corresponding to the target role identification.

Similar to the emotion classification described in the above example, in the dialog scenario, the semantics of the dialog role also have a certain continuity. In this example, for the target role identifier, the feature vector of the corresponding sample statement is input into the semantic classifier, and semantic classification can be identified for the target role.

For example, if the BERT model is applied to the feature extraction layer, since the network structure is a transform, and has a certain context correlation understanding capability and semantic feature extraction capability, the scheme provided by this embodiment can more accurately extract features of the sample sentences of the target role, and the extracted features more conform to the semantics that the target role really wants to express.

Based on the method provided in the foregoing embodiment, optionally, as shown in fig. 7, the sample labels in the training samples further include preset conversational labels, the feature vectors further represent feature values of corresponding sample sentences in preset conversational dimensions, the feature values of the preset conversational dimensions represent whether the sample sentences include preset conversational content, and the text classification model further includes a preset conversational classifier;

as shown in fig. 7, the step S13 includes:

s71: and inputting the feature vectors of the sample sentences into the preset conversational classifier for processing to obtain preset conversational labels output by the preset conversational classifier according to the feature values of the preset conversational dimensionality corresponding to the feature vectors of the sample sentences.

In this embodiment, the preset dialogs may be artificially defined dialogs, and the preset dialogs may be different in various application scenarios. For example, in the context of a seat-to-customer conversation, the preset dialogs may include, for example, the dialogs "hello, here xxx corporation asking what can help you? "may also include the word of saying at the end" recording the whole call, please evaluate my service, thank you for patience, hang up on your side, thank you for thank you. "

In the training sample, the sentences at the beginning and the end of the dialog correspond to preset dialog tags. The preset language label can be manually calibrated, or the model can be identified and calibrated in the sample according to the preset language text.

After the sample sentence is input into the feature extraction layer, the feature extraction layer can identify whether the sample sentence is close to the preset phonetic text according to the preset phonetic text, and the proximity degree can be expressed by the size of the feature value of the preset phonetic dimension in the vector.

In this example, the text classification model further includes a preset utterance classifier, the preset utterance classifier is configured to perform preset utterance classification recognition according to the feature vector of the sample sentence, and the output classification label expresses whether the sample sentence includes the preset utterance.

Through the scheme provided by the embodiment of the application, the classification of the sample sentences can be realized based on the preset dialect, and the output classification labels can be used for realizing the dialog quality inspection. For example, in an application scenario of a conversation between an agent and a client, the scheme provided by the example can be used for detecting whether the agent uses a greeting, a closing word, and the like in the conversation according to the specification, so that the agent service quality inspection is realized, and the method can be used for assisting in improving the service quality.

The present solution is further illustrated below with reference to examples. Assuming that the training sample is a communication record in a dialogue scene between an agent and a client, the specific dialogue content is as follows:

a seat: hello, here xxx company asks what can help you?

Customer: i want to reserve the money owed by I in advance

A seat: good, we help you query this side

Customer: good taste

A seat: mr. good, through inquiring, you do not conform to the condition of payment in advance, and you can not have money at this moment

Customer: why is there no money? Your contract is a overlord, I complain you, you are fraudulent consumers

……

A seat: recording the whole call, and please evaluate the service, thank you for listening and speaking for patience, hang up on your side, and thank you for thank you.

Based on the above dialogue content, firstly, tags can be added to each sentence to calibrate the emotion, semantic, dialect expressed by each sentence or whether there is a sensitive word. Assuming that the calibrated tags are arranged according to the sequence of the sensitive word tag, the emotion tag and the semantic tag, and the preset tactical tag is connected with other tags through the # symbol, the calibrated dialogue content is as follows:

a seat: hello, here xxx company asks what can help you? Labeling: [] "neutral ], [ ]

Customer: i want to keep the money I owed in advance with the tag: [] Neutral, customer's intention of repayment in advance

A seat: good, we help you query for the following tags: [] "neutral ], [ ]

Customer: and (4) good labeling: [] "neutral ], [ ]

A seat: mr. good, through inquiry, you do not conform to the condition of payment in advance, you temporarily do not have the money label: [] Neutral, customer does not meet the condition of payment in advance

Customer: why is there no money? Your this is a overlord contract, i want to complain you, your this is a fraudulent consumer label: [ (overlord contract), (complaint), (fraud) ], [ negative ], [ customer has complaint tendency ]

……

A seat: recording the whole call, and please evaluate the service, thank you for listening and speaking for patience, hang up on your side, and thank you for thank you. Labeling: [] Neutral and the seat correctly says the ending operation

In this example, to facilitate subsequent model processing, format conversion is performed based on the above-described tags. For example, each sentence is preceded by a corresponding character identifier, in this example, [ r1] represents a seat, [ r2] represents a client, each turn of conversation (turn) is separated by [ e ], the sentence head marked on each sentence is labeled by emotion, semantic information and other labels, sensitive words are converted into a BIO format labeled in sequence, a special character [ CLS ] is added before the standard language, and the whole conversation and the standard language are connected through the special character [ SEP ]. The structure of the training sample is shown in fig. 8a by the format conversion.

After the training samples are obtained, the training samples are input into the feature extraction layer for feature extraction, so as to obtain feature vectors representing various sample sentences, as shown in fig. 8 b. In this example, a standard multi-task pre-training model structure can be adopted according to actual requirements, and the model such as Bert, Roberta with better reproduction effect, or XLNet with better effect on ultra-long text can be used. The hidden states output by the pre-trained model can be followed by decoders of different tasks for further output to the respective classifiers. The decoder may include the following:

and (3) keyword recognition and decoding: the hierarchy states of the last layer corresponding to each token are connected with a classifier, and softmax can be used for sequence labeling in the scheme.

And (3) emotion recognition decoding: the last layer of hidden states of the special separator "e" of different rounds is followed by classfier for text classification task, in this scheme, softmax can be used for text classification.

Semantic point decoding for each dialog turn: the character specials [ r1] and [ r2] at the beginning of each round are followed by classfier classifiers, and in the scheme, softmax can be used for text classification.

Standard speech detection decoding: and (3) performing text classification task by following classfier to recognize whether a certain type of dialect is spoken or not, wherein the hidden states of the last layer correspond to [ CLS ] separators between the conversation and the standard dialect.

Subsequently, different classifiers connected behind the feature extraction layer implement multi-task classification based on the feature vectors. Different types of classifiers may be employed for different tasks. Next, with reference to fig. 8c, the classifiers for different tasks will be described.

For the emotion classifier and the semantic classifier, since emotion and semantic have certain continuity, classification can be performed based on the continuity of one round of conversation in the step of performing emotion and semantic classification. In this scheme, a softmax classifier is used for classification, as shown in the following formula (1-1):

yⁱ＝softmax(Wⁱh₁+bⁱ) (1-1)

wherein h is₁Is represented by [ e]Or [ r1]Or [ r2]Hidden vector of the highest layer, bⁱDenotes the bias coefficient, WⁱRepresenting the weight coefficients.

For the sensitive word classifier, the sequence labeling task can be regarded as the last layer h of the hidden state of each token₁，...，h_TRepresenting the final semantic information of each token, and then passing the hidden vector of the last layer of each token through a softmax layer classification layer to obtain the classification information of each token, which is as follows (1-2):

wherein h is_nIs associated with each token's corresponding hidden state, b^sDenotes the bias coefficient, W^sRepresenting the weight coefficient and n representing the sequence length.

For the preset speech detection task, it can be regarded as a text classification task, according to the model structure of the invention, the last layer [ CLS ] position of the model represents the semantic information of the standard speech, and the model is classified by using a softmax classifier, as shown in the following formula (1-3):

y^j＝softmax(W^jh_cls+b^j) (1-3)

wherein h is_clsRepresents [ CLS]Hidden vector of the highest layer, b^jDenotes the bias coefficient, W^jRepresenting the weight coefficients.

And after the characteristic extraction layer is accessed to each classifier, each classifier is used for outputting a classification label of each classification task, and then model optimization is carried out based on the classification labels and the sample labels.

In this example, since multiple tasks are involved, a penalty function may be determined for each task. And then determining the loss function of the whole text classification model by weighting the loss functions of various tasks based on preset weights.

For example, the cross entropy loss function is adopted in the scheme, and for the emotion recognition task, the semantic recognition task, the sensitive word recognition task and the preset speech recognition task, the loss functions of each class of tasks are respectively loss1, loss2, loss3 and loss4, so that the objective function of the multi-task learning task for constructing the text classification model is loss _ sum as shown in the following formula (1-4):

loss_sum＝w1*loss1+w2*loss2+w3*loss3+w4*loss (1-4)

wherein w1, w2, w3 and w4 are weighting coefficients of the loss function of the four tasks, and w1+ w2+ w3+ w4 is 1.

Subsequently, model optimization is performed using the above-described loss function as an objective function.

Based on the scheme provided by the embodiment of the application, the model can integrate a plurality of tasks such as a text classification task and a sequence labeling task into one model, and network parameter and semantic information sharing is realized. The model input is a processed whole dialogue text and a standard dialect to be matched, the feature extraction layer is a pre-training model structure (a transform network structure), and Bert, Roberta, XLNET and the like can be selected according to actual conditions, and the part can extract and extract a plurality of task features together, so that network parameter sharing and semantic information sharing can be realized. Multiple tasks may be combined together or each task may be constrained to provide additional information to each other, which may enhance the generalization of the model. The downstream tasks are respectively accessed into different classifiers such as softmax, Sigmoid and the like according to the corresponding hidden states of different positions of the last layer of the model structure such as [ r1], [ r2], [ e ], [ CLS ] and the like, and are used for respectively realizing the downstream tasks such as emotion recognition, sensitive word detection, semantic detection, speech detection and the like, thereby achieving one-time input and realizing the recognition of a plurality of tasks.

In the scheme provided by the embodiment of the application, multiple tasks are learned and trained together, and compared with a single model, noise is added mutually, so that the generalization capability of the model is improved, as long as the labeled data quantity is enough, the model can learn enough semantic information, and then words outside some word banks have stronger generalization capability. In the multi-task model, multi-task semantics are shared, a pre-training feature extraction model is shared, and only one set of network structure needs to be trained and one set of network parameters needs to be maintained. In addition, the scheme has advantages in the aspects of model size, reasoning time and the like.

In order to solve the problems in the prior art, the present application further provides a text classification model optimization apparatus 90, as shown in fig. 9, including:

the acquiring module 91 is configured to acquire a training sample of a text classification model, where the training sample includes a plurality of sample sentences and sample labels corresponding to the sample sentences, and the sample labels in the training sample include a sensitive word label, an emotion label and a semantic label, where any sample sentence corresponds to at least one sample label;

a feature extraction module 92, configured to perform feature extraction on the input multiple sample sentences and corresponding sample tags to obtain feature vectors respectively corresponding to the multiple sample sentences, where the feature vectors represent feature values of the corresponding sample sentences in at least one feature dimension of a sensitive word dimension, an emotion dimension, and a semantic dimension;

the classification module 93 is configured to process the feature vectors of the sample sentences to obtain classification labels corresponding to the feature vectors of the sample sentences, and the text classification module includes a sensitive word classifier, an emotion classifier, and a semantic classifier;

an optimizing module 94, configured to optimize the text classification model by using the classification label and the sample label corresponding to the sample sentence.

The device provided by the embodiment of the application can realize multi-task recognition, and in an application scene needing to execute multiple task recognition, a text classification model obtained by training a text to be recognized in the scheme can be used for obtaining a multi-task output result, so that the multi-task recognition efficiency of the model is effectively improved, and the text does not need to be input into different models for prediction for multiple times. In addition, according to the scheme of the embodiment of the invention, one feature extraction layer is shared to obtain the feature vectors, and then the classification results are output through a plurality of different classifiers, so that text classification can be executed based on multiple tasks, and the overall capacity of the model is reduced. Because this scheme relates to the multiple classifier of multiple task, compare single classifier model, noise can be increased between the different kinds of task in this scheme training process, promotes the generalization ability of model, optimizes the classification ability of model.

The modules in the device provided by the embodiment of the present application may also implement the method steps provided by the above method embodiment. Alternatively, the apparatus provided in the embodiment of the present application may further include other modules besides the modules described above, so as to implement the method steps provided in the foregoing method embodiment. The device provided by the embodiment of the application can achieve the technical effects achieved by the method embodiment.

Preferably, an embodiment of the present invention further provides an electronic device, which includes a processor, a memory, and a computer program stored in the memory and capable of running on the processor, and when the computer program is executed by the processor, the computer program implements each process of the foregoing text classification model optimization method embodiment, and can achieve the same technical effect, and in order to avoid repetition, details are not described here again.

The embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program implements each process of the foregoing text classification model optimization method embodiment, and can achieve the same technical effect, and in order to avoid repetition, the details are not repeated here. The computer-readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims

1. A text classification model optimization method is characterized by comprising the following steps:

inputting the feature vectors of the sample sentences into a plurality of classifiers of the text classification model for processing to obtain classification labels corresponding to the feature vectors of the sample sentences, wherein the text classification model comprises a sensitive word classifier, an emotion classifier and a semantic classifier, and the classification label of each sample sentence comprises at least one of a sensitive word label, an emotion label and a semantic label;

2. The method of claim 1, wherein optimizing the text classification model based on the classification labels and the sample labels corresponding to the sample sentences comprises:

determining a loss function of the text classification model according to the classification label and the sample label corresponding to each sample statement;

and optimizing the text classification model according to the loss function.

3. The method of claim 2, wherein determining the penalty function for the text classification model based on the classification label and the exemplar label corresponding to each exemplar statement comprises:

respectively determining a loss function corresponding to each classifier according to the classification label and the sample label corresponding to each sample statement;

and determining the weighted sum of the loss functions corresponding to the classifiers as the loss function of the text classification model according to a preset weight coefficient.

4. The method of claim 1, wherein the feature vectors of the sample sentences are input into a plurality of classifiers of the text classification model for processing, and classification labels corresponding to the feature vectors of the sample sentences are obtained, and the text classification model comprises a sensitive word classifier, an emotion classifier and a semantic classifier, and comprises:

inputting the feature vector of the sample sentence into the sensitive word classifier for processing to obtain a sensitive word label output by the sensitive word classifier according to a sensitive word implicit vector, wherein the sensitive word implicit vector is obtained by decoding a hidden state output by the feature extraction layer by a sensitive word decoder;

inputting the feature vector of the sample statement into the emotion classifier for processing to obtain an emotion label output by the emotion classifier according to an emotion hidden vector, wherein the emotion hidden vector is obtained by decoding a hidden state output by the feature extraction layer by an emotion decoder;

and inputting the feature vector of the sample sentence into the semantic classifier for processing to obtain a semantic label output by the semantic classifier according to a semantic hidden vector, wherein the semantic hidden vector is obtained by decoding a hidden state output by the feature extraction layer by a semantic decoder.

5. The method of claim 1, wherein the sample labels in the training samples further include a preset conversational label, the feature vector further characterizes a feature value of the corresponding sample sentence in a preset conversational dimension, the feature value of the preset conversational dimension characterizes whether the sample sentence includes preset conversational content, and the text classification model further includes a preset conversational classifier;

inputting the feature vectors of the sample sentences into a plurality of classifiers of the text classification model for processing to obtain classification labels corresponding to the feature vectors of the sample sentences, wherein the classification labels comprise:

and inputting the feature vectors of the sample sentences into the preset conversational classifier for processing to obtain preset conversational labels output by the preset conversational classifier according to the feature values of the preset conversational dimensionality corresponding to the feature vectors of the sample sentences.

6. The method of any one of claims 1 to 5, wherein obtaining training samples of a text classification model comprises:

obtaining a dialogue record of the text classification model;

recognizing the conversation sound record into a plurality of conversation text sentences based on time sequence and conversation role identifications corresponding to the conversation text sentences;

combining the plurality of dialogue text sentences based on the time sequence into a plurality of ordered sample sentences according to corresponding dialogue role identifications, wherein the dialogue sentences in the same sample sentence correspond to the same dialogue role identification;

and generating a training sample containing a plurality of sample sentences and corresponding role identifications.

7. The method of claim 6, wherein inputting the feature vectors of the sample sentences into a plurality of classifiers of the text classification model for processing to obtain the classification labels corresponding to the feature vectors of the sample sentences, comprises:

inputting a target feature vector of a sample sentence containing a target role identifier into the sensitive word classifier for processing to obtain a sensitive word label output by the sensitive word classifier according to a feature value of a sensitive word dimension of the target feature vector, wherein the sensitive word label represents whether the sample sentence corresponding to the target feature vector comprises a sensitive word or not;

inputting a target feature vector of a sample statement containing a target role identifier into the emotion classifier for processing to obtain an emotion label output by the emotion classifier according to a feature value of an emotion dimension of the target feature vector, wherein the emotion label represents the emotion of a role corresponding to the target role identifier;

inputting a target feature vector of a sample sentence containing a target role identification into the semantic classifier for processing, and obtaining a semantic label output by the semantic classifier according to a feature value of a semantic dimension of the target feature vector, wherein the semantic label represents the semantic of a role corresponding to the target role identification.

8. A text classification model optimization apparatus, comprising:

the characteristic extraction module is used for extracting characteristics of the input sample sentences and the corresponding sample labels to obtain characteristic vectors respectively corresponding to the sample sentences, and the characteristic vectors represent characteristic values of the corresponding sample sentences in at least one characteristic dimension of sensitive word dimensions, emotion dimensions and semantic dimensions;

9. An electronic device, comprising: memory, processor and computer program stored on the memory and executable on the processor, which computer program, when executed by the processor, carries out the steps of the method according to any one of claims 1 to 7.

10. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.