CN118132687A

CN118132687A - Sentence processing and category model training method, sentence processing and category model training device, sentence processing equipment and category model training medium

Info

Publication number: CN118132687A
Application number: CN202211535212.0A
Authority: CN
Inventors: 庞胜; 熊超; 包勇军
Original assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Wodong Tianjun Information Technology Co Ltd
Current assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Wodong Tianjun Information Technology Co Ltd
Priority date: 2022-12-02
Filing date: 2022-12-02
Publication date: 2024-06-04

Abstract

The disclosure provides a training method, device, equipment and medium for sentence processing and category models, wherein the method comprises the following steps: encoding each character in the target sentence to obtain initial character characteristics of each character, and generating sentence characteristics of the target sentence according to the initial character characteristics of each character; respectively fusing sentence characteristics with initial character characteristics of each character to obtain target character characteristics of each character; respectively carrying out slot recognition and intention recognition on the target sentence according to the target character characteristics of each character to obtain predicted slot information and predicted intention information; and processing the target sentence according to the predicted slot information and the predicted intention information. Therefore, the global sentence characteristics are fused into a single character dimension, the perception of characters to global sentence information can be promoted, the accuracy of intention and slot position identification is improved, and the accurate processing of target sentences according to accurate user intention and slot position information can be realized, so that the actual business requirements of users are met.

Description

Sentence processing and category model training method, sentence processing and category model training device, sentence processing equipment and category model training medium

Technical Field

The disclosure relates to the technical field of artificial intelligence, and in particular relates to a training method, device, equipment and medium for sentence processing and category models.

Background

The intelligent customer service system is used as an important ring of customer service scenes of the network sales platform or the network shopping platform, can help the platform or the mall and the merchant to solve the problems of before-sale, in-sale, after-sale and the like for users, and saves a great deal of labor cost for the direct-camping mall and the resident merchant. The task type multi-round dialogue system is used as a sub-module in the intelligent customer service system, and the user can be helped to solve the business problem finally by understanding the intention of the user and key information in the dialogue interaction process in the multi-round dialogue interaction process with the user.

In a task-type multi-round dialogue system, it is very important how to identify the intention (intent) information and slot (slot) information in an inquiry sentence (query) input by a user, so that the inquiry sentence is accurately processed according to the intention information and the slot information to meet the actual business requirement of the user.

Disclosure of Invention

The present disclosure aims to solve, at least to some extent, one of the technical problems in the related art.

The present disclosure proposes a training method, apparatus, device and medium for sentence processing and category model, so as to achieve the fusion of global sentence characteristics (i.e., sentence vector representation) to single character dimensions, so as to promote the perception of characters to global sentence information, and promote the accuracy of intention and slot recognition, thereby achieving the accurate processing of target sentences according to accurate user intention and slot information, so as to meet the actual business requirements of users.

An embodiment of a first aspect of the present disclosure provides a sentence processing method, including:

Acquiring a target sentence, and encoding each character in the target sentence to obtain initial character characteristics of each character;

generating sentence characteristics of the target sentence according to the initial character characteristics of each character;

Respectively fusing the sentence characteristics with initial character characteristics of each character to obtain target character characteristics of each character;

respectively carrying out slot recognition and intention recognition on the target sentence according to the target character characteristics of each character to obtain predicted slot information and predicted intention information;

And processing the target sentence according to the prediction slot information and the prediction intention information.

An embodiment of a second aspect of the present disclosure provides a training method for a category model, including:

acquiring a sample sentence, wherein the labeling information of the sample sentence comprises labeling intention information and labeling slot position information;

Determining an initial category model matched with the first category according to the first category to which the sample sentence belongs;

Encoding each character in the sample sentence to obtain initial character characteristics of each character, and generating sentence characteristics of the sample sentence according to the initial character characteristics of each character;

respectively carrying out slot recognition and intention recognition on the sample sentences by adopting an initial category model according to the initial character characteristics and the sentence characteristics of each character so as to obtain predicted slot information and predicted intention information;

and training the initial category model according to the difference between the predicted intention information and the labeling intention information and the difference between the predicted slot position information and the labeling slot position information to obtain a target category model.

An embodiment of a third aspect of the present disclosure provides another sentence processing method, including:

Acquiring a target sentence;

Determining a target category model matched with the second category from a plurality of target category models according to the second category to which the target sentence belongs; wherein the target category model is trained by a method according to an embodiment of the second aspect of the disclosure;

Performing slot recognition and intention recognition on the target sentence by adopting a target category model matched with the second category to obtain slot recognition information and intention recognition information;

And processing the target sentence according to the slot identification information and the intention identification information.

An embodiment of a fourth aspect of the present disclosure provides a sentence processing apparatus, including:

The acquisition module is used for acquiring the target statement;

The coding module is used for coding each character in the target sentence to obtain initial character characteristics of each character;

The generation module is used for generating sentence characteristics of the target sentence according to initial character characteristics of each character;

the fusion module is used for respectively fusing the sentence characteristics with the initial character characteristics of each character to obtain target character characteristics of each character;

The recognition module is used for respectively carrying out slot recognition and intention recognition on the target sentence according to the target character characteristics of each character so as to obtain predicted slot information and predicted intention information;

And the processing module is used for processing the target sentence according to the prediction slot position information and the prediction intention information.

An embodiment of a fifth aspect of the present disclosure proposes a training apparatus for a class model, including:

The acquisition module is used for acquiring sample sentences, wherein the labeling information of the sample sentences comprises labeling intention information and labeling slot position information;

the determining module is used for determining an initial category model matched with the first category according to the first category to which the sample sentence belongs;

the processing module is used for encoding each character in the sample sentence to obtain the initial character characteristics of each character, and generating the sentence characteristics of the sample sentence according to the initial character characteristics of each character;

the recognition module is used for carrying out slot recognition and intention recognition on the sample sentences respectively by adopting an initial category model according to the initial character characteristics and the sentence characteristics of each character so as to obtain predicted slot information and predicted intention information;

And the training module is used for training the initial category model according to the difference between the predicted intention information and the labeling intention information and the difference between the predicted slot position information and the labeling slot position information so as to obtain a target category model.

An embodiment of a sixth aspect of the present disclosure proposes another sentence processing apparatus, including:

The acquisition module is used for acquiring the target statement;

The determining module is used for determining a target category model matched with the second category from a plurality of target category models according to the second category to which the target sentence belongs; wherein the target category model is trained using the apparatus according to the embodiment of the fifth aspect of the present disclosure;

The recognition module is used for recognizing the slot position and the intention of the target sentence by adopting a target category model matched with the second category so as to obtain slot position recognition information and intention recognition information;

And the processing module is used for processing the target sentence according to the slot identification information and the intention identification information.

An embodiment of a seventh aspect of the present disclosure proposes an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor, so that the at least one processor can execute the sentence processing method set forth in the embodiment of the first aspect of the disclosure, or execute the training method of the category model set forth in the embodiment of the second aspect of the disclosure, or execute the sentence processing method set forth in the embodiment of the third aspect of the disclosure.

An eighth aspect embodiment of the present disclosure proposes a non-transitory computer-readable storage medium storing computer instructions for causing the computer to execute the sentence processing method proposed by the embodiment of the first aspect of the present disclosure, or to execute the training method of the category model proposed by the embodiment of the second aspect of the present disclosure, or to execute the sentence processing method proposed by the embodiment of the third aspect of the present disclosure.

An embodiment of a ninth aspect of the present disclosure proposes a computer program, including a computer program, which when executed by a processor implements a sentence processing method according to an embodiment of the first aspect of the present disclosure, or implements a training method for a category model according to an embodiment of the second aspect of the present disclosure, or implements a sentence processing method according to an embodiment of the third aspect of the present disclosure.

One embodiment of the present disclosure described above has at least the following advantages or benefits:

Coding each character in the target sentence to obtain initial character characteristics of each character, and generating sentence characteristics of the target sentence according to the initial character characteristics of each character; respectively fusing sentence characteristics with initial character characteristics of each character to obtain target character characteristics of each character; respectively carrying out slot recognition and intention recognition on the target sentence according to the target character characteristics of each character to obtain predicted slot information and predicted intention information; and processing the target sentence according to the predicted slot information and the predicted intention information. Therefore, global sentence characteristics (namely sentence vector representation) are fused into a single character dimension, so that the perception of characters to global sentence information can be promoted, the accuracy of intention and slot identification is improved, and the accurate processing of target sentences according to accurate user intention and slot information can be realized, so that the actual business requirements of users are met.

Additional aspects and advantages of the disclosure will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the disclosure.

Drawings

The foregoing and/or additional aspects and advantages of the present disclosure will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings, in which:

FIG. 1 is a flowchart of a sentence processing method according to an embodiment of the present disclosure;

fig. 2 is a flow chart of a sentence processing method according to a second embodiment of the disclosure;

Fig. 3 is a flow chart of a sentence processing method according to a third embodiment of the present disclosure;

FIG. 4 is a flow chart of a training method for category models provided in a fourth embodiment of the present disclosure;

FIG. 5 is a flow chart of a training method for category models provided in a fifth embodiment of the present disclosure;

FIG. 6 is a flow chart of a training method for category models provided in a sixth embodiment of the present disclosure;

FIG. 7 is a flow chart of a training method for category models provided in a seventh embodiment of the present disclosure;

FIG. 8 is a flowchart of a sentence processing method according to an embodiment of the present disclosure;

FIG. 9 is a schematic diagram of an offline process flow for intent and slot joint identification provided by an embodiment of the present disclosure;

FIG. 10 is a schematic diagram of a model overall architecture provided by an embodiment of the present disclosure;

FIG. 11 is a schematic diagram of a category model provided by an embodiment of the present disclosure;

FIG. 12 is a schematic diagram of an online process flow for intent and slot joint identification provided by an embodiment of the present disclosure;

Fig. 13 is a schematic structural diagram of a sentence processing device according to a ninth embodiment of the present disclosure;

FIG. 14 is a schematic view of a training device for category models provided in accordance with an embodiment of the present disclosure;

fig. 15 is a schematic structural diagram of a sentence processing device according to an eleventh embodiment of the present disclosure;

Fig. 16 illustrates a block diagram of an exemplary electronic device suitable for use in implementing embodiments of the present disclosure.

Detailed Description

Embodiments of the present disclosure are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are exemplary and intended for the purpose of explaining the present disclosure and are not to be construed as limiting the present disclosure.

In a task-based multi-round dialog system, an NLU (Natural Language Understanding ) module is a very important module, which mainly serves to identify intention (intent) information and slot (slot) information in an inquiry sentence (query) input by a user.

For example, assuming that the query is "please help me order a ticket from beijing to shanghai," the NLU module needs to recognize that the user's intention information is "order ticket" and the slot information is "departure = beijing, destination = shanghai. For another example, in a customer service scenario of a network purchase and sale platform or a network purchase platform, when a user inputs "help me to make an electronic product invoice from october one to october ten", the NLU module needs to identify that the intention information of the user is "invoice making", the slot information is "invoice time period=from october one to october ten", and the invoice category=electronic category ".

At present, for NLU tasks, intention recognition and slot extraction of a query are mainly realized in the following two ways:

First, the intention recognition and the slot extraction (or slot recognition) are modeled as two independent tasks, the intention of the user is recognized using an intention model, and the slot is extracted using a slot extraction model.

Secondly, the intention recognition and the slot extraction are modeled jointly, namely, the loss functions of the intention recognition task and the slot extraction task are combined and optimized. Because intent and slot position typically have strong correlations, joint modeling approaches typically perform better than independent modeling approaches.

Although the method of combining the intention recognition and the slot extraction modeling is better than the independent modeling method, the current combined modeling method only combines and optimizes the loss functions of two tasks, and the method cannot well relate the intention and the slot. For example, the emphasis of the joint modeling approach is on: using the intent information to assist in slot identification, and ignoring the effect of slot information on intent identification, tends to result in the following: due to the varied expression of the slot information in sentences, the intention recognition is wrong.

For example, assuming that the sample sentence or sample corpus in the training data is "please help me order an air ticket from city a to city B", and the model is in the actual online prediction process, "please help me order an air ticket from city C to city D", since the user only modifies the departure place and destination city in the sentence and the modified city name does not appear in the training data, if BERT (Bidirectional Encoder Representations from Transformers, a representation of a bi-directional encoder from a converter) is used as a backbone network encoder, the query is encoded, and the obtained sentence feature (i.e., a vector representation of the sentence) is generally an average value of vectors of characters (token, such as kanji, subword subword, etc.) output by the last layer of transducer, when more characters token irrelevant to intent recognition appear in the query, the sentence feature may deviate, resulting in a final intent recognition error.

In addition, models are usually trained independently according to different scenes, but under the condition that a network shopping platform or a network shopping platform is in customer service, task type multi-round dialogue scenes have certain business knowledge correlation, and information among different multi-round scenes cannot be utilized by independently modeling each scene. For example, for the following two scenarios: the invoice application and the invoice issuing progress query are related to the invoice, but each scene model only uses training data in respective scenes and does not learn vector representation of the invoice in a global service scene.

In view of at least one problem presented above, the present disclosure proposes a method, apparatus, device, and medium for training sentence processing and category models.

Methods, apparatuses, devices, and media for sentence processing and training of category models in accordance with embodiments of the present disclosure are described below with reference to the accompanying drawings.

Fig. 1 is a flowchart of a sentence processing method according to an embodiment of the disclosure.

The sentence processing method is configured in a sentence processing device for example, and the sentence processing device can be applied to any electronic device, so that the electronic device can execute a sentence processing function.

The electronic device may be any device with computing capability, for example, a computer, a mobile terminal, a server, etc., and the mobile terminal may be, for example, a vehicle-mounted device, a mobile phone, a tablet computer, a personal digital assistant, a wearable device, etc., which has various operating systems, a touch screen, and/or a display screen.

As shown in fig. 1, the sentence processing method may include the steps of:

step 101, obtaining a target sentence, and encoding each character in the target sentence to obtain initial character characteristics of each character.

In the embodiment of the disclosure, the target sentence may be a sentence or a question input by a user, and the input manner includes, but is not limited to, touch input (such as sliding, clicking, etc.), keyboard input, voice input, and the like.

As an application scenario, the method is applied to an intelligent customer service scenario of a network purchase and sales platform or a network shopping platform for illustration, and the target sentence can be a sentence (query) input by a customer in an intelligent customer service system.

In the disclosed embodiment, the character (token) refers to the minimum granularity of the text input model, for example, the character may be a single kanji for chinese and a subword (subword) for english.

In the embodiment of the disclosure, each character in the target sentence may be encoded to obtain an initial character feature of each character.

As an example, each character in the target sentence may be encoded based on a text encoding algorithm or a text feature extraction algorithm to obtain an initial character feature for each character. For example, each character in the target sentence may be encoded based on the BERT model to obtain initial character features for each character.

For example, the number of characters included in the markup sentence is N, and the initial character feature of the i-th character in the markup sentence is T _i, where i is a positive integer not greater than N.

Step 102, generating sentence characteristics of the target sentence according to the initial character characteristics of each character.

In the embodiment of the disclosure, sentence characteristics of the target sentence can be generated according to initial character characteristics of each character. For example, the initial character features of each character may be averaged by bits to obtain sentence features of the target sentence.

And 103, respectively fusing sentence characteristics with initial character characteristics of each character to obtain target character characteristics of each character.

In the embodiment of the disclosure, for any one character in the target sentence, the sentence feature may be fused with the initial character feature of the character to obtain the target character feature of the character.

Step 104, respectively carrying out slot recognition and intention recognition on the target sentence according to the target character characteristics of each character to obtain predicted slot information and predicted intention information.

In the embodiment of the disclosure, the slot recognition and the intention recognition can be respectively performed on the target sentence according to the target character characteristics of each character, so as to obtain the predicted slot information and the predicted intention information.

For example, taking the target sentence as "help me make an electronic invoice from monday to friday", the identified prediction intention information may be "make an invoice", and the prediction slot information may be "invoice time period=monday to friday, invoice category=electronic category".

And 105, processing the target sentence according to the predicted slot information and the predicted intention information.

In the embodiment of the disclosure, the target sentence can be processed according to the predicted slot information and the predicted intention information.

Still by way of example above, an electronic ticketing may be made for a user from monday to friday based on the predicted slot information and the predicted intent information.

According to the sentence processing method, initial character characteristics of each character are obtained by encoding each character in a target sentence, and sentence characteristics of the target sentence are generated according to the initial character characteristics of each character; respectively fusing sentence characteristics with initial character characteristics of each character to obtain target character characteristics of each character; respectively carrying out slot recognition and intention recognition on the target sentence according to the target character characteristics of each character to obtain predicted slot information and predicted intention information; and processing the target sentence according to the predicted slot information and the predicted intention information. Therefore, global sentence characteristics (namely sentence vector representation) are fused into a single character dimension, so that the perception of characters to global sentence information can be promoted, the accuracy of intention and slot identification is improved, and the accurate processing of target sentences according to accurate user intention and slot information can be realized, so that the actual business requirements of users are met.

It should be noted that, in the technical solution of the present disclosure, the related aspects of collecting, updating, analyzing, processing, using, transmitting, storing, etc. of the personal information of the user all conform to the rules of the related laws and regulations, and are used for legal purposes without violating the public order colloquial. Necessary measures are taken for the personal information of the user, illegal access to the personal information data of the user is prevented, and the personal information security, network security and national security of the user are maintained.

In order to clearly explain how sentence features are fused with initial character features of characters to obtain target character features of the characters in the above embodiments of the disclosure, the disclosure further provides a sentence processing method.

Fig. 2 is a flowchart of a sentence processing method according to a second embodiment of the disclosure.

As shown in fig. 2, the sentence processing method may include the steps of:

step 201, obtaining a target sentence, and encoding each character in the target sentence to obtain an initial character characteristic of each character.

Step 202, generating sentence characteristics of the target sentence according to the initial character characteristics of each character.

The explanation of steps 201 to 202 may be referred to the relevant descriptions in any embodiment of the present disclosure, and will not be repeated here.

Step 203, for any character in the target sentence, determining a first weight of the any character to each character.

In the embodiment of the present disclosure, for any character in the target sentence, the weight of the any character to each character may be determined, which is denoted as the first weight in the present disclosure.

As one possible implementation manner, the calculation manner of the first weight may be: determining the importance degree of each character in the target sentence, and determining the first weight of the ith character to the jth character according to the initial character characteristic of the ith character and the importance degree of the jth character in the target sentence aiming at the ith character in the target sentence; wherein i and j are positive integers less than or equal to T, and T is the number of characters contained in the target sentence.

As an example, the importance of each character in the target sentence may be determined by a deep learning technique, e.g., the importance of each character in the target sentence may be characterized by parameters of the attention mechanism in the model. For example, the importance of the jth character may be characterized by vector w _j, where vector w _j is a parameter of the attention mechanism.

The first weight α _i,j of the ith character to the jth character may be determined as:

α_i,j＝σ(w_j·T_i)； (1)

where T _i is the initial character feature of the ith character and σ is the activation function.

As a possible implementation manner, the weight of the ith character to the jth character may be normalized, for example, the initial weight of the ith character to the jth character may be determined according to the initial character feature of the ith character and the importance degree of the jth character, for example, the initial weight e _i,j of the ith character to the jth character is:

e_i,j＝σ(w_j·T_i)； (2)

thereafter, a first coefficient may be determined according to the initial weight of the ith character to each character, for example, the first coefficient may be: Thus, the first weight of the ith character to the jth character can be determined according to the initial weight of the ith character to the jth character and the first coefficient. For example, the first weight α _i,j of the ith character to the jth character is:

Step 204, according to the first weight of each character, the initial character features of any character are weighted and summed to obtain the middle character features of any character.

In the embodiment of the disclosure, the initial character features of any character may be weighted and summed according to the first weight of the character to obtain the intermediate character feature of any character.

As an example, the middle character of the i-th character is marked with u _i, and then there is:

step 205, fusing the intermediate character feature and sentence feature of the arbitrary character to obtain the target character feature of the arbitrary character.

In the embodiment of the disclosure, the intermediate character feature and the sentence feature of any character may be fused to obtain the target character feature of the any character.

As an example, the target character characteristic of the i-th character is v _i, and there are:

v_i＝tanh(W₁u_i+W₂·I')； (5)

Wherein, I 'is sentence characteristic, W ₁ is correlation matrix of u _i, which is trainable parameter, W ₂ is correlation matrix of I', which is trainable parameter.

Step 206, according to the target character characteristics of each character, respectively carrying out slot recognition and intention recognition on the target sentence to obtain predicted slot information and predicted intention information.

Step 207, processing the target sentence according to the predicted slot information and the predicted intention information.

The explanation of steps 206 to 207 may be referred to the relevant descriptions in any embodiment of the disclosure, and are not repeated here.

According to the sentence processing method, through fusion of sentence characteristics (sentence integral representation) and character characteristics of single characters, global information of a target sentence is fused in the character characteristics of the single characters, perception of characters related to slot positions on the global information is promoted, and accuracy of slot position identification is improved.

In order to clearly explain how to perform intention recognition and slot recognition on a target sentence according to target character features of each character in any embodiment of the disclosure, the disclosure further provides a sentence processing method.

Fig. 3 is a flowchart illustrating a sentence processing method according to a third embodiment of the present disclosure.

As shown in fig. 3, the sentence processing method may include the steps of:

step 301, obtaining a target sentence, and encoding each character in the target sentence to obtain an initial character feature of each character.

Step 302, generating sentence characteristics of the target sentence according to the initial character characteristics of each character.

Step 303, respectively fusing the sentence characteristics with the initial character characteristics of each character to obtain target character characteristics of each character.

Step 304, according to the target character characteristics of each character, the target sentence is subjected to slot recognition to obtain predicted slot information.

The explanation of steps 301 to 304 may be referred to the relevant descriptions in any embodiment of the disclosure, and are not repeated here.

In any of the embodiments of the present disclosure, the target character features of each character may be input into a first prediction network (such as a CRF (Conditional Random Field, conditional random field) network or CRF layer) for slot identification to obtain predicted slot information. Wherein the first predictive network (e.g., CRF network) has learned the correspondence between features and slots. Therefore, the slot position information is identified based on the deep learning technology, and the accuracy of the identification result can be improved.

Step 305, for any character in the target sentence, determining a second weight of the target sentence on the any character.

In the embodiment of the disclosure, for any one character in the target sentence, a second weight of the target sentence on the character may be determined.

As one possible implementation manner, the second weight may be calculated in a manner that: for any one character in the target sentence, the sentence feature and the target character feature of the character can be fused to obtain a fusion feature of the character, which is denoted as a second fusion feature in the disclosure. For example, the second fusion feature of the ith character may be: i'. W _S·v_i, wherein W _S is a correlation matrix, which is a trainable parameter.

Thereafter, the second fused feature of the character may be input into an activation function to determine a second weight of the target sentence on the character based on the output of the activation function, e.g., a second weight of the target sentence on the ith characterThe method comprises the following steps:

And 306, fusing the sentence characteristics and the target character characteristics of any character according to the second weight to obtain first fused characteristics of any character.

In the embodiment of the disclosure, the sentence feature and the target character feature of any character may be fused according to the second weight of the target sentence on the any character, so as to obtain the first fused feature of the any character.

As an example, the first fusion feature of the ith character may be: wherein W _I is a correlation matrix, which is a trainable parameter.

Step 307, generating the intention feature of any character according to the first fusion feature of any character.

In the embodiment of the disclosure, the intention feature of any character may be generated according to the first fusion feature of any character.

As an example, the intent feature w _i of the ith character may be:

Step 308, according to the intention characteristics of each character, carrying out intention recognition on the target sentence to obtain predicted intention information.

In the embodiment of the disclosure, the intention recognition can be performed on the target sentence according to the intention characteristics of each character to obtain the predicted intention information.

As one possible implementation manner, the intention characteristics of each character may be averaged and pooled, for example, the intention characteristics of each character may be averaged by bits to obtain a target intention characteristic, so that the target intention characteristic may be input into a first prediction network (such as a fully connected network or a fully connected layer) to perform intention recognition, so as to obtain prediction intention information. Therefore, intention information is identified based on the deep learning technology, and accuracy of an identification result can be improved.

And 309, processing the target sentence according to the predicted slot information and the predicted intention information.

The explanation of step 309 may be referred to the relevant descriptions in any embodiment of the disclosure, and will not be repeated here.

According to the sentence processing method, the character features of the single characters are fused into the sentence features (sentence integral representation, global sentence representation or global sentence vector) of the target sentence, so that the representation of the sentence features on the intention category can be promoted, and the accuracy of the intention recognition is improved.

Fig. 4 is a flowchart of a training method of a category model according to a fourth embodiment of the present disclosure.

As shown in fig. 4, the training method of the category model may include the following steps:

Step 401, obtaining a sample sentence, wherein the labeling information of the sample sentence comprises labeling intention information and labeling slot position information.

In the disclosed embodiments, the sample statement may be a statement or question related to a task-type multi-round dialog. The sample sentence may be a sentence manually input by a user, or the sample sentence may be an online obtained sentence, such as a sentence collected online by a web crawler technology, or the sample sentence may be a sentence obtained from an existing training set, or the like.

Wherein, the number of the sample sentences can be at least one.

In the embodiment of the disclosure, the labeling information of the sample sentence can also be obtained, for example, the sample sentence can be labeled in a manual labeling manner to obtain the labeling information of the sample sentence, or the sample sentence can be labeled in a machine labeling manner to obtain the labeling information of the sample sentence. The annotation information comprises annotation intention information and annotation slot position information.

Step 402, determining an initial category model matched with the first category according to the first category to which the sample sentence belongs.

In the embodiment of the disclosure, the first category may be, for example, a category of a minimum hierarchy, for example, an intelligent customer service system applied to a network sales platform or a network shopping platform by using the method is exemplified, and the first category may be a category of three categories (such as meat products, bean products, puffed foods, etc.) of the network sales platform or the network shopping platform.

In the embodiment of the present disclosure, an initial category model that matches a first category (such as a third category) may be determined according to the category to which the sample sentence belongs.

And step 403, carrying out slot recognition and intention recognition on the sample sentence by adopting the initial category model so as to obtain predicted slot information and predicted intention information.

In the embodiment of the disclosure, the initial category model may be used to perform slot recognition and intention recognition on the sample sentence, respectively, so as to obtain predicted slot information and predicted intention information.

As an example, each character in the sample sentence may be encoded to obtain an initial character feature of each character, and sentence features of the sample sentence may be generated according to the initial character feature of each character, then the sentence features may be respectively fused with the initial character initial features of each character by using an initial category model to obtain target character features of each character, and finally, slot recognition and intention recognition may be respectively performed on the sample sentence by using the initial category model according to the target character features of each character to obtain predicted slot information and predicted intention information. The implementation principle is similar to steps 101 to 104, or steps 201 to 206, or steps 301 to 308, and will not be described here.

Step 404, training the initial category model according to the difference between the predicted intention information and the labeling intention information and the difference between the predicted slot position information and the labeling slot position information to obtain the target category model.

In the embodiment of the disclosure, the initial category model may be trained according to the difference between the predicted intent information and the labeling intent information and the difference between the predicted slot information and the labeling slot information to obtain the target category model.

According to the training method of the category model, sample sentences are obtained, wherein the labeling information of the sample sentences comprises labeling intention information and labeling slot position information; determining an initial category model matched with the first category according to the first category to which the sample sentence belongs; respectively carrying out slot recognition and intention recognition on the sample sentences by adopting an initial category model to obtain predicted slot information and predicted intention information; and training the initial category model according to the difference between the predicted intention information and the labeling intention information and the difference between the predicted slot position information and the labeling slot position information to obtain a target category model. Therefore, the combination of the loss of the intention recognition task and the loss of the slot recognition task can be realized, the initial category model is subjected to combined optimization, and the prediction effect of the model is improved.

To clearly illustrate how the initial category model is trained to obtain the target category model in any of the embodiments of the present disclosure, the present disclosure also proposes a training method for the category model.

Fig. 5 is a flowchart of a training method of a category model according to a fifth embodiment of the present disclosure.

As shown in fig. 5, the training method of the category model may include the following steps:

step 501, obtaining a sample sentence, wherein the labeling information of the sample sentence comprises labeling intention information and labeling slot position information.

Step 502, determining an initial category model matched with the first category according to the first category to which the sample sentence belongs.

Step 503, performing slot recognition and intention recognition on the sample sentence by adopting the initial category model to obtain predicted slot information and predicted intention information.

The explanation of steps 501 to 503 may be referred to the relevant description in any embodiment of the present disclosure, and will not be repeated here.

Step 504, generating a first loss value according to the difference between the predicted intent information and the labeling intent information.

Wherein, the difference refers to the degree of difference or degree of difference between the predicted intention information and the labeling intention information. The smaller the difference between the predicted intent information and the labeling intent information is in the case where the predicted intent information and the labeling intent information are closer, for example, the smallest the difference is in the case where the predicted intent information and the labeling intent information are the same or match; conversely, the less closely the predicted intent information and the labeling intent information are (e.g., the labeling intent information is "invoicing", the more the predicted intent information is "booking").

In the embodiment of the present disclosure, the calculated value of the first loss function may be determined according to the difference between the predicted intent information and the labeling intent information, which is denoted as the first loss value in the present disclosure. For example, the first loss function may include, but is not limited to, a cross entropy loss function, where the cross entropy loss function is used to characterize a distance between labeling information corresponding to the sample and prediction information output by the model.

The first loss value and the difference are in positive correlation, namely the smaller the difference is, the smaller the first loss value is, and conversely, the larger the difference is, the larger the first loss value is.

Step 505, generating a second loss value according to the difference between the predicted slot information and the labeled slot information.

The difference refers to the degree of difference or the degree of difference between the predicted slot information and the marked slot information. The more closely the predicted and marked slot information is, the smaller the difference between the predicted and marked slot information is, whereas the more closely the predicted and marked slot information is, the larger the difference between the predicted and marked slot information is.

In the embodiment of the present disclosure, the calculated value of the second loss function may be determined according to the difference between the predicted slot information and the labeled slot information, which is denoted as the second loss value in the present disclosure. The second loss function may include, but is not limited to, a negative log likelihood (neglog likehood) loss function.

The second loss value and the difference are in positive correlation, namely the smaller the difference is, the smaller the second loss value is, and conversely, the larger the difference is, the larger the second loss value is.

Step 506, generating a first target loss value according to the first loss value and the second loss value.

In the embodiment of the disclosure, the first target loss value may be generated according to the first loss value and the second loss value. The first target loss value and the first loss value are in positive correlation, and the first target loss value and the second loss value are also in positive correlation.

As one example, the first loss value and the second loss value may be weighted summed to obtain a first target loss value.

As another example, the first loss value and the second loss value may be added to obtain the first target loss value.

And step 507, adjusting model parameters in the initial category model according to the first target loss value to obtain a target category model.

In the embodiment of the disclosure, the model parameters in the initial category model may be adjusted according to the first target loss value to obtain the trained target category model.

As one possible implementation, the model parameters in the initial category model may be adjusted according to a first target loss value to minimize the first target loss value.

It should be noted that, the foregoing example is only implemented by taking the termination condition of model training as the first target loss value minimization, and other termination conditions may be set in practical application, for example, the training frequency reaches the set frequency, the training duration reaches the set duration, the first target loss value converges, and the disclosure is not limited to this.

According to the training method for the category model, the initial category model is subjected to joint optimization by respectively calculating the loss value of the intention recognition task and the loss value of the slot recognition task and combining the loss value of the intention recognition task and the loss value of the slot recognition task, so that the prediction effect of the model can be improved.

Fig. 6 is a flowchart of a training method of a category model according to a sixth embodiment of the present disclosure.

As shown in fig. 6, the training method of the category model may include the following steps:

in step 601, a plurality of sample sentences are obtained, wherein the labeling information of each sample sentence comprises labeling intention information and labeling slot position information.

In an embodiment of the present disclosure, a plurality of sample sentences may be obtained, wherein each sample sentence belongs to one category, at least one sample sentence in the plurality of sample sentences belongs to the same category, and the plurality of sample sentences relate to a plurality of categories.

Step 602, for any sample sentence in a plurality of sample sentences, determining an initial category model matched with the first category of the any sample sentence according to the first category to which the any sample sentence belongs.

The explanation of steps 601 to 602 may be referred to the relevant description in any embodiment of the present disclosure, and will not be repeated here.

Step 603, adopting an initial category model matched with the first category to which the arbitrary sample sentence belongs, and respectively carrying out slot recognition and intention recognition on the arbitrary sample sentence according to the initial character characteristics of each character in the arbitrary sample sentence and the sentence characteristics of the arbitrary sample sentence so as to obtain prediction slot information and prediction intention information corresponding to the arbitrary sample sentence.

In the embodiment of the disclosure, for any one of a plurality of sample sentences, an initial category model matched with a first category to which the any one of the sample sentences belongs may be adopted, and according to initial character features of each character in the any one of the sample sentences and sentence features of the any one of the sample sentences, slot recognition and intention recognition are respectively performed on the any one of the sample sentences, so as to obtain prediction slot information and prediction intention information corresponding to the any one of the sample sentences. The implementation principle is similar to that of step 403, and will not be described here again.

Step 604, generating a target sub-loss value corresponding to the any sample sentence according to the difference between the predicted slot information and the labeling slot information corresponding to the any sample sentence and the difference between the predicted intention information and the labeling intention information corresponding to the any sample sentence.

In the embodiment of the disclosure, for any one of a plurality of sample sentences, the target sub-loss value corresponding to the any one sample sentence may be generated according to a difference between prediction slot information and labeling slot information corresponding to the any one sample sentence and a difference between prediction intention information and labeling intention information corresponding to the any one sample sentence.

As an example, the first sub-loss value corresponding to the arbitrary sample sentence may be generated according to the difference between the predicted slot information and the labeled slot information corresponding to the arbitrary sample sentence, and the implementation principle is similar to that of step 505, which is not described herein. In addition, the second sub-loss value corresponding to the arbitrary sample sentence may be generated according to the difference between the prediction intention information and the labeling intention information corresponding to the arbitrary sample sentence, and the implementation principle is similar to that of step 504, which is not described herein. Therefore, the target sub-loss value corresponding to the any sample statement can be generated according to the first sub-loss value and the second sub-loss value, and the implementation principle is similar to that of step 506, and will not be described herein.

Step 605, generating a second target loss value according to the target sub-loss value of each sample statement.

In the embodiment of the disclosure, the second target loss value may be generated according to the target sub-loss value of each sample statement. Wherein the second target loss value is in positive correlation with each target sub-loss value.

As one example, the target sub-loss values may be weighted summed to obtain a second target loss value.

As another example, the target sub-loss values may be added to obtain a second target loss value.

And step 606, performing joint training on the initial category models matched with the first categories to which the sample sentences belong according to the second target loss value to obtain target category models under the first categories.

In the embodiment of the disclosure, the initial category models under each first category may be jointly trained according to the second target loss value, so as to obtain the trained target category models under each first category.

As one possible implementation, the initial category models may be jointly trained based on a second target loss value to minimize the second target loss value.

It should be noted that, the foregoing example is only implemented by taking the termination condition of model training as the second target loss value minimization, and other termination conditions may be set in practical application, for example, the training frequency reaches the set frequency, the training duration reaches the set duration, the second target loss value converges, and the disclosure is not limited to this.

According to the training method for the category models, joint training of the category models can be achieved according to sample sentences under the category models, and therefore training effects of the category models are improved.

In order to clearly illustrate how sample sentences are obtained in any embodiment of the present disclosure, the present disclosure also proposes a training method for a class model.

Fig. 7 is a flowchart of a training method of a category model according to a seventh embodiment of the present disclosure.

As shown in fig. 7, the training method of the category model may include the following steps:

step 701, a history dialogue log is obtained, and a plurality of candidate sentences are obtained from the history dialogue log.

In embodiments of the present disclosure, a historical dialog log may be obtained, such as a historical dialog log of an intelligent customer service system.

In an embodiment of the present disclosure, a plurality of candidate sentences entered by a user are obtained from a historical dialog log. For example, reply sentences of customer service answers may be filtered from the history conversation log, and candidate sentences are determined from the remaining sentences.

Step 702, a sample sentence is determined from a plurality of candidate sentences.

In embodiments of the present disclosure, sample sentences may be determined from a plurality of candidate sentences, for example, sample sentences related to a multitasking dialog may be determined from a plurality of candidate sentences.

As one possible implementation, multiple candidate sentences may be filtered based on set rules (e.g., regular expressions, JSGF (JSpeech Grammer Format) grammars, etc.) to preserve sample sentences that match the set rules.

As another possible implementation manner, the multiple candidate sentences may be classified respectively to obtain classes of the multiple candidate sentences, and the candidate sentences with the classes matching the set classes are taken as sample sentences.

As an example, the candidate sentences may be classified based on a text bi-classification model to obtain categories of the candidate sentences, wherein the text bi-classification model may output two categories, a first category and a second category, respectively, wherein the first category is used for indicating that an input sentence of the model is related to a multi-tasking conversation and the second category is used for indicating that an input sentence of the model is not related to the multi-tasking conversation.

The set category may be the first category described above, and when the category of the candidate sentence is the first category, the candidate sentence may be regarded as a sample sentence, and when the category of the candidate sentence is the second category, the candidate sentence may be filtered.

As yet another possible implementation, candidate sentences may be simultaneously screened based on the set rules and classified to preserve candidate sentences whose categories match the set categories and match the set rules.

Step 703, obtaining labeling intention information corresponding to the sample sentence.

In the embodiment of the disclosure, the intention labeling can be performed on the sample sentence in a manual labeling mode to obtain labeling intention information of the sample sentence, or the intention labeling can be performed on the sample sentence in a machine labeling mode to obtain labeling intention information of the sample sentence.

Step 704, obtaining labeling slot information corresponding to the sample sentence.

In the embodiment of the disclosure, the sample sentence can be labeled in a slot position by a manual labeling mode to obtain labeled slot position information of the sample sentence, or the sample sentence can be labeled in a slot position by a machine labeling mode to obtain labeled slot position information of the sample sentence.

Step 705, determining an initial category model matched with the first category according to the first category to which the sample sentence belongs.

Step 706, performing slot recognition and intention recognition on the sample sentence by using the initial category model to obtain predicted slot information and predicted intention information.

Step 707, training the initial category model according to the difference between the predicted intent information and the labeling intent information and the difference between the predicted slot information and the labeling slot information to obtain the target category model.

The explanation of steps 705 to 707 may be referred to the relevant description in any embodiment of the present disclosure, and will not be repeated here.

According to the training method for the category model, through filtering the sentences in the history dialogue log, only sample sentences meeting business requirements are reserved, so that the model is trained according to the reserved sample sentences, and the training effect and the prediction effect of the model can be improved.

The above embodiments correspond to the training method of the category model, and the disclosure further provides an application method of the category model, namely a sentence processing method.

Fig. 8 is a flowchart of a sentence processing method according to an embodiment of the present disclosure.

As shown in fig. 8, the sentence processing method may include the steps of:

Step 801, a target sentence is acquired.

The explanation of step 801 may be referred to the relevant descriptions in any embodiment of the disclosure, and will not be repeated here.

Step 802, determining a target category model matched with the second category from a plurality of target category models according to the second category to which the target sentence belongs.

Wherein the target category model is trained by the method as set forth in any one of the embodiments of fig. 4-7.

In the embodiment of the disclosure, according to the second category to which the target sentence belongs, a target category model matched with the second category can be determined from a plurality of trained target category models.

Step 803, performing slot recognition and intention recognition on the target sentence by using the target category model matched with the second category to obtain slot recognition information and intention recognition information.

In the embodiment of the disclosure, the target category model matched with the second category may be used to perform slot recognition and intention recognition on the target sentence, so as to obtain slot recognition information and intention recognition information. The implementation principle is similar to that of step 403, and will not be described here again.

Step 804, processing the target sentence according to the slot identification information and the intention identification information.

In the embodiment of the disclosure, the target sentence can be processed according to the slot identification information and the intention identification information.

For example, taking the target sentence as "help me make an electronic invoice from monday to friday", the identified intention identification information may be "make an invoice", and the slot identification information may be "invoice time period=monday to friday, invoice category=electronic category". The electronic ticket issuing from monday to friday can be issued for the user according to the slot position identification information and the intention identification information.

According to the statement processing method, according to the second category to which the target statement belongs, a target category model matched with the second category is determined from a plurality of target category models; carrying out slot recognition and intention recognition on the target sentence by adopting a target category model matched with the second category so as to obtain slot recognition information and intention recognition information; and processing the target sentence according to the slot identification information and the intention identification information. Therefore, based on the category model matched with the category (or service scene) to which the target sentence belongs, the target sentence is subjected to slot recognition and intention recognition, and the accuracy of the recognition result can be improved, so that the target sentence can be accurately processed according to accurate user intention and slot information, and the actual service requirement of a user can be met.

In any one embodiment of the disclosure, the intention information and the slot information can be linked, so that the intention information can assist in improving the accuracy of the slot identification, the slot information can assist in improving the accuracy of the intention identification, and negative influence of the diversity expression of the slot on the sentence intention identification information is avoided. And the global service data is utilized in a single task dialogue scene, namely, the category models under a plurality of categories are jointly trained according to sample sentences under the plurality of categories, and the accuracy of intention and slot recognition is prompted.

As one example, the process flow of intent and slot joint identification may include the following two parts:

first, the offline section is mainly responsible for the flow from data processing to model yield.

The process flow of the offline part may be as shown in fig. 9, and mainly includes the following steps:

1. dialog log filtering.

The training data of the model can be customer service log data or dialogue log data of a network purchase and sale platform or a network shopping platform, and a great amount of information irrelevant to task type dialogue is doped in the log, so the log data can be filtered to keep the log relevant to the task type multi-round dialogue of the user and serve as sample sentences in the following two modes:

In the first way, a set of rule systems is arranged by analyzing the service, and each statement in the log is filtered, wherein the rule systems comprise regular expressions, JSGF grammar and the like.

In the second mode, a simple text two-classification model is trained to classify sentences in the log. This approach has a relatively high recall rate for sentences associated with the task-oriented multi-round dialog relative to the rule system, but requires additional annotated training data, and requires separate training of the text two-classification model based on the annotated training data.

2. And (5) marking data. And carrying out intention labeling and slot labeling on the sample sentences.

The customer service problems of the online shopping platform or the online shopping platform are classified into multiple levels of problems or scenes, the labels can be distinguished according to the granularity of three categories of the customer service, namely, for each sample sentence, the three categories of the sample sentence can be determined, the intention of the sample sentence under the three categories is labeled, and the sample sentence is labeled according to the specific slot positions related under the intention.

For example, assuming that the sample sentence is "help me make an electronic invoice for monday to friday," it may be noted that the intent is "invoice," and the slot is "invoice time period = monday to friday, invoice category = electronic category.

3. Scene data.

The basic BERT model is pre-trained and/or fine-tuned using sample sentences associated with task-style multi-round conversations (user-to-human conversational sentences), i.e., model weights are adjusted to accommodate customer service scenarios.

4. Multitasking training.

The overall architecture of the model can be as shown in fig. 10, a hierarchical design is adopted, the pre-trained BERT model produced by the step 3 is used as a service model, information under customer service scenes is learned, and the BERT model is more suitable for the tasks of intention classification and slot extraction compared with the BERT model in the step 3.

A special category model for category knowledge is trained for each customer service problem three-level category (hereinafter referred to as category) separately, and the business model is designed as a large model because of the large number of sample sentences under each category and the small number of sample sentences under each category, and the category model can be a small model, and the input of all category models is the output of the business model. And the sample sentences under all the categories are utilized to carry out joint training (end-to-end training) on a plurality of small category models, and the final loss of the models is the sum of the losses of all the category models.

In the training process, there are a plurality of category models, and for convenience of description, the following description will be given by taking the number of category models as 1, and the structure of the category models can be shown in fig. 11. When the number of category models is plural, the input of each category model is the output of the business model (BERT model).

The input of the business model is as follows: after the sample sentence is segmented, the obtained character (token) sequences, namely, the token 1, token 2, … and tokN in fig. 11 refer to the respective characters token in the sample sentence, and N is the number of characters contained in the sample sentence. The [ CLS ] mark is placed at the first place of the sentence, and the characterization vector C obtained through BERT can be used for the subsequent classification task.

Specifically, in the training stage, a character sequence corresponding to the sample sentence may be input into the service model to encode, so as to obtain initial character features T _i of each character, and sentence feature I' of the whole sample sentence is defined as a bit average value of all T _i.

The role of the category model is:

1) The initial character features of the ith character are weighted and averaged to obtain u _i of the ith character:

Wherein, e_i,k＝σ(w_k·T_i)。

2) The u _i of the ith character is combined with the sentence feature to obtain v _i of the ith character. The objective is to combine the overall representation of the sentence with the vector representation of the individual characters so that the vector representation of the individual characters can better use the global information of the sentence than the Attention layer Attention in the transducer. The specific operation is as follows:

v_i＝tanh(W₁u_i+W₂·I')； (5)

Wherein W ₁ is the correlation matrix of u _i, which is a trainable parameter in the category model, and W ₂ is the correlation matrix of I', which is also a trainable parameter in the category model. V _i can be considered to be a single character representation after the information is represented in whole in connection with a sentence.

3) For the slot recognition task, each v _i vector is input to a CRF layer (not shown in fig. 11) in the category model to perform slot recognition to obtain predicted slot information, and a calculated value of a negative log likelihood (neg_log_ likehood) loss function is calculated according to the difference between the predicted slot information and the labeled slot information and is used as a loss value of the slot recognition task.

4) For the intention recognition task, slot information can be introduced to avoid interference of the slot information to the intention recognition while utilizing the information related to the intention of the slot. Thus, the correlation coefficient of sentence feature to v _i of the ith character (noted as the second weight in this disclosure) can be calculated:

wherein W _S is a correlation matrix, which is a trainable parameter of the category model.

Calculating the intention feature w _i of the ith character:

The average pooling averagepooling operation is performed on all w _i, resulting in the final intent representation of the sentence (noted as the target intent feature in this disclosure). Then, the target intention feature may be input to a full connection layer (not shown in fig. 11) in the category model for intention recognition to obtain predicted intention information, and a calculated value of the cross entropy loss function is calculated as a loss value of the intention recognition task according to a difference between the predicted intention information and the labeling intention information.

Finally, the sum of the loss value of the intention recognition task and the loss value of the slot recognition task can be optimized as the loss value of the final model.

After model training is completed, a business model and a plurality of category models can be derived and used for carrying out intention prediction and slot prediction on target sentences of a user in an online process.

And secondly, an online part is responsible for loading an offline trained model and analyzing an online service request.

The processing flow of the online portion may be as shown in fig. 12. Specifically, after a user inputs a target sentence query, inputting the target sentence into a business model to obtain initial character features and sentence features of each character in the target sentence, and then carrying out intention recognition and slot recognition on the target sentence based on the initial character features and sentence features of each character through a target category model matched with the category to which the target sentence belongs to so as to obtain predicted slot information and predicted intention information.

In summary, the technical scheme provided by the present disclosure has at least the following advantages:

The method comprises the steps of fusing an intention recognition task and a slot recognition task (or a slot extraction task), fusing a global sentence representation vector (namely sentence characteristics) to a single character dimension, promoting perception of global sentence information by slot characters, fusing single character information into the global sentence representation vector, and promoting representation of intention categories by the global sentence representation vector.

And carrying out multitask training on a plurality of category models and service models by adopting sample sentences under a plurality of category scenes, and learning service shared knowledge by the service models and learning intra-category knowledge by the category models, so that the model performance can be effectively improved.

Corresponding to the sentence processing method provided in the embodiments of fig. 1 to 3, the present disclosure also provides a sentence processing apparatus, and since the sentence processing apparatus provided in the embodiments of the present disclosure corresponds to the sentence processing method provided in the embodiments of fig. 1 to 4, the implementation of the sentence processing method is also applicable to the sentence processing apparatus provided in the embodiments of the present disclosure, and will not be described in detail in the embodiments of the present disclosure.

Fig. 13 is a schematic structural diagram of a sentence processing device according to a ninth embodiment of the disclosure.

As shown in fig. 13, the sentence processing apparatus 1300 may include: acquisition module 1301, encoding module 1302, generation module 1303, fusion module 1304, identification module 1305, and processing module 1306.

The acquiring module 1301 is configured to acquire a target sentence.

The encoding module 1302 is configured to encode each character in the target sentence to obtain an initial character feature of each character.

The generating module 1303 is configured to generate sentence characteristics of the target sentence according to initial character characteristics of each character.

The fusion module 1304 is configured to fuse the sentence feature with the initial character feature of each character, so as to obtain a target character feature of each character.

And the recognition module 1305 is used for respectively carrying out slot recognition and intention recognition on the target sentence according to the target character characteristics of each character so as to obtain predicted slot information and predicted intention information.

The processing module 1306 is configured to process the target sentence according to the predicted slot information and the predicted intent information.

In one possible implementation of the embodiments of the present disclosure, the fusion module 1304 is specifically configured to: determining a first weight of any character to each character according to any character in the target sentence; according to the first weight of each character of any character, carrying out weighted summation on the initial character characteristics of any character to obtain the intermediate character characteristics of any character; and fusing the intermediate character features and sentence features of any character to obtain target character features of any character.

In one possible implementation of the embodiments of the present disclosure, the fusion module 1304 is specifically configured to: determining the importance degree of each character in the target sentence; aiming at the ith character in the target sentence, determining a first weight of the ith character to the jth character according to the initial character characteristic of the ith character and the importance degree of the jth character in the target sentence; wherein i and j are positive integers less than or equal to T, and T is the number of characters contained in the target sentence.

In one possible implementation of the embodiments of the present disclosure, the fusion module 1304 is specifically configured to: determining the initial weight of the ith character to the jth character according to the initial character characteristics of the ith character and the importance degree of the jth character; determining a first coefficient according to the initial weight of the ith character to each character; and determining the first weight of the ith character to the jth character according to the initial weight of the ith character to the jth character and the first coefficient.

In one possible implementation of the embodiment of the disclosure, the identification module 1305 is specifically configured to: inputting target character characteristics of each character into a conditional random field CRF network; determining predicted slot information according to the output of the CRF network; wherein, the CRF network has learned the correspondence between the features and the slots.

In one possible implementation of the embodiment of the disclosure, the identification module 1305 is specifically configured to: determining a second weight of the target sentence to any character aiming at any character in the target sentence; according to the second weight, fusing the sentence characteristics and the target character characteristics of any character to obtain first fusion characteristics of any character; generating the intention characteristic of any character according to the first fusion characteristic of any character; and carrying out intention recognition on the target sentence according to the intention characteristics of each character to obtain predicted intention information.

In one possible implementation of the embodiment of the disclosure, the identification module 1305 is specifically configured to: the intention characteristic of each character is subjected to average pooling to obtain a target intention characteristic; inputting the target intention characteristic into a fully connected network; and determining prediction intention information according to the output of the fully connected network.

In one possible implementation of the embodiment of the disclosure, the identification module 1305 is specifically configured to: fusing sentence characteristics and target character characteristics of any character aiming at any character in the target sentence to obtain second fusion characteristics of any character; and inputting the second fusion characteristic of any character into the activation function, so as to determine the second weight of the target sentence on any character according to the output of the activation function.

The sentence processing device of the embodiment of the disclosure obtains initial character characteristics of each character by encoding each character in a target sentence, and generates sentence characteristics of the target sentence according to the initial character characteristics of each character; respectively fusing sentence characteristics with initial character characteristics of each character to obtain target character characteristics of each character; respectively carrying out slot recognition and intention recognition on the target sentence according to the target character characteristics of each character to obtain predicted slot information and predicted intention information; and processing the target sentence according to the predicted slot information and the predicted intention information. Therefore, global sentence characteristics (namely sentence vector representation) are fused into a single character dimension, so that the perception of characters to global sentence information can be promoted, the accuracy of intention and slot identification is improved, and the accurate processing of target sentences according to accurate user intention and slot information can be realized, so that the actual business requirements of users are met.

Corresponding to the training method of the category model provided by the embodiments of fig. 4 to 7, the present disclosure further provides a training device of the category model, and since the training device of the category model provided by the embodiments of the present disclosure corresponds to the training method of the category model provided by the embodiments of fig. 4 to 7, the implementation of the training method of the category model is also applicable to the training device of the category model provided by the embodiments of the present disclosure, which is not described in detail in the embodiments of the present disclosure.

Fig. 14 is a schematic structural view of a training device of a category model provided in the tenth embodiment of the present disclosure.

As shown in fig. 14, the training apparatus 1400 of the category model may include: acquisition module 1401, determination module 1402, processing module 1403, identification module 1404, and training module 1405.

The obtaining module 1401 is configured to obtain a sample sentence, where labeling information of the sample sentence includes labeling intention information and labeling slot information.

A determining module 1402, configured to determine an initial category model that matches the first category according to the first category to which the sample sentence belongs.

A processing module 1403 is configured to encode each character in the sample sentence to obtain an initial character feature of each character, and generate a sentence feature of the sample sentence according to the initial character feature of each character.

The identifying module 1404 is configured to perform slot recognition and intention recognition on the sample sentence by using the initial category model, so as to obtain predicted slot information and predicted intention information.

The training module 1405 is configured to train the initial category model to obtain the target category model according to the difference between the predicted intent information and the labeling intent information and the difference between the predicted slot information and the labeling slot information.

In one possible implementation of the embodiments of the present disclosure, the identifying module 1404 is specifically configured to: fusing sentence characteristics with initial character initial characteristics of each character by adopting an initial category model so as to obtain target character characteristics of each character; and respectively carrying out slot recognition and intention recognition on the sample sentences by adopting an initial category model according to the target character characteristics of each character so as to obtain predicted slot information and predicted intention information.

In one possible implementation of the embodiments of the present disclosure, the training module 1405 is specifically configured to: generating a first loss value according to the difference between the predicted intention information and the labeling intention information; generating a second loss value according to the difference between the predicted slot position information and the marked slot position information; generating a first target loss value according to the first loss value and the second loss value; and adjusting model parameters in the initial category model according to the first target loss value to obtain a target category model.

In one possible implementation of the embodiments of the present disclosure, the sample sentence is a plurality of; the identification module 1404 is specifically configured to: for any sample sentence in the plurality of sample sentences, an initial category model matched with a first category to which the any sample sentence belongs is adopted, and according to initial character features of each character in the any sample sentence and sentence features of the any sample sentence, slot recognition and intention recognition are respectively carried out on the any sample sentence, so that prediction slot information and prediction intention information corresponding to the any sample sentence are obtained.

In one possible implementation of the embodiments of the present disclosure, the training module 1405 is specifically configured to: generating a target sub-loss value corresponding to any sample statement according to the difference between the prediction slot information and the labeling slot information corresponding to any sample statement and the difference between the prediction intention information and the labeling intention information corresponding to any sample statement; generating a second target loss value according to the target sub-loss value of each sample statement; and carrying out joint training on the initial category models matched with the first categories to which each sample sentence belongs according to the second target loss value so as to obtain target category models under each first category.

In one possible implementation of the embodiments of the present disclosure, the obtaining module 1401 is specifically configured to: acquiring a history dialogue log, and acquiring a plurality of candidate sentences from the history dialogue log; determining a sample sentence from a plurality of candidate sentences; acquiring labeling intention information corresponding to a sample sentence; and obtaining the labeling slot information corresponding to the sample sentence.

In one possible implementation of the embodiments of the present disclosure, the obtaining module 1401 is specifically configured to: screening the candidate sentences based on the set rules to reserve sample sentences matched with the set rules; and/or classifying the plurality of candidate sentences respectively to obtain categories of the plurality of candidate sentences; and taking the candidate sentences with the category matched with the set category as sample sentences.

According to the training device of the category model, sample sentences are obtained, wherein the labeling information of the sample sentences comprises labeling intention information and labeling slot position information; determining an initial category model matched with the first category according to the first category to which the sample sentence belongs; respectively carrying out slot recognition and intention recognition on the sample sentences by adopting an initial category model to obtain predicted slot information and predicted intention information; and training the initial category model according to the difference between the predicted intention information and the labeling intention information and the difference between the predicted slot position information and the labeling slot position information to obtain a target category model. Therefore, the combination of the loss of the intention recognition task and the loss of the slot recognition task can be realized, the initial category model is subjected to combined optimization, and the prediction effect of the model is improved.

Corresponding to the sentence processing method provided in the embodiment of fig. 8, the present disclosure further provides a sentence processing apparatus, and since the sentence processing apparatus provided in the embodiment of the present disclosure corresponds to the sentence processing method provided in the embodiment of fig. 8, the implementation of the sentence processing method is also applicable to the sentence processing apparatus provided in the embodiment of the present disclosure, and will not be described in detail in the embodiment of the present disclosure.

Fig. 15 is a schematic structural diagram of a sentence processing device according to an eleventh embodiment of the present disclosure.

As shown in fig. 15, the sentence processing apparatus 1500 may include: an acquisition module 1501, a determination module 1502, an identification module 1503 and a processing module 1504.

Wherein, the obtaining module 1501 is configured to obtain a target sentence.

A determining module 1502, configured to determine, according to a second category to which the target sentence belongs, a target category model that matches the second category from the plurality of target category models; wherein the target category model is trained using the apparatus shown in fig. 14.

The recognition module 1503 is configured to perform slot recognition and intention recognition on the target sentence by using the target category model matched with the second category, so as to obtain slot recognition information and intention recognition information.

The processing module 1504 is configured to process the target sentence according to the slot identification information and the intention identification information.

According to the statement processing device, a target category model matched with a second category is determined from a plurality of target category models according to the second category to which the target statement belongs; carrying out slot recognition and intention recognition on the target sentence by adopting a target category model matched with the second category so as to obtain slot recognition information and intention recognition information; and processing the target sentence according to the slot identification information and the intention identification information. Therefore, based on the category model matched with the category (or service scene) to which the target sentence belongs, the target sentence is subjected to slot recognition and intention recognition, and the accuracy of the recognition result can be improved, so that the target sentence can be accurately processed according to accurate user intention and slot information, and the actual service requirement of a user can be met.

In order to achieve the above embodiments, the present disclosure further proposes an electronic device including: the system comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the sentence processing method or the training method of the category model according to any one of the previous embodiments of the disclosure when the processor executes the program.

To achieve the above embodiments, the present disclosure further proposes a non-transitory computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a sentence processing method or a training method of a category model as proposed in any of the foregoing embodiments of the present disclosure.

To achieve the above embodiments, the present disclosure also proposes a computer program product which, when executed by a processor, performs a sentence processing method or a training method of a category model as proposed in any of the previous embodiments of the present disclosure.

Fig. 16 illustrates a block diagram of an exemplary electronic device suitable for use in implementing embodiments of the present disclosure. The electronic device 12 shown in fig. 16 is merely an example and should not be construed to limit the functionality and scope of use of embodiments of the present disclosure in any way.

As shown in fig. 16, the electronic device 12 is in the form of a general purpose computing device. Components of the electronic device 12 may include, but are not limited to: one or more processors or processing units 16, a system memory 28, a bus 18 that connects the various system components, including the system memory 28 and the processing units 16.

Bus 18 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, a processor, and a local bus using any of a variety of bus architectures. By way of example, and not limitation, such architectures include industry Standard architecture (Industry Standard Architecture; hereinafter ISA) bus, micro channel architecture (Micro Channel Architecture; hereinafter MAC) bus, enhanced ISA bus, video electronics standards Association (Video Electronics Standards Association; hereinafter VESA) local bus, and peripheral component interconnect (PERIPHERAL COMPONENT INTERCONNECTION; hereinafter PCI) bus.

Electronic device 12 typically includes a variety of computer system readable media. Such media can be any available media that is accessible by electronic device 12 and includes both volatile and nonvolatile media, removable and non-removable media.

Memory 28 may include computer system readable media in the form of volatile memory, such as random access memory (Random Access Memory; hereinafter: RAM) 30 and/or cache memory 32. The electronic device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from or write to non-removable, nonvolatile magnetic media (not shown in FIG. 16, commonly referred to as a "hard disk drive"). Although not shown in fig. 16, a disk drive for reading from and writing to a removable nonvolatile magnetic disk (e.g., a "floppy disk"), and an optical disk drive for reading from or writing to a removable nonvolatile optical disk (e.g., a compact disk read only memory (Compact Disc Read Only Memory; hereinafter, "CD-ROM"), digital versatile read only optical disk (Digital Video Disc Read Only Memory; hereinafter, "DVD-ROM"), or other optical media) may be provided. In such cases, each drive may be coupled to bus 18 through one or more data medium interfaces. Memory 28 may include at least one program product having a set (e.g., at least one) of program modules configured to carry out the functions of the various embodiments of the disclosure.

A program/utility 40 having a set (at least one) of program modules 42 may be stored in, for example, memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment. Program modules 42 generally perform the functions and/or methods in the embodiments described in this disclosure.

The electronic device 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), one or more devices that enable a user to interact with the electronic device 12, and/or any devices (e.g., network card, modem, etc.) that enable the electronic device 12 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 22. Also, the electronic device 12 may communicate with one or more networks, such as a local area network (Local Area Network; hereinafter: LAN), a wide area network (Wide Area Network; hereinafter: WAN), and/or a public network, such as the Internet, through the network adapter 20. As shown in fig. 16, the network adapter 20 communicates with other modules of the electronic device 12 over the bus 18. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with electronic device 12, including, but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.

The processing unit 16 executes various functional applications and data processing by running programs stored in the system memory 28, for example, implementing the methods mentioned in the foregoing embodiments.

In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present disclosure. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.

Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In the description of the present disclosure, the meaning of "a plurality" is at least two, such as two, three, etc., unless explicitly specified otherwise.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and additional implementations are included within the scope of the preferred embodiment of the present disclosure in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the embodiments of the present disclosure.

Logic and/or steps represented in the flowcharts or otherwise described herein, e.g., a ordered listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). In addition, the computer readable medium may even be paper or other suitable medium on which the program is printed, as the program may be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory.

It should be understood that portions of the present disclosure may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. As with the other embodiments, if implemented in hardware, may be implemented using any one or combination of the following techniques, as is well known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.

Those of ordinary skill in the art will appreciate that all or a portion of the steps carried out in the method of the above-described embodiments may be implemented by a program to instruct related hardware, where the program may be stored in a computer readable storage medium, and where the program, when executed, includes one or a combination of the steps of the method embodiments.

Furthermore, each functional unit in the embodiments of the present disclosure may be integrated in one processing module, or each unit may exist alone physically, or two or more units may be integrated in one module. The integrated modules may be implemented in hardware or in software functional modules. The integrated modules may also be stored in a computer readable storage medium if implemented in the form of software functional modules and sold or used as a stand-alone product.

The above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, or the like. Although embodiments of the present disclosure have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the present disclosure, and that variations, modifications, alternatives, and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the present disclosure.

Claims

1. A sentence processing method, the method comprising:

2. The method of claim 1, wherein the fusing the sentence feature with the initial character feature of each of the characters to obtain the target character feature of each of the characters, respectively, comprises:

Determining a first weight of any character to each character aiming at any character in the target sentence;

according to the first weight of each character of the arbitrary character, carrying out weighted summation on the initial character characteristics of the arbitrary character to obtain the intermediate character characteristics of the arbitrary character;

And fusing the intermediate character features of any character with the sentence features to obtain target character features of any character.

3. The method of claim 2, wherein the determining, for any character in the target sentence, a first weight of the any character to each of the characters comprises:

Determining the importance degree of each character in the target sentence;

for an ith character in the target sentence, determining a first weight of the ith character to the jth character according to initial character characteristics of the ith character and importance degrees of the jth character in the target sentence;

wherein i and j are positive integers less than or equal to T, and T is the number of characters contained in the target sentence.

4. The method of claim 3, wherein the determining the first weight of the ith character to the jth character according to the initial character feature of the ith character and the importance level of the jth character in the target sentence comprises:

determining the initial weight of the ith character to the jth character according to the initial character characteristics of the ith character and the importance degree of the jth character;

Determining a first coefficient according to the initial weight of the ith character to each character;

and determining the first weight of the ith character to the jth character according to the initial weight of the ith character to the jth character and the first coefficient.

5. The method according to any one of claims 1-4, wherein the performing slot recognition on the target sentence according to the target character feature of each character includes:

inputting target character characteristics of the characters into a conditional random field CRF network;

Determining the predicted slot information according to the output of the CRF network;

wherein, the CRF network has learned the correspondence between the features and the slots.

6. The method according to any one of claims 1-4, wherein the performing intent recognition on the target sentence according to the target character feature of each character to obtain predicted intent information includes:

determining a second weight of the target sentence on any character in the target sentence;

According to the second weight, fusing the sentence characteristics and the target character characteristics of any character to obtain first fusion characteristics of any character;

generating the intention characteristic of any character according to the first fusion characteristic of any character;

And carrying out intention recognition on the target sentence according to the intention characteristics of each character so as to obtain the predicted intention information.

7. The method of claim 6, wherein the performing intent recognition on the target sentence according to the intent characteristics of each character to obtain predicted intent information comprises:

the intention characteristic of each character is subjected to average pooling to obtain a target intention characteristic;

inputting the target intention characteristic into a fully connected network;

And determining the prediction intention information according to the output of the fully-connected network.

8. The method of claim 6, wherein the determining, for any character in the target sentence, a second weight of the target sentence on the any character comprises:

fusing the sentence characteristics and target character characteristics of any character aiming at any character in the target sentence to obtain second fused characteristics of any character;

and inputting the second fusion characteristic of any character into an activation function, so as to determine the second weight of the target sentence on any character according to the output of the activation function.

9. A method for training a class model, the method comprising:

Respectively carrying out slot recognition and intention recognition on the sample sentences by adopting the initial category model according to the initial character characteristics and the sentence characteristics of each character so as to obtain predicted slot information and predicted intention information;

10. The method of claim 9, wherein the employing the initial category model to perform slot recognition and intent recognition on the sample sentence according to the initial character feature and the sentence feature of each character, respectively, to obtain predicted slot information and predicted intent information comprises:

fusing the sentence characteristics with initial character initial characteristics of each character by adopting the initial category model so as to obtain target character characteristics of each character;

And respectively carrying out slot recognition and intention recognition on the sample sentence by adopting the initial category model according to the target character characteristics of each character so as to obtain the predicted slot information and the predicted intention information.

11. The method of claim 9, wherein training the initial category model to obtain a target category model based on the difference between the predicted intent information and the labeling intent information, and the difference between the predicted slot information and the labeling slot information, comprises:

generating a first loss value according to the difference between the predicted intention information and the labeling intention information;

Generating a second loss value according to the difference between the predicted slot position information and the marked slot position information;

generating a first target loss value according to the first loss value and the second loss value;

And according to the first target loss value, adjusting model parameters in the initial category model to obtain the target category model.

12. The method of claim 9, wherein the sample sentence is a plurality of;

the step of obtaining predicted slot information and predicted intention information by using the initial category model according to the initial character characteristics and sentence characteristics of each character, respectively for the sample sentences, including:

For any one of the plurality of sample sentences, an initial category model matched with a first category to which the any one of the plurality of sample sentences belongs is adopted, and according to initial character features of each character in the any one of the sample sentences and sentence features of the any one of the sample sentences, slot recognition and intention recognition are respectively carried out on the any one of the sample sentences, so that prediction slot information and prediction intention information corresponding to the any one of the sample sentences are obtained.

13. The method of claim 12, wherein training the initial category model to obtain a target category model based on the difference between the predicted intent information and the labeling intent information, and the difference between the predicted slot information and the labeling slot information, comprises:

Generating a target sub-loss value corresponding to any sample statement according to the difference between the predicted slot information and the labeling slot information corresponding to the any sample statement and the difference between the predicted intention information and the labeling intention information corresponding to the any sample statement;

generating a second target loss value according to the target sub-loss value of each sample statement;

and carrying out joint training on the initial category model matched with the first category to which each sample sentence belongs according to the second target loss value so as to obtain a target category model under each first category.

14. The method of any of claims 9-13, wherein the obtaining a sample statement comprises:

acquiring a history dialogue log, and acquiring a plurality of candidate sentences from the history dialogue log;

determining a sample sentence from the plurality of candidate sentences;

Acquiring labeling intention information corresponding to the sample sentence;

And obtaining the labeling slot position information corresponding to the sample sentence.

15. The method of claim 14, wherein the determining a sample sentence from the plurality of candidate sentences comprises:

screening the candidate sentences based on a set rule to reserve sample sentences matched with the set rule;

And/or the number of the groups of groups,

Classifying the candidate sentences to obtain the categories of the candidate sentences;

And taking the candidate sentences with the category matched with the set category as the sample sentences.

16. A sentence processing method, the method comprising:

Acquiring a target sentence;

determining a target category model matched with the second category from a plurality of target category models according to the second category to which the target sentence belongs; wherein the model of the target category is trained using the method of any one of claims 9-15;

17. A sentence processing apparatus, the apparatus comprising:

The acquisition module is used for acquiring the target statement;

18. A training device for a catalog model, the device comprising:

19. A sentence processing apparatus, the apparatus comprising:

The acquisition module is used for acquiring the target statement;

the determining module is used for determining a target category model matched with the second category from a plurality of target category models according to the second category to which the target sentence belongs; wherein the model of the target category is trained using the apparatus of claim 18;

20. An electronic device, comprising:

at least one processor; and

A memory communicatively coupled to the at least one processor; wherein,

The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-8, or to perform the method of any one of claims 9-15, or to perform the method of claim 16.

21. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method of any one of claims 1-8, or to perform the method of any one of claims 9-15, or to perform the method of claim 16.