CN118673141A

CN118673141A - Statement category determination method and device, storage medium and program product

Info

Publication number: CN118673141A
Application number: CN202310288345.0A
Authority: CN
Inventors: 黄剑辉
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2023-03-16
Filing date: 2023-03-16
Publication date: 2024-09-20

Abstract

The embodiment of the application discloses a method and a device for determining sentence categories, equipment, a storage medium and a program product. The method comprises the following steps: inputting the statement to be processed into a plurality of network layers, and selecting a plurality of target network layers from the plurality of network layers; obtaining feature vectors of the sentences to be processed output by each target network layer to obtain a plurality of feature vectors; calculating the matching probability of the sentence to be processed and each preset category based on a plurality of feature vectors to obtain a plurality of probability parameters; and determining a target preset category to which the statement to be processed belongs based on the probability parameters. The feature vectors are extracted from different target network layers, so that feature differences exist among the feature vectors, and the matching probability of the to-be-processed sentence and each preset category obtained by calculation according to the feature vectors is more accurate, so that the target preset category to which the to-be-processed sentence belongs can be accurately determined.

Description

Statement category determination method and device, storage medium and program product

Technical Field

The present application relates to the field of data processing, and in particular, to a method and apparatus for determining a sentence category, a device, a storage medium, and a program product.

Background

The topic category can characterize the core subject guidance in the article, is generally the basic basis for organizing and classifying the article, and is also the basis in numerous business applications. Particularly in the search service scene, the related nouns in the sentences are identified to provide the user with articles for searching the corresponding category subjects.

But the class purpose is determined by only identifying the related nouns in the sentence, and the class of classification is often wrong or inaccurate.

Disclosure of Invention

To solve the above technical problems, embodiments of the present application provide a method and apparatus for determining a category of sentences, an electronic device, a computer readable storage medium, and a computer program product, respectively, so as to improve accuracy of determining the category of sentences.

Other features and advantages of the application will be apparent from the following detailed description, or may be learned by the practice of the application.

According to an aspect of the embodiment of the present application, there is provided a method for determining a category of sentences, including: inputting a sentence to be processed into a plurality of network layers, and selecting a plurality of target network layers from the plurality of network layers; obtaining feature vectors of the sentences to be processed output by each target network layer to obtain a plurality of feature vectors; calculating the matching probability of the sentence to be processed and each preset category based on the plurality of feature vectors to obtain a plurality of probability parameters; and determining the target preset category to which the statement to be processed belongs based on the probability parameters.

According to an aspect of an embodiment of the present application, there is provided a sentence class purpose determining apparatus including: an extraction module configured to input a sentence to be processed into a plurality of network layers, and select a plurality of target network layers from the plurality of network layers; the acquisition module is configured to acquire the feature vector of the statement to be processed output by each target network layer to obtain a plurality of feature vectors; the computing module is configured to compute the matching probability of the sentence to be processed and each preset category based on the plurality of feature vectors, and a plurality of probability parameters are obtained; and the determining module is configured to determine a target preset category to which the statement to be processed belongs based on the probability parameters.

In another exemplary embodiment, the determining module includes: the probability distribution matrix construction unit is configured to construct a probability distribution matrix corresponding to all preset categories of the sentence to be processed according to a plurality of probability parameters; the target preset category determining unit is configured to select a probability parameter with the largest value from the probability distribution matrix as a target probability parameter, and take a preset category corresponding to the target probability parameter as a target preset category to which the statement to be processed belongs.

In another exemplary embodiment, the computing module includes: a feature vector obtaining unit configured to obtain a feature vector of each preset category; the first computing unit is configured to perform quotient operation on the plurality of feature vectors and the feature vectors of the preset categories respectively for each preset category to obtain a plurality of operation results, and the plurality of operation results are used as matching probabilities of the plurality of feature vectors and the preset categories; the second calculation unit is configured to calculate the matching probability of the sentence to be processed and each preset class according to the matching probability of the feature vectors and each preset class, so as to obtain a plurality of probability parameters.

In another exemplary embodiment, the extraction module includes: the selection rule acquisition unit is configured to acquire a preset network layer interval selection rule; the network layer interval selection rule is used for representing the number of network layers spaced by selecting one target network layer from the plurality of network layers; and a selection unit configured to select the plurality of target network layers from the plurality of network layers based on the preset network layer interval selection rule.

In another exemplary embodiment, the plurality of network layers are a plurality of network layers sequentially arranged in turn; the extraction module comprises: the feature vector output unit is configured to input the statement to be processed into a first network layer corresponding to the plurality of network layers which are sequentially arranged in sequence, and sequentially perform feature extraction through the plurality of network layers to obtain feature vectors output by each network layer.

In another exemplary embodiment, the sentence-class purpose determining apparatus further includes: the input module is configured to input sample sentences into a model to be trained containing a plurality of network layers and acquire sample feature vectors output by a plurality of target network layers in the plurality of network layers; the loss function value calculation module is configured to calculate loss function values of all preset categories corresponding to each sample feature vector according to the plurality of sample feature vectors; the objective loss function value calculation module is configured to calculate an objective loss function value according to the loss function values of all the preset categories corresponding to each sample feature vector; and the training module is configured to train the model to be trained based on the objective loss function value to obtain a category classification model.

In another exemplary embodiment, the loss function value calculation module includes: the computing unit is configured to compute the sample feature vectors and the preset categories aiming at each preset category to obtain the matching probability of each sample feature vector corresponding to the preset category; and the loss function value calculation unit is configured to calculate and obtain the loss function value of each sample feature vector corresponding to all preset categories based on the matching probability of each preset category corresponding to each sample feature vector.

In another exemplary embodiment, the loss function value calculation unit includes: the first calculating plate is configured to calculate and obtain a loss function value of each sample feature vector corresponding to each preset class object based on the matching probability of each sample feature vector corresponding to each preset class object respectively; and the second calculating plate is configured to calculate and obtain the loss function value of each sample feature vector corresponding to all preset categories based on the loss function value of each preset category corresponding to each sample feature vector.

In another exemplary embodiment, the objective loss function value calculation module includes: the preset weight value unit is configured to acquire a preset weight value corresponding to each loss function; and the objective loss function value calculation unit is configured to calculate the objective loss function value according to the loss function values of all preset categories corresponding to each sample feature vector and the preset weight values corresponding to each loss function value.

In another exemplary embodiment, the objective loss function value calculation unit includes: the product operation plate is configured to perform product operation on the loss function values of all preset categories corresponding to each sample feature vector and the weight values corresponding to each loss function value to obtain a product result corresponding to each sample feature vector; and the objective loss function value plate is configured to sum the product result corresponding to each training vector and take the obtained sum result as the objective loss function value.

According to an aspect of an embodiment of the present application, there is provided an electronic apparatus including: a controller; and a memory for storing one or more programs which, when executed by the controller, perform the method of determining the sentence categories described above.

According to an aspect of the embodiments of the present application, there is also provided a computer-readable storage medium having stored thereon computer-readable instructions which, when executed by a processor of a computer, cause the computer to perform the above-described method of determining a category of sentences.

According to an aspect of the embodiments of the present application, there is also provided a computer program product comprising a computer program which, when executed by a processor, implements the above-mentioned method of determining a class of sentences.

In the technical scheme provided by the embodiment of the application, a statement to be processed is input into a plurality of network layers, and a plurality of target network layers are selected from the plurality of network layers; obtaining feature vectors of the sentences to be processed output by each target network layer to obtain a plurality of feature vectors; calculating the matching probability of the sentence to be processed and each preset category based on a plurality of feature vectors to obtain a plurality of probability parameters; and determining a target preset category to which the statement to be processed belongs based on the probability parameters. The feature vectors are extracted from different target network layers, so that feature differences exist among the feature vectors, semantic features of the to-be-processed sentence can be more comprehensively represented by introducing the feature vectors with the feature differences, and the calculated matching probability of the to-be-processed sentence and each preset category can be more accurate according to the feature vectors, so that the target preset category to which the to-be-processed sentence belongs can be accurately determined.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application as claimed.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application. It is evident that the drawings in the following description are only some embodiments of the present application and that other drawings may be obtained from these drawings without inventive effort for a person of ordinary skill in the art. In the drawings:

FIG. 1 is a schematic illustration of an implementation environment in which the present application is directed;

FIG. 2 is a flow chart of a method of determining a category of sentences shown in an exemplary embodiment of the present application;

FIG. 3 is a flow diagram of another method of determining a category of sentences shown based on the embodiment of FIG. 2;

FIG. 4 is a flow diagram of another method of determining a category of sentences shown based on the embodiment of FIG. 2;

FIG. 5 is a flow diagram of another method of determining a category of sentences shown based on the embodiment of FIG. 2;

FIG. 6 is a flow diagram of another method of determining a category of sentences shown based on the embodiment of FIG. 2;

FIG. 7 is a schematic diagram of a process flow in a plurality of network layers of a sequence of pending statement inputs, according to an exemplary embodiment of the application;

FIG. 8 is a flow diagram of another method of determining a category of statement shown based on the embodiment shown in any of FIGS. 2-6;

FIG. 9 is a flow chart of another method of determining a category of sentences shown based on the embodiment of FIG. 8;

FIG. 10 is a flow chart of another method of determining a category of sentences shown based on the embodiment of FIG. 9;

FIG. 11 is a flow chart of another method of determining a category of sentences shown based on the embodiment of FIG. 8;

FIG. 12 is a flow chart of another method of determining a category of sentences shown based on the embodiment of FIG. 11;

FIG. 13 is a schematic diagram showing a process of calculating a target loss function value according to an exemplary embodiment of the present application;

fig. 14 is a schematic diagram showing the structure of a sentence-class purpose determining apparatus according to an exemplary embodiment of the present application;

fig. 15 is a schematic diagram of a computer system of an electronic device according to an exemplary embodiment of the present application.

Detailed Description

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the application. Rather, they are merely examples of apparatus and methods consistent with aspects of the application as detailed in the accompanying claims.

The block diagrams depicted in the figures are merely functional entities and do not necessarily correspond to physically separate entities. That is, the functional entities may be implemented in software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.

The flow diagrams depicted in the figures are exemplary only, and do not necessarily include all of the elements and operations/steps, nor must they be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the order of actual execution may be changed according to actual situations.

In the present application, the term "plurality" means two or more. "and/or" describes an association relationship of an association object, meaning that there may be three relationships, e.g., a and/or B may represent: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship.

Firstly, it should be noted that Cloud technology (Cloud technology) refers to a hosting technology that unifies serial resources such as hardware, software, network, etc. in a wide area network or a local area network, so as to implement calculation, storage, processing and sharing of data. The method can be used for carrying out audio processing on the cloud end, storing the processed music chorus in a remote database and directly sending the processed music chorus to other ports.

Cloud technology (Cloud technology) is based on the general terms of network technology, information technology, integration technology, management platform technology, application technology and the like applied by Cloud computing business models, and can form a resource pool, so that the Cloud computing business model is flexible and convenient as required. Cloud computing technology will become an important support. Background services of technical networking systems require a large amount of computing, storage resources, such as video websites, picture-like websites, and more portals. Along with the high development and application of the internet industry, each article possibly has an own identification mark in the future, the identification mark needs to be transmitted to a background system for logic processing, data with different levels can be processed separately, and various industry data needs strong system rear shield support and can be realized only through cloud computing.

The method and apparatus for determining the category of sentences, the electronic device, the computer readable storage medium and the computer program product according to the embodiments of the present application relate to the cloud technology described above, and the embodiments will be described in detail below.

Under the search scene, related nouns in the sentence input by the operation object are generally extracted to display category information matched with the related nouns, and the matching category information is often wrong or inaccurate. Therefore, the embodiment of the application provides a method for determining the sentence category, so that the accuracy of determining the sentence category is improved.

Referring to fig. 1, fig. 1 is a schematic diagram of an implementation environment according to the present application. As shown in fig. 1, the implementation environment includes a client 110 and a server 120, and communication between the client 110 and the server 120 is performed through a wired or wireless network. The related statement class purpose determination process is exemplified as follows:

Illustratively, the client 110 is configured to collect a to-be-processed sentence input by an operation object, send the collected to-be-processed sentence to the server 120 for determining a sentence class purpose, and the server 120 inputs the to-be-processed sentence into a plurality of network layers and selects a plurality of target network layers from the plurality of network layers; obtaining feature vectors of the sentences to be processed output by each target network layer to obtain a plurality of feature vectors; calculating the matching probability of the sentence to be processed and each preset category based on a plurality of feature vectors to obtain a plurality of probability parameters; determining a target preset category to which the sentence to be processed belongs based on a plurality of probability parameters; the target preset categories are sent to the client 110 so that the client 110 displays the target preset categories.

The client 110 is a device with a data collection function, which may be a smart phone, a notebook computer, an intelligent tablet computer, etc., and is not limited herein. The server 120 may be an independent physical server, or may be a server cluster or a distributed system formed by a plurality of physical servers, where a plurality of servers may form a blockchain, and the servers are nodes on the blockchain, and the server 120 may also be a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs (Content Delivery Network, content delivery networks), and basic cloud computing services such as big data and artificial intelligent platforms, which are not limited in this respect.

Referring to fig. 2, fig. 2 is a flowchart illustrating a method for determining a category of sentences according to an exemplary embodiment of the present application. The method may be specifically performed by the server 120 in the implementation environment shown in fig. 1. Of course, the method may also be applied to other implementation environments and executed by a server device in other implementation environments, which is not limited by the present embodiment.

The determination method of the sentence category provided in the present embodiment will be described below with a server as an exemplary execution body. As shown in fig. 2, the method at least includes S210 to S240, which are described in detail as follows:

s210: the statement to be processed is input into a plurality of network layers, and a plurality of target network layers are selected from the plurality of network layers.

The sentence to be processed is a sentence containing a plurality of nouns, verbs, adverbs, and the like, for example, the sentence to be processed is: "this was originally written by the famous composer a and eventually carried by the director B to the screen. "

The network layers are feature extraction layers for extracting feature vectors corresponding to the sentences to be processed, and each network layer can extract the feature vectors of the sentences to be processed.

In this embodiment, the feature vector output by the target network layer is selected and obtained for determining the purpose of the subsequent sentence. The selection of the target network layer may be random or according to a certain selection rule.

S220: and obtaining the feature vectors of the sentences to be processed output by each target network layer to obtain a plurality of feature vectors.

For example, the network layers are 12 layers in total, the 4 th network layer, the 8 th network layer and the 12 th network layer are respectively selected as target network layers, and the feature vectors of the sentences to be processed output by the three network layers are respectively obtained.

In another example, the network layers are 12 layers in total, the 3 rd network layer, the 8 th network layer and the 10 th network layer are respectively selected as target network layers, and the feature vectors of the sentences to be processed output by the three network layers are respectively obtained.

S230: and calculating the matching probability of the sentence to be processed and each preset category based on the plurality of feature vectors to obtain a plurality of probability parameters.

The matching probability is a numerical value representing the matching degree of the sentence to be processed and the preset category, and the larger the matching probability is, the higher the matching degree of the sentence to be processed and the preset category is, and the lower the matching degree is otherwise.

The process of calculating the matching probability is exemplarily described: and respectively digitizing the plurality of feature vectors with preset feature vectors corresponding to each preset category, carrying out quotient calculation on the numerical values corresponding to the plurality of feature vectors and the numerical values corresponding to each preset feature vector, taking the obtained quotient as the plurality of feature vectors to calculate the matching probability of the sentence to be processed and each preset category, and further obtaining probability parameters according to the matching probability.

Illustratively, the plurality of feature vectors includes a first feature vector, a second feature vector and a third feature vector, which are respectively input into different classifiers to perform classification fitting, and the specific process is as follows:

Logits1＝Classifier1(L1-emb)；

The Classifier1 is composed of a full-connection layer and a softmax layer, and the input dimension of the full-connection layer is the number of categories.

Logits2＝Classifier2(L2-emb)；

The Classifier2 is composed of a full-connection layer and a softmax layer, and the input dimension of the full-connection layer is the number of categories.

Logits3＝Classifier3(L3-emb)；

The Classifier3 is composed of a full-connection layer and a softmax layer, and the input dimension of the full-connection layer is the number of categories.

Inputting Logits into softmax to obtain final multi-category probability distribution: a= softmax (Logits); wherein a= [ a ₁,a₂…a_n],a_i ] represents the matching probability predicted as i preset categories, i.e. the probability parameter in this embodiment.

S240: and determining a target preset category to which the statement to be processed belongs based on the probability parameters.

The target preset category is a preset parameter, which refers to a preset category to which a sentence to be processed belongs, and is one or more of a plurality of preset categories, wherein the preset category can be literature, philosophy, physics, engineering and other categories, and the application is not limited to specific information of the category. For example, the sentence to be processed may belong to the literature category and/or the philosophy category.

Exemplary description is made of S240: and acquiring the values corresponding to the probability parameters, determining the preset category corresponding to the probability parameter with the largest value in the probability parameters as the target preset category to which the sentence to be processed belongs, and displaying the related articles, information and other data under the target preset category.

The method comprises the steps of inputting a statement to be processed into a plurality of network layers, and selecting a plurality of target network layers from the plurality of network layers; obtaining feature vectors of the sentences to be processed output by each target network layer, and obtaining a plurality of feature vectors with feature differences, which are extracted from different target network layers; calculating the matching probability of the to-be-processed sentence and each preset category based on the plurality of feature vectors to obtain a plurality of probability parameters, namely introducing the plurality of feature vectors with feature differences, so that the semantic features of the to-be-processed sentence can be more comprehensively represented, and the calculated matching probability of the to-be-processed sentence and each preset category can be more accurately calculated; and determining the target preset category to which the to-be-processed sentence belongs based on the probability parameters so as to accurately and rapidly determine the target preset category to which the to-be-processed sentence belongs.

How to determine the target preset category to which the sentence to be processed belongs based on the multiple probability parameters is described in detail in another exemplary embodiment of the present application, referring specifically to fig. 3, fig. 3 is a flow chart of another method for determining the category of the sentence based on the embodiment shown in fig. 2. The method in this embodiment further includes at least S310 to S320 in S240 shown in fig. 2, and is described in detail as follows:

s310: and constructing and obtaining probability distribution matrixes of the sentences to be processed corresponding to all preset categories according to the probability parameters.

Illustratively, the plurality of probability parameters of the present embodiment includes a first probability parameter, a second probability parameter, and a third probability parameter. The first probability parameter is the matching probability obtained by calculating a plurality of feature vectors and a first preset category, and the first probability parameter characterizes the matching degree of the sentence to be processed and the first preset category; the second probability parameter is the matching probability obtained by calculating a plurality of feature vectors and a second preset category, and the second probability parameter represents the matching degree of the sentence to be processed and the second preset category; the third probability parameter is the matching probability calculated by the feature vectors and the third preset category, and the third probability parameter characterizes the matching degree of the sentence to be processed and the third preset category. And constructing a probability distribution matrix [ the first probability parameter, the second probability parameter and the third probability parameter ] according to the first probability parameter, the second probability parameter and the third probability parameter of the first preset category.

S320: and selecting the probability parameter with the largest numerical value from the probability distribution matrix as a target probability parameter, and taking the preset category corresponding to the target probability parameter as the target preset category to which the sentence to be processed belongs.

Illustratively, the probability distribution matrix is (0.45,0.9,0.12), where 0.45 characterizes a degree of matching of the sentence to be processed with a first preset category, 0.9 characterizes a degree of matching of the sentence to be processed with a second preset category, and 0.12 characterizes a degree of matching of the sentence to be processed with a third preset category. Obviously, 0.9 is a probability parameter with the largest value, the probability parameter is taken as a target probability parameter, and the second preset category is taken as a target preset category to which the sentence to be processed belongs.

According to the embodiment, the probability distribution matrix of the to-be-processed sentence corresponding to all preset categories is obtained through construction according to a plurality of probability parameters, and the matching degree of the to-be-processed sentence and each preset category is clearly known through the probability distribution matrix; and selecting the probability parameter with the largest numerical value from the probability distribution matrix as a target probability parameter, and taking the preset category corresponding to the target probability parameter as the target preset category to which the sentence to be processed belongs, so that the determination process of the target preset category is more convenient and quick.

In another exemplary implementation of the present application, a manner in which a plurality of probability parameters are calculated is illustrated, and referring specifically to fig. 4, fig. 4 is a flow chart of another method for determining a category of sentences based on the embodiment shown in fig. 2. The method in this embodiment further includes at least S410 to S430 in S230 shown in fig. 2, and is described in detail as follows:

s410: and obtaining the characteristic vector of each preset category.

Illustratively, as shown in table 1, table 1 is a feature vector corresponding to a preset category vector. And obtaining the feature vector corresponding to each preset category according to the table 1.

Preset category	Feature vector
		First preset category	First feature vector
Second preset category	Second feature vector
		Third preset category	Third feature vector
Fourth preset category	Fourth feature vector
		……	……

TABLE 1

S420: and respectively carrying out quotient calculation on the plurality of feature vectors and the feature vectors of the preset categories aiming at each preset category to obtain a plurality of calculation results, and taking the plurality of calculation results as the matching probability of the plurality of feature vectors and the preset categories.

For example, for a first preset category, if the feature vector corresponding to the feature vector is determined to be the first feature vector by table 1, the feature vector a, the feature vector B, and the feature vector C are divided by the first feature vector to obtain a first operation result (i.e., a matching probability of the feature vector a and the first preset category), a second operation result (i.e., a matching probability of the feature vector B and the first preset category), and a third operation result (i.e., a matching probability of the feature vector C and the first preset category).

S430: and calculating the matching probability of the sentence to be processed and each preset category based on the matching probability of the plurality of feature vectors and each preset category, so as to obtain a plurality of probability parameters.

Illustratively, the first operation result, the second operation result and the third operation result in the above example are subjected to an averaging operation, and the calculated average is used as the matching probability of the sentence to be processed and the first preset category.

In another exemplary embodiment, the first operation result, the second operation result and the third operation result in the above example are averaged to obtain their variances or standard deviations, and if the variances or standard deviations are larger, the plurality of operation results are screened out, the average operation is performed after the abnormal data are screened out, and the calculated average is used as the matching probability of the sentence to be processed and the first preset category.

The embodiment provides a way for calculating the matching probability of a sentence to be processed and each preset category, wherein a plurality of eigenvectors and eigenvectors of the preset category are subjected to quotient calculation respectively to obtain a plurality of calculation results, and the plurality of calculation results are used as the matching probability of the plurality of eigenvectors and the preset category so as to accurately obtain a plurality of probability parameters.

Referring to fig. 5, fig. 5 is a schematic flow chart of a method for determining another category of sentences according to the embodiment shown in fig. 2. The method in this embodiment further includes at least S510 to S520 in S210 shown in fig. 2, and is described in detail as follows:

S510: acquiring a preset network layer interval selection rule; wherein the network layer interval selection rule is used for representing the number of network layers spaced by selecting one target network layer from a plurality of network layers.

In some embodiments, a number of target network layers are randomly selected from a plurality of network layers.

S520: selecting a plurality of target network layers from the plurality of network layers based on a preset network layer interval selection rule.

For example, for a total of 12 network layers, the preset network layer interval selection rule is to select one target network layer every 3 network layers, and if the first network layer is the target network layer, the fourth network layer, the seventh network layer and the tenth network layer are the target network layers.

Another exemplary scenario is that for a total of 12 network layers, the preset network layer interval selection rule is to select one target network layer every 4 network layers, and if the fourth network layer is the target network layer, the eighth network layer and the twelfth network layer are the target network layers.

The embodiment provides a mode of selecting the target network layer according to the network layer interval selection rule, so that a certain rule exists among the selected plurality of target network layers, and the relation among the feature vectors output by the plurality of target network layers is enhanced.

In another exemplary embodiment of the present application, the plurality of network layers are sequentially arranged, which details how to input the sentence to be processed into the plurality of network layers, referring specifically to fig. 6, fig. 6 is a flow chart of another method for determining the category of the sentence shown in the embodiment of fig. 2. The method in this embodiment includes at least S610 in S210 shown in fig. 2, and is described in detail as follows:

S610: inputting the sentence to be processed into a first network layer corresponding to a plurality of network layers which are sequentially arranged in sequence, and sequentially extracting the characteristics through the plurality of network layers to obtain the characteristic vector output by each network layer.

Illustratively, as shown in fig. 7, fig. 7 is a schematic diagram of a processing flow in a plurality of network layers of a sequential order of entry of pending statements according to an exemplary embodiment of the present application. Wherein twelve network layers are sequentially arranged and combined together in sequence. The statement to be processed: the novel written by the famous maker A is finally moved to the screen by the director B and is input to a first network layer, and the first network layer outputs corresponding first feature vectors; the first characteristic vector is input into a second network layer after passing through the first network layer, and the second network layer outputs the corresponding second characteristic vector; with this, the twelfth network layer outputs the corresponding twelfth feature vector. If a network layer is selected as the target network layer, the feature vector output by the network layer is the feature vector used for determining the sentence category in this embodiment.

The embodiment further illustrates that a plurality of network layers are sequentially arranged in sequence, each network layer can timely re-extract the feature vectors of the to-be-processed sentences which are output by adjacent network layers and are used for extracting the feature vectors, and continuity of the data processing process is guaranteed.

If the method for determining the category of the sentence in each embodiment is applied to the category classification model, the category classification model needs to be trained in the early stage. For this reason, another exemplary embodiment of the present application is specifically described with reference to fig. 8, and fig. 8 is a flowchart illustrating another method for determining a category of sentences based on the embodiment shown in any one of fig. 2 to 6. The method in this embodiment further includes at least S810 to S840 before S210, and is described in detail as follows:

s810: and inputting the sample sentence into a model to be trained containing a plurality of network layers, and acquiring sample feature vectors output by a plurality of target network layers in the plurality of network layers.

The sample sentence is a sentence for training the model to be trained, which is beneficial to the accuracy of the output result of the model to be trained. It is input into multiple target network layers to obtain multiple sample feature vectors for calculation of correlation loss function values.

S820: and calculating according to the sample feature vectors to obtain loss function values of all preset categories corresponding to each sample feature vector.

Illustratively, the sample feature vector includes a first sample feature vector, a second sample feature vector, and a third sample feature vector. The first loss functions of all preset classes of the feature vectors of the first sample are calculated, the second loss functions of all preset classes of the feature vectors of the second sample are calculated, and the third loss functions of all preset classes of the feature vectors of the third sample are calculated.

S830: and calculating the target loss function value according to the loss function value of each sample feature vector corresponding to all preset categories.

Illustratively, the first, second and third loss functions described in S820 are averaged, and the resulting average is taken as the objective loss function value.

S840: training the model to be trained based on the objective loss function value to obtain the category classification model.

And adjusting training parameters in the model to be trained according to the objective loss function value to complete training of the model to be trained, and obtaining the category classification model.

According to the method, sample sentences are used as training initial parameters to be input into a target network layer to obtain a plurality of sample feature vectors, and the loss function value of each sample feature vector corresponding to all preset categories is obtained through calculation, so that the target loss function value for training a model to be trained is accurately calculated, training and adjustment of the model to be trained are completed, and the accuracy of the obtained category classification model is higher.

In another exemplary embodiment of the present application, a manner of calculating the loss function value of each sample feature vector corresponding to all preset categories is described in detail, and referring specifically to fig. 9, fig. 9 is a flowchart illustrating another method for determining the category of the sentence based on the embodiment shown in fig. 8. The method in this embodiment at least includes S910 to S920 in S820 shown in fig. 8, and is described in detail as follows:

S910: and calculating a plurality of sample feature vectors and preset categories aiming at each preset category to obtain the matching probability of each sample feature vector corresponding to the preset category.

Illustratively, the number of the plurality of sample feature vectors is three, i.e. three sample feature vectors are included. And aiming at the first preset category, respectively carrying out numerical processing on the three sample feature vectors and preset feature vectors corresponding to the preset category, and then calculating to obtain the matching probability of each sample feature vector corresponding to the preset category.

S920: and calculating the loss function value of each sample feature vector corresponding to all preset categories based on the matching probability of each sample feature vector corresponding to each preset category.

According to the embodiment, for each preset category, the matching probability of each sample feature vector corresponding to the preset category is calculated in sequence, so that the loss function value of each sample feature vector corresponding to all the preset categories is calculated rapidly and accurately.

In another exemplary embodiment of the present application, a manner of calculating the loss function value of each sample feature vector corresponding to all preset categories is described in detail, referring specifically to fig. 10, fig. 10 is a flowchart illustrating another method for determining the category of the sentence based on the embodiment shown in fig. 9. The method in this embodiment at least includes S1010 to S1020 in S920 shown in fig. 9, and is described in detail as follows:

s1010: and calculating to obtain the loss function value of each sample feature vector corresponding to each preset class according to the matching probability of each sample feature vector corresponding to each preset class.

Illustratively, the loss function value corresponding to each preset class of object is calculated according to the following formula:

B_(n)＝-(y_iloga_i)；

Wherein, B _(n) represents the loss function value of the nth sample feature vector corresponding to all preset categories; i represents a preset category sequence; y _i is 0 or 1; a _i represents the matching probability of the sample feature vector to the i-th preset category.

S1020: and calculating the loss function value of each sample feature vector corresponding to all preset categories based on the loss function value of each sample feature vector corresponding to each preset category.

And (3) carrying out summation operation on the B _(n), or calculating to obtain loss function values corresponding to all preset categories of each sample feature vector according to the following formula:

Wherein loss represents the loss function value of the nth sample feature vector corresponding to all preset categories; cls (n) represents the nth sample feature vector; n represents the number of preset categories; i represents a preset category sequence; y _i is 0 or 1; a _i represents the matching probability of the sample feature vector to the i-th preset category.

If the sample feature vectors are cls (1), cls (2) and cls (3), the first sample feature vector corresponds to the loss function values of all the preset categoriesA _i1 represents the matching probability of the i preset class object corresponding to the sample feature vector; the second sample feature vector corresponds to the loss function value of all preset categoriesA _i2 represents the matching probability of the i preset class corresponding to the second sample feature vector; the third sample feature vector corresponds to the loss function value of all preset categories A _i3 represents the matching probability of the i preset class corresponding to the third sample feature vector; and obtaining the loss function value of each sample feature vector corresponding to all preset categories.

The embodiment describes how to quickly calculate the loss function value of each sample feature vector corresponding to all preset categories, calculates the loss function value of each sample feature vector corresponding to each preset category by respectively matching the sample feature vector with the matching probability of each preset category, links the matching probability with the loss function, and accurately calculates the loss function value of each sample feature vector corresponding to all preset categories based on the loss function value of each sample feature vector corresponding to each preset category.

In another exemplary embodiment of the present application, a manner of calculating the objective loss function value is described in detail, and referring specifically to fig. 11, fig. 11 is a flowchart illustrating another method for determining the category of sentences based on the embodiment shown in fig. 8. The method in this embodiment at least includes S1110 to S1120 in S830 shown in fig. 8, and is described in detail as follows:

s1110: and acquiring a preset weight value corresponding to each loss function.

The preset weight value is used for calculating the target loss function value, and the larger the weight value is, the larger the specific gravity of the corresponding loss function of the table is, and the larger the influence on the target loss function value is.

S1120: and calculating to obtain a target loss function value according to the loss function value of each sample feature vector corresponding to all the preset categories and the preset weight value corresponding to each loss function value.

Illustratively, the sample feature vector includes a first sample feature vector, a second sample feature vector, and a third sample feature vector. The loss function value of the first sample feature vector corresponding to all preset categories is loss _cls(1),loss_cls(1), and the preset weight value corresponding to the loss function value is lambda ₁; the loss function value of the second sample feature vector corresponding to all preset categories is loss _cls(2),loss_cls(2), and the preset weight value corresponding to the second sample feature vector is lambda ₂; the loss function value of the third sample feature vector corresponding to all preset categories is loss _cls(3),loss_cls(3), and the preset weight value corresponding to the third sample feature vector is lambda ₃. The objective loss function value may be calculated according to the following formula:

Where loss _all represents the target loss function; the lambda ₁、λ₂ and lambda ₃ can be valued according to the actual situation.

The embodiment further illustrates how the objective loss function value is calculated, and when the objective loss function value is calculated, a preset weight value corresponding to the relevant sample feature vector is introduced to refine the influence of each loss function on the objective loss function, so that the objective loss function is more accurate.

In another exemplary embodiment of the present application, the above S1120 is described in detail, with specific reference to fig. 12, and fig. 12 is a flowchart illustrating another method for determining the category of sentences according to the embodiment shown in fig. 11. The method in this embodiment at least includes S1210 to S1220 in S1120 shown in fig. 11, and is described in detail as follows:

S1210: and carrying out product operation on the loss function value of each sample feature vector corresponding to all preset categories and the weight value corresponding to each loss function value to obtain a product result corresponding to each sample feature vector.

For example, the loss function value of the first sample feature vector corresponding to all the preset categories is loss _cls(1),loss_cls(1), the preset weight value corresponding to loss 3962 is λ ₁;λ₁loss_cls(1), which is the product result corresponding to the first sample feature vector, and so on, to obtain the product result corresponding to each sample feature vector.

S1220: and carrying out summation operation on the product result corresponding to each training vector, and taking the obtained sum result as a target loss function value.

The target loss function value is the sum of the product results corresponding to all the training vectors, namely, the product results corresponding to each training vector are added, and the obtained sum is the target loss function value.

The present embodiment is exemplarily described: the sample feature vector includes a first sample feature vector, a second sample feature vector, and a third sample feature vector. The loss function value of the first sample feature vector corresponding to all preset categories is loss _cls(1),loss_cls(1), and the preset weight value corresponding to the loss function value is lambda ₁; the loss function value of the second sample feature vector corresponding to all preset categories is loss _cls(2),loss_cls(2), and the preset weight value corresponding to the second sample feature vector is lambda ₂; the loss function value of the third sample feature vector corresponding to all preset categories is loss _cls(3),loss_cls(3), the corresponding preset weight value is lambda ₃, and the target loss function is calculated according to the following formula:

loss_all＝λ₁loss_cls(1)+λ₂loss_cls(2)+λ₃loss_cls(3)；

Where loss _all represents the target loss function; the ratio of lambda ₁：λ₂：λ₃ can be adjusted according to the actual situation, but the sum of all weight values is 1. In a preferred embodiment, λ ₁：λ₂：λ₃ =5: 3:2, the target loss function obtained by calculating the ratio is the most accurate, so that the category classification model obtained by training is more accurate.

The embodiment provides a specific mode for calculating the objective loss function value, which is to perform product operation on the loss function value of each sample feature vector corresponding to all preset categories and the weight value corresponding to each loss function value to obtain a product result corresponding to each sample feature vector; and carrying out summation operation on the product result corresponding to each training vector, taking the obtained sum result as a target loss function value, and refining the influence of each loss function on the target loss function, so that the target loss function is more accurate.

The method for determining the statement category of the present application may be applied to a multi-layer BERT model, and another exemplary embodiment of the present application is specifically described with reference to fig. 13, and fig. 13 is a schematic diagram illustrating a process for calculating the objective loss function value according to an exemplary embodiment of the present application. The BERT model includes 12 continuous network layers, and each layer selects one target network layer, as shown in fig. 13, where the fourth network layer, the eighth network layer and the twelfth network layer are target network layers.

First, the novel written by the famous maker a, which is finally carried onto the screen by the director B, is input as a sample sentence into the BERT model, each layer beginning with a token, which is also considered as a fusion of whole sentence semantics. The method comprises the steps of processing the first network layer, the second network layer and the third network layer, inputting the processed first sample feature vector into the fourth network layer, and obtaining a first sample feature vector output by the fourth network layer: L1-emb=bert-L4 (SENTENTCE); and by analogy, acquiring an eighth network layer and a twelfth network layer in a distributed manner, and outputting a second sample feature vector: L2-emb=bert-L8 (SENTENTCE) and third sample eigenvector: L3-emb=bert-L12 (SENTENTCE).

In this embodiment, the first sample feature vector is a shallow feature vector, the second sample feature vector is a middle layer feature vector, and the third sample feature vector is a deep feature vector. The shallow network layer may be more concerned with extracting basic structures of the text, such as parts of speech, expression structures, and the like, and the deep network layer may be more concerned with extracting abstract features of the text, such as semantic expressions of models. But basic characteristics are also important factors for text expression, and particularly when the structure is complex, the basic structure information is reinforced, so that the classification accuracy can be effectively improved. Therefore, in the process of training the model, the feature vectors of the shallow layer, the middle layer and the deep layer are considered, so that the trained model can more accurately determine the target preset category of the sentence to be processed.

The three sample feature vectors are respectively put into different classifiers to carry out classification fitting, namely, classification 1, classification 2 and classification 3 in fig. 13, and the specific classification fitting process is as follows:

logits 1=classifier (L1-emb); the Classifier1 is composed of a full-connection layer and a softmax layer, and the input dimension of the full-connection layer is the number of categories.

Logits 2= CLASSIFIERL 2-emb); the Classifier2 is composed of a full-connection layer and a softmax layer, and the input dimension of the full-connection layer is the number of categories.

Logits =classifier (L3-emb); the Classifier3 is composed of a full-connection layer and a softmax layer, and the input dimension of the full-connection layer is the number of categories.

Then, the Logits inputs softmax to get the final multi-category probability distribution: a= softmax (Logits); wherein a= [ a ₁,a₂…a_n],a_i ] represents the matching probability predicted as i preset categories.

Further, the loss function value of each sample feature vector corresponding to all preset categories is calculated according to the following formula:

Secondly, the loss function value of the first sample feature vector corresponding to all preset categories is loss _cls(1),loss_cls(1), and the preset weight value corresponding to the loss function value is lambda ₁; the loss function value of the second sample feature vector corresponding to all preset categories is loss _cls(2),loss_cls(2), and the preset weight value corresponding to the second sample feature vector is lambda ₂; the loss function value of the third sample feature vector corresponding to all preset categories is loss _cls(3),loss_cls(4), the corresponding preset weight value is lambda ₃, and the target loss function is calculated according to the following formula:

loss_all＝λ₁loss_cls(1)+λ₂loss_cls(2)+λ₃loss_cls(3)；

Where loss _all represents the target loss function; the ratio of lambda ₁：λ₂：λ₃ can be adjusted according to the actual situation, but the sum of all weight values is 1.

In the embodiment, a multilayer loss structure is introduced to fit the labels from different angles, so that the training effect is enhanced.

Finally, the BERT model is adjusted according to the target loss function, so that the statement category output by the BERT model is more accurate.

The BERT model of the embodiment can acquire semantic vectors of sentences to be processed and grammar information of words in sentences, and the model integrating the grammar and the semantic information can better represent core information in the sentences in experiments, so that the classification accuracy is improved, the semantics of a shallow network layer of the model are more focused on sentence structure expression of the sentences, the deep network layer is more focused on sentence abstract semantics, and the integrity of expressing the semantics is better combined with the two-aspect information, so that the semantic expression of complex sentence structures can be effectively improved. In addition, the multilayer LOSS structure in the embodiment can fit the real label to the feature vectors extracted by different network layers, so that the training effect is enhanced.

In another aspect of the present application, a sentence class purpose determining apparatus is provided, as shown in fig. 14, and fig. 14 is a schematic structural diagram of a sentence class purpose determining apparatus according to an exemplary embodiment of the present application. Wherein, the statement class purpose determining device comprises:

The extraction module 1410 is configured to input a sentence to be processed into a plurality of network layers and select a plurality of target network layers from the plurality of network layers.

And the obtaining module 1430 is configured to obtain the feature vector of the sentence to be processed output by each target network layer, so as to obtain a plurality of feature vectors.

A calculating module 1450 configured to calculate a matching probability of the sentence to be processed and each preset category based on the plurality of feature vectors, resulting in a plurality of probability parameters.

A determining module 1470 configured to determine a target preset category to which the sentence to be processed belongs based on the plurality of probability parameters.

In another exemplary embodiment, the determination module 1470 includes:

the probability distribution matrix construction unit is configured to construct a probability distribution matrix corresponding to all preset categories of the sentence to be processed according to the probability parameters.

The target preset category determining unit is configured to select a probability parameter with the largest value from the probability distribution matrix as a target probability parameter, and take a preset category corresponding to the target probability parameter as a target preset category to which the statement to be processed belongs.

In another exemplary embodiment, the computing module 1450 includes:

and a feature vector acquisition unit configured to acquire a feature vector of each preset category.

The first computing unit is configured to perform quotient operation on the plurality of feature vectors and the feature vectors of the preset categories respectively for each preset category to obtain a plurality of operation results, and the plurality of operation results are used as matching probabilities of the plurality of feature vectors and the preset categories.

The second calculating unit is configured to calculate the matching probability of the sentence to be processed and each preset category based on the matching probability of the feature vectors and each preset category, so as to obtain a plurality of probability parameters.

In another exemplary embodiment, the extraction module 1410 includes:

The selection rule acquisition unit is configured to acquire a preset network layer interval selection rule; wherein the network layer interval selection rule is used for representing the number of network layers spaced by selecting one target network layer from a plurality of network layers.

And a selection unit configured to select a plurality of target network layers from the plurality of network layers based on a preset network layer interval selection rule.

In another exemplary embodiment, the plurality of network layers are a plurality of network layers sequentially arranged in turn; the extraction module 1410 includes:

The feature vector output unit is configured to input the sentence to be processed into a first network layer corresponding to a plurality of network layers which are sequentially arranged in sequence, and sequentially perform feature extraction through the plurality of network layers to obtain a feature vector output by each network layer.

In another exemplary embodiment, the sentence-class purpose determining apparatus further includes:

The input module is configured to input sample sentences into a model to be trained containing a plurality of network layers and acquire sample feature vectors output by a plurality of target network layers in the plurality of network layers.

And the loss function value calculation module is configured to calculate loss function values corresponding to all preset categories of each sample feature vector according to the plurality of sample feature vectors.

And the target loss function value calculation module is configured to calculate a target loss function value according to the loss function values of all preset categories corresponding to each sample feature vector.

And the training module is configured to train the model to be trained based on the objective loss function value to obtain the category classification model.

In another exemplary embodiment, the loss function value calculation module includes:

the computing unit is configured to compute a plurality of sample feature vectors and preset categories aiming at each preset category, so as to obtain the matching probability of each sample feature vector corresponding to the preset category.

The loss function value calculation unit is configured to calculate and obtain loss function values of all preset categories corresponding to each sample feature vector based on the matching probability of each preset category corresponding to each sample feature vector.

In another exemplary embodiment, the loss function value calculation unit includes:

The first calculating plate is configured to calculate and obtain a loss function value of each sample feature vector corresponding to each preset class according to the matching probability of each sample feature vector corresponding to each preset class.

The second calculating plate is configured to calculate and obtain the loss function value of each sample feature vector corresponding to all preset categories based on the loss function value of each preset category corresponding to each sample feature vector.

In another exemplary embodiment, the objective loss function value calculation module includes:

The preset weight value unit is configured to acquire a preset weight value corresponding to each loss function.

And the objective loss function value calculation unit is configured to calculate an objective loss function value according to the loss function values of all preset categories corresponding to each sample feature vector and the preset weight values corresponding to each loss function value.

In another exemplary embodiment, the objective loss function value calculation unit includes:

And the product operation plate is configured to perform product operation on the loss function value of each sample feature vector corresponding to all preset categories and the weight value corresponding to each loss function value to obtain a product result corresponding to each sample feature vector.

And an objective loss function value block configured to sum the product results corresponding to each training vector and take the obtained sum result as an objective loss function value.

It should be noted that, the statement class purpose determining apparatus provided in the foregoing embodiment and the statement class purpose determining method provided in the foregoing embodiment belong to the same concept, and the specific manner in which each module and unit perform the operation has been described in detail in the method embodiment, which is not repeated here.

Another aspect of the present application also provides an electronic device, including: a controller; and a memory for storing one or more programs which, when executed by the controller, perform the method of statement class purpose determination in the respective embodiments described above.

Another aspect of the application also provides a computer readable storage medium having stored thereon computer readable instructions which, when executed by a processor of a computer, cause the computer to perform the above-described method. Also, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the method for determining the category of sentences described above.

Referring to fig. 15, fig. 15 is a schematic diagram of a computer system of an electronic device according to an exemplary embodiment of the present application, which is suitable for implementing the electronic device according to the embodiment of the present application.

It should be noted that, the computer system 1500 of the electronic device shown in fig. 15 is only an example, and should not impose any limitation on the functions and the application scope of the embodiments of the present application.

As shown in fig. 15, the computer system 1500 includes a central processing unit (Central Processing Unit, CPU) 1501, which can perform various appropriate actions and processes, such as performing the methods in the above-described embodiments, according to a program stored in a Read-Only Memory (ROM) 1502 or a program loaded from a storage portion 1508 into a random access Memory (Random Access Memory, RAM) 1503. In the RAM 1503, various programs and data required for the operation of the system are also stored. The CPU 1501, ROM 1502, and RAM 1503 are connected to each other through a bus 1504. An Input/Output (I/O) interface 1505 is also connected to bus 1504.

The following components are connected to I/O interface 1505: an input section 1506 including a keyboard, mouse, and the like; an output portion 1507 including a Cathode Ray Tube (CRT), a Liquid crystal display (Liquid CRYSTAL DISPLAY, LCD), and a speaker; a storage section 1508 including a hard disk and the like; and a communication section 1509 including a network interface card such as a LAN (Local Area Network ) card, a modem, or the like. The communication section 1509 performs communication processing via a network such as the internet. A drive 1510 is also connected to the I/O interface 1505 as needed. Removable media 1511, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like, is mounted on the drive 1510 as needed so that a computer program read therefrom is mounted into the storage section 1508 as needed.

In particular, according to embodiments of the present application, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present application include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising a computer program for performing the method shown in the flowchart. In such an embodiment, the computer program can be downloaded and installed from a network via the communication portion 1509, and/or installed from the removable medium 1511. When executed by a Central Processing Unit (CPU) 1501, performs the various functions defined in the system of the present application.

It should be noted that, the computer readable medium shown in the embodiments of the present application may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium may be, for example, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-Only Memory (ROM), an erasable programmable read-Only Memory (Erasable Programmable Read Only Memory, EPROM), a flash Memory, an optical fiber, a portable compact disc read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present application, however, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with a computer-readable computer program embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. A computer program embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. Where each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units involved in the embodiments of the present application may be implemented by software, or may be implemented by hardware, and the described units may also be provided in a processor. Wherein the names of the units do not constitute a limitation of the units themselves in some cases.

Another aspect of the present application also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method of determining a category of statements as before. The computer-readable storage medium may be included in the electronic device described in the above embodiment or may exist alone without being incorporated in the electronic device.

Another aspect of the application also provides a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions so that the computer device performs the determination method of the sentence categories provided in the above-described respective embodiments.

According to an aspect of the embodiment of the present application, there is also provided a computer system including a central processing unit (Central Processing Unit, CPU) that can perform various appropriate actions and processes, such as performing the method in the above-described embodiment, according to a program stored in a Read-Only Memory (ROM) or a program loaded from a storage section into a random access Memory (Random Access Memory, RAM). In the RAM, various programs and data required for the system operation are also stored. The CPU, ROM and RAM are connected to each other by a bus. An Input/Output (I/O) interface is also connected to the bus.

The following components are connected to the I/O interface: an input section including a keyboard, a mouse, etc.; an output section including a Cathode Ray Tube (CRT), a Liquid crystal display (Liquid CRYSTAL DISPLAY, LCD), and a speaker; a storage section including a hard disk or the like; and a communication section including a network interface card such as a LAN (Local Area Network ) card, a modem, or the like. The communication section performs communication processing via a network such as the internet. The drives are also connected to the I/O interfaces as needed. Removable media such as magnetic disks, optical disks, magneto-optical disks, semiconductor memories, and the like are mounted on the drive as needed so that a computer program read therefrom is mounted into the storage section as needed.

The foregoing is merely illustrative of the preferred embodiments of the present application and is not intended to limit the embodiments of the present application, and those skilled in the art can easily make corresponding variations or modifications according to the main concept and spirit of the present application, so that the protection scope of the present application shall be defined by the claims.

Claims

1. A method for determining a category of sentences, comprising:

Inputting a sentence to be processed into a plurality of network layers, and selecting a plurality of target network layers from the plurality of network layers;

Obtaining feature vectors of the sentences to be processed output by each target network layer to obtain a plurality of feature vectors;

calculating the matching probability of the sentence to be processed and each preset category based on the plurality of feature vectors to obtain a plurality of probability parameters;

and determining the target preset category to which the statement to be processed belongs based on the probability parameters.

2. The method of claim 1, wherein determining, based on the plurality of probability parameters, a target preset category to which the statement to be processed belongs comprises:

constructing and obtaining probability distribution matrixes of the sentences to be processed corresponding to all preset categories according to a plurality of probability parameters;

And selecting a probability parameter with the largest numerical value from the probability distribution matrix as a target probability parameter, and taking a preset category corresponding to the target probability parameter as a target preset category to which the statement to be processed belongs.

3. The method according to claim 1, wherein calculating the matching probability of the sentence to be processed and each preset category based on the feature vectors to obtain a plurality of probability parameters includes:

acquiring the feature vector of each preset category;

for each preset category, respectively carrying out quotient calculation on the plurality of feature vectors and the feature vectors of the preset category to obtain a plurality of calculation results, and taking the plurality of calculation results as the matching probability of the plurality of feature vectors and the preset category;

and calculating the matching probability of the sentence to be processed and each preset class according to the matching probability of the feature vectors and each preset class, so as to obtain a plurality of probability parameters.

4. The method of claim 1, wherein said selecting a plurality of target network layers from said plurality of network layers comprises:

Acquiring a preset network layer interval selection rule; the network layer interval selection rule is used for representing the number of network layers spaced by selecting one target network layer from the plurality of network layers;

And selecting the target network layers from the network layers based on the preset network layer interval selection rule.

5. The method of claim 1, wherein the plurality of network layers are sequentially arranged in turn;

The inputting the sentence to be processed into a plurality of network layers comprises:

Inputting the statement to be processed into a first network layer corresponding to the plurality of network layers which are sequentially arranged in sequence, and sequentially extracting features through the plurality of network layers to obtain feature vectors output by each network layer.

6. The method according to any one of claims 1 to 5, wherein prior to said entering the statement to be processed into the plurality of network layers, the method further comprises:

Inputting sample sentences into a model to be trained containing a plurality of network layers, and acquiring sample feature vectors output by a plurality of target network layers in the plurality of network layers;

calculating according to the sample feature vectors to obtain loss function values of all preset categories corresponding to each sample feature vector;

calculating a target loss function value according to the loss function values of all the preset categories corresponding to each sample feature vector;

and training the model to be trained based on the objective loss function value to obtain a category classification model.

7. The method according to claim 6, wherein the calculating the loss function value of each sample feature vector corresponding to all preset categories according to the plurality of sample feature vectors includes:

Aiming at each preset category, calculating the plurality of sample feature vectors and the preset category to obtain the matching probability of each sample feature vector corresponding to the preset category;

And calculating the loss function value of each sample feature vector corresponding to all preset categories based on the matching probability of each sample feature vector corresponding to each preset category.

8. The method of claim 7, wherein calculating the loss function value of each sample feature vector corresponding to all preset categories based on the matching probability of each preset category corresponding to each sample feature vector, comprises:

based on the matching probability that each sample feature vector corresponds to each preset class objective respectively, calculating to obtain a loss function value that each sample feature vector corresponds to each preset class objective respectively;

And calculating the loss function value of each sample feature vector corresponding to all preset categories based on the loss function value of each sample feature vector corresponding to each preset category.

9. The method according to claim 6, wherein the calculating the objective loss function value according to the loss function values of all the preset categories for each sample feature vector includes:

acquiring a preset weight value corresponding to each loss function;

And calculating the target loss function value according to the loss function value of each sample feature vector corresponding to all preset categories and the preset weight value corresponding to each loss function value.

10. The method according to claim 9, wherein the calculating the objective loss function value according to the loss function values of all preset categories corresponding to each sample feature vector and the weight value corresponding to each loss function value includes:

Performing product operation on the loss function values of all the preset categories corresponding to each sample feature vector and the weight values corresponding to each loss function value to obtain a product result corresponding to each sample feature vector;

And carrying out summation operation on the product result corresponding to each training vector, and taking the obtained sum result as the target loss function value.

11. A sentence-like destination determining apparatus, comprising:

An extraction module configured to input a sentence to be processed into a plurality of network layers, and select a plurality of target network layers from the plurality of network layers;

the acquisition module is configured to acquire the feature vector of the statement to be processed output by each target network layer to obtain a plurality of feature vectors;

the computing module is configured to compute the matching probability of the sentence to be processed and each preset category based on the plurality of feature vectors, and a plurality of probability parameters are obtained;

And the determining module is configured to determine a target preset category to which the statement to be processed belongs based on the probability parameters.

12. An electronic device, comprising:

A controller;

a memory for storing one or more programs that, when executed by the controller, cause the controller to implement the method of determining the class of sentences of any one of claims 1 to 10.

13. A computer-readable storage medium having stored thereon computer-readable instructions which, when executed by a processor of a computer, cause the computer to perform the method of determining the category of sentences of any one of claims 1 to 10.

14. A computer program product comprising a computer program, characterized in that the computer program, when executed by a processor, implements the method of determining the category of sentences in any one of claims 1 to 10.