CN116151272A

CN116151272A - Terminal equipment and semantic intention recognition method

Info

Publication number: CN116151272A
Application number: CN202310096697.6A
Authority: CN
Inventors: 岳文浩; 潘佳斌; 王敏; 车万翔; 黎州扬; 魏福煊
Original assignee: Hisense Visual Technology Co Ltd
Current assignee: Hisense Visual Technology Co Ltd
Priority date: 2023-02-06
Filing date: 2023-02-06
Publication date: 2023-05-23

Abstract

Some embodiments of the present application provide a terminal device and a method for identifying semantic intent, where the method may parse a query text input by a user, and generate an initial tag according to the query text. Wherein the initial tag is a forward tag of the query text. The query text is entered into a semantic model to detect the semantics of the initial tag through a bi-classified neural network linear layer in the semantic model. And marking the initial label according to the semantic meaning to obtain a label to be identified, wherein the label to be identified consists of a positive label with positive semantic meaning and a negative label with negative semantic meaning. The method can generate the recognition result of the query text based on the label to be recognized, improve the recognition rate of semantic intention and improve user experience.

Description

Terminal equipment and semantic intention recognition method

Technical Field

The application relates to the technical field of semantic understanding, in particular to terminal equipment and a semantic intention recognition method.

Background

The terminal equipment refers to electronic equipment with a built-in dialogue system, and can be electronic equipment such as an intelligent television, a mobile phone, an intelligent sound box, a computer, a robot and the like. Taking intelligent electricity as an example, the intelligent television is based on the Internet application technology, has an open operating system and a chip, has a voice recognition function module, can realize a television product with a bidirectional man-machine interaction function, and is used for meeting the diversified and personalized requirements of users.

The terminal device can execute corresponding operation according to text content input by a user, namely, the terminal device can analyze the semantics of user inquiry through a semantic understanding module in a dialogue system, and then execute corresponding inquiry operation according to the analyzed semantic result.

However, the semantic understanding module of the terminal device focuses on affirmative type query recognition, and the user needs to express the intention of the query positively at the time of the query. However, during the interaction, the user's query text may also include a negative intent, i.e., there is a query description in the query text that does not need to be executed by the terminal device. At this time, the dialogue system of the terminal device cannot accurately identify the intention of the query text, which results in low identification rate of semantic intention and reduces the experience of the user.

Disclosure of Invention

The application provides a terminal device and a semantic intention recognition method, which are used for solving the problem that the recognition rate of semantic intention in the terminal device is low.

In a first aspect, some embodiments of the present application provide a terminal device, including: a detector and a controller. The detector is used for acquiring query text input by a user; the controller is configured to perform the following program steps:

Analyzing the query text, and generating an initial label according to the query text, wherein the initial label is a forward label of the query text;

inputting the query text into a semantic model to detect the semantics of the initial tag through a binary neural network linear layer in the semantic model;

marking the initial tag according to the semantic meaning to obtain a tag to be identified, wherein the tag to be identified comprises a positive tag and a negative tag, the positive tag is an initial tag with positive semantic meaning, and the negative tag is an initial tag with negative semantic meaning;

and generating a recognition result of the query text based on the label to be recognized.

In a second aspect, some embodiments of the present application provide a method for identifying semantic intent, including:

analyzing a query text, and generating an initial label according to the query text, wherein the initial label is a forward label of the query text;

According to the technical scheme, the terminal equipment and the semantic intention recognition method provided by some embodiments of the application can analyze the query text input by the user, and then generate the initial tag according to the query text. Wherein the initial tag is a forward tag of the query text. The query text is entered into a semantic model to detect the semantics of the initial tag through a bi-classified neural network linear layer in the semantic model. And marking the initial label according to the semantic meaning to obtain a label to be identified, wherein the label to be identified consists of a positive label with positive semantic meaning and a negative label with negative semantic meaning. The identification method can generate the identification result of the query text based on the label to be identified, so that the identification rate of semantic intention is improved, and the user experience is improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic diagram of an operation scenario between a terminal device and a control device provided in some embodiments of the present application;

fig. 2 is a schematic hardware configuration diagram of a terminal device according to some embodiments of the present application;

fig. 3 is a schematic hardware configuration diagram of a control device according to some embodiments of the present application;

fig. 4 is a schematic software configuration diagram of a terminal device according to some embodiments of the present application;

fig. 5 is an interaction schematic diagram of a control device according to some embodiments of the present application inputting query text to a terminal device;

FIG. 6 is a flowchart illustrating a method for identifying a fusion of positive and negative intent provided in some embodiments of the present application;

FIG. 7 is a flow chart of domain and intent classification provided in some embodiments of the present application;

FIG. 8 is a flowchart illustrating slot classification according to some embodiments of the present disclosure;

FIG. 9 is a schematic diagram of a negative classification flow of a negative prediction process provided in some embodiments of the present application;

FIG. 10 is a diagram of an example framework for semantic understanding provided by some embodiments of the present application;

fig. 11 is a flow chart of a method for identifying semantic intent provided in some embodiments of the present application.

Detailed Description

For the purposes of making the objects, technical solutions and advantages of the exemplary embodiments of the present application more apparent, the technical solutions in the exemplary embodiments of the present application will be clearly and completely described below with reference to the drawings in the exemplary embodiments of the present application, and it is apparent that the described exemplary embodiments are only some embodiments of the present application, but not all embodiments.

All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present application, are intended to be within the scope of the present application based on the exemplary embodiments shown in the present application. Furthermore, while the disclosure has been presented in terms of an exemplary embodiment or embodiments, it should be understood that various aspects of the disclosure can be practiced separately from the disclosure in a complete subject matter.

It should be understood that the terms "first," "second," "third," and the like in the description and in the claims and in the above-described figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate, such as where appropriate, for example, implementations other than those illustrated or described in accordance with embodiments of the present application.

Furthermore, the terms "comprise" and "have," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a product or apparatus that comprises a list of elements is not necessarily limited to those elements expressly listed, but may include other elements not expressly listed or inherent to such product or apparatus.

The terminal device provided in the embodiment of the present application may have various implementation forms, for example, may be a display device, including a television, an intelligent television, a laser projection device, a display (monitor), an electronic whiteboard (electronic bulletin board), an electronic desktop (electronic table), or may also be an electronic device such as a mobile device, an intelligent sound box, or the like; or may also be software of a built-in dialog system or the like. Fig. 1 and fig. 2 are specific embodiments of a terminal device of the present application.

Fig. 1 is a schematic diagram of an operation scenario between a terminal device and a control apparatus according to an embodiment. As shown in fig. 1, a user may operate the terminal device 200 through the smart device 300 or the control apparatus 100.

In some embodiments, the control device 100 may be a remote controller, and the communication between the remote controller and the terminal device includes infrared protocol communication or bluetooth protocol communication, and other short-range communication modes, and the terminal device 200 is controlled by a wireless or wired mode. The user may control the terminal device 200 by inputting user instructions through keys on a remote controller, voice input, control panel input, etc.

In some embodiments, the smart device 300 (e.g., mobile terminal, tablet, computer, notebook, etc.) may also be used to control the terminal device 200. For example, the terminal device 200 is controlled using an application running on the smart device.

In some embodiments, the terminal device may not receive the instruction using the above-described smart device or control device, but receive the control of the user through touch or gesture, or the like.

In some embodiments, the terminal device 200 may also perform control in a manner other than the control apparatus 100 and the smart device 300, for example, the voice command control of the user may be directly received through a module configured inside the terminal device 200 for acquiring a voice command, or the voice command control of the user may be received through a voice control apparatus configured outside the terminal device 200.

In some embodiments, the terminal device 200 is also in data communication with the server 400. The terminal device 200 may be permitted to make communication connection through a Local Area Network (LAN), a Wireless Local Area Network (WLAN), and other networks. The server 400 may provide various contents and interactions to the terminal device 200. The server 400 may be a cluster, or may be multiple clusters, and may include one or more types of servers.

Fig. 2 exemplarily shows a block diagram of a configuration of the control apparatus 100 in accordance with an exemplary embodiment. As shown in fig. 2, the control device 100 includes a controller 110, a communication interface 130, a user input/output interface 140, a memory, and a power supply. The control apparatus 100 may receive an input operation instruction of a user and convert the operation instruction into an instruction recognizable and responsive to the terminal device 200, and may function as an interaction between the user and the terminal device 200.

As shown in fig. 3, the terminal apparatus 200 includes at least one of a modem 210, a communicator 220, a detector 230, an external device interface 240, a controller 250, a display 260, an audio output interface 270, a memory, a power supply, and a user interface.

In some embodiments the controller includes a processor, a video processor, an audio processor, a graphics processor, RAM, ROM, a first interface for input/output to an nth interface.

The display 260 includes a display screen component for presenting a picture, and a driving component for driving an image display, a component for receiving an image signal from the controller output, displaying video content, image content, and a menu manipulation interface, and a user manipulation UI interface.

The display 260 may be a liquid crystal display, an OLED display, a projection device, or a projection screen.

The communicator 220 is a component for communicating with external devices or servers according to various communication protocol types. For example: the communicator may include at least one of a Wifi module, a bluetooth module, a wired ethernet module, or other network communication protocol chip or a near field communication protocol chip, and an infrared receiver. The terminal device 200 may establish transmission and reception of control signals and data signals with the control apparatus 100 or the server 400 through the communicator 220.

A user interface, which may be used to receive control signals from the control device 100 (e.g., an infrared remote control, etc.).

The detector 230 is used to collect signals of the external environment or interaction with the outside. For example, detector 230 includes a light receiver, a sensor for capturing the intensity of ambient light; alternatively, the detector 230 includes an image collector such as a camera, which may be used to collect external environmental scenes, user attributes, or user interaction gestures, or alternatively, the detector 230 includes a sound collector such as a microphone, or the like, which is used to receive external sounds.

The external device interface 240 may include, but is not limited to, the following: high Definition Multimedia Interface (HDMI), analog or data high definition component input interface (component), composite video input interface (CVBS), USB input interface (USB), RGB port, etc. The input/output interface may be a composite input/output interface formed by a plurality of interfaces.

The modem 210 receives broadcast television signals through a wired or wireless reception manner, and demodulates audio and video signals, such as EPG data signals, from a plurality of wireless or wired broadcast television signals.

In some embodiments, the controller 250 and the modem 210 may be located in separate devices, i.e., the modem 210 may also be located in an external device to the main device in which the controller 250 is located, such as an external set-top box or the like.

The controller 250 controls the operation of the terminal device 200 and responds to the user's operation by various software control programs stored in the memory. The controller 250 controls the overall operation of the terminal device 200. For example: in response to receiving a user command to select a UI object to be displayed on the display 260, the controller 250 may perform an operation related to the object selected by the user command.

In some embodiments the controller includes at least one of a central processing unit (Central Processing Unit, CPU), video processor, audio processor, graphics processor (Graphics Processing Unit, GPU), RAM Random Access Memory, RAM), ROM (Read-Only Memory, ROM), first to nth interfaces for input/output, a communication Bus (Bus), and the like.

The user may input a user command through a Graphical User Interface (GUI) displayed on the display 260, and the user input interface receives the user input command through the Graphical User Interface (GUI). Alternatively, the user may input the user command by inputting a specific sound or gesture, and the user input interface recognizes the sound or gesture through the sensor to receive the user input command.

A "user interface" is a media interface for interaction and exchange of information between an application or operating system and a user, which enables conversion between an internal form of information and a user-acceptable form. A commonly used presentation form of the user interface is a graphical user interface (Graphic User Interface, GUI), which refers to a user interface related to computer operations that is displayed in a graphical manner. It may be an interface element such as an icon, a window, a control, etc. displayed in a display screen of the electronic device, where the control may include a visual interface element such as an icon, a button, a menu, a tab, a text box, a dialog box, a status bar, a navigation bar, a Widget, etc.

In some embodiments, as shown in fig. 4, the system is divided into four layers, from top to bottom, an application layer (application layer), an application framework layer (Application Framework layer), a An Zhuoyun row (Android run) and a system library layer (system runtime layer), and a kernel layer.

In some embodiments, at least one application program is running in the application program layer, and these application programs may be a Window (Window) program of an operating system, a system setting program, a clock program, or the like; or may be an application developed by a third party developer. In particular implementations, the application packages in the application layer are not limited to the above examples.

The framework layer provides an application programming interface (application programming interface, API) and programming framework for the application. The application framework layer includes a number of predefined functions. The application framework layer corresponds to a processing center that decides to let the applications in the application layer act. Through the API interface, the application program can access the resources in the system and acquire the services of the system in the execution.

As shown in fig. 4, the application framework layer in the embodiment of the present application includes a manager (manager), a Content Provider (Content Provider), and the like, where the manager includes at least one of the following modules: an Activity Manager (Activity Manager) is used to interact with all activities that are running in the system; a Location Manager (Location Manager) is used to provide system services or applications with access to system Location services; a package manager (PackageManager) for retrieving various information about an application package currently installed on the device; a notification manager (Notification Manager) for controlling the display and clearing of notification messages; a Window Manager (Window Manager) is used to manage bracketing icons, windows, toolbars, wallpaper, and desktop components on the user interface.

In some embodiments, the activity manager is used to manage the lifecycle of the individual applications as well as the usual navigation rollback functions, such as controlling the exit, opening, fallback, etc. of the applications. The window manager is used for managing all window programs, such as obtaining the size of the display screen, judging whether a status bar exists or not, locking the screen, intercepting the screen, controlling the change of the display window (for example, reducing the display window to display, dithering display, distorting display, etc.), etc.

In some embodiments, the system runtime layer provides support for the upper layer, the framework layer, and when the framework layer is in use, the android operating system runs the C/C++ libraries contained in the system runtime layer to implement the functions to be implemented by the framework layer.

In some embodiments, the kernel layer is a layer between hardware and software. As shown in fig. 4, the kernel layer contains at least one of the following drivers: audio drive, display drive, bluetooth drive, camera drive, WIFI drive, USB drive, HDMI drive, sensor drive (e.g., fingerprint sensor, temperature sensor, pressure sensor, etc.), and power supply drive, etc.

In some embodiments, the terminal device 200 has a dialogue system built in. The dialogue system comprises a semantic understanding module, and the semantic understanding module is configured to analyze query text input by a user to acquire information to be queried by the user, so that the terminal equipment 200 responds to corresponding operations.

It is to be noted that, for the convenience of understanding to those skilled in the art, some of the terms and related techniques referred to in the present application are explained below.

Query text: the request sentence input by the user contains the query intention of the user. For example: "find XXX movie", "see today's weather", "play a voiced novel", etc.

In some embodiments, the input manner of the query text may be a typing manner, that is, the user may input corresponding text through an input device such as a keyboard of the terminal device 200 as the query text.

For example: taking the terminal device 200 as a display device as an example, as shown in fig. 5, a user may set up a remote control device through the display device. In the search page of the display device, the text of "I want to watch the movie of A actor" is typed, and the display device can analyze and process the "I want to watch the movie of A actor" through the built-in dialogue system.

Alternatively, in some embodiments, the input manner of the query text may be voice input, that is, the terminal device 200 may collect voice data input by the user and convert the collected voice data into corresponding text, so as to generate the corresponding query text. I.e. the terminal device 200 can convert the user's speech into text by speech recognition (Automatic Speech Recognition, ASR). For example, after the user speaks "i want to watch a movie of an actor" in the vicinity of the terminal device 200, the terminal device 200 may convert the voice data of "i want to watch a movie of an actor" into a text of "i want to watch a movie of an actor" as the query text.

After receiving the query text, the dialogue system carries out natural language processing on the query text, and the following explanation is carried out on partial concepts used in the natural language processing process by combining the working principle of the dialogue system:

intent (intent): the abstract description corresponding to the operation procedure to be performed by the terminal device 200 may represent the user's needs. That is, the intention refers to the actual or potential needs of the user identified by the terminal device 200. For example, the text data input by the user is "how today is weather", and the corresponding intention is "query weather".

Slot (slot): in dialog systems, slots under specific intent are used to express important information in a user's query text request. For example, when a user creates a musical skill intent to identify a text query request for "i want to listen to the X song of the a singer," we design a slot for "singer=a singer, song=x song. That is, the slot corresponds to the parameter that is intended to be carried. An intention may correspond to a number of slots, for example, the user may give the necessary parameters of location, time, etc. in the input query text when asking for weather. The parameters are slots corresponding to the intention of inquiring weather conditions.

For example, the main goal of a semantic slot filling task is to extract the values of predefined semantic slots in a semantic frame (seman nticframe) of a particular domain or a particular intent from an input statement, knowing that semantic frame. The semantic slot filling task may be converted to a sequence labeling task, marking a word as a beginning (begin), continuation (inside), or non-semantic slot (outside) of a semantic slot. Obviously, to make a dialog system work properly, the intention and slot should be designed first. The intention and slot may enable the dialog system of the terminal device 200 to identify which functional task should be performed and to output the type of parameters needed to perform the task.

Accordingly, in some embodiments, the terminal device 200 may perform an operation corresponding to a control instruction in response to the control instruction input by the user. The control instruction may be a program step preset in the terminal device 200. Then, after the dialog system recognizes the intention corresponding to the query text, a corresponding control instruction is generated according to the recognition result, so that the terminal device 200 responds to the corresponding operation.

Intent recognition and slot filling: after the definition of the intention and the slot position is completed, the user intention and the slot value corresponding to the corresponding slot can be identified from the query text.

The goal of intent recognition is to recognize user intent from input, a single task can simply be modeled as a classification problem, such as "listen to songs" intent, and when intent recognition is performed can be modeled as a classification problem of "listen to songs" or "not listen to songs". And when it involves a need for the terminal device 200 to process a plurality of tasks, the terminal device 200 needs to discriminate respective intents. At this time, the classification problem is converted into a multi-classification problem. The task of slot filling is to extract information from the data and fill in predefined slots.

For example: the user inputs a query sentence "find movie of actor a" to the terminal device 200. The terminal device 200 outputs the domain, intention and slot information of the query sentence. The semantic understanding data are shown in the following table, wherein Query is user Query, and the field, the intention and the slot are results to be identified.

Based on the terminal device 200 in the above embodiment, after the terminal device 200 obtains the query text input by the user, the semantic understanding module of the dialogue system analyzes the domain, the intention and the slot in the query text, so as to identify the semantic intention of the query text. That is, the terminal device 200 may output the domain, intention, and slot information corresponding to the query text through the dialogue system.

In some embodiments, after obtaining the query text, the terminal device 200 adds special symbols before the query text to form an input text, and obtains sentence vectors and word vectors (sequential output) of the input text according to a pre-trained model. The sentence vector is the vector corresponding to the special symbol, and the word vector is all vectors except the special symbol. The sentence vector and the total word vector are then added to obtain a first text sequence vector. And performing similarity calculation on the first text sequence and all preset intention labels to obtain an intention regression vector. And normalizing the intention regression vector to obtain an intention probability vector, and outputting a corresponding intention label with the probability larger than a threshold value.

Similarly, for slot recognition, in some embodiments, the terminal device 200 performs multi-headed attention calculation on the obtained word vectors and all intention vectors to obtain the similarity of each word vector to each intention tag as a second text sequence vector. And vector addition is carried out on the second text sequence vector, the word vector and the intention probability vector so as to obtain a third text sequence vector. And performing similarity calculation on the third text sequence vector and all preset slot labels to obtain a slot regression vector. And normalizing the slot regression vector to obtain a slot probability vector, and outputting a corresponding slot label with probability greater than a threshold value.

However, the semantic understanding module of the dialog system described above is primarily directed to affirmative query recognition. Such as "find a movie of actor" in the above embodiment, the description in the query text is a forward description. If the user enters query text of a negative type, such as "find movie without a actor" in the interactive composition. At this time, the semantic understanding module is very easy to fail to accurately recognize the user's intention when performing query recognition, resulting in the terminal device 200 still recommending the movie including the a actor to the user. That is, when the terminal device 200 performs the negative query text recognition, the semantic intention recognition of the query text is inaccurate, and the experience of the user is reduced.

Based on the above application scenario, in order to improve the experience of the user and improve the problem of low semantic intention recognition rate in the terminal device 200, some embodiments of the present application provide a semantic intention recognition method. As shown in fig. 6, the method specifically includes the following:

s100: and analyzing the query text, and generating an initial label according to the query text.

Wherein the initial tag is a forward tag of the query text. That is, the content included in the initial tag is forward semantic. Namely, the terminal device 200 analyzes the corresponding domain, intention and slot information in the query text, and generates a corresponding forward label according to the analyzed domain, intention and slot information.

To facilitate parsing the query text, in some embodiments, the terminal device 200 generates a query text sequence from the query text and extracts word vectors of the query text sequence. An average pooling is then performed on the word vectors to generate sentence vectors for the query text. The sentence vector obtained after the averaging pooling operation is a new vector, and the sentence vector comprises all word information in the query text.

Illustratively, the base of the BERT model is that, as shown in fig. 7, in some embodiments, the query text is input to the BERT model, which then outputs the word vector corresponding to the sequence of query text pairs. Formalized, let the word vector be denoted as h= { H ₁ ,h ₂ ,…,h _n }. An average pooling operation is performed on all word vectors to obtain a new vector. At this time, this new vector integrates the word information in the sentence, and then it is taken as a sentence vector representing the sentence. The specific formula is as follows:

wherein C is a sentence vector, h _i Is a word vector, n is the number of words contained in the sentence.

In the process of analyzing the query text, domain recognition and intention recognition corresponding to the query text are also needed. Thus, in some embodiments, the first distribution vector is parsed from the sentence vector. The first distribution vector comprises a domain distribution vector and an intention distribution vector. The first distribution vector is then normalized by an activation function to obtain a domain prediction vector and an intent prediction vector. The domain prediction vector is used to predict the domain in the query text and the intent prediction vector is used to predict the intent in the query text.

Illustratively, based on the BERT model, in some embodiments, the resulting sentence vectors are passed through a neural network linear layer of domain identification and a neural network linear layer of intent identification, respectively, to obtain a domain distribution vector and an intent distribution vector. The neural network linear layer is a learnable network parameter. The formula is as follows:

D＝W _D C+b _D

I＝W _I C+b _I

wherein D and I are domain distribution vectors and intention distribution vectors respectively; w (W) _D And b _D Learning parameters of a neural network linear layer for domain identification; w (W) _I And b _I A learnable parameter of a neural network linear layer that is intended to be identified.

Then, for the domain distribution vector and the intention distribution vector, a domain prediction vector and an intention prediction vector are obtained through a softmax activation function respectively. The formula is as follows:

D'＝softmax(D)

I'＝softmax(I)

wherein D 'and I' are the domain prediction vector and the intent prediction vector, respectively.

In the above embodiment, the BERT (bidirectional encoder representation from tran sformers) model is a bidirectional transducer encoder. Among these, transformer is a method that relies entirely on self-attention to calculate input and output characterizations. BERT uses a masked model to implement bi-directionality of the language model and demonstrates the importance of bi-directionality to language representation pre-training 12. The BERT model is a truly bi-directional language model, so that each word can utilize the context information of the word at the same time, so BERT aims to pre-train deep bi-directional representations by jointly adjusting the contexts in all layers. Thus, the pre-trained BERT representation can be fine-tuned by an additional output layer, suitable for the construction of a broad task model.

The method comprises the steps of adding a full connection layer to the BERT model, training, and removing the BERT model of the full connection layer after training, wherein various natural language processing tasks including a sequence labeling task, a classification task, sentence relation judgment, a generation task and the like can be performed through the BERT model.

After obtaining the domain prediction vector and the intention prediction vector, the terminal device 200 can predict the final domain and intention classification result according to the domain prediction vector and the intention prediction vector.

To facilitate predicting the final result, in some embodiments, the terminal device 200 also generates a prediction result from the domain prediction vector and the intent prediction vector. The prediction result comprises field information, intention classification information, a prediction value corresponding to the field information and a prediction value corresponding to the intention classification information. Obviously, the higher the predicted value, the closer the end result is explained. The terminal device 200 needs to screen the domain information having the highest predicted value and the intention classification information having the highest predicted value. And then packaging the screened domain information and the intention classification information into an initial tag to serve as a prediction result.

Illustratively, in some embodiments, the data is trained by a cross entropy loss function in generating the prediction result. That is, in the training process, cross entropy (cross entropy) loss function may be used for the domain prediction vector, the intention prediction vector, and the classification result. And in the prediction process, selecting the result with the highest predicted value as the final classification result of the field and the intention.

In addition, in the process of analyzing the query text, the slot recognition corresponding to the query text is also needed. Thus, in some embodiments, the terminal device 200 will also query the text input slot identification model to generate a slot distribution vector from the neural network linear layer of the slot identification model, and annotate the slot distribution vector based on the sequence annotation method. The bin distribution vector is then normalized by an activation function to obtain a bin prediction vector. The slot prediction vector is used for predicting a slot classification result in the query text.

In some embodiments, the slot identification model may be, but is not limited to, an "ebedding+bidirectional lstm+crf" neural network model. Each domain class corresponds to a slot identification model. The input of the slot recognition model is the field of each word in the query text and the intention of each word, and the slot recognition model is output as a slot label of the query text, namely one of the initial labels in the embodiment of the application.

Illustratively, the slot distribution vector is labeled, in some embodiments by way of BIO labeling, based on the BERT model. BIO labeling is one of joint labeling, wherein B, I, O represents Begin, inne r, and Other, respectively. Further, B-X indicates that the element is of the X type and is located at the start of the segment, I-X indicates that the element is of the X type and is located in the middle of the segment of the element, and O indicates that the element is not of the X type. For example, for the term "play", the model built in the terminal device 200 outputs "B-actionPlay" for the "play" position; and outputting 'I-actionPlay' for the 'put' position. For other word positions that do not contain meaning, an "O" is output to indicate.

For example, the query text is "find a Movie of actor" and the terminal device 200 may recognize two kinds of intentions, i.e., "find actor (search_actor)", find Movie (search_movie) ", respectively. When the terminal device 200 recognizes that the intention of the text sequence is "search_actor", the terminal device 200 takes "a actor" as an actor character slot. Under the intention of searching for an actor (search_actor), a text sequence marking result corresponding to a query text is 'search-O', 'A-B-actor', 'play-I-actor', 'member-I-actor', 'electric-O', 'shadow-O'. When the terminal apparatus 200 recognizes that the intention of the text sequence is "search_movie", the terminal apparatus 200 takes "a actor" as an actor character slot and "Movie" as a Movie slot; under the intention of searching for a Movie (search_movie), a text sequence marking result corresponding to a query text is "search-O", "A-B-actor", "show-I-actor", "member-I-actor", "O", "electric-I-Movie", "shadow-I-Movie".

Based on the labeling manner, in some embodiments, as shown in fig. 8, the terminal device 200 may obtain a slot distribution vector of each word by identifying the parsed word vector through a neural network linear layer. The slot distribution vector for each word is then normalized using the softma x function to obtain a slot prediction vector for each word.

Obviously, when the classification result of the slot is predicted according to the slot prediction vector, the slot classification information with the highest prediction value is the final slot classification result. Thus, after obtaining the slot prediction vector, in some embodiments, the terminal device 200 also trains a model corresponding to the slot prediction vector to generate a prediction result. The prediction result comprises slot classification information and a prediction value corresponding to the slot classification information. And screening the groove classification information with the highest predicted value, and packaging the screened groove classification information to the initial label.

Illustratively, in some embodiments, the terminal device 200 trains the model corresponding to the slot prediction vector and the slot classification information using a loss function of cross entropy. In the prediction process, directly selecting the slot classification information with the highest predicted value as a slot classification result corresponding to the current word.

S200: the query text is entered into a semantic model.

After generating the initial tag, the terminal device 200 inputs the query text into the semantic model to detect the semantics of the initial tag through the bi-classified neural network linear layer in the semantic model.

It should be noted that, the semantics detected in step S200 in the embodiment of the present application refers to determining whether the initial label corresponds to positive semantics or negative semantics. For example, the query text is "find movie without a actor", and the parsed label "a actor" corresponds to negative semantics.

S300: and marking the initial label according to the semantics to obtain the label to be identified.

After detecting the semantics of the initial tag, the terminal device 200 marks the initial tag according to the semantics corresponding to the initial tag to generate the tag to be identified. The label to be identified comprises a positive label and a negative label, wherein the positive label is an initial label with positive semantics, and the negative label is an initial label with negative semantics.

For example: the query text is "find movie without a actor", and after analyzing the slot and the intention, the terminal device 200 may obtain an initial tag: "find", "none", "A actor", "movie". The whole semantic of the query text is combined to obtain that the label "A actor" corresponds to negative semantic, and the label "A actor" can be marked as negative label.

For example, in some embodiments, the terminal device 200 may tag an initial tag representing negative semantics with a logical non-symbol as a negative tag; and then directly outputting the initial label representing the positive semantics as a positive label.

For example: the user inputs a query sentence "find movie without a actor" to the terminal device 200, and the terminal device 200 outputs the domain, negative intention, and negative slot of the query sentence. The semantic understanding data are shown in the following table, wherein query is a user query; the domain, intention and slot are the results to be identified. The negative symbols are represented by logical negatives ("|") marked on the negative intentions or slots for determining negative objects. For negatives, such as words like "don't want", "not", "none", "not", etc., the terminal device 200 is denoted by a specific label "negative".

To facilitate recognition of negative intents, in some embodiments, the terminal device 200 also trains a discriminant negative bisectional neural network linear layer. And obtaining the intended negative judgment vector through the sentence vector obtained by the negative judgment neural network linear layer. The terminal device 200 may generate the intention negative prediction vector by normalizing the intention negative determination vector. The intent-to-negative prediction vector is the meaning of the initial label corresponding to the prediction intent, and a model corresponding to the intent-to-negative prediction vector can be trained by adopting a cross entropy loss function so as to judge the initial label as a positive label or a negative label.

Similarly, to facilitate identifying negative slots, as shown in fig. 9, in some embodiments, the terminal device 200 also detects consecutive segments of the target text. The target text is a query text corresponding to the filtered slot classification information. And then performing flattening on the continuous fragments to generate fragment representation vectors of the continuous fragments. And then generating a slot negative prediction vector according to the segment representation vector. The negative prediction vector of the slot is used for predicting the semantics of the initial label, and a model corresponding to the negative prediction vector of the slot can be trained by adopting a cross entropy loss function so as to judge whether the initial label is a positive label or a negative label.

Further, the terminal apparatus 200 may generate a slot negative prediction vector from the slot negative discrimination vector. That is, in some embodiments, the terminal apparatus 200 inputs the segment representative vector into the negative discrimination model to generate the slot negative discrimination vector by the negative discrimination model. And normalizing the negative judgment vector of the slot position by an activation function to generate a negative prediction vector of the slot position.

Illustratively, in some embodiments, the terminal device 200 trains models corresponding to the intent negative predictive vector and the slot negative predictive vector using cross entropy (cross entropy) loss function during training. And in the prediction process, according to the result of the predicted value, whether the current initial label is negative or not is obtained. If the negation determiner determines that the intention or slot of the position is negative, then a negative symbol "+|! ", as a negative tag; if the negative arbiter considers that the intention or the slot of the position is affirmative, the initial tag is directly output as an affirmative tag.

That is, in some embodiments, the terminal device 200 may derive the intended negative discrimination vector from the sentence vector by a negative discrimination neural network linear layer. And, for each continuous segment of the word vector of the predicted slot, the terminal device 200 may average and pool the word vectors of the continuous segment to obtain the segment representation vector corresponding to the segment. And then the segment expression vector passes through a negative judgment neural network linear layer to obtain a slot negative judgment vector of the slot. After obtaining the intended negative determination vector and the slot negative determination vector, the terminal device 200 obtains the intended negative prediction vector and the negative prediction vector of each slot through the softmax activation function. And predicting whether the initial label is a negative label according to the intended negative prediction vector and the negative prediction vector of each slot position, and marking the initial label predicted to be the negative label.

S400: and generating a recognition result of the query text based on the label to be recognized.

After the initial tag is marked to generate the positive tag and the negative tag, the terminal device 200 may generate a corresponding recognition result according to whether the initial tag is marked or not.

For example: the query text input by the user is "find song that is not a singer", and the terminal device 200 parses the text of "find song that is not a singer" to generate semantic understanding data shown in the following table.

The terminal device 200 may generate a song control instruction to query songs and screen out a singer according to the semantic understanding data, and the terminal device 200 may directionally recommend songs that do not include the a singer to the user.

That is, according to the negative tag among the tags to be detected, the terminal device 200 can more accurately recognize the negative intention contained in the query text input by the user. In addition, the identification method provided by the embodiment of the application has higher accuracy when identifying the query text which does not contain the negative label. Therefore, the identification method of the embodiment of the application can effectively improve the semantic intention identification rate of the terminal equipment 200, is closer to the requirements of users, and improves the user experience.

Illustratively, based on the BERT model, as shown in fig. 10, the terminal device 200, after acquiring the query text, positively identifies the domain, intention, and slot of the query text. At this time, the identification results generated by the terminal device 200 are all forward tag results. Then, the terminal device 200 performs a classification judgment on the label of each slot and the intention by the negative judgment device, and judges whether the label is a negative label. After the judgment is finished, generating a semantic recognition result of the query text according to the negative label, and executing corresponding operation according to the semantic recognition result.

In addition, in order to improve the recognition rate of the semantic intent, in some embodiments, when the terminal device 200 predicts the initial label of the negative slot, multiple negative word prediction models may be fused, so as to enhance the accuracy and recall rate of understanding the negative slot by the models.

Because of the imbalance between the negative expression data and the positive expression data, in some embodiments, the terminal device 200 further adopts a method of optimizing data sampling and adjusting the model loss to be focallos, so as to further improve the effect of semantic understanding data processing.

It can be appreciated that the semantic intent recognition method of the present application can also combine modeling positive and negative, and learn positive and negative expressions of intent and slot, thereby reducing the dependence of negative intent recognition model training on negative data and positive intent understanding. The present application is not limited in this regard.

Based on the above semantic intent recognition method, some embodiments of the present application further provide a terminal device 200, as shown in fig. 11, including: detector 230 and controller 250. The detector 230 is configured to obtain query text input by a user; as shown in fig. 6, the controller 250 is configured to perform the following program steps:

S100: analyzing the query text, and generating an initial label according to the query text, wherein the initial label is a forward label of the query text;

s200: inputting the query text into a semantic model to detect the semantics of the initial tag through a binary neural network linear layer in the semantic model;

s300: marking the initial tag according to the semantic meaning to obtain a tag to be identified, wherein the tag to be identified comprises a positive tag and a negative tag, the positive tag is an initial tag with positive semantic meaning, and the negative tag is an initial tag with negative semantic meaning;

According to the technical scheme, the terminal equipment and the semantic intention recognition method provided by some embodiments of the application can analyze the query text input by the user, and then generate the initial tag according to the query text. Wherein the initial tag is a forward tag of the query text. The query text is entered into a semantic model to detect the semantics of the initial tag through a bi-classified neural network linear layer in the semantic model. And marking the initial label according to the semantic meaning to obtain a label to be identified, wherein the label to be identified consists of a positive label with positive semantic meaning and a negative label with negative semantic meaning. According to the identification method, the negative limit is defined through the negative intention and the negative slot, so that the identification result of the query text can be generated based on the label to be identified, the identification rate of the semantic intention is improved, and the user experience is improved.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the corresponding technical solutions from the scope of the technical solutions of the embodiments of the present application.

The foregoing description, for purposes of explanation, has been presented in conjunction with specific embodiments. However, the illustrative discussions above are not intended to be exhaustive or to limit the embodiments to the precise forms disclosed above. Many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles and the practical application, to thereby enable others skilled in the art to best utilize the embodiments and various embodiments with various modifications as are suited to the particular use contemplated.

Claims

1. A terminal device, comprising:

the detector is used for acquiring query text input by a user;

A controller configured to:

2. The terminal device of claim 1, wherein the controller executing the parsing the query text is configured to:

generating a query text sequence according to the query text;

extracting word vectors of the query text sequence;

performing average pooling on the word vectors to generate sentence vectors of the query text, the sentence vectors including term information in the query text.

3. The terminal device of claim 2, wherein the controller is configured to:

Analyzing a first distribution vector from the sentence vector, wherein the first distribution vector comprises a domain distribution vector and an intention distribution vector;

normalizing the first distribution vector by an activation function to obtain a domain prediction vector for predicting a domain in the query text and an intent prediction vector for predicting an intent in the query text.

4. A terminal device according to claim 3, wherein the controller, executing the generation of an initial tag from the query text, is configured to:

generating a prediction result according to the domain prediction vector and the intention prediction vector, wherein the prediction result comprises domain information, intention classification information, a prediction value corresponding to the domain information and a prediction value corresponding to the intention classification information;

screening the domain information with the highest predicted value and the intention classification information with the highest predicted value;

and packaging the screened domain information and the intention classification information to the initial label.

5. The terminal device of claim 4, wherein the controller executing the generation of the prediction result from the domain prediction vector and the intent prediction vector is configured to:

And training data through a cross entropy loss function in the process of generating the prediction result.

6. The terminal device of claim 1, wherein the controller is configured to:

inputting the query text into a slot identification model to generate a slot distribution vector according to a neural network linear layer of the slot identification model;

labeling the slot position distribution vector based on a sequence labeling method;

normalizing the slot distribution vector by an activation function to obtain a slot prediction vector, wherein the slot prediction vector is used for predicting a slot classification result in the query text.

7. The terminal device of claim 6, wherein the controller is configured to:

generating a prediction result according to the slot position prediction vector; the prediction result comprises the groove classification information and a prediction value corresponding to the groove classification information;

screening the slot classification information with the highest predicted value, and packaging the screened slot classification information to the initial tag.

8. The terminal device of claim 7, wherein the controller is configured to:

detecting continuous fragments of a target text, wherein the target text is the query text corresponding to the screened slot classification information;

Performing pooling on the continuous segments to generate segment representation vectors for the continuous segments;

generating a slot negative prediction vector according to the fragment representation vector, wherein the slot negative prediction vector is used for predicting the semantics of the initial tag.

9. The terminal device of claim 8, wherein the controller executing the generating of the slot negative prediction vector from the segment representation vector is configured to:

inputting the segment representative vector into a negative judgment model to generate a slot negative judgment vector through the negative judgment model;

normalizing the negative discrimination vector of the slot by an activation function.

10. A method for identifying semantic intent, comprising: