CN115392264A

CN115392264A - RASA-based task-type intelligent multi-turn dialogue method and related equipment

Info

Publication number: CN115392264A
Application number: CN202211342781.3A
Authority: CN
Inventors: 梁兴伟; 王冰冰; 严海强; 杨波
Original assignee: Konka Group Co Ltd
Current assignee: Konka Group Co Ltd
Priority date: 2022-10-31
Filing date: 2022-10-31
Publication date: 2022-11-25

Abstract

The invention discloses a RASA-based task-type intelligent multi-turn dialogue method and related equipment, wherein the method comprises the following steps: constructing a natural language understanding module and a multi-turn dialogue management module based on RASA, and acquiring text information input by a user; controlling the natural language understanding module to carry out intention detection and semantic slot filling on the text information to respectively obtain user intention and entity information; and controlling the multi-turn dialogue management module to match a response result of the text information based on the user intention and the entity information, feeding the response result back to the user, and displaying a visual interface of the user dialogue. According to the invention, the dialogue system is constructed based on the RASA open-source framework and the pipeline method, the tasks of the modules are clear and independent, the complex and tedious configuration process is presented in a graphical mode, and the construction efficiency is improved; and by adopting a Botfront open source framework and a front-back end interaction technology, the model can be trained one-key only by configuring related parameters, and the personalized service of the user is realized.

Description

RASA-based task-type intelligent multi-turn dialogue method and related equipment

Technical Field

The invention relates to the technical field of man-machine interaction, in particular to a RASA-based task-type intelligent multi-turn dialogue method, system and terminal.

Background

The dialogue system is an important content in the field of human-computer interaction, human beings use natural language to communicate information with the system, and machines can provide personalized services for the system.

The multi-turn dialogue system aims at knowing the complex intention of a user with the minimum number of turns and providing personalized services in a targeted manner, research on multi-turn dialogue currently achieves certain research progress and research result, but has certain gap from practical application, and the following problems are faced: the traditional task-type multi-turn conversation system is complicated in process and high in repeatability, all links are dispersed, and no engineering process is formed; most task-based multi-turn dialog systems do not support visual analysis, so that users or managers cannot visually evaluate the dialog effect; because the dialogue management corpus is not the original language input by the user, the structured dialogue story flow containing intention, word slot and historical dialogue content is marked manually based on the original input data; therefore, the dialogue management corpus labeling is difficult, and the user personalized service cannot be realized; especially, when the conversation scene and the conversation process are complex and the number of the intended word slots is large, manual labeling is difficult and corpus labeling quality is difficult to ensure; the existing technology has the problems that visual analysis is not supported, so that a user or a manager cannot visually evaluate conversation effect, language material labeling is difficult to manage aiming at conversation, and user personalization cannot be realized.

Accordingly, there is a need for improvements and developments in the art.

Disclosure of Invention

The invention mainly aims to provide a RASA-based task-based intelligent multi-turn dialogue method and related equipment, and aims to solve the problems that visual analysis is not supported and user personalization cannot be realized in the prior art.

In order to achieve the above object, the present invention provides a tasking intelligent multi-turn dialog method based on RASA, which includes the following steps:

constructing a natural language understanding module and a multi-turn dialogue management module based on RASA, and acquiring text information input by a user;

controlling the natural language understanding module to carry out intention detection and semantic slot filling on the text information to respectively obtain user intention and entity information;

and controlling the multi-turn dialog management module to match a response result of the text information based on the user intention and the entity information, feeding the response result back to the user, and displaying a visual interface of user dialog.

Optionally, the RASA-based task-based intelligent multi-turn dialog method, where the RASA-based natural language understanding module and the multi-turn dialog management module are constructed, and text information input by a user is acquired, and the method further includes:

and constructing scene corpora based on semantic data and spoken habits, and classifying the scene corpora according to different intention categories to construct the data type of the text information.

Optionally, the RASA-based task-based intelligent multi-turn dialog method, wherein the controlling the natural language understanding module performs intention detection and semantic slot filling on the text information to obtain user intention and entity information, respectively, before further comprising:

extracting the characteristics of the text information to obtain text characteristics based on a pre-training language model for large-scale Chinese corpus training, and segmenting the text information to obtain target text information;

embedding the target text information into a vector space based on the text features to cause the natural language understanding module to process the text information.

Optionally, the RASA-based task-based intelligent multi-turn dialog method, wherein the controlling the natural language understanding module performs intention detection and semantic slot filling on the text information to obtain a user intention and entity information, respectively, specifically includes:

the method comprises the steps that a DIETClassifier is used as a classifier for intention recognition in advance, and text information is input into the classifier for intention classification;

performing intention detection on the spoken texts in the text information after intention classification to obtain user intentions of the text information;

and labeling words in the text information based on the semantic information, and controlling an extractor to perform semantic slot filling based on the labels to obtain entity information of the text information.

Optionally, the RASA-based task-based intelligent multi-turn dialog method, wherein the decimator includes a DIET decimator, a regular expression decimator, and a conditional random field decimator.

Optionally, the RASA-based task-based intelligent multi-turn dialog method includes, after that, constructing a natural language understanding module and a multi-turn dialog management module based on RASA, and acquiring text information input by a user:

when the RASA provides insufficient components, acquiring a component interface of the RASA, and accessing different components based on the component interface.

Optionally, the RASA-based task-based intelligent multi-turn dialog method, wherein the controlling the multi-turn dialog management module to match a response result of the text information based on the user intention and the entity information, feeding the response result back to the user, and displaying a visual interface of a user dialog specifically includes:

inputting the user intention and the entity information into a tracker based on an interpreter to obtain a conversation state of the user, and sending the conversation state to a policer;

controlling the strategy device to perform action response based on the conversation state, and outputting text conversation based on the responded action;

and displaying a visual interface of the user conversation based on the Botfront framework, and constructing a multi-language conversation agent.

Optionally, the RASA-based task-based intelligent multi-turn dialogue method includes a weather query, a schedule query, and a movie query.

Optionally, the RASA-based task-based intelligent multi-turn dialog method includes:

the data acquisition module is used for constructing a natural language understanding module and a multi-turn dialogue management module based on RASA (random access association), and acquiring text information input by a user;

the natural language understanding module is used for understanding the user intention of the text information, inputting the user intention into a correct intention category and extracting a semantic groove value of the text information;

the multi-round dialogue management module is used for training a dialogue management model and outputting an answer text of the text information;

the data analysis module is used for controlling the natural language understanding module to carry out intention detection and semantic slot filling on the text information so as to respectively obtain user intention and entity information;

and the result display module is used for controlling the multi-turn dialogue management module to match a response result of the text information based on the user intention and the entity information, feeding the response result back to the user and displaying a visual interface of the user dialogue.

In addition, to achieve the above object, the present invention further provides a terminal, wherein the terminal includes: the processor is used for executing the RASA-based task-based intelligent multi-turn dialog program to realize the steps of the RASA-based task-based intelligent multi-turn dialog method when the RASA-based task-based intelligent multi-turn dialog program is executed by the processor.

The method comprises the steps of constructing a natural language understanding module and a multi-turn dialogue management module based on RASA, and acquiring text information input by a user; controlling the natural language understanding module to carry out intention detection and semantic slot filling on the text information to respectively obtain user intention and entity information; and controlling the multi-turn dialogue management module to match a response result of the text information based on the user intention and the entity information, feeding the response result back to the user, and displaying a visual interface of the user dialogue. According to the invention, the dialogue system is constructed based on the RASA open-source framework and the pipeline method, the tasks of the modules are clear and independent, the complex and tedious configuration process is presented in a graphical mode, and the construction efficiency is improved; and by adopting a Botfront open source framework and a front-back end interaction technology, the model can be trained one-key only by configuring related parameters, and the personalized service of the user is realized.

Drawings

FIG. 1 is a flow chart of a preferred embodiment of the RASA-based task-based intelligent multi-turn dialog method of the present invention;

FIG. 2 is a schematic diagram of the overall framework of the RASA-based task-based intelligent multi-turn dialog system of the present invention;

FIG. 3 is a schematic diagram of Rasa natural language understanding and Rasa core of the RASA-based task-based intelligent multi-turn dialog method of the present invention;

FIG. 4 is a diagram of a pre-training language model based on large-scale Chinese corpus training according to the present invention;

FIG. 5 is a diagram of a multitasking architecture for intent classification and entity identification in accordance with the present invention;

FIG. 6 is a block diagram of a multi-turn dialog management module framework according to the present invention;

FIG. 7 is a schematic diagram of creating and training a story in an embodiment of the invention;

FIG. 8 is a schematic diagram of creating, training and evaluating NLU models in an embodiment of the present invention;

FIG. 9 is a diagram illustrating the creation and editing of corresponding responses in an embodiment of the present invention;

FIG. 10 is a schematic illustration of a monitoring session in an embodiment of the invention;

FIG. 11 is a schematic illustration of an NLU utterance for review and annotation input in an embodiment of the present invention;

FIG. 12 is a schematic diagram of a preferred embodiment of the RASA-based task-based intelligent multi-turn dialog system of the present invention;

fig. 13 is a schematic operating environment of a terminal according to a preferred embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer and clearer, the present invention is further described in detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the invention.

It should be noted that, if directional indications (such as up, down, left, right, front, and back … …) are involved in the embodiment of the present invention, the directional indications are only used to explain the relative position relationship between the components, the motion situation, and the like in a specific posture (as shown in the drawing), and if the specific posture is changed, the directional indications are changed accordingly.

In addition, if there is a description of "first", "second", etc. in an embodiment of the present invention, the description of "first", "second", etc. is for descriptive purposes only and is not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In addition, technical solutions between various embodiments may be combined with each other, but must be realized by a person skilled in the art, and when the technical solutions are contradictory or cannot be realized, such a combination should not be considered to exist, and is not within the protection scope of the present invention.

The task-based multi-turn dialogue system is mainly oriented to tasks, and gradually collects information related to target tasks by performing multi-turn natural language dialogue with a user so as to assist the user in obtaining certain services. The research on task-based multi-turn dialog systems in the prior art is generally based on pipeline (pipeline) and end-to-end (end-to-end) architectures; the research of the early task-based dialogue System is most typical of DARPA travel Information System (ATIS) and a travel plan System (Communicator), which respectively provide dialogue services targeting airline reservation and travel plan planning based on mainly a pipe structure divided into three modules of natural language understanding, dialogue management (dialogue state tracking and dialogue strategy selection) and natural language generation, and connected in order. In recent years, some excellent task-based multi-turn dialog systems are emerging at home and abroad, and the AIUI system is developed by science news and news, and only Natural Language Understanding (NLU) service is provided; hundreds of intelligent dialogue customization and service platforms (UNIT) are developed in hundreds, wherein the platforms mainly use an intention recognition and word slot filling model and assist a dialogue template, and realize dialogue management by triggering a rule set; the UNIT defines multiple trigger rule groups for each intention, and once the conditions of a certain rule group are met, the platform triggers the execution action under the intention, but the platform does not support a developer to define multiple execution actions and does not provide a visualization module.

There are many conversational systems or platforms in foreign countries as well, microsoft develops an LUIS (Language Understanding Intelligent Service) platform based on NLU Service, the platform adopts a pipeline structure and machine learning to respectively train a model for an intention and a word slot defined by a developer, but does not provide a conversational management function; ai was purchased by Google in 2016 and renamed to Dialogflow, the method of its natural language processing module is similar to the LUIS platform developed by microsoft corporation, and dialog management is performed in the form of context, which can reflect the current request state of the user, so that the dialog system can transmit an intention to the user, thereby controlling the dialog path. In addition, unlike microsoft LUIS and Google's Dialogflow platform, facebook's wit.ai platform jointly recognizes intents and word slots using an end-to-end structure, without requiring developers to configure the intents and word slots, whereas task-based multi-turn conversations have strong domain relevance, and the definition of the intents and word slots is also helpful for natural language understanding; the current end-to-end architecture is also in the initial research stage, and the model architecture does not perform well in a task-based multi-turn dialogue scene.

As shown in fig. 1, the RASA-based task-based intelligent multi-turn dialog method according to a preferred embodiment of the present invention includes the following steps:

and S10, constructing a natural language understanding module and a multi-turn dialogue management module based on RASA, and acquiring text information input by a user.

Specifically, the RASA-based task-based intelligent multi-turn dialog system is mainly implemented by performing algorithm based on an RASA framework and implementing a visual interface based on Botfront (front end of the robot), wherein the RASA framework is an open-source robot framework for implementing multi-turn dialog based on machine learning; the method comprises the steps of adopting pipeline to split and modularize key technical problems in multiple rounds of conversation, cascading a plurality of modules through the pipeline method, and defining an interactive interface mode for each module so as to determine input and output of each module. In the general pipeline method, the multi-turn dialog system mainly includes five parts of ASR (automatic speech recognition), NLU (natural language understanding), DM (dialog management), NLG (natural language generation), and TIS (speech synthesis), as shown in fig. 2; the user sends the voice signal to the ASR; the ASR recognizes text information in the speech signal (e.g., a french person looking at a Alibab) and sends the text information to the NLU; the NLU identifies an intention and a semantic groove (for example, the intention is a French look, the semantic groove is a company name: alibaba, attribute: french) in the text information, and sends the intention and the semantic groove to a DM, the DM carries out DST (state tracking) and DPO (policy optimization) on the text information based on the intention and the semantic groove, retrieves the French look based on a knowledge base and APIs (APIs), and sends the French look to an NLG; the NLG sends a text reply (e.g., xxx, a courabar corporation) to the TIS; and the TIS synthesizes the text reply into voice, and plays the voice for the user to finish the conversation with the user.

The two modules of natural language understanding and dialogue management are in the most close logical connection, are the core of task-type dialogue and are the problems that each dialogue system needs to pay attention to and solve; as shown in fig. 3, RASA establishes RASA NLU (RASA natural language understanding) and RASA Core (RASA Core, also called multi-round dialogue management module); the Rasa NLU is used for intention identification, entity identification, and data conversion of input of a user into structured data, nlu.md and NLU _ config.yml, and the Rasa Core is used for conversation management and deciding what content is returned to the user next, mainly analyzing stories and defining domains, wherein the stories comprise scene flows of conversations, story creation, title, intention and action analysis, and stores.md; yml, domain, including knowledge base of machines, intents, actions, answer templates, entities, word slots, and domain; establishing a Rasa NLU and a Rasa Core through RASA to respectively finish user message understanding and multi-round conversation management, solving the two Core problems and realizing the main functions of a conversation system; the main documents used and the related functions are shown in the following table.

The primary purpose of the natural language understanding module is to understand user intent. The method generally comprises two tasks of intention detection and semantic slot filling, achieves the purpose of understanding and formatting the intention of a user by analyzing the semantics of a text input by the user and extracting key information related to the tasks, and provides support for subsequent modules of multi-turn conversations; the intention detection is generally regarded as a sentence classification problem, and a category of a user purpose is predicted from a predefined category set through an algorithm, wherein the category corresponds to an intention; different from other classification tasks, the intention detection data is spoken text, and needs to be combined with sentences and contexts to capture real semantic information; the semantic slot filling is to understand a segment of characters by marking meaningful words or signs in sentences, and labels are marked on each word (character) in the text according to semantic information, which is essentially a sequence labeling task, and the label can be used for extracting clearly defined attributes (namely slot positions) from the text, so that the user intention is converted into clear instructions; in contrast, intent detection focuses more on the overall meaning of user input, and semantic slot filling focuses on understanding and capturing the fine granularity of text, as shown in the following table.

Furthermore, data is the basis of an artificial intelligence system, whether based on rules, or a traditional machine learning method, or a currently common neural network method, and tens of thousands of high-quality data are often required for training to obtain accurate and problem-compliant parameters; the application scene of the task type conversation needs a large amount of training data which aims at a specific field and accords with daily conversation logic and spoken language habits; even so to say, the quality of the data largely determines the performance of the dialog system, and is limited by the current research situation of the task-based dialog system and the special application scene oriented by the invention, and no sufficient starting data set is available to complete the task, so that the data set is automatically constructed in the invention, thereby supporting the model to realize the function of the task-based dialog system; firstly, building common scene linguistic data (such as greetings, billings, chatting and the like) by referring to other multi-turn conversation tasks and daily spoken habits, then building data for practical application scenes (such as weather inquiry, schedule arrangement inquiry and movie and television inquiry) according to different intention categories, and labeling semantic slots to extract key information input by a user; then, in order to improve the generalization capability of the model, data enhancement is carried out on the basic data; on one hand, synonyms, near synonyms and association words are replaced for the semantic slots, so that the data set can cover more scenes; on the other hand, sentence pattern transformation is carried out on the data, so that the data set can adapt to more types of spoken language expressions; the constructed and enhanced data set is proved to be effective in the subsequent process, the system can be supported to complete multiple rounds of conversations under most contexts, but some special conditions can not be completed smoothly, the data which can not be completed smoothly are recorded, fed back to the multiple rounds of conversation system and added into the data set for retraining, and the accuracy and generalization capability of the system are continuously improved in a self-supervision mode.

And S20, controlling the natural language understanding module to perform intention detection and semantic slot filling on the text information to respectively obtain user intention and entity information.

The step S20 includes:

step S21, taking DIETClassifier as a classifier for intention identification in advance, and inputting the text information into the classifier for intention classification;

s22, carrying out intention detection on the spoken language texts in the text information after intention classification to obtain the user intention of the text information;

and S23, labeling words in the text information based on the semantic information, and controlling an extractor to perform semantic slot filling based on the labels to obtain entity information of the text information.

Specifically, according to a general processing flow of a natural language understanding module, aiming at an intention detection task, a Rasa NLU (Rasa natural language understanding module) designs a classifier module (classifier) for classification; aiming at the semantic slot filling task, the Rasa NLU designs a word segmentation device (tokenizer), a feature extractor (featurer), an extractor (extractors) and the like to complete the analysis of the input text and the extraction of key information, and the RASA frame has rich content, powerful functions and is integrated with a plurality of components; in the classifier module, a method based on key words, a method based on the MITIE language model and the like are included; when the semantic slot is analyzed, efficient language models such as MITIE, space and the like are provided, and tools such as jieba word segmentation and the like are provided for Chinese, so that the whole open source framework is compatible with Chinese; when the components provided by the RASA framework can not meet the requirements of the dialog system, developers can customize the components through various interfaces provided by the framework, so that artificial rules, a traditional machine learning and statistical learning method and a front-edge deep learning result can be integrated in the multi-turn dialog system as required, and the RASA framework can be freely applied to various actual scenes and has high flexibility and expandability; the natural language understanding module is one of core modules of a task-based dialog system and is also the first module after receiving user input, and the main task of the module is to understand user intention in text information, carry out correct intention classification on the user intention and extract a proper semantic slot value; in this section, the invention is primarily based on the Rasa open source framework and rules set up for the task.

Further, if the model can process text information, firstly, the text features of the text information need to be extracted, and the text features are segmented and then embedded into vector space for representation; the Rasa framework provides a Jieba Tokenizer word segmentation component supporting Chinese and a MitieNLP Chinese word vector tool to complete a Chinese task; however, they have certain limitations, firstly, based on the processing method of Chinese word segmentation, because the word segmentation error may cause cascade error, the effect of subsequent intention classification and semantic groove value extraction is influenced; the Mitie natural language processing toolkit is mainly based on machine learning algorithms such as SVM and the like, has better keyword extraction performance, but is gradually surpassed by a large-scale pre-training language model, and the training speed is slow; therefore, the invention adopts the pre-training language model BERT-base-chip based on large-scale Chinese corpus training to realize the feature extraction of the Chinese text, wherein the pre-training language model BERT-base-chip is based on rich pre-training knowledge of a BERT model (the main input of the BERT model is an original word vector of each word/word in the text), so that the pre-training language model can be widely applied to various downstream tasks and unused contexts, and the pre-training task based on shape filling can also be combined with context information to improve the performance of semantic slot extraction, as shown in figure 4.

While the Rasa framework proposes a DIET (Dual Intent Entity Transformer) framework for Intent classification and Entity extraction; the DIET framework is a multitasking architecture for intent classification and entity recognition, as shown in fig. 5, which can combine pre-trained word embedding of language models in a plug-and-play manner and combine them with word-and-character-level n-gram sparse features, experiments show that DIET can achieve better results than other models on complex natural language understanding datasets and train far-beyond-fine-tuned BERT models even without pre-trained embedding, using only word-and-character-level n-gram sparse features; the DIET framework is inherited from a Transformer Rasa model class, the whole sentence is coded by using a 12-layer Transformer and a relative position attention mechanism based on the Transformer model framework, CLS marks output by the Transformer represent intention classification for user input, similarity comparison is carried out on the CLS marks and the intention classification and the similarity comparison are carried out on the CLS marks and the intention classification, a loss function is calculated, and the purpose of accommodating the loss function is to measure the quality of model prediction; the invention uses DIETClassifier as the classifier for intention identification, and can classify the intention more accurately.

Another major task of the natural language understanding module is semantic slot filling, which is essentially an entity recognition and entity extraction problem; due to the diversity of entities and the complexity of different linguistic expressions, this problem is solved by combining a variety of decimators, which mainly include a DIET decimator, a regular expression decimator, and a conditional random field decimator; the DIET extractor is used for acquiring the relation between a context label and an input sequence label through a conditional random field layer according to the named entity recognition task on the basis of the transform output, so as to acquire entity prediction; the regular expression extractor extracts the entity by defining a lookup table and/or a regular expression in the training data, and the component checks whether the user message contains an entry of one of the lookup tables or matches one of the regular expressions; if a matching item is found, the value is extracted as an entity, and the regular expression extractor can set rules for some special words or expression modes and filter errors caused by special conditions; while the conditional random field extractor implements a conditional random field for named entity recognition, a conditional random field can be considered a undirected Markov chain, where the time step is a word, the state is an entity class, the features of the word (e.g., capitalization, POS tags, etc.) give the probability of some entity classes, as do transitions between adjacent entity tags: then calculating and returning a most possible label set; the conditional random field decimator may better learn the relationships between contexts in text.

And S30, controlling the multi-turn dialogue management module to match a response result of the text information based on the user intention and the entity information, feeding the response result back to the user, and displaying a visual interface of user dialogue.

The step S40 includes:

s31, inputting the user intention and the entity information into a tracker based on an interpreter to obtain a conversation state of the user, and sending the conversation state to a strategy device;

s32, controlling the strategy device to perform action response based on the conversation state, and outputting text conversation based on the responded action;

and S33, displaying a visual interface of the user conversation based on the Botfront framework, and constructing a multi-language conversation agent.

Specifically, rasa _ core (multi-turn dialog management module) is responsible for completing the management of multi-turn dialogs, because the Rasa _ natural language understanding module only supports English and German, and the system uses a jieba word segmenter (jieba) as a Chinese word segmenter (tokenizer) as a part of the whole pipeline because the system is Chinese; when a Rasa _ natural language understanding module is used for an intention recognition task, a model of the MITIE trained based on an unsupervised method is needed, and the model is similar to word embedding in word2 vec; training the MITIE model by utilizing a database created by the user, and segmenting words of the whole corpus by using the ending segmentation words; specifically, under an ubuntu operating system, a root word bank is firstly installed through a command sudo pip install jieba, and then an executable wordrep tool is created through compiling a command cmake; finally, training to generate a binary file of total _ word _ feature _ extra _ chi.dat, wherein the binary file is a 300-dimensional word vector to be used in the whole dialog system, after the word vector is provided, training of an NLU (natural language understanding) model can be started, but before training, pipeline needs to be configured, and is specifically implemented as [ "nlp _ limiter", "token _ jieba", "ner _ limiter", "ner _ synnym", "interface _ featurer _ limiter", "interface _ classier _ sklee" ], wherein "nlp _ limiter" is used for initializing the MITIE, and "token _ jeneermer _ jieba" is used for dividing words by the jeneba, and "interface _ limiter" is used for entity recognition and "interface _ recognizer" is used for extracting the characteristics of the interface _ recognizer; when the sklern is used for intention recognition, the used core algorithm is a Support Vector Machine (SVM), the input feature of the SVM is that the word vectors of each word in a sentence with 300 dimensions are added, and then an average Vector is taken; in order to improve the performance of the NLU model, the Rasa NLU is improved, and the feature extraction of the Chinese text is realized by adopting a pre-training language model bert-base-Chinese based on large-scale Chinese corpus training.

The Rasa _ core does not realize complex conversation logic through if/else condition judgment, but trains a conversation management model through a machine learning method, and the machine learning method has good portability and good maintainability; as shown in fig. 6, firstly, a system receives a user message, sends the user message to an Interpreter module, and identifies and generates a dictionary containing a message text and an intent; the identification of the intentions by the Interpreter module is realized by a PaddleNLP deep learning model; then tracking the conversation state through a Tracker (Tracker), wherein the Tracker is mainly used for receiving and recording a new message identified by the Interpreter model; the current dialog state is then sent to Policy, which selects which Action to respond to, the responding Action will be recorded in Tracker, and the result of the responding Action is returned to the user.

Further, presentation of interactive and visual interfaces that operate in conjunction with the RASA and the Botfront framework, which can build advanced multilingual conversation agents, e.g., create and train stories (specific interfaces are shown in fig. 7), create, train, and evaluate NLU models (specific interfaces are shown in fig. 8), create and edit corresponding responses (specific interfaces are shown in fig. 9), monitor dialogues (specific interfaces are shown in fig. 10), review and annotate input NLU utterances (specific interfaces are shown in fig. 11); the invention quickly constructs a task-type multi-turn dialog system based on the RASA open-source framework and the pipeline-type structure, the system has a whole set of engineering process, the modules are clear, the interpretability is strong, multiple languages are supported, and the maintenance and the expansion are easy; the graphical configuration is realized based on the Botfront open source framework, the system presents complicated and fussy configuration processes in a graphical mode, and a user or a manager can rapidly configure a conversation only by performing simple operations such as selection, filling, dragging and the like, so that the learning cost is reduced, and the building efficiency is improved; and by using the Botfront open-source framework and the front-back end interaction technology, the user can train the model by one key only by configuring related parameters, and in addition, the user can update, add or modify the dialogue corpus at any time and train the model again by using the new corpus to realize the user personalized service.

Further, as shown in fig. 12, based on the above RASA-based task-based intelligent multi-turn dialog method, the present invention also provides a RASA-based task-based intelligent multi-turn dialog system, where the RASA-based task-based intelligent multi-turn dialog system includes:

a data acquisition module 51, configured to construct a natural language understanding module and a multi-turn dialogue management module based on the RASA, and acquire text information input by a user;

the natural language understanding module 52 is configured to understand a user intention of the text information, input the user intention into a correct intention category, and extract a semantic slot value of the text information;

the multi-round dialogue management module 53 is configured to train a dialogue management model and output an answer text of the text information;

a data analysis module 54, configured to control the natural language understanding module to perform intent detection and semantic slot filling on the text information, so as to obtain a user intent and entity information respectively;

and the result display module 55 is configured to control the multi-turn dialog management module to match a response result of the text information based on the user intention and the entity information, feed the response result back to the user, and display a visual interface of a user dialog.

Further, as shown in fig. 13, based on the above RASA-based task-based intelligent multi-turn dialog method, the present invention also provides a terminal, where the terminal includes a processor 10, a memory 20, and a display 30; fig. 13 shows only some of the components of the terminal, but it is to be understood that not all of the shown components are required to be implemented, and that more or fewer components may be implemented instead.

The memory 20 may in some embodiments be an internal storage unit of the terminal, such as a hard disk or a memory of the terminal. The memory 20 may also be an external storage device of the terminal in other embodiments, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like provided on the terminal. Further, the memory 20 may also include both an internal storage unit and an external storage device of the terminal. The memory 20 is used for storing application software installed in the terminal and various types of data, such as program codes of the installation terminal. The memory 20 may also be used to temporarily store data that has been output or is to be output. In one embodiment, the memory 20 stores a program 40 of RASA-based task-based intelligent multi-turn dialog, and the program 40 of RASA-based task-based intelligent multi-turn dialog is executable by the processor 10 to implement the RASA-based task-based intelligent multi-turn dialog method of the present application.

The processor 10 may be, in some embodiments, a Central Processing Unit (CPU), microprocessor or other data Processing chip, which is used to run program codes stored in the memory 20 or process data, such as executing the RASA-based task-based intelligent multi-turn dialog method.

The display 30 may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch panel, or the like in some embodiments. The display 30 is used for displaying information at the terminal and for displaying a visual user interface. The components 10-30 of the terminal communicate with each other via a system bus.

In one embodiment, when the processor 10 executes the interface display program 40 of the split screen window in the memory 20, the following steps are implemented:

and controlling the multi-turn dialogue management module to match a response result of the text information based on the user intention and the entity information, feeding the response result back to the user, and displaying a visual interface of the user dialogue.

Wherein, the establishing of the natural language understanding module and the multi-turn dialogue management module based on the RASA and the obtaining of the text information input by the user also comprise:

Wherein, the controlling the natural language understanding module to perform intention detection and semantic slot filling on the text information to respectively obtain user intention and entity information, and the method also comprises the following steps:

The controlling the natural language understanding module to perform intention detection and semantic slot filling on the text information to respectively obtain user intention and entity information specifically comprises:

Wherein the decimators include a DIET decimator, a regular expression decimator, and a conditional random field decimator.

Wherein, the controlling the natural language understanding module to perform intention detection and semantic slot filling on the text information to respectively obtain user intention and entity information, and then further comprises:

Wherein, the controlling the multi-turn dialog management module to match out a response result of the text information based on the user intention and the entity information, feeding the response result back to the user, and displaying a visual interface of the user dialog specifically comprises:

controlling the policy maker to perform action response based on the conversation state, and outputting text conversation based on the responded action;

Wherein the data types include weather queries, schedule queries, and movie queries.

In summary, the present invention provides a tasking intelligent multi-turn dialog method based on RASA and related devices, the method includes: constructing a natural language understanding module and a multi-turn dialogue management module based on RASA, and acquiring text information input by a user; controlling the natural language understanding module to carry out intention detection and semantic slot filling on the text information to respectively obtain user intention and entity information; and controlling the multi-turn dialog management module to match a response result of the text information based on the user intention and the entity information, feeding the response result back to the user, and displaying a visual interface of user dialog. According to the invention, the dialogue system is constructed based on the RASA open-source framework and the pipeline method, the tasks of the modules are clear and independent, the complex and tedious configuration process is presented in a graphical mode, and the construction efficiency is improved; and by adopting a Botfront open source framework and a front-back end interaction technology, the model can be trained one-key only by configuring related parameters, and the personalized service of the user is realized.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another identical element in a process, method, article, or apparatus that comprises the element.

Of course, it will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by instructing relevant hardware (such as a processor, a controller, etc.) through a computer program, and the program can be stored in a computer readable storage medium, and when executed, the program can include the processes of the embodiments of the methods described above. The computer readable storage medium may be a memory, a magnetic disk, an optical disk, etc.

It is to be understood that the invention is not limited to the examples described above, but that modifications and variations may be effected thereto by those of ordinary skill in the art in light of the foregoing description, and that all such modifications and variations are intended to be within the scope of the invention as defined by the appended claims.

Claims

1. A RASA-based task-based intelligent multi-turn dialog method is characterized by comprising the following steps:

controlling the natural language understanding module to perform intention detection and semantic slot filling on the text information to respectively obtain user intention and entity information;

2. The RASA-based task-based intelligent multi-turn dialog method according to claim 1, wherein the RASA-based natural language understanding module and multi-turn dialog management module are constructed and obtain text information input by a user, and the method further comprises:

3. The RASA-based task-based intelligent multi-turn dialog method of claim 1, wherein the controlling the natural language understanding module to perform intent detection and semantic slot filling on the text information to obtain user intent and entity information respectively further comprises:

4. The RASA-based task-based intelligent multi-turn dialog method of claim 1, wherein the controlling the natural language understanding module to perform intent detection and semantic slot filling on the text information to obtain user intent and entity information, respectively, specifically comprises:

intention detection is carried out on the spoken language texts in the text information after intention classification, and user intentions of the text information are obtained;

5. The RASA-based task-based intelligent multi-turn dialog method of claim 4, wherein the decimators comprise a DIET decimator, a regular expression decimator, and a conditional random field decimator.

6. The RASA-based task-based intelligent multi-turn dialog method of claim 1, wherein the controlling the natural language understanding module performs intent detection and semantic slot filling on the text information to obtain user intent and entity information, respectively, and then further comprising:

7. The RASA-based task-based intelligent multi-turn dialog method of claim 1, wherein the controlling the multi-turn dialog management module to match out a response result of the text information based on the user intent and the entity information, feeding the response result back to the user, and displaying a visual interface of a user dialog comprises:

8. The RASA-based task-based intelligent multi-turn dialog method of claim 2, wherein the data types include weather queries, scheduling queries, and movie queries.

9. An RASA-based task-based intelligent multi-turn dialog system, comprising:

and the result display module is used for controlling the multi-turn conversation management module to match a response result of the text information based on the user intention and the entity information, feeding the response result back to the user, and displaying a visual interface of user conversation.

10. A terminal, characterized in that the terminal comprises: memory, a processor and a program stored on the memory and executable on the processor, the program when executed by the processor implementing the steps of the RASA-based task-based intelligent multi-turn dialog method according to any of the claims 1-8.