CN110209774B

CN110209774B - Method and device for processing session information and terminal equipment

Info

Publication number: CN110209774B
Application number: CN201810142498.3A
Authority: CN
Inventors: 涂眉; 张帆; 张培华
Original assignee: Samsung Electronics Co Ltd
Current assignee: Samsung Electronics Co Ltd
Priority date: 2018-02-11
Filing date: 2018-02-11
Publication date: 2024-09-20
Anticipated expiration: 2038-02-11
Also published as: CN110209774A

Abstract

The embodiment of the invention provides a method, a device and terminal equipment for processing session information, wherein the method for processing the session information comprises the following steps: acquiring auxiliary information and source session information input by a first user; based on the auxiliary information and the source session information, target session information is generated and output. According to the technical scheme provided by the embodiment of the invention, when the target session information is generated, not only the source session information input by the first user is relied, but also the acquired auxiliary information is fully considered, so that the generated target session information not only meets the diversified demands of the user, but also ensures that the user can understand nouns or things in the generated sentences, and the user experience is improved.

Description

Method and device for processing session information and terminal equipment

Technical Field

The present invention relates to the field of session information processing technologies, and in particular, to a method, an apparatus, and a terminal device for processing session information.

Background

Natural language generation is a branch of artificial intelligence and computational linguistics, which is a computer model based on language information processing, whose working process starts from an abstract conceptual level, by selecting and executing certain semantic and grammatical rules to generate text.

The existing natural language generation technology is mainly applied to a question and answer system, and a task scene is given a question to automatically generate a corresponding answer. Such as: in the prior art, li Diwen: what place Beijing is playing? The question and answer system automatically gives a place where Beijing is played. The existing natural language generation technology can be divided into the following two types according to different generation modes: (1) template-based generation: and obtaining a logic expression of the question through semantic analysis, matching a template which is defined by the logic expression in advance, and searching a corresponding answer from a database. (2) deep learning based generation: training a string-to-string (sequence-to-sequence) generation model according to the existing question-answer corpus in a training stage; in the prediction stage, given a question sentence, generating an answer sentence according to a training-obtained generation model.

Although natural language generation in the prior art can meet the basic demands of people to a certain extent, the action objects are single, generally simple questions and answers between a machine and a user, the diversified demands of the user cannot be met, and the generated sentences completely depend on the current given context environment, and whether the user understands nouns or things in the generated sentences is not considered, so that the experience of the user is poor.

Disclosure of Invention

The present invention aims to solve at least one of the above technical drawbacks, and in particular, the technical drawbacks that cannot meet the diversified needs of users.

According to one aspect, an embodiment of the present invention provides a method for processing session information, including:

Acquiring auxiliary information and source session information input by a first user;

And generating target session information based on the auxiliary information and the source session information and outputting the target session information.

According to another aspect, an embodiment of the present invention further provides an apparatus for processing session information, including:

the information acquisition module is used for acquiring the auxiliary information and the source session information input by the first user;

And the session generation module is used for generating target session information based on the auxiliary information and the source session information and outputting the target session information.

According to another aspect, an embodiment of the present invention further provides a terminal device, including a memory and a processor, where the memory stores computer executable instructions that, when executed by the processor, perform the above-mentioned method of processing session information.

Compared with the prior art, the method for processing the session information provided by the embodiment of the invention acquires the auxiliary information and the source session information input by the first user, then generates the target session information based on the auxiliary information and the source session information, and outputs the target session information. According to the technical scheme, when the target session information is generated, not only the source session information input by the first user is relied on, but also the acquired auxiliary information is fully considered, so that the generated target session information meets the diversified requirements of the user, the user is ensured to understand nouns or things in the generated sentences, and the user experience is improved.

Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.

Drawings

The foregoing and/or additional aspects and advantages of the invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings, in which:

Fig. 1 is a flow chart of a method for processing session information according to an embodiment of the present invention;

FIG. 2 is a flow chart of generating session information according to an embodiment of the present invention;

FIG. 3 is a diagram of the overall operation of the sentence automatic generation model according to the embodiment of the present invention;

FIG. 4 is a workflow diagram of a sentence automatic generation system in accordance with an embodiment of the present invention;

FIG. 5 is a schematic diagram of a system for automatically generating a sentence in a single language according to an embodiment of the present invention;

FIG. 6 is a diagram of an automatic multilingual sentence generation system according to an embodiment of the present invention;

FIG. 7 is a schematic diagram of an automatic sentence generation system for loading user information according to an embodiment of the present invention;

FIG. 8 is a schematic diagram of an abbreviation based sentence auto-generation system in accordance with an embodiment of the present invention;

FIG. 9 is a diagram of an analog translation system in a physical analog application scenario according to an embodiment of the present invention;

FIG. 10 is a schematic diagram of an analog translation system according to an embodiment of the present invention in another physical analog application scenario;

FIG. 11 is a schematic diagram of an event analogy application scenario of the analog translation system according to an embodiment of the present invention;

FIG. 12 is a diagram of an analog application scenario of entities and events of the analog translation system according to an embodiment of the present invention;

FIG. 13 is a signaling flow diagram of a centralized unit triggered user context modification in accordance with an embodiment of the present invention;

FIG. 14 is a complete flow diagram of a cross-language cross-domain translation system in accordance with an embodiment of the present invention;

FIG. 15 is a schematic diagram of a cross-domain translation system for a specific specialized domain representation to a generic representation in accordance with an embodiment of the present invention;

FIG. 16 is a schematic diagram of a cross-domain translation system from a generic expression to a specific domain expression in accordance with an embodiment of the present invention;

FIG. 17 is a complete flow diagram of a cross-language cross-cultural background translation system in accordance with an embodiment of the present invention;

FIG. 18 is a schematic diagram of a cross-language cross-cultural background translation system from a special cultural background domain representation to a generic representation in accordance with an embodiment of the invention;

FIG. 19 is a schematic diagram of a cross-language cross-cultural background translation system from a special cultural background domain representation to a general representation with picture output according to an embodiment of the invention;

FIG. 20 is a flowchart illustrating a picture recommendation system according to an embodiment of the present invention;

FIG. 21 is a schematic diagram illustrating an application of a picture recommendation system according to an embodiment of the present invention;

Fig. 22 is a structural view of an apparatus for processing session information according to an embodiment of the present invention;

FIG. 23 is a block diagram of a computing system that may be used to implement the apparatus for processing session information disclosed in embodiments of the present invention.

Detailed Description

Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative only and are not to be construed as limiting the invention.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless expressly stated otherwise, as understood by those skilled in the art. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. The term "and/or" as used herein includes all or any element and all combination of one or more of the associated listed items.

It will be understood by those skilled in the art that all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs unless defined otherwise. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

As used herein, a "terminal" or "terminal device" includes both a device of a wireless signal receiver having no transmitting capability and a device of receiving and transmitting hardware having receiving and transmitting hardware capable of bi-directional communication over a bi-directional communication link, as will be appreciated by those skilled in the art. Such a device may include: a cellular or other communication device having a single-line display or a multi-line display or a cellular or other communication device without a multi-line display; PCS (Personal Communications Service, personal communications System) that may combine voice, data processing, facsimile and/or data communications capabilities; PDA (Personal DIGITAL ASSISTANT ) that may include a radio frequency receiver, pager, internet/intranet access, web browser, notepad, calendar and/or GPS (Global Positioning System ) receiver; a conventional laptop and/or palmtop computer or other appliance that has and/or includes a radio frequency receiver. As used herein, "terminal," "terminal device" may be portable, transportable, installed in a vehicle (aeronautical, maritime, and/or land-based), or adapted and/or configured to operate locally and/or in a distributed fashion, to operate at any other location(s) on earth and/or in space. The "terminal" and "terminal device" used herein may also be a communication terminal, a network access terminal, and a music/video playing terminal, for example, may be a PDA, a MID (Mobile INTERNET DEVICE ) and/or a Mobile phone with a music/video playing function, and may also be a smart tv, a set top box, and other devices.

The existing natural language generation technology has the following defects:

1) The action object is single, and the interaction object is generally between the machine and the user, including how the machine understands natural language (i.e., human language) and how the machine language is encoded into natural language that can be understood by an adult. However, the reality is much more complex for language generation and understanding, especially the communication between people directly or indirectly through devices, and there is also a possibility that the individual cannot understand the other or communicate his own ideas to each other due to his own situation or environment. For example, for foreign language beginners, language handicapped patients, very busy and urgent situations, etc., very intelligent and humanized language generation technology is required to help users achieve a quick and convenient auxiliary communication function. The prior art does not deal with the above-mentioned complex situation because it does not consider the multi-party language environment between multiple users and devices.

For example: in the prior art, when a questioner presents an open question, for example, "what is we doing what about Saturday? ", at this time, the existing question-answering system cannot generate answers from keywords given by respondents.

2) The generated sentence is entirely dependent on the current given context, regardless of whether the user understands the nouns or things in the generated sentence.

For example: assuming that the user C is completely unfamiliar with the graphics card, when the user a and the user B discuss the graphics card topic, the prior art cannot generate a sentence enabling the user C to understand the dialogue according to the dialogue between the user a and the user B.

3) The case where the user a is different from the user B in the sense of being good at the domain is not considered in generating the sentence, and therefore the generated sentence should also be changed according to the requirement of the output domain.

For example: assuming that user a is a game player, it is very preferable to use the game term, and user B only occasionally touches the game with little knowledge of the game term. When user a says to user B "do you eat chicken tonight? With your fly-! "(meaning" play together tonight "game) is you with? win +|"), the user is informed of the winning, user B cannot understand what user a speaks, nor does the prior art assist user B in understanding that utterance.

4) The difference of cultural backgrounds in which different users are located is not considered.

For example: suppose user a has had a finger palm for slang in the united states, while user B does not have a much understanding of the united states culture. When user a says "In electronic industry, samsung can be considered the, 800 round gorilla" to user B, it is intended that "samsung has very strong strength in the electronics industry" and user B would not understand this sentence too much, nor would the prior art help user B understand it.

5) The method is limited to text information, and multi-modal information such as images is not considered.

In order to solve the above-mentioned shortcomings in the existing natural language generation technology, the embodiment of the invention provides a method for processing session information, through the scheme, when generating target session information, not only the source session information input by a first user is relied on, but also the acquired auxiliary information is fully considered, on one hand, the interactive object generated by the extension language, such as the interactive form of a user-terminal device-user, is not limited between a machine and the user any more, on the other hand, the natural language is generated without only depending on the context environment given currently, and the auxiliary information is fully considered, so that the user can understand the generated natural language sentence, the practicability of language generation is enhanced, the bridge of communication between people inconvenient for normal communication is erected, the unreadable words are converted into the popular and easy-to-understand natural language, and the understanding barrier of the user in the communication process is broken.

The method for processing session information according to the present invention, as shown in fig. 1, includes: step 110, acquiring auxiliary information and source session information input by a first user; and 120, generating target session information and outputting the target session information based on the auxiliary information and the source session information.

According to the method for processing the session information, provided by the embodiment of the invention, the auxiliary information and the source session information input by the first user are acquired, and then the target session information is generated and output based on the auxiliary information and the source session information. According to the technical scheme, when the target session information is generated, not only the source session information input by the first user is relied on, but also the acquired auxiliary information is fully considered, so that the generated target session information meets the diversified requirements of the user, the user is ensured to understand nouns or things in the generated sentences, and the user experience is improved.

Specifically, the auxiliary information in step 110 and step 120 includes at least one of history session information, user information of the first user, user information of the second user, and a domain to which the information belongs, where the domain to which the information belongs includes a domain to which the source information belongs and a domain to which the target information belongs.

Further, the user information includes at least one of: user attribute information; user preference information; user schedule information; user location information; user behavior information; user equipment information.

Further, the information field includes at least one of the following: language type information; professional field information; cultural background information.

Further, the source information belongs to the field of detecting or setting the source session information, and the target information belongs to the field of detecting or setting the history session information.

Specifically, the source session information in step 110 and step 120 includes at least one of the following: abbreviations, incomplete words, natural language sentences, and picture selection information.

Specifically, the generating the target session information in step 120 based on the auxiliary information and the source session information specifically includes at least one of steps 1201 (not shown) to 1204 (not shown):

step 1201, extracting session information keywords of source session information and history session information between the first user and the second user, and generating target session information according to the session information keywords.

Step 1202, extracting session information keywords of source session information and history session information between the first user and the second user, and user information keywords of user information of the first user and/or the second user, and generating target session information according to the session information keywords and the user information keywords.

Step 1203, translating the source session information in the domain of the source information into the session information in the domain of the target information according to the domain of the information, extracting the session information in the domain of the target information and the session information keywords of the history session information between the first user and the second user, and generating the target session information according to the session information keywords.

Step 1204, translating the source session information in the domain of the source information into the session information in the domain of the target information according to the domain of the information, extracting session information keywords of the session information in the domain of the target information and history session information between the first user and the second user, and user information keywords of user information of the first user and/or the second user, and generating the target session information according to the session information keywords and the user information keywords.

Further, the steps 1201 and 1203 of generating the target session information according to the session information keyword include: generating a model based on a pre-trained sentence according to the conversation information keywords, and generating target conversation information; step 1202 and step 1204 of generating target session information according to the session information keyword and the user information keyword, including: and generating a model based on the pre-trained sentences according to the session information keywords and the user information keywords, and generating target session information.

Specifically, the generating the target session information in step 120 based on the auxiliary information and the source session information specifically includes at least one of steps 1205 (not shown) to 1207 (not shown).

Step 1205, obtaining source object information according to the source session information, and classifying the source object information to obtain source category information; obtaining candidate target category information according to the user information of the first user; obtaining target class information according to the similarity between the source class information and the candidate target class information; obtaining candidate target object information according to the target category information; obtaining target object information according to the similarity of the source object information and the candidate target object information; and generating target session information according to the target object information.

Step 1206, obtaining source object information according to the source session information and historical session information between the first user and more than one second user; classifying the source object information to obtain source category information; obtaining candidate target category information according to the user information of the first user; obtaining target class information according to the similarity between the source class information and the candidate target class information; obtaining candidate target object information according to the target category information; obtaining target object information according to the similarity of the source object information and the candidate target object information; and generating target session information according to the target object information.

Step 1207, translating the source session information in the domain of the source information into the session information in the domain of the target information according to the domain of the information, and obtaining the source object information according to the session information in the domain of the target information and the history session information between the first user and more than one second user; classifying the source object information to obtain source category information; obtaining candidate target category information according to the user information of the first user; obtaining target class information according to the similarity between the source class information and the candidate target class information; obtaining candidate target object information according to the target category information; obtaining target object information according to the similarity of the source object information and the candidate target object information; and generating target session information according to the target object information.

Wherein the object comprises an entity and/or an event.

Specifically, the generating the target session information in step 120 based on the auxiliary information and the source session information specifically includes step 1208 (not shown in the figure) of translating the source session information in the domain of the source information into the target session information in the domain of the target information according to the domain of the information.

Further, step 1208 specifically includes: and according to the field to which the information belongs, translating at least one of the language type, the professional field and the cultural background of the source session information according to the sequence based on a corresponding pre-trained translation model to obtain the target session information.

Further, the method further includes step 130 (not shown in the figure), according to the semantic similarity between the source session information and/or the target session information and the candidate pictures, obtaining a target picture corresponding to the source session information and/or the target session information, and outputting the target picture.

Specifically, the generating the target session information in step 120 based on the auxiliary information and the source session information specifically includes at least one of step 1209 (not shown) and step 1210 (not shown):

Step 1209, obtaining session estimation information according to the picture selection information input by the first user and the history session information between the first user and the second user; and acquiring a target picture from the candidate picture according to the semantic similarity between the session estimation information and the candidate picture, and taking the target picture as target session information.

Step 1210, obtaining session estimation information according to the picture selection information input by the first user, the history session information between the first user and the second user, and the user information of the first user; and acquiring a target picture from the candidate picture according to the semantic similarity between the session estimation information and the candidate picture, and taking the target picture as target session information.

Further, in step 1209, according to the picture selection information input by the first user and the historical session information between the first user and the second user, session estimation information is obtained, which specifically includes: and acquiring session estimation information based on the pre-trained session understanding model according to the picture selection information input by the first user and the historical session information between the first user and the second user.

Further, in step 1210, session estimation information is obtained according to the picture selection information input by the first user, the history session information between the first user and the second user, and the user information of the first user, which specifically includes: and acquiring session estimation information based on the pre-trained session understanding model according to the picture selection information input by the first user, the historical session information between the first user and the second user and the user information of the first user.

Wherein the session speculation information includes: the first user wants to express the conversation content and/or the conversation emotion the first user wants to express.

As can be seen from the above description, the method for processing session information implemented by the present invention includes, but is not limited to, the following 5 aspects: A. a sentence automatic generation system based on the natural language expression form of the key words; B. an analog translation system; C. a cross-domain translation system based on cross-language; D. a cross-language cross-cultural background translation system; E. a picture recommendation system; a flow chart for generating session information including, but not limited to, the content of the above 5 aspects is shown in fig. 2.

The following describes the above 5 aspects in detail with reference to specific embodiments, where the statement in the natural language expression form is target session information, and the source end user in the following description is the first user, the target user is the second user, the source end user information is the user information of the first user, and the target user information is the user information of the second user.

A. An automatic sentence generation system based on a natural language expression form of a keyword (hereinafter referred to as an automatic sentence generation system) is a method which is provided for generating a sentence in a natural language expression form based on context information (i.e., user dialogue information including history dialogue information between users, source dialogue information input by a first user, etc.) provided externally and which can act on a user-device-user, in view of a problem of generating only dialogue information (i.e., natural language) between a user and a device in the conventional natural language generation technology, the automatic sentence generation system comprising: session information keywords are extracted from the context information, sentences in a natural language expression form (namely target session information) are generated according to the session information keywords, and the sentences are fed back to a target user. In addition, source end user information and target user information can be obtained, and sentences in natural language expression form can be generated and fed back to the target user according to the context information and the target user information and the source end user information. In addition, the conversation information keywords input by the source end user can be translated into conversation information keywords of the language types of the target user, and then the conversation information keywords translated into the language types of the target user are generated into sentences in natural language expression forms and fed back to the target user.

When the sentence automatic generation system based on the keyword natural language expression form is realized, the sentence automatic generation system based on the keyword natural language expression form mainly comprises two parts of training of an online sentence automatic generation system (namely sentence generation of the natural language expression form) and an offline sentence automatic generation model. The training of the automatic generation model of the offline sentence is to train a model for automatically generating the sentence in the natural language expression form according to the conversation information keywords in advance according to the existing corpus, and the model can be recorded as the automatic generation model of the pre-trained sentence, for example: dialog "-what do Saturday we go? We can go to basketball and then go to the movie "kungfu panda". The method is characterized in that the corpus is a corpus, wherein the former sentence is marked as a question sentence, the basketball and the congou pandas in the latter sentence are marked as session information keywords, and in a specific training process, the session information keywords of the former sentence and the latter sentence are used as input, and the whole sentence of the latter sentence is used as output for training. The on-line sentence automatic generation system is used for automatically generating a model according to context information through the offline pre-trained sentence, and generating a sentence in a natural language expression form suitable for the current context. In addition, in order to cover the complex language communication environment as much as possible, auxiliary information is considered in the training process of the online automatic sentence generation model and in the online automatic sentence generation system, namely, according to the context information and the auxiliary information, the sentence in the form of natural language expression suitable for the current context is generated through the offline pre-trained automatic sentence generation model, and the automatic recognition and completion of incomplete word and/or abbreviation expression can be performed.

In the training process of automatically generating the model by the sentences under the online condition, the sentences are simplified into conversation information keywords by using a deep learning algorithm, and the training is performed by generating dual learning of the sentences according to the conversation information keywords, so that the model is learned and used in a mode of not relying on a large amount of manual intervention such as templates. The sentence automatic generation model is trained on line and directly applied on line, and the deep learning algorithm is realized by modifying input parameters, output parameters and network parameters (such as network depth, node number and the like) by applying the existing framework.

Specifically, since the on-line sentence automatic generation system generates a sentence in a natural language expression form suitable for the current context by using an off-line pre-trained sentence automatic generation model according to the context information and the auxiliary information, the on-line sentence automatic generation system can be summarized into the following two parts: the information extraction system mainly converts the context information and the auxiliary information into vector representation forms, inputs the information into the pre-trained sentence automatic generation model for preprocessing, the pre-trained sentence automatic generation model trains a sentence automatic generation model according to the existing corpus in an off-line training mode, the input of the sentence automatic generation model is the context information and the auxiliary information which are represented in the vector form, and the output of the sentence automatic generation model is the sentence in the natural language expression form, namely the output of the sentence automatic generation model is the target session information. The context information comprises historical session information and source session information input by a first user, and the auxiliary information comprises at least one of user information of the first user, user information of a second user and the field to which the information belongs; the information belonging field further includes a source information belonging field, which refers to the field of the source session information, and a target information belonging field, which refers to the field of the target session information; the user information further includes at least one of: user attribute information, user preference information, user schedule information, user location information, user behavior information, and user equipment information.

Further, the offline training process of the sentence automatic generation model specifically includes: step one, extracting keyword information in the training corpus. Step two, converting the keyword information in the step one into vector representation, inputting the vector representation into a neural network model generated by sentences to generate sentences, then calculating the difference between the sentences generated by the automatic sentence generation model and the original sentences, and transmitting the difference into network parameters through a network, wherein the difference refers to the difference between the vectors of the sentences generated by the automatic sentence generation model and the original sentences by the automatic sentence generation model, and the method of calculating the difference is characterized in that the absolute value is directly subtracted, squares and the like can be also adopted, and the network parameters are weight parameters of edges connected with neurons in the neural network and belong to adjustable parameters. Step three: and repeating the first step and the second step until the model converges. The overall operation structure of the sentence automatic generation model is shown in fig. 3.

Further, as shown in fig. 4, the workflow of the sentence automatic generation system specifically includes: step one, an information extraction system receives user information and context information. Firstly, extracting conversation information keywords aiming at context information, then, performing abbreviation detection or recovery and target language type detection or translation on the extracted conversation information keywords, and finally, automatically generating parameters such as word vectors trained by a model according to sentences to convert the parameters into the context information in a corresponding vector form, namely, the context information vectors in fig. 4; for the user information, firstly, extracting the key words of the user information, and then automatically generating parameters such as word vectors trained by a model according to sentences to convert the parameters into the corresponding vector form of the user information, namely the user information vector in fig. 4. Step two: and merging the user information vector and the context information vector and inputting the merged user information vector and the context information vector into a sentence automatic generation model to obtain a sentence expressed in a natural language form, namely the final target session information. The user information comprises user information of a first user and user information of a second user.

The automatic generation of the above statement is described in detail below by way of several embodiments:

Embodiment one: simple thing input

In the simple matter input of the first embodiment, a user who is not familiar with chinese uses a terminal device, and a sentence automatic generation system is used for communication. When a user encounters a language expression obstacle, such as a language beginner or uses a language which is not familiar with the language to communicate with other people, only a few keywords expressing contents are usually reminiscent, and the sentence automatic generation system has no capability of completely organizing sentences, so that great help can be provided for the user. As shown in fig. 5, the user completes communication with others using the sentence automatic generation system without familiarity with language expressions.

In fig. 5, the second user "plum" on the right is a Chinese skilled user, the first user "plum" on the left is a Chinese impaired user, and in the user communication process shown in fig. 5, the sentence automatic generation system organizes complete sentences (i.e., target session information) and feeds back to the second user ("plum") according to the historical session information (e.g., the session content input by the "plum") between the first user ("plum") and the second user ("plum") and the session information keywords of the source session information provided by the first user ("plum"), i.e., the sentence automatic generation system extracts the source session information and the session information keywords of the historical session information between the first user and the second user, generates the target session information according to the session information keywords and displays the target session information. The first user's' is the end user of the current equipment, and the second user's' is the opposite user's' i.e. the user who carries out the conversation with the end user of the current equipment.

Specifically, the sentence automatic generation system in this embodiment includes two parts, namely an information extraction system and a sentence automatic generation model, and the sentence automatic generation process of the sentence automatic generation system can be divided into the following two steps:

Step one: identifying context information

According to the embodiment, sentence generation is performed according to context information provided by a user, so as to assist the user to complete language organization and expression, wherein the context information is history session information which is cached in terminal equipment or social software and is mutually exchanged between a first user and a second user, and source session information (such as keywords 'basketball', 'movie', 'kungfu panda' and the like) input by the first user. In the process of automatically generating the sentence of the present embodiment, the session information keywords of the context information need to be extracted first, where the method for obtaining the session information keywords includes but is not limited to the following two methods: 1) Directly acquiring, if text content input in a specific query format is detected, automatically performing word segmentation and keyword extraction on the text by a system; 2) Context information generated by rich sentences is obtained from historical session information of a user, and session information keywords are extracted from the context information. And then automatically generating word vector parameters in the model by utilizing the offline pre-trained sentences, and vectorizing the session information keywords to obtain the vectors of the session information keywords of the context information.

Step two: generating natural language sentences using contextual information

The step of inputting the vector of the session information keywords of the context information extracted in the step one into an automatic generation model of the sentences trained offline, obtaining the sentences in the natural language expression form as final target session information, and outputting the target session information. The target session information can be directly displayed to the second user in a text form, or can be played to the second user when the second user clicks a voice play related button.

The implementation scenario of the embodiment is that the sentence automatic generation system can be embedded into an input method or a social platform on the social platform, so that a user only needs to input source session information such as session information keywords, and the social platform provides historical session information such as context information exchanged by the user, and the sentence automatic generation process is more conveniently implemented.

Embodiment two: statement automatic generation system in field of multiple information

The application scenario of this embodiment is that a tourist from home and a beginner of language want to express an ideas of mind, but can only recall a few keywords of own native language, and thus encounter an obstacle when communicating. At this time, the sentence automatic generation system can translate the source language keywords provided by the user into the keywords of the target language by detecting the source language types and the target language types to be expressed, and organize proper sentences to be fed back to the user in combination with the context information. Taking the communication scenario of fig. 6 as an example, the following description will be given:

In fig. 6, the second user "to The right" is a chinese skilled user, the first user "to The left" is a user whose native language is english and is quite unfamiliar with chinese usage, at this time, "to" completes a dialogue exchange with "to" using a sentence automatic generation system, "to" ask "to a plan of Tom's wednesday," to "can only think about source dialogue information such as" banketball "and" The Mummy "in english, and then inputs these source dialogue information (e.g., keywords" banketball "," The Mummy "and The like) to a sentence automatic generation system, and The sentence automatic generation system uses The source dialogue information (e.g., keywords" basketball "," The Mummy "and The like) provided by The context information (i.e., history dialogue information) and" to generate a sentence in a proper natural language expression form, i.e., target dialogue information, and applies to The dialogue content of both to assist in achieving a normal exchange with "to" The chinese. The first user "Tom" is a current equipment user end user, and the second user "plum" is a counterpart user, i.e. a user who performs a session with the current equipment user end user.

In this embodiment, a sentence in a natural language expression form needs to be generated according to source session information provided by the first user, a domain to which the source information belongs, a domain to which the target information belongs, and history session information between the first user and the second user. The specific treatment process is as follows:

step one: identifying contextual information and unifying fields to which information pertains

The information belongs to the fields including the source information and the target information, and the information belongs to the fields including at least one of language type information, professional field information and cultural background information.

For example, when the language type of the source session information provided by the first user is inconsistent with the language type used by the speaker (i.e., the second user), the sentence automatic generation system detects the language type of the source session information according to the source session information input by the first user, detects the language type of the target session information according to the history session information between the first user and the second user, and confirms the language type of the source session information and the language type of the target session information, wherein in the multi-language type scene, the sentence automatic generation system can also freely set the language type of the source session information and the language type of the target session information by the user. Then, the sentence automatic generation system loads the corresponding translation model, and translates the source conversation information of all source conversation information language types into the conversation information of the language types of the target conversation information.

For another example, when the professional field, cultural background, etc. of the source conversation information provided by the first user is different from the professional field, cultural background, etc. of the second user, the sentence automatic generation system may detect the professional field, cultural background, etc. of the source conversation information according to the source conversation information input by the first user, detect the professional field, cultural background, etc. of the target conversation information according to the history conversation information between the first user and the second user, and confirm the professional field, cultural background, etc. of the source conversation information, and the professional field, cultural background, etc. of the target conversation information, where the sentence automatic generation system may also freely set the professional field, cultural background, and the professional field, cultural background of the source conversation information by the user in the multi-professional field, multi-cultural background scenario. Then, the sentence automatic generation system loads a corresponding translation model, such as a professional field translation module, a cultural translation model and the like, and translates all source conversation information of the professional field and the cultural background of the source conversation information into conversation information of the professional field and the cultural background of the target conversation information.

As can be seen from the above description, the sentence automatic generation system at this time first determines, based on the source session information input by the first user, the domain to which the information such as the language type, the professional field, the cultural background, etc. of the source session information belongs, and determines, based on the history session information (such as the context information) between the first user and the second user, the domain to which the information such as the language type, the professional field, the cultural background, etc. of the target session information belongs. Then, the source session information in the field of the source session information input by the first user is translated into the session information in the field of the target information, then, session information keywords are extracted from the session information in the field of the target information and the history session information between the first user and the second user, word vector parameters in a model are automatically generated by utilizing offline pre-trained sentences, and the session information keywords are vectorized to obtain vectors of the corresponding session information keywords.

Step two: generating sentences

The sentence automatic generation system recognizes the context information and generates the sentence, and the process is substantially the same as the first and second steps in the first implementation, and will not be described herein.

The implementation scenario of the embodiment is that the sentence automatic generation system can be embedded into an input method or a social platform on the social platform, so that a user only needs to input source session information such as keywords, and the social platform provides historical session information such as context information exchanged by the user, and the sentence generation process is more conveniently implemented. The sentence automatic generation system can be executed at the terminal side or at the server side, and the sentence automatic generation system is generally executed at the server side in consideration of the fact that the operation amount of the system may be large and may occupy a large physical memory.

Embodiment III: sentence automatic generation system for loading user information

The application scenario of this embodiment is similar to that of the embodiment, but richer user information, such as user schedule information, user location information, etc., is loaded in the sentence automatic generation system. At this time, the sentence automatic generation system can comprehensively consider source session information (such as provided keywords) input by a user and other attribute keywords related to the user information collected by the system to generate sentences more suitable for the current context, so as to assist the user in language expression.

Taking the communication scenario of fig. 7 as an example, the dialogue scenario of fig. 7 is consistent with the embodiment, and will not be described herein. Unlike the first embodiment, the sentence automatic generation system at this time not only utilizes the source session information (e.g., provided keywords) input by the first user "sheet", but also automatically collects the user information of the first user "sheet", such as schedule information of the user, geographical location of the user, etc., through the terminal device or the network device, and then generates sentence content more suitable for the first user "sheet" to be expressed according to the source session information (e.g., provided keywords) input by the first user "sheet" and the user information of the first user "sheet" automatically collected by the system. The operation steps of the sentence automatic generation system at this time are as follows:

Step one: identifying context information and user information

The context information is inter-user mutual communication information cached in the terminal device or the social software, and includes historical session information between the first user and the second user and source session information (such as keyword information) input by the first user. The user information is a description of the environment in which the user is located and some personalized attribute information of the user, including but not limited to: user schedule, geographic location, etc. In the process of automatically generating the sentence of the present embodiment, the session information keywords of the context information and the user information keywords of the user information of the first user need to be extracted, where the method for obtaining the session information keywords includes, but is not limited to, the following two methods: 1) Directly acquiring, if text content input in a specific query format is detected, automatically performing word segmentation and keyword extraction on the text by a system; 2) Context information generated by rich sentences is obtained from historical session information of a user, and session information keywords are extracted from the context information. Meanwhile, a user information keyword needs to be acquired from the acquired user information of the first user. And then automatically generating word vector parameters in the model by utilizing offline pre-trained sentences, and vectorizing the session information keywords and the user information keywords to obtain the vectors of the session information keywords and the vectors of the user information keywords of the context information.

Step two: system generated statements

The step combines the vector of the session information keyword of the context information provided in the step one with the vector of the user information keyword (including the user information keyword of the user information of the first user), then inputs the combined vector into an automatic sentence generation model trained offline to obtain sentences in a natural language expression form, namely final target session information, and outputs the target session information at the same time, wherein the target session information can be directly displayed to a second user ('plum') in a text form, or can be played to the second user ('plum') when the second user ('plum') clicks a voice play related button.

It should be noted that, in the application scenario of the third embodiment, if the second user is willing to share his own user information, for example, user attribute information, user preference information, user schedule information, user location information, and so on, to share information among friends, then in the process of identifying context information and user information in step one, not only the user information keyword of the user information of the first user but also the user information keyword of the user information of the second user may be extracted, that is, in step one, the session information keyword of the source session information and the history session information between the first user and the second user, and the user information keyword of the user information of the first user and the user information keyword of the user information of the second user are extracted; in the second step, the vector of the session information keyword of the context information provided in the first step and the vector of the user information keyword (including the user information keyword of the user information of the first user and the user information keyword of the user information of the second user) are combined, and then input into an automatic sentence generation model trained offline to obtain a sentence in a natural language expression form, namely the final target session information. In addition, if the second user is willing to share own user information, such as user attribute information, user preference information, user schedule information, user position information and the like, so as to share information among friends, and the first user cannot acquire the user information of the first user without sharing the own user information, then in the process of identifying the context information and the user information in the step one, the user information keywords of the user information of the second user can be extracted, that is, in the step one, the session information keywords of the source session information and the history session information between the first user and the second user and the user information keywords of the user information of the second user are extracted; in the second step, the vector of the session information keyword of the context information provided in the first step and the vector of the user information keyword (including the user information keyword of the user information of the second user) are combined, and then input into an automatic sentence generation model trained offline to obtain sentences in a natural language expression form, namely the final target session information.

Embodiment four: statement automatic generation system for loading user information in field of multi-information

The application scenario of this embodiment is a combination of the second embodiment and the third embodiment, at this time, the domain of the source session information provided by the first user ("sheet") is inconsistent with the domain of the session information of the second user ("plum"), and the sentence automatic generation system loads richer user information, such as user schedule information, user location information, and the like. The operation steps of the sentence automatic generation system at this time are as follows:

Step one: identifying context information and user information and unifying fields to which the information pertains

The sentence automatic generation system firstly determines the domain to which the information such as the language type, the professional domain, the cultural background and the like of the source conversation information based on the source conversation information input by the first user, and determines the domain to which the target conversation information such as the language type, the professional domain, the cultural background and the like of the target conversation information based on the historical conversation information (such as the contextual information) between the first user and the second user, so as to translate the source conversation information in the domain to which the source conversation information input by the first user belongs into the conversation information in the domain to which the target information belongs. Then, session information keywords are extracted from session information of a domain to which the target information belongs and history session information between the first user and the second user, and at the same time, user information keywords need to be acquired from the acquired user information of the first user. And then, automatically generating word vector parameters in a model by utilizing offline pre-trained sentences, vectorizing session information in the field to which the target information belongs, historical session information between the first user and the second user and user information keywords acquired from the user information of the first user, and sequentially generating corresponding session information keyword vectors and user information keyword vectors.

Step two: system generated statements

The step is to combine the vector of the conversation information keyword provided in the step one with the vector of the user information keyword (including the user information keyword of the user information of the first user), then input the combined vector into an automatic sentence generation model trained offline to obtain a sentence in a natural language expression form, namely final target conversation information, and output the target conversation information.

It should be noted that, in the application scenario of the fourth embodiment, if the second user is willing to share own user information, for example, user attribute information, user preference information, user schedule information, user location information, and other user information, to perform information sharing between friends, in step one, not only the user information keyword needs to be obtained from the obtained user information of the first user, but also the user information keyword of the second user may be obtained, and further the user information keyword of the user information of the second user is obtained, that is, in step one, according to the domain to which the information belongs, the source session information in the domain to which the source information belongs is translated into the session information in the domain to which the target information belongs, and the session information keyword of the history session information between the first user and the second user, and the user information keyword of the user information of the first user and the user information of the second user are extracted; in the second step, the vector of the session information keyword provided in the first step and the vector of the user information keyword (including the user information keyword of the user information of the first user and the user information keyword of the user information of the second user) are combined, and then input into an automatic sentence generation model trained offline, so as to obtain a sentence in a natural language expression form, namely the final target session information.

In addition, if the second user is willing to share own user information, such as user attribute information, user preference information, user schedule information, user position information and the like, so as to share information among friends, and the first user cannot acquire the user information of the first user without sharing the own user information, in the step one, the user information of the second user can be acquired at this time, and then the user information keyword of the user information of the second user is acquired, that is, in the step one, according to the field to which the information belongs, the source session information in the field to which the source information belongs is translated into session information in the field to which the target information belongs, and the session information keyword of the history session information between the first user and the second user and the user information keyword of the user information of the second user are extracted; in the second step, the vector of the session information keyword provided in the first step and the vector of the user information keyword (including the user information keyword of the user information of the second user) are combined, and then input into an automatic sentence generation model trained offline to obtain a sentence in a natural language expression form, namely the final target session information.

Fifth embodiment: automatic generation of sentences of acronyms and/or incomplete words

The application scenario of this embodiment is that when the user is in a busy state or does not concentrate on chatting with others, only a brief reply is often desired. However, such answers may be less desirable, and the sentence automatic generation system may generate a complete natural language sentence expression according to the source session information (e.g., the provided keywords) input by the user, so as to assist the user in reasonably expressing, for example, fig. 8.

In fig. 8, the first user "s" is in a busy state, and the query for the second user "s" is a flawless organizational language reply. However, the first user's "sheet" thinks that he/she is eating "must win guest" but wait slightly, so that the "sheet" provides the content in the idea to the sentence automatic generation system, and the sentence automatic generation system generates a complete sentence to reply according to the current historical conversation information (such as dialogue context information), i.e. helps the "sheet" to answer. The operation steps of the sentence automatic generation system are the same as those of the first embodiment, and are not described herein. The first user "sheet" in this embodiment is the end user of the current device, and the second user "plum" is the opposite user, that is, the user who performs a session with the end user of the current device.

B. Analog translation system

Because the existing sentence automatic generation technology does not consider whether the user can really understand the terms or proper nouns in the generated natural language sentences, the invention provides an analogy translation system based on a sentence automatic generation system based on the natural language expression form of keywords, and the proper nouns and/or entities and/or events in the generated natural language sentences are generated through analogy algorithms. The application scene of the analog translation system is as follows: when there is confusion for the user's knowledge of something, the analog translation system can analogize the interpretation of something familiar to the current user based on the questions provided by the user and the user information.

The overall operation flow of the analog translation system is shown in fig. 9, and includes:

Step one: according to the source session information input by the first user, that is, the expression content in fig. 9, which may be a question sentence, source object information such as a key entity and/or an event in the source session information, that is, source end entity/event detection in fig. 9 is identified, and the collected source object information such as an entity and/or an event is classified according to a predefined class label by a classification model, so as to obtain source class information of the source object information. The identification method is to make syntax and semantic role labeling on source session information input by a first user, and extract the syntax and semantic features of the source session information to identify the described source object information such as key entities and/or events.

Step two: each of the divided source category information and source object information is represented by a feature vector, namely a source category feature representation and a source entity/event feature representation in fig. 9.

Step three: user information of the first user, namely user logs in fig. 9, including user attribute information such as personal profiles of the user, user equipment information such as recently used equipment of the user, user operation logs, social logs and other user behavior information, and user preference information such as interests and hobbies, is collected. Then, the features of the user information are extracted and expressed in a vector form, namely, the features in fig. 9 are extracted, then, according to the extracted features of the user information, N candidate target categories which are most familiar to the user are predicted, wherein the candidate target categories are the target end categories in fig. 9, and the candidate target category information is expressed in a vector form, namely, the category features in fig. 9. The user information of the first user is collected in the terminal equipment after the user provides the authority.

Step four: by calculating the similarity between the source category information (i.e., the source-end category characteristic representation in fig. 9) and the candidate target category information (i.e., the category characteristic representation in fig. 9), candidate target category information (i.e., the most similar target-end category in fig. 9) which is most similar to the source category information is obtained and is recorded as target category information. In other words, the similarity between the classes is calculated by the feature vector of the source class (i.e. source class information) and the target class (i.e. candidate target class information), so as to obtain the target class most similar to each source class. The specific implementation method comprises two schemes: according to the scheme I, the method is calculated through a DNN (Deep Neural Networks, deep neural network) model, a source end category and each target end category are used as input of the DNN model respectively, then one probability value representing similarity of the two categories is calculated, after all probability values are calculated, the probability values are ordered, and the target end category which is the most similar to the source end category is the largest probability value, namely the target category information. According to the scheme II, a similarity calculation model is used for calculating the distance measurement of the feature vector of the source end category and the feature vector of the target end category, and in the space represented by the feature vector, the smaller the space distance between the two categories is, the larger the similarity is, so that the target end category with the smallest distance in the source end category feature space, namely the target end category which is the most similar to the source end category, namely the target category information is obtained.

Step five: candidate target object information is obtained according to the target class information, candidate entities in the target class, namely candidate target end entity/event feature representations in fig. 9 are listed first, then the target object information is obtained according to the similarity of source object information and candidate target object information, namely, the target end entity most similar to the source end entity and/or event (namely, the candidate target end entity/event in fig. 9) is obtained by calculating the similarity of the source end entity and/or event (namely, the source end entity/event feature representations in fig. 9) and the candidate entities and/or events in the target end (namely, the candidate target end entity/event in fig. 9). The method for calculating the similarity between the entities is the same as the method for calculating the similarity between the categories in the fourth step.

Step six: generating target session information according to the target object information, specifically, replacing the source end entity (i.e. the source object information) with the most similar target end entity (i.e. the target object information) calculated in the step five by using a pre-defined sentence pattern template, and performing interpretation of the target end entity, i.e. entity/event analogy in fig. 8.

Specifically, the core idea of the analog translation system is to collect source object information such as key entities and/or events of the source session information according to the source session information of the first user, and classify the collected source object information such as the entities and/or events according to a predefined class label through a classification model to obtain source class information. And calculating the most similar target category information with the source category information through the similarity between the feature vector calculation categories of the source category information and the candidate target category information. And finally, replacing the source object information with the calculated target object information by utilizing a pre-defined sentence pattern template to generate an entity and/or event which can be understood by a user, and playing a role in explaining the source object information in the source session information, namely generating the target session information according to the target object information.

Further, when a user chat with more than one other user, for example, a first user D chat with more than one second user (such as user a, user B, user C, etc.), in this application scenario, the analog translation system identifies source object information such as a key entity and/or an event in the source session information according to the source session information input by the first user and the historical session information between the first user and the more than one second user in step one, so as to obtain the source object information. Meanwhile, other processing procedures of the analog translation system are the same as the steps two to six, namely, the source object information is classified to obtain source category information; obtaining candidate target category information according to the user information of the first user; then, obtaining target category information according to the similarity between the source category information and the candidate target category information; then, candidate target object information is obtained according to the target category information; then, obtaining target object information according to the similarity of the source object information and the candidate target object information; and finally, generating target session information according to the target object information.

Further, when a user chat with more than one other user, and the language type, professional field, cultural background, etc. of the session information of one user is different from the domain of the session information of other users, for example, the first user D chat with more than one second user (such as user a, user B, user C, etc.), and the language type, professional field, cultural background, etc. of the source session information of the first user D are different from the domain of the language type, professional field, cultural background, etc. of the session information of user a, user B, user C, etc., in step one, the analog translation system translates the source session information of the domain of the source information into the session information of the domain of the target information according to the domain of the information, and obtains the source object information according to the session information of the domain of the target information and the history session information between the first user and more than one second user. Meanwhile, other processing procedures of the analog translation system are the same as the steps two to six, namely, the source object information is classified to obtain source category information; obtaining candidate target category information according to the user information of the first user; then, obtaining target category information according to the similarity between the source category information and the candidate target category information; then, candidate target object information is obtained according to the target category information; then, obtaining target object information according to the similarity of the source object information and the candidate target object information; and finally, generating target session information according to the target object information.

The following describes the analog translation system in detail by way of several embodiments:

example six: analog translation of entities

In this embodiment, when the user wants to know the commodity in the less familiar area, if the user only interprets the information of the attribute, parameter, etc. of the commodity, the user is hard to clearly understand, so in the sixth embodiment of the present invention, the information of the attribute, parameter, etc. of the commodity is interpreted by analogy to the commodity scene familiar to the user, so as to meet the requirement of the user, and the specific application scene is shown in fig. 10 and 11.

In fig. 10, a user wants to purchase a tv set, wants to know the market condition of the next tv set, and knows that the user is a housewife through user information collection. If only the description of the resolution of the display, the quality of the screen, the parameters of the sound quality, etc. is made, it will lead to the housewife who is not familiar with the electronic equipment becoming foggy at one time, and it is not clear how to evaluate the quality of a television. The analogy translation system can compare the information such as the brand and the cost performance of the television with the kitchen products familiar to the housewife by collecting the user information of the housewife, so that the housewife can have more visual knowledge on the television products of various brands, including the aspects of price, quality, characteristics and the like. Wherein, the housewife in fig. 10 is the first user, i.e. the end user of the current device, and in addition, in fig. 10, the italics indicate the product of the television that the user wants to know, and the bold indicates the analog interpretation recommended by the analog translation system.

FIG. 11 is a similar problem addressed by the analog translation system, which analogizes television products to a brand of automobile familiar to the user by knowing that the user is a car fan through user information collection when the user wants to purchase a television, so that the user can more clearly recognize the price, quality, characteristics, etc. of each television product. In addition, in fig. 11, the italics indicates the product of the television set that the user wants to know, and the bold indicates the analog interpretation recommended by the analog translation system.

In the application scenario shown in fig. 10 and 11, the analog translation system is configured to "i want to buy a tv set" according to the source session information input by the first user (i.e. housewives in fig. 10, car lovers in fig. 11), but not knowing which brand to buy ", on the one hand, key entities of the source, i.e. some entity information that is able to answer the source session information, for example, the source session information is what television is bought, the key entities may include samsung television, sea letter television, millet television and the like, the entity information not only includes entity names, but also includes some characteristic information of prices, parameters and the like of the entities, on the other hand, the key entities can be collected from questions and answers of users in an open community, and on the other hand, the information of the prices, the parameters and the like of the key entities can be collected from the official networks of the key entities; then, classifying the collected entities according to predefined class labels through a classification model, and representing the classified entities by using feature vectors; then, predicting N most familiar target categories of the user, and expressing the characteristics of the target categories in a vector form; then, calculating the similarity between the classes through the feature vectors of the source class and the target class, and calculating the most similar target class of each source class; and finally, listing candidate entities in the target end category, and obtaining the target end entity most similar to the source end entity by calculating the similarity between the source end entity and the candidate entities in the target end.

It should be noted that, in the embodiments shown in fig. 10 and 11, the execution flow of the analog translation system is the same as the overall operation flow of the analog translation system shown in fig. 9, and is not repeated here.

Embodiment seven: analog translation system for events

When a user faces chat content in an unfamiliar area, confusion is often felt, and at this time, the chat content can be analogized to the area familiar to the user through an analogized translation system to interpret the chat content, thereby helping the user to more easily understand the chat content, as in the scenario shown in fig. 12.

In fig. 12, user a, user B and user C are exchanging things about the computer device GPU (Graphics Processing Unit, graphics processor), it is difficult for user D, who does not know the device, to understand their chat content. At this time, the analog translation system can use the familiar entity of the user D to interpret the chat contents among the user A, the user B and the user C according to the chat contents and the user information of the user D, so as to help the user D participate in mutual communication. The user D is a first user, namely a current equipment user end user, and the user A, the user B and the user C are second users, namely opposite users, namely users who perform session with the current equipment user end user.

In this case, the analog translation flow of the event can be summarized as the following steps:

step one: the analog translation system collects chat content through the chat platform, extracts key events of the chat content, and constructs source event descriptions related to the chat content.

Step two: user information including personal profiles (i.e., user attribute information), user behavior information such as social logs, operation logs, and user equipment information of devices used by the user is obtained, and corresponding target end event descriptions are constructed.

Step three: and obtaining the target end event familiar to the user capable of representing the chat content by calculating the similarity between the source end event description and the target end event description.

Step four: and according to a predefined sentence pattern, the source event is interpreted by utilizing the target event so as to help the user to know the chat content.

Example eight: enhanced analog translation system

The application scenario of this embodiment considers entities and events comprehensively, analogizes source entities and events into target entities and events familiar to the user. The enhanced analog translation system comprehensively utilizes analog information of entities and events to more fully explain the entities and events which the user wants to know, as shown in the application scenario of fig. 13.

In FIG. 13, the enhanced analog translation system collects not only the logical content of the chat, but also the physical content of the chat content, and interprets the chat content by analogy with combinations of event logical and physical content in the user's familiar domain, finding a reasonable analogy combination. The entity content refers to an entity described by the user (such as an inflight, a GPU, etc.), and the logic content refers to a logic idea in the chat content of the user (such as xx pushes the latest xx, which are upgrading versions of xxx and xxx, respectively).

The execution flow of this embodiment is similar to that of the seventh embodiment, except that: in the embodiment, the collection of entity information is added in the first step and the second step, and the similarity of entity contents and the similarity of event logic are comprehensively considered in the third step, so that a proper entity event combination is selected to analogically explain the current chat contents.

C. Cross-language cross-domain translation system

Because of the wide variety of different language types, specialized fields and the like familiar to different users, the existing technology for generating natural language does not really take into account the communication barriers caused by the fact that users are different from the user's adept fields when generating sentences. The cross-language cross-domain translation system provided by the embodiment of the invention aims at the expression forms of different professional fields, and the conversion models from the general field to the special field and from the special field to the general field are respectively used for helping users in different fields to communicate.

The cross-language and cross-domain translation system of the embodiment of the invention adopts the following basic ideas: and according to the domain of the information, translating the source session information of the domain of the source information input by the first user into the target session information of the domain of the target information. Namely, receiving source session information input by a source end, acquiring or detecting the domain to which the information belongs, such as the professional domain to which the information belongs, the type of the language used and the like, loading a corresponding domain translation model to translate the source session information, and presenting a translation result to a target user, namely translating the translation result into target session information in the domain to which the target information belongs. Can also be described as: the cross-language cross-domain translation system automatically detects the domain to which the information of the professional domain, the language type and the like belongs according to the multi-modal information input by the source end user (namely the source session information input by the first user), and performs corresponding translation on the input source session information by combining with certain special expression forms of the professional domain so as to achieve the purpose of translating from the special expression forms of certain professional domain to the general expression forms; meanwhile, the information of the general expression form input by the source terminal user can be used as the source session information, and the source session information of the general expression form can be translated into the information of the special expression form in the field of the appointed target by the field of the appointed target. In other words, the cross-language cross-domain translation system performs translation processing on the source session information in the language type and the professional domain according to the sequence based on the corresponding pre-trained translation model according to the domain to which the information belongs, so as to obtain the target session information. The complete flow of the cross-language cross-domain translation system is shown in FIG. 14.

Specifically, the cross-language cross-domain translation system of the embodiment of the invention has the core ideas that the multi-modal information input by the source end is detected to belong to the professional domain, and the input information is translated correspondingly through the domain translation model by combining with some special expression forms of the professional domain.

Wherein, the above multi-modal information includes, but is not limited to, at least one of the following: user text input, user speech input, user image input, etc. The foregoing information belongs to the field including, but not limited to, at least one of the following: politics, military, programming, games, physics, chemistry, mathematics, animation, construction, music, etc.

The following describes the cross-language cross-domain translation system in detail by several embodiments:

Example nine: specific professional field expression forms to general expression forms

As shown in fig. 15, the source session information input by the first user at the source end is expressed in a specific professional field (e.g., game field), and the output end wants to be expressed in a general expression, at this time, the cross-language cross-field translation system has the following processing manner:

Step one: and receiving source session information input by the first user at the source end, and detecting the domain to which the information of the source session information belongs, such as professional domain, language type and the like, wherein the source session information input by the first user at the source end is the source end input information described below, and the first user is the source end user described below.

Specifically, the first user inputs the source session information at the source end in a plurality of ways, including but not limited to: text, speech, pictures. When a first user clicks a text box, text information can be directly input; when a photographing button at the upper left corner is clicked, a corresponding picture can be selected from an album of the terminal equipment, the terminal equipment can be directly utilized for photographing, and a cross-language cross-domain translation system can acquire text information in the picture through an optical character recognition technology and display the text information in a text box; when clicking the voice button at the upper right corner, a corresponding voice fragment can be selected from the record of the terminal equipment, the terminal equipment can also be directly utilized for voice input, and the cross-language cross-domain translation system translates the voice input into corresponding text information through a voice recognition technology and displays the text information in a text box.

Further, the cross-language cross-domain translation system firstly classifies different input texts through a text domain classifier, finds out the special professional domain to which the input text most probably belongs, and displays feedback to the source end user in real time, wherein the "domain to which" option in fig. 15 is the special professional domain to which the input text information probably belongs. Meanwhile, the function of manually selecting the professional field to which the input information belongs by the source end user is provided, so that a small number of field discrimination errors occurring in the cross-language cross-field translation system are corrected, namely when the source end user considers that the cross-language cross-field translation system is inaccurate in judgment, a 'reselection' button beside the 'belonging field' can be clicked, a field list can be popped up at the moment, and the source end user reselects the professional field of the input text according to subjective judgment of the source end user. Meanwhile, the cross-language cross-domain translation system can also detect the language type of the source input information and feed back the language type to the source user through real-time display. Similar to the domain information, the language category may also be reselected by the source user.

Further, the text domain classifier is a pre-trained classifier model, which includes but is not limited to: logistic regression models, support Vector Machine (SVM) models, neural network models, etc. The generation process of the text field classifier is as follows: and taking the vector converted from the text as input, taking the professional field to which the text belongs as output, and continuously correcting output errors and adjusting classifier parameters to obtain an offline model of the text field classifier of different language types.

Wherein in fig. 15, the source user input is "do it turn black at night? With your eating chicken-! The input can be source input information obtained by a source user scanning characters on a picture, source input information obtained by the user through voice input, or source input information obtained through other possible input modes; the cross-language cross-domain translation system firstly detects that the language type of the input information of the source terminal is Chinese, then detects that the professional domain of the input information of the source terminal is game domain through a text domain classifier of Chinese, and displays the domains of the detected language type, the professional domain and the like on the terminal equipment in real time, such as the source language and the domain in fig. 15.

Step two: translating the input information of the source terminal according to the field of the information

The cross-language cross-domain translation system loads a corresponding domain translation model to translate the source input information according to the domain of the source input information, such as the professional domain information, the language type and the like.

Specifically, the domain translation model is a trained string-to-string (Sequence to Sequence) translation model, and the method for constructing the domain translation model is as follows: the corresponding sentences or phrases are input and output by taking words or characters as units, and because most of sentences in the special professional field consist of common words or characters, in order to keep the professional field information, the domain labels are required to be used as input or output labels at the same time, and training is carried out together in a word vector form. And adjusting the model parameters by continuously correcting the output errors until the model parameters finally converge. When a representation method in a special professional field is used as the input of a training model and an information representation method in a general field is used as the output of the training model, the field translation model is a 'special- > general' field translation model; when the information expression method in the general field is used as the input of the training model and the information expression method in the special professional field is used as the output of the training model, the field translation model is a 'general- > special' field translation model.

Further, given the domain translation model trained by the above method, input information having a specific expression form in a specific professional domain, such as "do black at night? With your eating chicken-! "text information of the same kind of language as the source input information, which is to be translated into a general domain expression form, for example," how is played together at night? With you win, i.e. "do black at night? With your eating chicken-! "loading a domain translation model of a" Chinese "game domain expression- > a general domain expression", input information to source "do black in evening? With your eating chicken-! What are "translate to" play together at night? With your win-! ".

Further, in terms of language types, the cross-language cross-domain translation system will load a corresponding base language translation model, and will express text information in the form of a generic expression that is the same as the language type of the source input information, e.g. "how can play together in the evening? With your win-! "further translates into target session information of the target language type. In fig. 15, when "target language" is selected as "english", the cross-language cross-domain translation system will load the basic language translation model of "medium- > english", will "play together at night? With you winning "translate to" Do you PLAY GAMES togther AT NIGHTTAKE you to win-! ".

The language types of the "source language" and the "target language" may be the same or different, and include, but are not limited to, the following examples: such as chinese, english, korean, japanese, etc.

Further, the basic language translation model is also a string-to-string (Sequence to Sequence) translation model, and the method for constructing the basic language translation model is as follows: the method takes one language type as the input of a training model, takes the result of translating the language type by other language types as the output of the training model, and achieves the aim of translating the different language types by continuously adjusting model parameters.

It should be noted that the above various languages include, but are not limited to: chinese, english, korean, japanese, etc.; the various fields described above include, but are not limited to: politics, military, programming, games, physics, chemistry, mathematics, animation, construction, music, etc. In addition, in terms of multiple domains, since the "target domain" default selection is "none" and the cross-language cross-domain translation system will bypass this option, the final translation result is "Do you PLAY GAMES togther AT NIGHTTAKE you to win-! ".

Step three: results presentation

Finally, the cross-language cross-domain translation system presents the final translation result to the target user in the form of text through the terminal device, as shown in fig. 15, i.e. the final translation result presented by the terminal device is "Do you PLAY GAMES togther AT NIGHTTAKE you to win-! ".

Further, when the auto-read button is activated, the terminal device may also convert text to speech for playback to the user.

Further, as can be seen from the description of the above steps one to three, the processing of the cross-language cross-domain translation system can be summarized as follows: and according to the field of the information, carrying out translation processing of the professional field and the language type on the source session information according to the sequence based on a corresponding pre-trained translation model to obtain the target session information.

In the ninth embodiment, the source end user (i.e., the first user) is the current device user end user, and the target user (i.e., the second user) is the opposite user, i.e., the user who performs a session with the current device user end user.

Example ten: general expression forms to expression forms in special professional fields

As shown in fig. 16, the source session information input by the first user at the source end is in a general expression form, that is, there is no special professional field, but the first user wants to use the special professional field to express at the output end, and at this time, the cross-language cross-field translation system processes in the following manner:

Specifically, the manner of inputting information and the manner of detecting information at the source end are the same as those in the ninth embodiment, and will not be described in detail here. In FIG. 16, the source input information is "Let' S PLAY GAMES together at night-! The source input information may be information obtained by the source user by scanning characters on a picture, or information obtained by the source user by voice input, or information obtained by other feasible input modes. The cross-language cross-domain translation system firstly detects that the language type of the input information of the source terminal is English, then loads a text domain classifier of English, and detects that the expression method of the input information of the source terminal has no special domain background, so that the 'belonging domain' is 'none'. Similarly, the above detected information is also displayed on the terminal device in real time, similarly to the above-described embodiment nine.

The method for constructing the text field classifier is the same as that described in step one of the ninth embodiment.

In fig. 16, since the "belonging field" option of the source input information is "none", the cross-language cross-field translation system does not perform any processing on the source input information on this option, i.e. the translation result is the same as the source input information, and is still "Let' S PLAY GAMES together at night-! ".

Specifically, in terms of language types, as the language types of the source end input information are English, the language types of the target end are Chinese, namely, the target language is selected to be Chinese, at the moment, a cross-language cross-domain translation system loads a basic language translation model in English-Chinese, and the current translation result Let' S PLAY GAMES together at night-! "translate into Chinese" we play the bar together at night-! "i.e. in this step" Let' S PLAY GAMES together at night-! "further translation into" we play the bar together at night-! ".

In addition, the construction method of the basic language translation model is the same as that described in the second step of the above-mentioned embodiment.

Further, in terms of multiple domains, when the "target domain" is selected as the "game domain", the cross-language cross-domain translation system will select the domain translation model of "general expression of" Chinese "- > game domain expression", and play the game bar together at night in the Chinese translation result "++! On the basis of the translation result, the corresponding professional field translation is carried out again, namely the translation result is that' we play the game bar together at night! Chinese expression form under "translate into" Game Domain "open black at night-! "that is, in this step, we play the game bar together at night-! "further translation into expression in the Game specialty area" turn black at night-! ".

The construction method of the domain translation model is the same as that of the second step in the ninth embodiment.

Step three: results presentation

Finally, the cross-language cross-domain translation system presents the final translation result to the target user in the form of text through the terminal device, as shown in fig. 16, i.e. the final translation result presented by the terminal device is "turn black at night-! ".

It should be noted that, in the tenth embodiment, the source end user (i.e., the first user) is the current device user end user, and the target user (i.e., the second user) is the opposite user, i.e., the user who performs a session with the current device user end user.

D. Cross-language and cross-cultural background translation system

The existing natural language generation technology does not consider the difference of cultural backgrounds of different users, and the cross-cultural background translation system provided by the implementation of the invention can help target users understand sentences with cultural backgrounds.

The cross-language and cross-cultural background translation system of the embodiment of the invention adopts the following basic ideas: and according to the domain of the information, translating the source session information of the domain of the source information input by the first user into the target session information of the domain of the target information. Namely, receiving source session information input by a source end, acquiring or detecting the information of the cultural background, the type of the language used and the like of the information of the field, then loading a corresponding cultural background translation model to translate the source session information, and presenting a translation result to a target user, namely translating the translation result into target session information of the field of the target information. The cross-language and cross-cultural background translation system has the following core ideas: automatically detecting the cultural background to which the source session information belongs, and combining some special expression forms under the cultural background to perform corresponding translation on the source session information so as to achieve the purpose of translating the special expression forms of some cultural backgrounds into general expression forms.

Specifically, the cross-language cross-cultural background translation system automatically detects the domains of information such as cultural background and language type of the information according to multi-modal information input by a source end user (namely source session information input by a first user), and performs corresponding translation on the input source session information by combining with certain special expression forms of the cultural background so as to achieve the purpose of translating from the special expression forms of certain cultural background domains to the general expression forms. In other words, the cross-language and cross-cultural background translation system carries out translation processing on the language types and cultural backgrounds of the source session information according to the sequence based on the corresponding pre-trained translation model according to the field to which the information belongs, and obtains the target session information.

Wherein, the above multi-modal information includes, but is not limited to, at least one of the following: user text input, voice input, image input, etc. The cultural background area described above includes, but is not limited to, at least one of: chinese native culture, korean native culture, american native culture, japanese native culture, etc.

In addition, the complete flow of the cross-language cross-cultural background translation system is shown in FIG. 17. The culture translation system in the figure is an offline pre-trained culture translation model, and can be directly used on line after the culture translation model is trained. Wherein the upper half of FIG. 17 represents the online flow and the lower half represents the process of offline training of the cultural translation system.

The following describes the cross-language cross-cultural background translation system in detail by several embodiments:

example eleven: special cultural background field expression form to general expression form

As shown in fig. 18, the source session information input by the first user at the source end is expressed in a special cultural background (e.g. slang) and is intended to be expressed in a general expression at the output end, and at this time, the cross-language cross-cultural background field translation system is processed as follows:

Step one: and receiving source session information input by the first user at the source end, and detecting the field to which the information of the source session information belongs, such as cultural background, language type and the like, wherein the source session information input by the first user at the source end is the source end input information described below, and the first user is the source end user described below.

There are various ways of inputting information at the source, including but not limited to: text, speech, pictures. When a source end user clicks a text box, text information can be directly input; when a source end user clicks a left upper corner photographing button, a corresponding picture can be selected from an album of the terminal equipment, the terminal equipment can be directly utilized for photographing, and a cross-language and cross-cultural background translation system can acquire text information in the picture through an optical character recognition technology and display the text information in a text box; when the source end user clicks the voice button at the upper right corner, a corresponding voice fragment can be selected from the voice recording of the terminal equipment, the voice input can be directly carried out by utilizing the terminal equipment, and the cross-language and cross-cultural background translation system can translate the voice input into corresponding text information through a voice recognition technology and display the text information in a text box.

The cross-language cross-cultural background translation system detects the input information of the source terminal, loads a cultural background classifier of a corresponding language type, classifies the cultural background of the input information of the source terminal, and feeds back the input information of the source terminal to the user of the source terminal through real-time display, wherein a cultural background option in fig. 18 is a cultural background possibly carried by the input information of the source terminal. Meanwhile, the function of manually selecting the input information cultural background by the source end user is provided, so that a small amount of cultural background discrimination errors occurring in a cross-language cross-cultural background translation system are corrected. That is, if the source end user considers that the cross-language and cross-cultural background translation system is inaccurate in judgment, the 'reselection' button beside the 'cultural background' can be clicked, at this time, a cultural background list can be popped up, and the source end user reselects the cultural background of the input text according to subjective judgment. The cross-language cross-cultural background translation system can detect the language type of the input information of the source terminal and feed back the language type to the user of the source terminal through real-time display. Similar to the cultural background information, the language category may also be reselected by the source user.

The cultural background classifier is a pre-trained cultural background classifier model, which includes but is not limited to: logistic regression models, support Vector Machine (SVM) models, neural network models, etc. The cultural background classifier is generated as follows: the method comprises the steps of taking a vector converted from text information as input, taking the field to which the text belongs as output, and continuously correcting output errors and adjusting classifier parameters to obtain an offline model of a cultural background classifier of different languages.

In fig. 18, when the source user inputs "In electronic industry, samsung can be considered the, 800 post gorilla", the cross-language cross-cultural background translation system first detects that the language type of the source input information is english, then loads the cultural background classifier of "english", and detects that the cultural background of this expression is the american slang culture, so the "source language" is set to "english", and the "cultural background" is set to "american slang". At this time, the source user may also manually select the categories of "source language" and "cultural background". It should be noted here that the "source language" option has a certain correlation with the "cultural background" option, that is, when the "cultural background" is "usa slang", the source language may not appear as an option like "chinese", "korean", or the like having no relation to "usa slang".

And loading a corresponding culture translation model according to the culture background information by the cross-language and cross-culture background translation system to translate the source input information.

The cultural translation model is a trained string-to-string (Sequence to Sequence) translation model, and the method for constructing the cultural translation model comprises the following steps: the corresponding sentences or phrases are input and output by taking words or characters as units, and because the sentences in the expression form of the special culture background mostly consist of common words or characters, in order to keep the field information, the culture background labels are required to be used as input or output labels at the same time, and training is carried out in the form of word vectors. And adjusting the model parameters by continuously correcting the output errors until the model parameters finally converge. When the information in the special cultural background expression form is used as the input of the training model and the information in the general expression form is used as the output of the training model, the cultural translation model is a cultural translation model of 'special cultural background- > general'; when the information in the general expression form is input as a training model and the information in the special cultural background expression form is output as a training model, the cultural translation model is a cultural translation model of 'general- > special cultural background'.

Specifically, given the cultural translation model trained by the above method, input information having an expression form of a special cultural background, for example, "In electronic industry, samsung can be considered the, bound gorilla" in fig. 18 is to be translated into text information of the same kind of language as the source input information in a general expression form, for example, "In electronic industry, samsung has quite powerful strength" in fig. 18, that is, the translation system is first loaded into the cultural translation system of "american slang-general", and the input "In electronic industry, samsung can be considered the, 800 bound gorilla" of the source user is translated into "In electronic industry, samsung has quite powerful strength".

Further, in fig. 18, in terms of language types, the cross-language cross-cultural background translation system loads a corresponding basic language translation model, and further translates the translation result "In electronic industry, samsung has quite powerful struct" into a target language type. Because the target language is selected as Chinese, a cross-language cross-cultural background translation system loads a basic language translation model of English- > and translates In electronic industry, samsung has quite powerful strungth into three stars, which has very strong strength in the electronic industry. ", i.e., target session information.

The basic language translation model is also a string-to-string (Sequence to Sequence) translation model, and the method for constructing the basic language translation model comprises the following steps: the information of one language type is used as the input of a training model, the result of translating the information of the language type by using other language types is used as the output of the training model, and the purpose of translating the different language types is achieved by continuously adjusting model parameters. In addition, the various languages described above include, but are not limited to: chinese, english, korean, japanese, etc.

Step three: results presentation

Finally, the cross-language cross-cultural background translation system presents the final translation result to the target user in the form of characters through the terminal equipment, as shown in fig. 18, namely, the final translation result presented by the terminal equipment is' in the electronic industry, three stars have very strong strength. ".

In the eleventh embodiment, the source user (i.e., the first user) is the current device user, and the target user (i.e., the second user) is the opposite user, i.e., the user who performs a session with the current device user.

Embodiment twelve: expression form of special cultural background field to general expression form (picture plus square output)

When the cross-language cross-cultural background translation system detects that the translation result (namely the target session information) can be displayed more intuitively by using the picture, the cross-language cross-cultural background translation system not only can output the target session information which is the word translation result at the target end, but also can output the picture which can vividly express the target session information, so that a user can understand the output target session information more intuitively. At this time, the cross-language and cross-cultural background translation system comprises the following operation steps:

Step one: the step of receiving the source session information input by the first user at the source end and detecting the domain to which the information of the source session information belongs, such as cultural background and language type, is the same as the step one in the eleventh embodiment, and will not be described herein.

The cross-language cross-cultural background translation system is used for translating the source input information by loading a corresponding cultural translation model according to the cultural background information of the source input information, firstly translating the input information which is input by the source and has a special expression form of a special cultural background into text information which is in a general expression form and has the same language type as the source input information, then loading a corresponding basic language translation model, and further translating the translation result into session information of a target language type.

The method for constructing the cultural translation model is the same as the description of the second step in the eleventh embodiment, and will not be repeated here.

In FIG. 19, the cross-language cross-cultural background translation system is first loaded with the "American slang-general" cultural translation system, which inputs the user "Hi, zhang, GIVE ME FIVE-! "translate to" Hi, zhang, clap your hands-! ". Then, because the target language is Chinese and the source language is English, the cross-language cross-cultural background translation model will load the basic language translation model in English, and the translation result Hi, zhang, clap your hands-! "further translate into" hi, tensor, hit the palm-! ".

The construction method of the basic language translation model is the same as that of the second step in the eleventh embodiment, and will not be described herein.

Step three: results presentation

Finally, the cross-language and cross-cultural background translation system presents the final translation result to the user in the form of text through the terminal device, as shown in fig. 19, i.e. the final translation result presented by the terminal device is "hi, the classmate, hit the palm-! "this target session information.

Further, when the target session information can be intuitively represented through the pictures, the cross-language cross-cultural background translation system queries a picture database according to the target session information, finds out the picture which can most accurately express the meaning of the target session information by a method of calculating the semantic similarity between the target session information and the candidate pictures, and supplements and displays the picture on the terminal equipment. The semantics of the candidate picture (i.e., the picture description information) may be displayed on the picture or may be generated according to the picture.

It should be noted that, the sentence automatic generation system, the analogy translation system, the cross-language cross-domain translation system and the like based on the keyword can also query the picture database according to the target session information, find out the picture which can most exactly express the meaning of the target session information by calculating the semantic similarity between the target session information and the candidate picture, and make up for display on the terminal device.

In addition, the sentence automatic generation system, the analogy translation system, the cross-language cross-domain translation system, the cross-language cross-cultural background translation system and the like based on the keyword can also query a picture database according to the source session information, find out the picture which can most exactly express the meaning of the source session information by a method of calculating the semantic similarity of the source session information and the candidate picture, and supplement and display the picture on the terminal equipment.

Similarly, the sentence automatic generation system, the analogy translation system, the cross-language cross-domain translation system, the cross-language cross-cultural background translation system and the like based on the keyword can also query a picture database according to the source conversation information and the target conversation information, find out the picture which can most exactly express the meaning of the source conversation information and the target conversation information by calculating the semantic similarity between the source conversation information and the target conversation information and the candidate picture respectively, and supplement and display the picture on the terminal equipment.

Wherein the picture database includes, but is not limited to, for example: the user network cloud stores the picture library in the user terminal equipment, the picture library which can be searched on the network search engine and the like.

E. Picture recommendation system

In order to enable a user to better communicate through multi-mode information, the embodiment of the invention provides a picture recommendation system, and a complete flow chart of the picture recommendation system is shown in fig. 20.

The picture recommendation system predicts session estimation information such as session content and/or session emotion that the first user wants to express, based on picture selection information input by the first user (for example, smiling face beside a "send" button in fig. 21, i.e., a "picture" button, which the user clicks to indicate that the picture recommendation system is activated), and historical session information between the first user and the second user, and obtains a target picture from the candidate picture according to semantic similarity between the session estimation information and the candidate picture, and predicts emotion of the first user according to text information, picture information, past historical session information, and the like input by the first user, thereby recommending a suitable expression or picture. The semantics of the candidate pictures (i.e., the picture description information) may be displayed on the pictures, or may be generated according to the pictures, where the first user (i.e., the source end user) is the current device user end user, and the second user is the opposite user, i.e., the user who performs a session with the current device user end user.

In addition, in order to recommend a more proper expression or picture, the picture recommendation system may further obtain user information of the first user, for example, age, gender, preference, occupation, etc. of the user, further combine the obtained user information of the first user, predict session content and/or session emotion and other session estimation information that the first user wants to express, that is, the picture recommendation system predicts session content and/or session emotion and other session estimation information that the first user wants to express based on picture selection information input by the first user, historical session information between the first user and the second user, and user information of the first user, and further obtain a target picture from the candidate picture according to semantic similarity between the session estimation information and the candidate picture, and take the target picture as target session information. The semantics of the candidate picture (i.e., picture description) may be displayed on the picture or may be generated according to the picture.

Specifically, in the implementation process of the picture recommendation system, a pre-trained dialogue understanding model is called, a picture database is called according to source session information and historical session information input by a first user, or according to the source session information, the historical session information and user information of the first user input by the first user, content and emotion tendencies possibly expressed by a second user are deduced, and a picture recommendation model is loaded to recommend pictures for the first user.

The source user is the first user, and the target user is the second user. According to the picture recommendation system, dialogue meanings can be understood according to source session information, historical session information, user information of a source user and the like, pictures which possibly need to express emotion of the target user can be deduced from a picture database without inputting information such as keywords and the like by the target user, and the pictures are pushed to the target user, so that when a picture library of the target user is huge, the time for searching the pictures by the target user is shortened. Because the picture recommendation system depends on the user information, the user is required to open the authority of the picture recommendation system to acquire part of the user information.

Wherein the picture database includes, but is not limited to: the user network cloud stores the picture library in the user terminal equipment, the picture library which can be searched on the network search engine and the like.

The picture recommendation system is described in detail by the following embodiments:

Embodiment thirteen: picture recommendation system

As shown in fig. 21, a part of the operation process of the picture recommendation system on the user terminal device is shown, wherein the user who inputs the source session information of "o-affinity" for you speaking jokes to … "is the source user, and the other user is the target user. The operation steps of the picture recommendation system are as follows:

step one: understanding user conversations, inferring information that a target user wants to send

When the target user needs to send a picture to express the emotion of the target user, the smiling face beside the 'send' button, namely the 'picture' button, namely the picture selection information is clicked, and the picture recommendation system is activated at the moment. The picture recommendation system firstly calls a dialogue understanding model, fully understands a user dialogue according to user dialogue information and user information, and deduces content and emotion tendencies which a source user possibly wants to express, wherein the user dialogue information comprises source dialogue information input by the source user and historical dialogue information between the source user and a target user, and the user information refers to user information of the source user.

The dialogue understanding model is a pre-trained string-to-string (Sequence to Sequence) model, and the method for constructing the dialogue understanding model comprises the following steps: the dialogue takes words or characters as input and output, and the model parameters (weights among nodes in the network, error adjustment rates and the like) are adjusted until finally convergence (convergence is a definition in mathematics and means that the model parameters reach local optimum) by continuously correcting output errors (words or characters are expressed in vector form at the input end and the output end, namely, errors between calculated network output vectors and original result vectors). Assuming that a (i) is the i-th dialogue of the source user a, B (i) is the i-th dialogue of the target user B, and assuming that the source user a starts speaking first, the input and output of the 1 st parameter update are a (1) and B (1), respectively, the input and output of the 2 nd parameter update are B (1) and a (2), respectively, and so on, the input and output of the i x 2-1 th parameter update are a (i) and B (i), respectively, and the input and output of the i x2 th parameter update are B (i) and a (i+1), respectively, until the dialogue is ended, and the next dialogue parameter is iteratively updated. And iterating through all dialog training sets until the parameters converge (all parameters in the network reach local optima). The dialogue understanding model is a new model added with characteristics such as user emotion and the like on the basis of the existing model.

In fig. 21, the picture recommendation system, through the pre-trained dialog understanding system, deduces that the source user may express conversational content and/or conversational emotions in terms of "happy", "laugh", etc.

Step two: recommending pictures

When the picture recommendation system deduces the conversation emotion of the source user, a picture database is called, a picture recommendation model is loaded, and pictures matched with the conversation emotion of the source user are recommended to the source user.

Specifically, stored in the picture database is a picture that has been converted to contain a textual description, the conversion being based on a picture translation model. The picture translation model is a trained encoding-decoding (Encoder-Decoder) model, and the method for constructing the picture translation model comprises the following steps: the picture pixel matrix is used as input, the picture description is used as output, and the model parameters are adjusted until the final convergence is achieved by continuously correcting output errors, so that the aim of converting pictures into characters is fulfilled.

Further, the input of the picture recommendation model is text description, the text description is converted into vectors through a Word2Vec model, similarity calculation is carried out on the text description and the text description of pictures in a database, the picture frequency is used as a weight reference by a user, similarity ranking is carried out on the pictures, and the first K pictures with the maximum similarity are marked and are output.

The language types of the source input information include, but are not limited to: chinese, english, korean, japanese, etc.

In addition, in fig. 20, the picture recommendation system calls a picture recommendation model, takes emotional contents such as "happy", "laugh" and the like as input, and the picture recommendation model takes four pictures with highest similarity as a result of recommending pictures, namely as target session information, because the number of recommended pictures preset by a user is 4.

Step three: results presentation

Finally, the picture recommendation system arranges and presents the final recommended picture result to the target user through the terminal equipment, as shown in fig. 21, that is, the terminal equipment finally presents four pictures representing "happy" and "laugh". When a user clicks on a picture, the corresponding picture is sent out as a user's answer.

Further, when the user considers that the recommendation of the picture recommendation system is inaccurate, the user can click on a lower right corner search button to manually search for a picture which the user wants.

According to the description of the first to thirteenth embodiments, it can be seen that, when generating the target session information, the method for processing session information provided by the embodiment of the present invention not only relies on the source session information input by the first user, but also fully considers the acquired auxiliary information, so that the second user can understand the generated natural language sentence, enhance the practicability of language generation, erect a bridge for communication between people who are inconvenient for normal communication, and convert the unreadable words into popular and understandable natural language, thereby breaking the understanding barrier of the user in the communication process.

In addition, the analogy translation system can convert key entities or events and the like which are difficult to understand by a user into understanding which can be understood by the user, so that the understanding of the user on the generated content is enhanced; the cross-language cross-domain translation system can meet the understanding of a user on special expressions of unfamiliar professional fields; the cross-language and cross-cultural background translation system can help a user to understand special expressions with different cultural backgrounds; the picture recommendation system is convenient for users to better communicate by using the multi-mode information.

The embodiment of the invention also provides a device for processing session information according to another aspect, which comprises: the information acquisition module 201 and the session generation module 211; wherein: an information acquisition module 201, configured to acquire auxiliary information and source session information input by a first user; the session generation module 211 is configured to generate target session information and output the target session information based on the auxiliary information and the source session information.

Specifically, the auxiliary information includes at least one of history session information, user information of the first user, user information of the second user, and a domain to which the information belongs, wherein the domain to which the information belongs includes a domain to which the source information belongs and a domain to which the target information belongs.

Further, the user information includes at least one of:

User attribute information;

user preference information;

User schedule information;

user location information;

user behavior information;

user equipment information.

Further, the information field includes at least one of the following:

Language type information;

professional field information;

cultural background information.

Further, the source session information includes at least one of: abbreviations, incomplete words, natural language sentences, and picture selection information.

Further, the session generation module 211 is specifically configured to at least one of the following:

Extracting session information keywords of source session information and historical session information between a first user and a second user, and generating target session information according to the session information keywords;

extracting session information keywords of source session information and historical session information between a first user and a second user and user information keywords of user information of the first user and/or the second user, and generating target session information according to the session information keywords and the user information keywords;

Translating source session information in the field of the source information into session information in the field of the target information according to the field of the information, extracting session information keywords of the session information in the field of the target information and history session information between the first user and the second user, and generating target session information according to the session information keywords;

According to the field of the information, translating the source session information in the field of the source information into the session information in the field of the target information, extracting session information keywords of the session information in the field of the target information and history session information between the first user and the second user and user information keywords of user information of the first user and/or the second user, and generating the target session information according to the session information keywords and the user information keywords.

Further, the session generation module 211 is specifically configured to generate target session information based on a pre-trained sentence generation model according to the session information keyword;

generating target session information according to the session information keywords and the user information keywords comprises: and generating a model based on the pre-trained sentences according to the session information keywords and the user information keywords, and generating target session information.

According to the source session information, source object information is obtained, and the source object information is classified to obtain source category information; obtaining candidate target category information according to the user information of the first user; obtaining target class information according to the similarity between the source class information and the candidate target class information; obtaining candidate target object information according to the target category information; obtaining target object information according to the similarity of the source object information and the candidate target object information; generating target session information according to the target object information;

Obtaining source object information according to the source session information and historical session information between a first user and more than one second user; classifying the source object information to obtain source category information; obtaining candidate target category information according to the user information of the first user; obtaining target class information according to the similarity between the source class information and the candidate target class information; obtaining candidate target object information according to the target category information; obtaining target object information according to the similarity of the source object information and the candidate target object information; generating target session information according to the target object information;

Translating source session information in the field of the source information into session information in the field of the target information according to the field of the information, and obtaining source object information according to the session information in the field of the target information and historical session information between a first user and more than one second user; classifying the source object information to obtain source category information; obtaining candidate target category information according to the user information of the first user; obtaining target class information according to the similarity between the source class information and the candidate target class information; obtaining candidate target object information according to the target category information; obtaining target object information according to the similarity of the source object information and the candidate target object information; generating target session information according to the target object information;

An object includes an entity and/or an event.

Further, the session generation module 211 is specifically configured to translate the source session information in the domain of the source information into the target session information in the domain of the target information according to the domain of the information.

Further, the session generation module 211 is specifically configured to perform, according to the domain to which the information belongs, a translation process of at least one of a language type, a professional domain, and a cultural background on the source session information according to a sequence based on a corresponding pre-trained translation model, so as to obtain the target session information.

Further, the apparatus further comprises: the picture acquisition module 212 (not shown in the figure), the picture acquisition module 212 is configured to acquire a target picture corresponding to the source session information and/or the target session information according to the semantic similarity between the source session information and/or the target session information and the candidate picture, and output the target picture.

Acquiring session presumption information according to picture selection information input by a first user and historical session information between the first user and a second user; according to the semantic similarity between the session estimation information and the candidate picture, acquiring a target picture from the candidate picture, and taking the target picture as target session information;

Acquiring session presumption information according to picture selection information input by a first user, historical session information between the first user and a second user and user information of the first user; and acquiring a target picture from the candidate picture according to the semantic similarity between the session estimation information and the candidate picture, and taking the target picture as target session information.

Further, the session generation module 211 is specifically configured to obtain session estimation information based on a pre-trained dialogue understanding model according to the picture selection information input by the first user and the history session information between the first user and the second user;

and acquiring session estimation information based on the pre-trained session understanding model according to the picture selection information input by the first user, the historical session information between the first user and the second user and the user information of the first user.

Further, the session estimation information includes: the second user wants to express the conversation content and/or the first user wants to express the conversation emotion.

According to another aspect, the embodiment of the present invention further provides a terminal device, including: a processor; and a memory configured to store machine-readable instructions that, when executed by the processor, cause the processor to perform the method of configuring random access information.

Fig. 23 schematically illustrates a block diagram of a computing system that may be used to implement a base station or user equipment of the present disclosure, in accordance with an embodiment of the present disclosure.

As shown in fig. 23, the computing system 2300 includes a processor 2310, a computer-readable storage medium 2320, an output interface 2330, and an input interface 2340. The computing system 2300 may perform the method described above with reference to fig. 1.

In particular, the processor 2310 may include, for example, a general purpose microprocessor, an instruction set processor and/or an associated chipset and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), or the like. The processor 2310 may also include on-board memory for caching purposes. The processor 2310 may be a single processing unit or multiple processing units for performing the different actions of the method flow described with reference to fig. 1.

Computer-readable storage medium 2320 may be any medium that can contain, store, communicate, propagate, or transport instructions, for example. For example, a readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, device, or propagation medium. Specific examples of the readable storage medium include: magnetic storage devices such as magnetic tape or hard disk (HDD); optical storage devices such as compact discs (CD-ROMs); a memory, such as a Random Access Memory (RAM) or a flash memory; and/or a wired/wireless communication link.

The computer-readable storage medium 2320 may include a computer program that may include code/computer-executable instructions that, when executed by the processor 2310, cause the processor 2310 to perform the method flow and any variations thereof as described above in connection with fig. 1.

The computer program may be configured with computer program code comprising, for example, computer program modules. For example, in an example embodiment, code in a computer program may include one or more program modules, including for example, module 1, module 2, … …. It should be noted that the division and number of modules is not fixed, and that a person skilled in the art may use suitable program modules or combinations of program modules depending on the actual situation, which when executed by the processor 2310, enable the processor 2310 to perform the method flow as described above in connection with fig. 1 and any variations thereof.

The processor 2310 may use the output interface 2330 and the input interface 2340 to perform the method flow described above in connection with fig. 1 and any variations thereof, according to embodiments of the present disclosure.

Those skilled in the art will appreciate that the present application includes apparatuses related to performing one or more of the operations described herein. These devices may be specially designed and constructed for the required purposes, or may comprise known devices in general purpose computers. These devices have computer programs stored therein that are selectively activated or reconfigured. Such a computer program may be stored in a device (e.g., a computer) readable medium or any type of medium suitable for storing electronic instructions and respectively coupled to a bus, including, but not limited to, any type of disk (including floppy disks, hard disks, optical disks, CD-ROMs, and magneto-optical disks), ROMs (Read-Only memories), RAMs (Random Access Memory, random access memories), EPROMs (Erasable Programmable Read-Only memories), EEPROMs (ELECTRICALLY ERASABLE PROGRAMMABLE READ-Only memories), flash memories, magnetic cards, or optical cards. That is, a readable medium includes any medium that stores or transmits information in a form readable by a device (e.g., a computer).

It will be understood by those within the art that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by computer program instructions. Those skilled in the art will appreciate that the computer program instructions can be implemented in a processor of a general purpose computer, special purpose computer, or other programmable data processing method, such that the blocks of the block diagrams and/or flowchart illustration are implemented by the processor of the computer or other programmable data processing method.

Those of skill in the art will appreciate that the various operations, methods, steps in the flow, acts, schemes, and alternatives discussed in the present invention may be alternated, altered, combined, or eliminated. Further, other steps, means, or steps in a process having various operations, methods, or procedures discussed herein may be alternated, altered, rearranged, disassembled, combined, or eliminated. Further, steps, measures, schemes in the prior art with various operations, methods, flows disclosed in the present invention may also be alternated, altered, rearranged, decomposed, combined, or deleted.

The foregoing is only a partial embodiment of the present invention, and it should be noted that it will be apparent to those skilled in the art that modifications and adaptations can be made without departing from the principles of the present invention, and such modifications and adaptations are intended to be comprehended within the scope of the present invention.

Claims

1. A method performed by an electronic device, comprising:

Acquiring auxiliary information and source session information input by a first user; the auxiliary information comprises at least one of historical session information between a first user and at least one second user, user information of the first user, user information of the second user and the field to which the information belongs;

Generating target session information based on the auxiliary information and the source session information, including: acquiring information related to the session based on the source session information and the auxiliary information, and generating target session information based on the information related to the session;

Outputting the target session information;

wherein the information related to the session includes at least one of:

session information keywords;

target object information, the object comprising an entity and/or an event;

Session speculation information including the content of the session the first user wants to express and/or the emotion of the session the first user wants to express.

2. The method of claim 1, wherein the fields to which the information pertains include a field to which source information pertains and a field to which target information pertains.

3. The method of claim 2, wherein the user information comprises at least one of:

User attribute information;

user preference information;

User schedule information;

user location information;

user behavior information;

user equipment information.

4. The method of claim 1, the information comprising at least one of:

Language type information;

professional field information;

cultural background information.

5. The method of claim 2, wherein the source information belongs to a field obtained by detecting or settable source session information, and the target information belongs to a field obtained by detecting or settable history session information.

6. The method of claim 1, wherein the source session information comprises at least one of:

abbreviations, incomplete words, natural language sentences, and picture selection information.

7. The method of claim 1, the generating target session information based on the assistance information and the source session information comprising at least one of:

8. The method of claim 7, the generating target session information from session information keywords comprising: generating a model based on a pre-trained sentence according to the conversation information keywords, and generating target conversation information;

The generating the target session information according to the session information keywords and the user information keywords comprises: and generating a model based on the pre-trained sentences according to the session information keywords and the user information keywords, and generating target session information.

9. The method of claim 1, the generating target session information based on the assistance information and the source session information comprising at least one of:

the object includes an entity and/or an event.

10. The method of claim 1, the generating target session information based on the assistance information and the source session information comprising:

and translating the source session information in the domain of the source information into the target session information in the domain of the target information according to the domain of the information.

11. The method of claim 10, wherein translating source session information of a domain to which source information belongs to target session information of a domain to which target information belongs according to a domain to which information belongs comprises:

And according to the field to which the information belongs, translating at least one of the language type, the professional field and the cultural background of the source session information according to the sequence based on a corresponding pre-trained translation model to obtain the target session information.

12. The method according to any one of claims 7-11, further comprising:

And obtaining a target picture corresponding to the source session information and/or the target session information according to the semantic similarity between the source session information and/or the target session information and the candidate picture, and outputting the target picture.

13. The method of claim 1, the generating target session information based on the assistance information and the source session information comprising at least one of:

14. The method of claim 13, wherein the obtaining the session estimation information according to the picture selection information input by the first user and the historical session information between the first user and the second user comprises: acquiring session estimation information based on a pre-trained session understanding model according to picture selection information input by a first user and historical session information between the first user and a second user;

The obtaining the session estimation information according to the picture selection information input by the first user, the historical session information between the first user and the second user, and the user information of the first user includes: and acquiring session estimation information based on the pre-trained session understanding model according to the picture selection information input by the first user, the historical session information between the first user and the second user and the user information of the first user.

15. An electronic device comprising a memory and a processor, the memory having stored thereon computer executable instructions that, when executed by the processor, perform the method of any of the preceding claims 1-14.

16. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein a computer program which, when run by a processor, performs the method of any of claims 1-14.