CN111611468B

CN111611468B - Page interaction method and device and electronic equipment

Info

Publication number: CN111611468B
Application number: CN202010356398.8A
Authority: CN
Inventors: 唐子杰; 张海杰; 麻雪云; 程磊生; 曹文强; 曹彬; 陈奇; 吴开放; 吕晟; 邵领; 张弛; 王珍
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-04-29
Filing date: 2020-04-29
Publication date: 2023-08-25
Anticipated expiration: 2040-04-29
Also published as: CN111611468A

Abstract

The application discloses a page interaction method, and relates to the technical field of voice in the technical field of computers. The specific implementation scheme is as follows: acquiring executable interactive content of a page; acquiring voice information; performing intention recognition on the voice information to determine target interaction matched with the intention of the voice information from executable interaction content; the target interaction is performed at the page. Therefore, through executing target interaction matched with the intention of the voice information input by the user on the page, the user can interact with the webpage in a voice mode, and the efficiency of interaction between the user and the page is improved.

Description

Page interaction method and device and electronic equipment

Technical Field

The present application relates to the field of computer technology, and in particular, to a method and apparatus for page interaction, and an electronic device.

Background

The webpage is the largest information carrier of the Internet, and in the Internet age of a personal computer (Personal Computer, PC for short), people interact with the webpage by clicking, scrolling and keyboard input modes; in the mobile internet era, people interact with web pages in a clicking and sliding manner with fingers.

However, there is a technical problem that the input efficiency is low, and thus the efficiency is low when the user interacts with the web page, regardless of the mode of clicking with a mouse, scrolling, inputting with a keyboard, or interacting with the web page with a finger.

Disclosure of Invention

The application provides a page interaction method, a page interaction device, electronic equipment and a storage medium.

An embodiment of a first aspect of the present application provides a page interaction method, including:

acquiring executable interactive content of a page;

acquiring voice information;

performing intent recognition on the voice information to determine target interactions matching the intent of the voice information from the executable interaction content;

and executing the target interaction on the page.

An embodiment of a second aspect of the present application provides another page interaction method, including:

acquiring executable interactive content obtained by a page front-end identification page;

acquiring voice information;

sending an interaction instruction of the target interaction to the front end of the page; and the interaction instruction is used for executing the target interaction on the page.

An embodiment of a third aspect of the present application provides a page interaction device, including:

the first acquisition module is used for acquiring executable interactive contents of the page;

the second acquisition module is used for acquiring voice information;

the intention recognition module is used for carrying out intention recognition on the voice information so as to determine target interaction matched with the intention of the voice information from the executable interaction content;

and the execution module is used for executing the target interaction on the page.

Another page interaction device provided by an embodiment of the fourth aspect of the present application includes:

the interactive acquisition module is used for acquiring executable interactive content obtained by the front-end identification page of the page;

the voice acquisition module is used for acquiring voice information;

the recognition module is used for carrying out intention recognition on the voice information so as to determine target interaction matched with the intention of the voice information from the executable interaction content;

the sending module is used for sending the interaction instruction of the target interaction to the front end of the page; and the interaction instruction is used for executing the target interaction on the page.

An embodiment of a fifth aspect of the present application provides an electronic device, including:

At least one processor; and

a memory communicatively coupled to the at least one processor; wherein,,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the page interaction method of the first aspect embodiment or the page interaction method of the second aspect embodiment.

An embodiment of a sixth aspect of the present application provides a non-transitory computer readable storage medium storing computer instructions for causing a computer to execute the page interaction method of the embodiment of the first aspect, or the page interaction method of the embodiment of the second aspect.

One embodiment of the above application has the following advantages or benefits: acquiring executable interactive content of a page; acquiring voice information; performing intention recognition on the voice information to determine target interaction matched with the intention of the voice information from executable interaction content; the target interaction is performed at the page. Therefore, through executing target interaction matched with the intention of the voice information input by the user on the page, the user can interact with the webpage in a voice mode, and the efficiency of interaction between the user and the page is improved.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are included to provide a better understanding of the present application and are not to be construed as limiting the application. Wherein:

fig. 1 is a schematic flow chart of a page interaction method according to a first embodiment of the present application;

fig. 2 is a schematic flow chart of a page interaction method according to a second embodiment of the present application;

fig. 3 is a flow chart of a page interaction method according to a third embodiment of the present application;

fig. 4 is a flow chart of a page interaction method according to a fourth embodiment of the present application;

fig. 5 is a flow chart of a page interaction method provided in a fifth embodiment of the present application;

FIG. 6 is an exemplary diagram of a page interaction process provided in a sixth embodiment of the present application;

FIG. 7 is a schematic diagram of user interaction with a page according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of a page interaction device according to a seventh embodiment of the present application;

fig. 9 is a schematic structural diagram of a page interaction device according to an eighth embodiment of the present application;

FIG. 10 is a block diagram of an electronic device for implementing a method of page interaction of an embodiment of the application.

Detailed Description

Exemplary embodiments of the present application will now be described with reference to the accompanying drawings, in which various details of the embodiments of the present application are included to facilitate understanding, and are to be considered merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In the related art, the method is used for interacting with the web page only by using the modes of mouse click, scrolling, keyboard input and finger touch, and the method cannot be used for interacting with the web page by using voice control.

Compared with a voice interaction mode, the mode of using a mouse and finger to touch to interact with a webpage has the following disadvantages: the typing efficiency of a user by a keyboard is far lower than the voice input efficiency in terms of the input efficiency; from the learning cost, the computer and the mobile phone are used as the basis for learning with certain knowledge, and the voice is not needed; from the operational distance, the user must be in close contact with the device, which is inconvenient in some situations.

Aiming at the technical problems existing when the prior user interacts with the webpage, the application provides a webpage interaction method, which comprises the steps of acquiring executable interaction content of the webpage; acquiring voice information; performing intention recognition on the voice information to determine target interaction matched with the intention of the voice information from executable interaction content; the target interaction is performed at the page. Therefore, interaction with the webpage is realized through a voice interaction mode, a user does not need to additionally learn to use characters or touch equipment, and can interact with the webpage only by speaking, so that the interaction efficiency is improved.

The page interaction method, the page interaction device, the electronic equipment and the storage medium according to the embodiment of the application are described below with reference to the accompanying drawings.

Fig. 1 is a flow chart of a page interaction method according to an embodiment of the present application.

The embodiment of the application is exemplified by the page interaction method being configured in the page interaction device, and the page interaction device can be applied to any electronic equipment so that the electronic equipment can execute the page interaction function.

The electronic device may be a PC, a cloud device, a mobile device, etc., and the mobile device may be a hardware device with various operating systems, such as a mobile phone, a tablet computer, a personal digital assistant, a wearable device, a vehicle-mounted device, etc.

As one example, an electronic device may include a voice module, a page front end, and a server module. For example, a speech module obtains speech uttered by a user; the front end of the page identifies the page, and executable interactive content of the page is obtained; the server module performs intention recognition on the voice acquired from the voice module to determine target interaction matched with the intention of the voice information from executable interaction content; further, the front end of the page performs the target interaction on the page. Therefore, the function of voice interaction between the user and the webpage is realized.

The process of executing the page interactive function by the electronic device described above is described in detail below with reference to fig. 1.

As shown in fig. 1, the page interaction method may include the following steps:

step 101, executable interactive content of a page is acquired.

The page may be a hypertext markup language (Hyper Text Markup Language, abbreviated HTML) page. For example, a page presented by a hundred degrees browser, a page presented by a google browser, and so forth.

In the application, the HTML page can be analyzed through the HTML Parser module at the front end of the page so as to obtain executable interactive content of the page. Wherein, HTMLParser is a module which is specially used for resolving HTML and is self-contained in a computer programming language Python. The HTMLParser module can be utilized to analyze executable interactive content of the HTML page.

It should be explained that the executable interactive content of the page may be page operation content, such as sliding, refreshing, advancing, backing, etc.; the content of the page clicking operation class can be also used, such as immediate inquiry, label switching and the like; the content of the groove filling operation can also be such as departure place, arrival place, departure date and the like; other types of interactive content are also possible and will not be described in detail here by way of example.

Step 102, obtaining voice information.

The voice information is text information input by a user in a voice mode.

In the embodiment of the application, when the user performs voice interaction with the page, the user speaks a section of speech, the voice module of the electronic equipment records the speaking content of the user and sends text information obtained by identifying the recording content to the server module of the electronic equipment, so that the server module of the electronic equipment obtains the voice information.

As an example, when the user needs to purchase a train ticket, the user may say "the train ticket from beijing to shanghai in five weeks", and the voice module of the electronic device records the speaking content of the user and sends text information obtained by identifying the recording content to the server module, so that the server module obtains the voice information.

It should be noted that, in this embodiment, the execution order of the steps 101 and 102 is not limited, and the process of acquiring the voice information in the step 102 may be performed first, and then the process of acquiring the executable interactive content of the page in the step 101 may be performed.

For example, when the user performs voice interaction with the page, the voice module may acquire voice information input by the user after the front end of the page of the electronic device acquires executable interaction content of the page; or the voice module of the electronic device may acquire the voice information input by the user, and then acquire the executable interactive content of the page at the front end of the page, which is not limited in this embodiment.

In step 103, intention recognition is performed on the voice information to determine a target interaction matching the intention of the voice information from the executable interaction content.

The target interaction refers to executable interaction content of a page matched with the intention of voice information input by a user.

According to the application, after the server side module of the electronic equipment acquires the voice information, the voice information can be subjected to intention recognition so as to determine the intention of the user for interacting with the page from the voice information.

As one possible implementation, text content corresponding to the voice information may be input into a trained intent recognition model to obtain intent information of the voice information according to an output of the model. The intention recognition model learns the mapping relation between the voice information and the corresponding intention, so that the intention of the voice information can be accurately recognized.

As another possible implementation manner, the method of rule template classification can also be used for performing intention recognition on the voice information. For example, text information corresponding to the voice information is matched with each template in the template library, so that the intention of the voice information is determined according to the template matched with the text information corresponding to the voice information in the template library.

It should be noted that, the above method for performing intent recognition on voice information is merely an exemplary expression, and other methods for performing intent recognition on voice information are also applicable to the present application.

In the embodiment of the application, after the intention of the user interacting with the page in the voice information is identified, the target interaction matched with the intention of the voice information can be determined from the executable interaction content.

As an example, assuming that the acquired voice information is "train ticket from beijing to shanghai in the fifth week", by performing intention recognition on the voice information, determining that the intention of the voice information is to query the train ticket, it is possible to determine, from the executable interactive contents described above, that the interactive contents matching the intention of the voice information are slot filling operation type contents, wherein the departure place is beijing, the arrival place is shanghai, and the departure date is friday.

And 104, executing target interaction on the page.

According to the method and the device, after target interaction matched with the intention of the voice information input by the user is determined, the front end of the page can be controlled to execute the target interaction on the page.

As an example, the user inputs "Baibai Jiuzhao 201607050-2 wuhan creatures" in a voice manner, and the voice module of the electronic device records the voice input by the user and sends the voice information obtained by recognition to the server module. The server module carries out intention recognition on the received voice information, determines that the intention of the user is the details of the page query vaccine, determines that target interaction matched with the intention of the voice information is 'immediate query' from executable interaction content of the page, and controls the front end of the page to execute the operation of immediate query on the page, thus obtaining the query result.

According to the page interaction method, executable interaction content of the page is obtained; acquiring voice information; performing intention recognition on the voice information to determine target interaction matched with the intention of the voice information from executable interaction content; the target interaction is performed at the page. Therefore, through executing target interaction matched with the intention of the voice information input by the user on the page, the user can interact with the webpage in a voice mode, and the efficiency of interaction between the user and the page is improved.

Based on the foregoing embodiments, another page interaction method is provided in the second embodiment of the present application, and fig. 2 is a schematic flow chart of the page interaction method provided in the second embodiment of the present application.

As shown in fig. 2, the page interaction method may include the following steps:

step 201, parsing the page to obtain the operation of each interactive element response.

For example, the operation to which the interactive element responds may be a click operation, a text input operation, or the like.

In the application, each interactive element in the page can comprise: content interaction elements, menu interaction elements, and state interaction elements.

The < details > and < summary > elements in the content interaction elements belong to newly added content interaction elements and are mainly used for the interactive display of titles, details and contents of documents. The < details > element is used to describe the role of the document or some detailed information, often used in conjunction with the < details > element.

Since the style and function of the element are defined in the attribute information of each interactive element in the page, for example, the response operation of the interactive element and the text description for describing the interactive purpose are included in the attribute information of the interactive element. Therefore, the page can be parsed to obtain the operation of each interactive element response.

And 202, taking the operation responded by each interactive element as the executable interactive operation of each interactive element.

In the application, after the operation of responding to each interactive element in the page is obtained, the operation of responding to each interactive element can be used as the executable interactive operation of each interactive element.

As one example, an operation to which a certain interactive element responds is "click," and it may be determined that the interactive operation that the interactive element may perform is a click operation. That is, "click" in the attribute information of the interactive element is for declaring that the interactive operation executable by the interactive element is a click.

In step 203, a list of executable interactive contents is generated according to the executable interactive operations of each interactive element.

In the application, after the executable interactive operation of each interactive element is determined, the executable interactive operation of each interactive element can be classified to obtain a list of executable interactive contents.

That is, the list of executable interactive contents of the page includes each interactive element and the executable interactive operation corresponding to each interactive element. Furthermore, by inquiring the list of executable interactive contents, the functions corresponding to the interactive elements can be determined, and the efficiency of interaction between the user and the page is improved.

Specifically, after the page is parsed and a plurality of interactive elements in the page are determined, text descriptions of the interactive elements obtained by parsing the page can be obtained. For example, the text description may be acknowledgement, refresh, forward, backward, etc. Further, the text description of each interactive element is used as the interactive purpose of each interactive element.

As an example, the text of a certain interactive element is described as "forward," and the interactive purpose of the interactive element may be determined to be forward.

In the embodiment of the application, after the interactive operation of each interactive element in the page and the interactive purpose of each interactive element are determined, each interactive element can be classified to generate a list of executable interactive contents of the page.

Step 204, obtaining voice information.

In the embodiment of the present application, the implementation process of step 204 may refer to the implementation process of step 102 in the above embodiment, which is not described herein.

It should be noted that the execution of step 204 may also precede step 201, and the present application is not limited thereto.

In step 205, the intention recognition is performed on the voice information to determine the executable interactive operation of the target element in the page according to the executable interactive content.

In the embodiment of the application, the intention recognition is carried out on the voice information, and after the intention of the voice information is determined, the interaction purpose of a plurality of interaction elements in the page can be determined from the executable interaction content of the page. Further, from among the plurality of interactive elements, an interactive element whose interaction purpose matches with the intention is determined as a target element.

Further, executable interactive operations of the target elements in the page can be determined according to executable interactive contents of the page.

As an example, the server module performs intention recognition on the voice information, determines that the intention of the voice information is "purchase train ticket", and can determine that the interactive operation executable by the target element in the page is a query operation from the executable interactive content according to the intention of the voice information.

Step 206, determining the target operation matched with the intention from the executable interactive operations of the target element.

In the embodiment of the application, the intention of the voice information is identified, and after the intention of the voice information is determined, the target operation matched with the intention can be determined from the interactive operation executable by the target element after the interactive operation executable by the target element in the page is determined according to the executable interactive content of the page.

For example, the intention of recognizing the voice information is "purchase of train tickets from Beijing to Shanghai on friday", and the target operation matching the intention may be determined as the query operation among the interactive operations executable by the target element.

Step 207, generating interaction instructions of target interaction according to the target elements and the target operation.

According to the method and the device, after the intention of the voice information is identified, the target element is determined from a plurality of interactive elements in the page according to the intention of the voice information, and after the target operation matched with the intention is determined from the interactive operation executable by the target element, the interactive instruction of the target interaction can be generated according to the target element and the target operation.

Continuing with the example in step 206, after determining that the user interacts with the page to purchase the train ticket from Beijing to Shanghai on friday, the interaction instruction of the target interaction may be generated as the query instruction according to the element corresponding to the search box and the query operation to be executed. Therefore, after the interaction instruction of the target interaction is generated through the intention of the voice information, the target interaction can be executed on the page, and the voice interaction between the user and the page is realized.

And step 208, executing the interaction instruction of the target interaction on the page.

In the embodiment of the application, the server side module generates the interaction instruction of target interaction according to the target element and the target operation executed on the target element, and then sends the interaction instruction to the front end of the page, so that the front end of the page executes the target interaction on the page according to the interaction instruction, and the aim of interaction between a user and the page in a voice mode is realized.

For example, the interaction instruction for generating the target interaction is a query instruction, and the front end of the page may execute the target interaction for querying the train ticket on the page according to the interaction instruction.

According to the page interaction method, the page is analyzed to obtain the operation responded by each interaction element, the operation responded by each interaction element is used as the executable interaction operation of each interaction element, a list of executable interaction contents is generated according to the executable interaction operation of each interaction element, and voice information is obtained; and carrying out intention recognition on the voice information to determine the executable interactive operation of the target element in the page according to the executable interactive content, determining the target operation matched with the intention from the executable interactive operation of the target element, and generating an interactive instruction of target interaction according to the target element and the target operation. Therefore, through determining the target element which accords with the intention of the voice information from the interaction elements in the page, and further generating the interaction instruction of target interaction according to the target element and the target operation executed by the target element, interaction between the user and the page in a voice mode is realized, and interaction efficiency between the user and the page is improved.

On the basis of the above embodiment, when the intention recognition is performed on the voice information in step 103 and step 205, the manner of performing the intention recognition on the voice information may also be determined by judging whether there is a target template matching the voice information in the template library. The above process is described in detail below in conjunction with fig. 3.

Fig. 3 is a flow chart of a page interaction method according to a third embodiment of the present application.

As shown in fig. 3, the page interaction method may further include the following steps:

step 301, a template library is obtained.

Wherein a plurality of trained templates are stored in the template library.

In the embodiment of the application, when the voice information is acquired and recognized, a preset template library can be acquired so as to match the voice information with each template in the template library.

Step 302, each template in the template library is matched with the voice information respectively.

Step 303, judging whether a target template with sentence pattern matched with the voice information exists in the template library.

The target module is a template matched with the voice information in the module library.

In the embodiment of the application, the voice information is matched with each template in the template library so as to judge whether a target template with sentence pattern matched with the voice information exists in the template library, and the voice information is subjected to intention recognition in a corresponding voice information recognition mode.

And step 304, if a target template with sentence pattern matched with the voice information exists in the template library, carrying out intention recognition on the voice information according to the target template.

In the embodiment of the application, after the voice information is matched with each template in the template library, the target template with sentence pattern matched with the voice information is determined to exist in the module library, and in this case, the intention recognition can be carried out on the voice information according to the target template.

As a possible implementation manner, when the intention recognition is performed on the voice information according to the target template matched with the voice information, the text position corresponding to the slot in the voice information can be extracted according to the slot set in the target template, so as to obtain the slot filling content of the slot. Further, the slot filling content of the slot can be used as the intention of the voice information. Therefore, the intention of the voice information can be accurately identified through the slot filling content of the slot, and the accuracy of interaction between the user and the page is improved.

As an example, assume that the acquired voice information is "train ticket from beijing to shanghai in friday", and that one of the templates in the template library is "train ticket from { time } { city name } to { city name }", which exactly matches the voice information "train ticket from beijing to shanghai in friday". Further, according to the groove position set in the template, the text position corresponding to the groove position in the voice information is extracted to obtain the groove filling content of the groove position, so that the intention of the voice information is identified according to the groove filling content of the groove position. For example, the intent recognition of the voice information according to the template may result in:

Intent 1: inputting a departure date; slot 1: this friday;

intent 2: inputting a departure city; slot 2: beijing;

intent 3: the input arrives at the city; slot 3: shanghai.

As another possible implementation manner, when intention recognition is performed on voice information according to a target template matched with the voice information, the intention corresponding to the target template may be regarded as the intention of the voice information.

As an example, assuming that the voice information is "click button", the voice information is matched with each template in the template library, and it is determined that there is a target template matched with the voice information as "X-click button", the intention of the target template may be regarded as the intention of the voice information.

In step 305, if there is no target template matching the sentence pattern with the voice information in the template library, the semantic model is adopted to perform intention recognition on the voice information.

In the embodiment of the application, after the voice information is matched with each template in the template library, the target template matched with the voice information in the sentence pattern does not exist in the module library, and under the condition, the semantic model is adopted to carry out intention recognition on the voice information.

As an example, assuming that the voice information is "walking from the Shanghai", after the voice information is matched with each template in the module library, it is determined that there is no target model in the template library, in which case, the voice information cannot be intention-recognized by using the target template, and then the semantic model is adopted to perform intention recognition on the voice information. Therefore, intention recognition of different voice information is realized, and accuracy of intention recognition of the voice information is improved.

As a possible implementation manner, word segmentation processing is performed on the voice information to obtain semantic features and part-of-speech features of each word segment, and then the voice features and the part-of-speech features of each word segment are input into a semantic model to determine the intention of the voice information according to the output of the semantic model.

It will be appreciated that natural language processing is the process of allowing a computer to understand the language of a human, i.e., to allow a computer to read text as if it were a human, and understand the meaning behind the text. In the reading process of a person, only the meaning of the words is understood, so that the meaning of the whole sentence can be mastered. Thus, in order for a computer to understand the text of a human, it is necessary for the computer to accurately grasp the meaning of each word. Therefore, in natural language processing, it is necessary to word-segment the voice information when processing the voice information.

Alternatively, speech information may be word segmented in a statistical-based manner, with statistical sample content from some standard corpus. The method can also divide words of the voice information in a dictionary word dividing mode so as to obtain semantic features and part-of-speech features of each divided word.

The semantic model is a model which is obtained by training a large number of training samples in advance, and the trained semantic model can accurately identify the intention of each word segmentation characteristic.

According to the webpage interaction method, through obtaining the template library, each template in the template library is respectively matched with the voice information, if a target template with the sentence pattern matched with the voice information exists in the template library, intention recognition is carried out on the voice information according to the target template, and if the target template with the sentence pattern matched with the voice information does not exist in the template library, the intention recognition is carried out on the voice information by adopting a semantic model. Therefore, through adopting different intention recognition modes to carry out intention recognition on different types of voice information, the accuracy rate of the intention recognition of the voice information is improved.

Based on the above embodiments, a fourth embodiment of the present application provides another web page interaction method.

Fig. 4 is a flowchart of a web page interaction method according to a fourth embodiment of the present application.

As shown in fig. 4, the web page interaction method, executed by the server module or the client, may include the following steps:

step 401, obtaining executable interactive content obtained by identifying a page at the front end of the page.

In the application, the front end of the page identifies the page, and the executable interactive content of the page can be obtained, and then the executable interactive content of the page is sent to the server module, so that the server module obtains the executable interactive content obtained by the front end of the page for identifying the page.

As an example, an HTML page may be parsed by an HTML Parser module at the front end of the page to obtain executable interactive content for the page.

Step 402, obtaining voice information.

It should be noted that, in this embodiment, the execution order of the steps 401 and 402 is not limited, and the process of acquiring the voice information in the step 402 may be performed first, and then the process of acquiring the executable interactive content obtained by the page front-end identification page in the step 401 may be performed.

In step 403, the intent recognition is performed on the voice information to determine a target interaction matching the intent of the voice information from the executable interaction content.

In the embodiment of the present application, the implementation process of step 402 and step 403 may refer to the implementation process of step 102 and step 103 in the first embodiment, which is not described herein.

Step 404, sending an interaction instruction of target interaction to the front end of the page; and the interaction instruction is used for executing target interaction on the page.

In the application, the server side module carries out intention recognition on the voice information, determines target interaction matched with the intention of the voice information from executable interaction content, and then sends an interaction instruction corresponding to the target interaction to the front end of the page so that the front end of the page executes the target interaction on the page according to the interaction instruction.

For example, the server module performs intention recognition on the voice information, determines that an interaction instruction corresponding to a target interaction matched with the intention of the voice information is a click instruction from executable interaction content, and sends the instruction to the front end of the page so that the front end of the page executes the click instruction on the page.

According to the page interaction method, the executable interaction content obtained by identifying the page at the front end of the page is obtained, the voice information is obtained, the intention of the voice information is identified, so that target interaction matched with the intention of the voice information is determined from the executable interaction content, and an interaction instruction of the target interaction is sent to the front end of the page; and the interaction instruction is used for executing target interaction on the page. Therefore, the interaction instruction for interacting with the page is determined through the server side module and is sent to the front end of the page, so that target interaction is performed on the page, interaction between the user and the page in a voice interaction mode is realized, and the interaction efficiency of the user and the page is improved.

Based on the embodiment, the application provides a webpage interaction method.

Fig. 5 is a flowchart of a web page interaction method according to a fifth embodiment of the present application.

As shown in fig. 5, the web page interaction method may include the following steps:

step 501, obtaining executable interactive content obtained by identifying a page at the front end of the page.

In the embodiment of the application, after the front end of the page analyzes the page to obtain the operation responded by each interactive element, the operation responded by each interactive element is used as the executable interactive operation of each interactive element.

After the page is analyzed and a plurality of interactive elements in the page are determined, text description of each interactive element obtained by analyzing the page can be obtained. Further, the text description of each interactive element is used as the interactive purpose of each interactive element. After determining the interaction operation of each interaction element in the page and the interaction purpose of each interaction element, each interaction element can be classified to generate a list of executable interaction contents of the page.

After the page front end analyzes the page to obtain executable interactive content of the page, the obtained executable interactive content is sent to the server side module, so that the server side module obtains the executable interactive content obtained by identifying the page at the front end of the page.

Step 502, obtaining voice information.

It should be noted that, in this embodiment, the execution order of the steps 501 and 502 is not limited, and the process of acquiring the voice information in the step 502 may be performed first, and then the process of acquiring the executable interactive content obtained by the page front-end identification page in the step 501 may be performed.

In step 503, the intention recognition is performed on the voice information, so as to determine the executable interactive operation of the target element in the page according to the executable interactive content.

From the interactive operations that the target element may perform, a target operation that matches the intent is determined, step 504.

Step 505, generating an interaction instruction of target interaction according to the target element and the target operation.

In the embodiment of the present application, the implementation process of step 502 to step 505 may refer to the implementation process of step 204 to step 207, and will not be described herein.

Step 506, sending an interaction instruction of target interaction to the front end of the page; and the interaction instruction is used for executing target interaction on the page.

In the embodiment of the present application, the implementation process of step 506 may refer to the implementation process of step 404 in the above embodiment, which is not described herein.

Therefore, the interaction instruction for interacting with the page is determined through the server side module and is sent to the front end of the page, so that target interaction is performed on the page, interaction between the user and the page in a voice interaction mode is realized, and the interaction efficiency of the user and the page is improved.

As an example, referring to fig. 6, fig. 6 is an exemplary diagram of a page interaction procedure provided in a sixth embodiment of the present application.

As shown in fig. 6, the page interaction method includes the steps of:

in step 601, the front end of the page parses the page to obtain executable interactive content of the page.

Step 602, the voice module records the voice input by the user, recognizes the voice, and sends the voice information to the server module after obtaining the voice information.

In step 603, the server module obtains the matching between each module in the module library and the voice information.

In step 604, if there is a target template matching the sentence pattern with the voice information in the template library, the intention recognition is performed on the voice information according to the target template.

And step 605, if a target template matched with the voice information in the sentence pattern does not exist in the template library, extracting the characteristics of the voice information.

Step 606, the extracted features are input into a semantic model to obtain intent of the speech information.

In the application, after the server side module determines the intention of the voice information, the target interaction matched with the intention of the voice information is determined from executable interaction content.

In step 607, the front end of the page performs the target interaction on the page.

Therefore, voice information input by a user is acquired through the voice module of the electronic equipment, the server module carries out intention recognition on the voice information, target interaction matched with the intention of the voice information is determined from executable interaction content of the page, target interaction is executed on the page by the front end of the page, interaction between the user and the page in a voice mode is realized, and interaction efficiency between the user and the page is improved.

As an example, fig. 7 is a schematic diagram of user interaction with a page according to an embodiment of the present application.

As can be seen from fig. 7, after a user inputs voice to a page of an electronic device and the page acquires voice information input by the user, the user performs intention recognition on the voice information, and determines a target interaction matching with the intention of the voice information from executable interaction content, so that the target interaction can be executed on the page. Therefore, the purpose that the user interacts with the page in a voice mode is achieved.

In order to achieve the above embodiment, the present application provides a page interaction device.

Fig. 8 is a schematic structural diagram of a page interaction device according to a sixth embodiment of the present application.

As shown in fig. 8, the page interaction device 600 may include: the first acquisition module 610, the second acquisition module 620, the intent recognition module 630, and the execution module 640.

The first obtaining module 610 is configured to obtain executable interactive content of the page.

A second obtaining module 620, configured to obtain voice information.

The intention recognition module 630 is configured to perform intention recognition on the voice information to determine a target interaction matching the intention of the voice information from the executable interaction content.

An execution module 640 for executing the target interaction on the page.

As one possible scenario, the intent recognition module 630 further includes:

the first determining unit is used for determining executable interactive operation of the target element in the page according to the executable interactive content;

a second determining unit configured to determine a target operation matching the intention from among the interactive operations executable by the target element;

the first generation unit is used for generating an interaction instruction of target interaction according to the target element and the target operation.

As another possible case, the intention recognition module 630 further includes:

and a third determining unit for determining the interaction purposes of the plurality of interaction elements in the page by using the executable interaction content.

And a fourth determining unit configured to determine a target element from the plurality of interactive elements, wherein an interaction purpose of the target element matches the intention.

As another possible scenario, the first acquisition module 610 includes:

and the analysis unit is used for analyzing the page to obtain the operation of each interaction element response.

And a fifth determining unit, configured to use the operation responded by each interactive element as an interactive operation executable by each interactive element.

And the second generation unit is used for generating a list of executable interactive contents according to the executable interactive operation of each interactive element.

As another possible case, the second generating unit is further configured to:

acquiring text description of each interactive element obtained by analyzing a page;

the text description of each interactive element is used as the interactive purpose of each interactive element;

and generating a list of executable interactive contents according to the interactive operation of each interactive element and the interactive purpose of each interactive element.

As another possible case, the intention recognition module 630 may further include:

And the second acquisition unit is used for acquiring the template library.

And the matching unit is used for respectively matching each template in the template library with the voice information.

The intention recognition unit is used for carrying out intention recognition on the voice information according to the target template if the target template matched with the voice information in the sentence pattern exists in the template library; if the target template matched with the voice information in the sentence pattern does not exist in the template library, the semantic model is adopted to carry out intention recognition on the voice information.

As another possible case, the intention recognition unit is further configured to:

according to the groove position set in the target template, extracting text positions corresponding to the groove position in the voice information to obtain groove filling content of the groove position; taking the filling content of the slot as the intention of the voice information;

or, the intention corresponding to the target template is taken as the intention of the voice information.

word segmentation is carried out on the voice information, and the characteristics of each word segmentation are obtained; features include semantic features and part-of-speech features;

and inputting the characteristics of each word into a semantic model to obtain the intention of the voice information.

It should be noted that, the explanation of the embodiments of the page interaction method in the foregoing first to third embodiments is also applicable to the page interaction device of this embodiment, and will not be repeated here.

According to the webpage interaction device, executable interaction content of a webpage is obtained; acquiring voice information; performing intention recognition on the voice information to determine target interaction matched with the intention of the voice information from executable interaction content; the target interaction is performed at the page. Therefore, through executing target interaction matched with the intention of the voice information input by the user on the page, the user can interact with the webpage in a voice mode, and the efficiency of interaction between the user and the page is improved.

In order to implement the above embodiment, the present application proposes another page interaction device.

Fig. 9 is a schematic structural diagram of a page interaction device according to a seventh embodiment of the present application.

As shown in fig. 9, the page interaction device 700 may include: an interaction acquisition module 710, a voice acquisition module 720, a recognition module 730, and a transmission module 740.

The interaction obtaining module 710 is configured to obtain executable interaction content obtained by identifying a page at the front end of the page;

a voice acquisition module 720, configured to acquire voice information;

a recognition module 730, configured to perform intent recognition on the voice information, so as to determine, from the executable interaction content, a target interaction that matches the intent of the voice information;

A sending module 740, configured to send an interaction instruction of the target interaction to the front end of the page; and the interaction instruction is used for executing the target interaction on the page.

As a possible scenario, the identification module 730 may also be used to:

determining executable interactive operation of a target element in a page according to executable interactive content;

determining a target operation matched with the intention from the executable interactive operations of the target element;

and generating an interaction instruction of target interaction according to the target element and the target operation.

As another possible scenario, the identification module 730 may also be used to:

determining interaction purposes of a plurality of interaction elements in the page according to executable interaction content;

from a plurality of interactive elements, a target element is determined, wherein the interactive purpose of the target element matches the intent.

As another possible scenario, the interaction acquisition module 710 may also be used to:

analyzing the page to obtain the operation of each interactive element response;

the operation responded by each interactive element is used as the executable interactive operation of each interactive element;

and generating a list of executable interactive contents according to the executable interactive operation of each interactive element.

As a possible scenario, the identification module 730 may also be used to:

obtaining a template library;

matching each template in the template library with voice information respectively;

if a target template matching the sentence pattern with the voice information exists in the template library, carrying out intention recognition on the voice information according to the target template;

if the target template matched with the voice information in the sentence pattern does not exist in the template library, the semantic model is adopted to carry out intention recognition on the voice information.

According to the page interaction device, the executable interaction content obtained by identifying the page at the front end of the page is obtained, the voice information is obtained, the intention of the voice information is identified, so that target interaction matched with the intention of the voice information is determined from the executable interaction content, and an interaction instruction of the target interaction is sent to the front end of the page; and the interaction instruction is used for executing target interaction on the page. Therefore, the interaction instruction for interacting with the page is determined through the server side module and is sent to the front end of the page, so that target interaction is performed on the page, interaction between the user and the page in a voice interaction mode is realized, and the interaction efficiency of the user and the page is improved.

According to an embodiment of the present application, the present application also provides an electronic device and a readable storage medium.

As shown in fig. 10, a block diagram of an electronic device of a method of page interaction according to an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the applications described and/or claimed herein.

As shown in fig. 10, the electronic device includes: one or more processors 801, memory 802, and interfaces for connecting the components, including high-speed interfaces and low-speed interfaces. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions executing within the electronic device, including instructions stored in or on memory to display graphical information of the GUI on an external input/output device, such as a display device coupled to the interface. In other embodiments, multiple processors and/or multiple buses may be used, if desired, along with multiple memories and multiple memories. Also, multiple electronic devices may be connected, each providing a portion of the necessary operations (e.g., as a server array, a set of blade servers, or a multiprocessor system). One processor 801 is illustrated in fig. 10.

Memory 802 is a non-transitory computer readable storage medium provided by the present application. Wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the method of page interaction provided by the present application. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to perform the method of page interaction provided by the present application.

The memory 802 is used as a non-transitory computer readable storage medium, and may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the method of page interaction in the embodiment of the present application (e.g., the first acquisition module 610, the second acquisition module 620, the intent recognition module 630, and the execution module 640 shown in fig. 8, and the interaction acquisition module 710, the voice acquisition module 720, the recognition module 730, and the transmission module 740 shown in fig. 9). The processor 801 executes various functional applications of the server and data processing, that is, a method of realizing page interaction in the above-described method embodiment by executing non-transitory software programs, instructions, and modules stored in the memory Y02.

Memory 802 may include a storage program area that may store an operating system, at least one application program required for functionality, and a storage data area; the storage data area may store data created from the use of the electronic device for page interactions, and the like. In addition, memory 802 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some embodiments, memory 802 may optionally include memory located remotely from processor 801, which may be connected to the page-interacting electronic device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device of the method of page interaction may further include: an input device 803 and an output device 804. The processor 801, memory 802, input device 803, and output device 804 may be connected by a bus or other means, for example in fig. 10.

The input device 803 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic device interacting with the page, such as a touch screen, keypad, mouse, trackpad, touchpad, pointer stick, one or more mouse buttons, trackball, joystick, and the like. The output device 804 may include a display apparatus, auxiliary lighting devices (e.g., LEDs), and haptic feedback devices (e.g., vibration motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device may be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASIC (application specific integrated circuit), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

These computing programs (also referred to as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented in a high-level procedural and/or object-oriented programming language, and/or in assembly/machine language. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

According to the technical scheme of the embodiment of the application, executable interactive content of the page is obtained; acquiring voice information; performing intention recognition on the voice information to determine target interaction matched with the intention of the voice information from executable interaction content; the target interaction is performed at the page. Therefore, through executing target interaction matched with the intention of the voice information input by the user on the page, interaction between the user and the webpage in a voice mode is realized, and the interaction efficiency of the user and the page is improved.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present application may be performed in parallel, sequentially, or in a different order, provided that the desired results of the disclosed embodiments are achieved, and are not limited herein.

The above embodiments do not limit the scope of the present application. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present application should be included in the scope of the present application.

Claims

1. A method of page interaction, the method comprising:

acquiring executable interactive contents of a page, wherein the executable interactive contents of the page comprise page operation type contents, page clicking operation type contents and slot filling operation type contents;

acquiring voice information;

performing intention recognition on the voice information to determine target interaction matched with the intention of the voice information from the executable interaction content, wherein the target interaction refers to the executable interaction content of a page matched with the intention of the voice information input by a user;

executing the target interaction on the page;

the determining, from the executable interaction content, a target interaction that matches the intent of the voice information, comprising:

determining executable interactive operation of the target element in the page according to the executable interactive content;

Determining a target operation matched with the intention from the interactive operations executable by the target element;

generating an interaction instruction of the target interaction according to the target element and the target operation;

before determining the interactive operation executable by the target element in the page, the method further comprises:

determining interaction purposes of a plurality of interaction elements in the page according to the executable interaction content;

determining the target element from a plurality of interaction elements, wherein the interaction purpose of the target element is matched with the intention;

the acquiring executable interactive content of the page comprises the following steps:

analyzing the page to obtain the operation of each interaction element response;

2. The page interaction method according to claim 1, wherein the generating the list of executable interaction contents according to the executable interaction operation of each of the interaction elements comprises:

acquiring text descriptions of the interactive elements obtained by analyzing the page;

The text description of each interaction element is used as the interaction purpose of each interaction element;

and generating a list of the executable interactive contents according to the interactive operation of each interactive element and the interactive purpose of each interactive element.

3. The page interaction method of any of claims 1-2, wherein said intent recognition of said voice information comprises:

obtaining a template library;

matching each template in the template library with the voice information respectively;

if a target template with sentence pattern matched with the voice information exists in the template library, carrying out intention recognition on the voice information according to the target template;

if no target template matched with the voice information exists in the template library, the semantic model is adopted to carry out intention recognition on the voice information.

4. The page interaction method as recited in claim 3, wherein said performing intent recognition on said voice information according to said target template comprises:

according to the groove position set in the target template, extracting a text position corresponding to the groove position in the voice information to obtain groove filling content of the groove position; taking the groove filling content of the groove as the intention of the voice information;

Or, taking the intention corresponding to the target template as the intention of the voice information.

5. A method of page interaction as claimed in claim 3, wherein said employing a semantic model for intent recognition of said speech information comprises:

word segmentation is carried out on the voice information, and the characteristics of each word segmentation are obtained; the features include semantic features and part-of-speech features;

and inputting the characteristics of each word into the semantic model to obtain the intention of the voice information.

6. A method of page interaction, the method comprising:

acquiring executable interactive content obtained by identifying a page at the front end of the page, wherein the executable interactive content of the page comprises page operation type content, page clicking operation type content and slot filling operation type content;

acquiring voice information;

sending an interaction instruction of the target interaction to the front end of the page; the interaction instruction is used for executing the target interaction on the page;

the executable interactive content obtained by the page front end identification page is obtained, and comprises the following steps:

7. The page interaction method according to claim 6, wherein the generating the list of executable interaction contents according to the executable interaction operation of each of the interaction elements comprises:

8. The page interaction method of any of claims 6-7, wherein said intent recognition of said voice information comprises:

obtaining a template library;

9. A page interaction device, comprising:

The first acquisition module is used for acquiring executable interactive contents of the page, wherein the executable interactive contents of the page comprise page operation type contents, page clicking operation type contents and slot filling operation type contents;

the second acquisition module is used for acquiring voice information;

the intention recognition module is used for carrying out intention recognition on the voice information so as to determine target interaction matched with the intention of the voice information from the executable interaction content, wherein the target interaction refers to the executable interaction content of a page matched with the intention of the voice information input by a user;

the execution module is used for executing the target interaction on the page;

the intention recognition module further includes:

the first generation unit is used for generating an interaction instruction of the target interaction according to the target element and the target operation;

the intention recognition module further includes:

A third determining unit, configured to determine, according to the executable interaction content, interaction purposes of a plurality of interaction elements in the page;

a fourth determining unit configured to determine the target element from a plurality of the interactive elements, wherein an interaction purpose of the target element matches the intention;

the first acquisition module includes:

the analysis unit is used for analyzing the page to obtain the operation of each interaction element response;

a fifth determining unit configured to use an operation responded by each of the interactive elements as an interactive operation executable by each of the interactive elements;

10. The page interaction device of claim 9, wherein the second generating unit is further configured to:

11. The page interaction device of any of claims 9-10, wherein the intent recognition module further comprises:

the first acquisition unit is used for acquiring a template library;

the matching unit is used for matching each template in the template library with the voice information respectively;

the intention recognition unit is used for carrying out intention recognition on the voice information according to the target template if the target template matched with the voice information in the sentence pattern exists in the template library; if no target template matched with the voice information exists in the template library, the semantic model is adopted to carry out intention recognition on the voice information.

12. The page interaction device of claim 11, wherein the intent recognition unit is further to:

13. The page interaction device of claim 11, wherein the intent recognition unit is further to:

14. A page interaction device, comprising:

the interactive acquisition module is used for acquiring executable interactive contents obtained by identifying a page at the front end of the page, wherein the executable interactive contents of the page comprise page operation type contents, page clicking operation type contents and slot filling operation type contents;

the voice acquisition module is used for acquiring voice information;

the recognition module is used for carrying out intention recognition on the voice information so as to determine target interaction matched with the intention of the voice information from the executable interaction content, wherein the target interaction refers to the executable interaction content of a page matched with the intention of the voice information input by a user;

the sending module is used for sending the interaction instruction of the target interaction to the front end of the page; the interaction instruction is used for executing the target interaction on the page;

the identification module is further specifically configured to: determining executable interactive operation of the target element in the page according to the executable interactive content;

the identification module is further configured to:

the interaction acquisition module is further used for:

15. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the page interaction method of any one of claims 1-5 or to implement the page interaction method of any one of claims 6-8.

16. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the page interaction method of any one of claims 1-5 or to implement the page interaction method of any one of claims 6-8.