CN112102820B - Interaction method, interaction device, electronic equipment and medium - Google Patents
Interaction method, interaction device, electronic equipment and medium Download PDFInfo
- Publication number
- CN112102820B CN112102820B CN201910527533.8A CN201910527533A CN112102820B CN 112102820 B CN112102820 B CN 112102820B CN 201910527533 A CN201910527533 A CN 201910527533A CN 112102820 B CN112102820 B CN 112102820B
- Authority
- CN
- China
- Prior art keywords
- information
- system event
- voice
- event
- acoustic feature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000003993 interaction Effects 0.000 title claims abstract description 58
- 238000000034 method Methods 0.000 title claims abstract description 54
- 238000013507 mapping Methods 0.000 claims description 19
- 238000004590 computer program Methods 0.000 claims description 13
- 238000010276 construction Methods 0.000 claims description 6
- 230000002452 interceptive effect Effects 0.000 claims 1
- 230000006870 function Effects 0.000 description 18
- 238000010586 diagram Methods 0.000 description 12
- 238000005516 engineering process Methods 0.000 description 11
- 230000015654 memory Effects 0.000 description 9
- 230000008569 process Effects 0.000 description 8
- 238000004891 communication Methods 0.000 description 7
- 238000012545 processing Methods 0.000 description 5
- 238000000605 extraction Methods 0.000 description 3
- 238000012549 training Methods 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 2
- 238000009472 formulation Methods 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 239000000758 substrate Substances 0.000 description 2
- 108010001267 Protein Subunits Proteins 0.000 description 1
- 238000010420 art technique Methods 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000010485 coping Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000004806 packaging method and process Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
The disclosure provides an interaction method, an interaction device, an electronic device and a medium, wherein the interaction method comprises the following steps: receiving voice information; determining whether the voice information comprises information related to a system event or not, wherein the system event is an event corresponding to the inputtable information of at least one input device supported by the electronic equipment; upon determining that the information associated with the system event is included in the voice information, the electronic device distributes the system event to an object associated with the system event.
Description
Technical Field
The present disclosure relates to the field of computer technology, and more particularly, to an interaction method, an interaction device, an electronic device, and a medium suitable for an electronic device.
Background
With the advent of the artificial intelligence (ARTIFICIAL INTELLIGENCE, AI) era, voice interaction has become an increasingly important user portal, and users can complete equipment operation and control, task execution and other tasks which originally need input equipment such as a mouse, a keyboard, a touch pad and the like only through a voice interaction mode. The voice interaction also solves the problem that the existing part of people cannot use computers and intelligent terminals, and is considered as one of the new major changes after Application (APP).
In the existing voice interaction technology, natural language for functions of a specific application is generally used for interaction to solve the interaction problem, such as "putting a song to me", "opening a WeChat" to instruct to operate the specific application to play a song or open an application. The technology completely abandons the habit of graphical operation interfaces on the mobile terminal APP and the Personal Computer (PC).
In implementing the concepts of the present disclosure, the inventors found that at least the following problems exist in the prior art: the existing voice interaction technology is tightly coupled with the application itself, and is not easy to multiplex, and in addition, when the existing application requiring the operation of input devices such as a keyboard, a mouse and the like changes the operation mode into voice operation, the use habit of a user needs to be re-cultivated.
Disclosure of Invention
In view of this, the present disclosure provides an interaction method, an interaction apparatus, an electronic device, and a medium that decouple voice interaction technology from an application and do not require training of new usage habits.
One aspect of the present disclosure provides an interaction method performed by an electronic device, including the operations of: firstly, receiving voice information, then determining whether information related to a system event is included in the voice information, wherein the system event is an event corresponding to inputtable information of at least one input device supported by the electronic equipment, and then when the information related to the system event is determined to be included in the voice information, distributing the system event to an object related to the system event by the electronic equipment.
The interaction method provided by the disclosure determines whether the voice information includes information of a system event corresponding to the inputtable information of the input device, such as clicking a mouse, inputting letter A, clicking a touch screen, and the like, and if so, distributes the system event. The input voice information can be directly converted into instructions which are input by operating input devices such as a keyboard and a mouse, the voice information is further converted into system events, the system events are distributed to applications such as application through a system, man-machine interaction is realized, the applicable applications are very wide, various applications such as the application installed on the electronic equipment with the input devices are avoided, and related voice interaction technologies which usually need to be focused by application developers are omitted.
According to an embodiment of the disclosure, determining whether the voice information includes information associated with a system event includes: and matching the voice information in an acoustic feature library, and determining whether the voice information comprises acoustic features corresponding to system events, wherein the acoustic feature library is stored in the electronic equipment, and the acoustic feature library comprises the corresponding relation between the acoustic features and the system events. Because the acoustic feature library can be trained offline in advance, whether the system event association information is included in the voice information can be quickly and accurately determined based on the acoustic feature library.
According to an embodiment of the disclosure, the method may further include constructing the acoustic feature library, wherein the constructing the acoustic feature library may include: first, the system events and corresponding acoustic features are obtained, and then a mapping model between the system events and the corresponding acoustic features is generated.
According to an embodiment of the present disclosure, the constructing the acoustic feature library may include the operations of: firstly, obtaining text information of a system event corresponding to the inputtable information of the at least one input device, then generating acoustic features of the text information based on acoustic features of a speech unit, and then storing the system event, the acoustic features and the mapping model. The speech unit may be various basic units constituting speech, such as phonemes, words, etc., so that acoustic features of system events may be synthesized based on the speech unit.
According to an embodiment of the present disclosure, the obtaining the system event and the corresponding acoustic feature may include the operations of: firstly, system events corresponding to the inputtable information of the at least one input device are obtained, then, voice information of the system events corresponding to the inputtable information of the at least one input device is obtained, and then, acoustic feature extraction is carried out on the voice information of the system events. In this way, acoustic features of system events can be acquired using the speech model tool.
According to an embodiment of the present disclosure, the method may further include an operation of updating the acoustic feature library independently of an application installed in the electronic device. This eliminates the need to update the corresponding application at the same time when the acoustic feature library needs to be updated.
According to an embodiment of the present disclosure, the method may further include the operations of: when it is determined that the voice information does not include information associated with a system event, performing voice recognition on the voice information so as to obtain text information, then determining semantic information of the text information, then determining whether the semantic information includes semantic information associated with the system event, and when the semantic information includes semantic information associated with the system event, distributing the system event to an object associated with the system event by the electronic device. When the voice information does not comprise the corresponding voice information of the system event, voice recognition and semantic analysis are carried out to obtain semantic information, so that the system event can be obtained according to the semantic information, multiple voice expression modes aiming at the same semantic are effectively covered, and the user experience is improved.
According to an embodiment of the disclosure, the semantic information of the system event includes standard semantic information and extended semantic information, and the extended semantic information semantic is obtained by expanding the semantic of the standard semantic information.
Another aspect of the present disclosure provides an interaction apparatus performed by an electronic device, which may include: the electronic device comprises an information receiving module, a first event determining module and a first event distributing module, wherein the information receiving module is used for receiving voice information, the first event determining module is used for determining whether the voice information comprises information related to a system event, the system event is an event corresponding to inputtable information of at least one input device supported by the electronic device, and the first event distributing module is used for distributing the system event to an object related to the system event when the voice information comprises the information related to the system event.
According to an embodiment of the disclosure, the first event determining module is specifically configured to match the voice information in an acoustic feature library, determine whether the voice information includes an acoustic feature corresponding to a system event, where the acoustic feature library is stored in the electronic device, and the acoustic feature library includes a correspondence between the acoustic feature and the system event.
According to an embodiment of the present disclosure, the apparatus may further include: a model library construction module, the model library construction module comprising: the system comprises an acoustic feature acquisition unit and a mapping model generation unit, wherein the acoustic feature acquisition unit is used for acquiring the system event and the corresponding acoustic feature, and the mapping model generation unit is used for generating a mapping model between the system event and the corresponding acoustic feature.
According to an embodiment of the present disclosure, the acoustic feature acquisition unit includes: the system comprises an event obtaining subunit, a voice information obtaining subunit and an acoustic feature obtaining subunit, wherein the event obtaining subunit is used for obtaining a system event corresponding to the inputtable information of the at least one input device, the voice information obtaining subunit is used for obtaining the voice information of the system event corresponding to the inputtable information of the at least one input device, and the acoustic feature obtaining subunit is used for extracting acoustic features of the voice information of the system event.
According to an embodiment of the disclosure, the apparatus may further comprise an updating module for updating the acoustic feature library independently of an application installed in the electronic device.
According to an embodiment of the present disclosure, the apparatus further comprises: the system comprises a voice recognition module, a semantic information acquisition module, a second event determination module and a second event distribution module. The electronic device comprises a voice recognition module, a semantic information acquisition module, a second event determination module and a second event distribution module, wherein the voice recognition module is used for performing voice recognition on the voice information so as to obtain text information when determining that the voice information does not comprise information related to a system event, the semantic information acquisition module is used for determining whether the semantic information comprises semantic information related to the system event, and the second event distribution module is used for distributing the system event to an object related to the system event when the semantic information comprises the semantic information related to the system event.
According to an embodiment of the disclosure, the semantic information of the system event includes standard semantic information and extended semantic information, and the extended semantic information semantic is obtained by expanding the semantic of the standard semantic information.
Another aspect of the present disclosure provides an electronic device comprising one or more processors and a storage device for storing executable instructions that, when executed by the processors, implement the method as described above.
Another aspect of the present disclosure provides a computer-readable storage medium storing computer-executable instructions that, when executed, are configured to implement a method as described above.
Another aspect of the present disclosure provides a computer program comprising computer executable instructions which when executed are for implementing a method as described above.
Drawings
The above and other objects, features and advantages of the present disclosure will become more apparent from the following description of embodiments thereof with reference to the accompanying drawings in which:
FIG. 1A schematically illustrates an application scenario of an interaction method, an interaction device, an electronic apparatus, and a medium according to an embodiment of the disclosure;
FIG. 1B schematically illustrates an exemplary system architecture to which the interaction method according to embodiments of the present disclosure is applicable;
FIG. 2 schematically illustrates a flow chart of an interaction method performed by an electronic device according to an embodiment of the disclosure;
FIG. 3A schematically illustrates a flow chart of an interaction method performed by an electronic device according to another embodiment of the disclosure;
FIG. 3B schematically illustrates a logic diagram of natural speech manipulation according to an embodiment of the present disclosure;
FIG. 4A schematically illustrates a block diagram of an interaction apparatus performed by an electronic device, according to an embodiment of the disclosure;
FIG. 4B schematically illustrates a block diagram of a speech manipulation module according to an embodiment of the present disclosure; and
Fig. 5 schematically illustrates a block diagram of an electronic device according to an embodiment of the disclosure.
Detailed Description
Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is only exemplary and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the present disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. In addition, in the following description, descriptions of well-known structures and techniques are omitted so as not to unnecessarily obscure the concepts of the present disclosure.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and/or the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.
All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It should be noted that the terms used herein should be construed to have meanings consistent with the context of the present specification and should not be construed in an idealized or overly formal manner.
Where a convention analogous to "at least one of A, B and C, etc." is used, in general such a convention should be interpreted in accordance with the meaning of one of skill in the art having generally understood the convention (e.g., "a system having at least one of A, B and C" would include, but not be limited to, systems having a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.). Where a formulation similar to at least one of "A, B or C, etc." is used, in general such a formulation should be interpreted in accordance with the ordinary understanding of one skilled in the art (e.g. "a system with at least one of A, B or C" would include but not be limited to systems with a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.). The terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more of the described features.
The voice interaction is usually realized by an application layer to realize voice recognition and analysis to obtain a text or a control instruction, and the text or the control instruction is converted into application logic so as to achieve the purpose of control or interaction. For example, the user sends the voice information of 'Beijing dong, I want to buy TV' to the current application, and the current application determines the command of buying through algorithm processing and semantic analysis after receiving the voice. The voice interaction mode and the application are tightly coupled and are not easy to multiplex, a large amount of acoustic characteristic information such as a voice model can be embedded in an application layer, a large amount of semantic models can be embedded in the application layer, a large amount of storage space (also called application package is heavy) is occupied, the application needs to be reinstalled or upgraded along with application function change, and meanwhile the voice model and the semantic model need to be reinstalled or upgraded, so that inconvenience is brought to popularization and use of the application.
The key problem of the prior art is that the new voice interaction mode is to directly complete tasks, conversations and other purposes in a mode of analyzing complete instructions (intentions) aiming at specific functions of the application. When the mode of inputting instructions by operating input devices such as a keyboard and a mouse is changed into the mode of inputting instructions by voice operation, new use habits of users need to be re-cultivated, so that the period is long, the user experience is hindered, and the connection between the user experience and the existing system or application (such as Excel programs supported by a desktop) is also in great gap, so that the popularization of the voice interaction mode is not facilitated. In addition, the training cost of the voice model and the semantic model is high, and the voice model and the semantic model need to be retrained by different newly developed functions, so that the popularization of the voice interaction mode is further influenced.
The embodiment of the disclosure provides an interaction method, an interaction device, electronic equipment and a medium. The method includes a system event identification process and a system event distribution process. In a system event recognition process, determining whether the received voice information comprises information related to a system event, wherein the system event is an event corresponding to inputtable information of at least one input device supported by the electronic equipment. After the system event recognition process is completed, a system event distribution process is entered, and when it is determined that the information associated with the system event is included in the voice information, the electronic device distributes the system event to an object related to the system event.
Fig. 1A schematically illustrates an application scenario of an interaction method, an interaction device, an electronic apparatus, and a medium according to an embodiment of the present disclosure.
As shown in fig. 1A, the user 10 uses the notebook computer 20 for a long time to office, and can proficiently operate office software such as Excel software (referred to as an application in the Android operating system and the IOS operating system) and Word software supported by the Windows operating system, and recently, it is desired to office in a voice interaction manner, for example, word software is currently opened, and the user 10 desires to input a number 528 in a line where a cursor is located and input a bed in the next line. Since the keyboard can input various types of characters which can be input by the keyboard and a combination mode which is difficult to count in Word software, such as input a, input b, input ab, input abc, input aa, input aca, input nice, input normal, input scheduling (work scheduling) and the like, wherein each input is a complete voice message, the prior art needs to train a voice model for each complete voice message respectively, and the combination is inexhaustible, in the prior art, because the voice models are set for a certain application respectively, and because the number of functions of the application is limited, a corresponding voice model can be trained for each function respectively, such as an air conditioner temperature setting (a specific function of the application) is 28 ℃, and each voice model of playing songs of children and the like is a voice model which is trained for the specific function of the application alone, but the voice model library is set in each application, so that the volume of a software package is increased. In order to solve this problem, the prior art also has a mode of performing voice recognition on the received voice information to obtain text information, and determining a corresponding instruction according to the text information. However, because the voice model library involved in voice recognition has extremely large capacity and high requirement on computing capability, the voice model library cannot be basically implemented on a personal terminal, and voice recognition is usually performed on voice information by a server, and then text information obtained by recognition is sent to the personal terminal, which cannot be applied in many scenarios, such as when the personal terminal cannot be networked or is inconvenient to be networked. In addition, because a plurality of words with similar pronunciation exist in the voice, such as bad and bed, when the whole recognition is carried out, the error is easy to recognize.
In order to solve the above-mentioned problems, embodiments of the present disclosure determine whether information associated with a system event is included in received voice information, wherein the system event is an event corresponding to inputtable information of at least one input device supported by the electronic apparatus, and the number of system events corresponding to inputtable information of at least one input device is limited. For example, the keyboard can input information only comprises the input information set on the keyboard, such as input a, input b, input carriage return (line feed), input F4, input A, input locking capitalization and the like. The input information is small in quantity and large in pronunciation difference, and the system event information corresponding to the input information can be accurately identified by using a small-size voice model package, so that the system event information can be popularized to software of various operating systems. As shown in fig. 1A, the user 10 only needs to open Word software and then speaks "inputs 5, 2, 8" (where there may be a pause between 5, 2, 8, of course "input 5, input 2, input 8"), "input line feed", "inputs b, e, d".
Fig. 1B schematically illustrates an exemplary system architecture to which the interaction method according to an embodiment of the present disclosure is applicable. It should be noted that fig. 1B illustrates only an example of a system architecture 100 in which embodiments of the present disclosure may be applied to assist those skilled in the art in understanding the technical content of the present disclosure, but does not mean that embodiments of the present disclosure may not be used in other devices, systems, environments, or scenarios.
As shown in fig. 1B, the system architecture 100 according to this embodiment may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
The user may interact with the server 105 via the network 104 using the terminal devices 101, 102, 103 to receive or transmit information or the like. Various communication client applications, such as shopping class applications, web browser applications, office class applications, search class applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only) may be installed on the terminal devices 101, 102, 103.
The terminal devices 101, 102, 103 may be various electronic devices supporting at least one input means including, but not limited to, smartphones, tablet computers, laptop and desktop computers, smart televisions, and the like.
The server 105 may be a server providing various services, such as a background management server (by way of example only) providing support for application updates, downloads and speech recognition requested by the user with the terminal devices 101, 102, 103. The background management server may analyze and process the received data such as the user request, and feed back the processing result (for example, information or data obtained or generated according to the user request) to the terminal device.
It should be noted that, the interaction method provided by the embodiments of the present disclosure may be generally performed by the terminal devices 101, 102, 103.
It should be understood that the number of terminal devices, networks and servers is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Fig. 2 schematically illustrates a flow chart of an interaction method performed by an electronic device according to an embodiment of the disclosure.
As shown in fig. 2, the method includes operations S201 to S205.
In operation S201, voice information is received.
In this embodiment, the voice information may be collected by a sound sensor of the electronic device, such as by a microphone, or may be received voice information, such as audio information, sent from another electronic device.
In operation S203, it is determined whether information associated with a system event, which is an event corresponding to inputtable information of at least one input device supported by the electronic device, is included in the voice information.
Specifically, the determining whether the voice information includes information associated with a system event may specifically include the following operations.
And matching the voice information in an acoustic feature library, and determining whether the voice information comprises acoustic features corresponding to the system events. Wherein the acoustic feature library is stored in the electronic device and comprises correspondence between acoustic features and system events.
For example, the keyboard of the electronic device has pgUP keys and pgDn keys, and the voice information corresponding to the pgUP key input information may be "last page", "previous page", etc., and the voice information corresponding to the pgDn key input information may be "page turn", "lower page", etc., respectively. If a pronunciation of "last page" is detected to be included in the voice information, the corresponding system event may be determined to include one of: wm_keydown, wm_ CHAR (KeyPageDown), wm_keyup, etc. Wherein the instructions of the same system event may be different in different operating systems, and thus the instructions of a specific system event need to be determined according to the system used.
When it is determined that the information associated with the system event is included in the voice information, the electronic device distributes the system event to an object related to the system event in operation S205.
For example, when using Word application, the user sends out voice information: "turn down", "input a", etc., the electronic device may send the instruction of the system event to Word software by the system, so as to implement voice interaction between the user and the electronic device.
In another embodiment, the acoustic feature library may be constructed in the following manner.
For example, constructing the acoustic feature library may include the following operations.
First, the system events and corresponding acoustic features are obtained. For example, it is possible to obtain, by statistics, which of the inputtable information of the various keyboards corresponds to the name of the system event in the different operating systems and the instruction thereof. In addition, acoustic features corresponding to the system events can be obtained through voice synthesis.
After the inputtable information of the input device is determined, the inputtable information or the common voice expression mode of the inputtable information and the operation is collected, and then the characteristic extraction is carried out on the common voice expression mode, so that the acoustic characteristic corresponding to the system event is obtained.
A mapping model between the system event and the corresponding acoustic feature is then generated.
Such as a mapping model of the acoustic features of the "last page" with wm_keydown, wm_char, wm_keyup, etc. The system events (e.g., instructions), corresponding acoustic features, and mapping models may then be stored in an acoustic feature library.
The acoustic feature library can be constructed offline, and is suitable for systems with the same system events, and can be optimized and updated in the use process.
In a specific embodiment, the obtaining the system event and corresponding acoustic features may include the following operations.
First, a system event corresponding to the inputtable information of the at least one input device is obtained.
Then, voice information of a system event corresponding to the inputtable information of the at least one input device is obtained.
Then, acoustic feature extraction is performed on the voice information of the system event, for example, acoustic features of the voice information corresponding to the system event are obtained by using a voice model tool. The method for extracting the features is not limited.
For example, input data is trained using speech model tools and algorithms, such as random field (CRF), deep learning, etc., as shown in table 1, to represent windows system speech tables to system event mapping tables.
TABLE 1 windows System Speech Table reach System event map representation
Wherein different operating systems may differ in name and definition for the same system event (e.g., right click of a mouse). Thus, the speech expression to system event mapping for different operating systems may be different.
Next, a mapping model of speech to system events is generated. The model is independent of running software, can be updated independently, and generates a voice control software package, namely, a voice decoding library is acted on the generated model to develop an extension function, so that a voice control software module is generated.
In another particular embodiment, the obtaining the system event and corresponding acoustic signature may include the following operations.
Firstly, text information of a voice expression mode of a system event corresponding to the inputtable information of the at least one input device is obtained.
Then, acoustic features of the text information are generated based on acoustic features of the speech units. Wherein the speech units include, but are not limited to: consonants, vowels, initials, finals, characters, words, phrases, and the like.
Next, the system event, the acoustic features, and the mapping model are stored. Wherein the mapping model is a model between the system event and the corresponding acoustic feature.
It should be noted that, since the acoustic feature library is decoupled from the application installed in the electronic device, for example, not encapsulated in the application, but encapsulated in the system, and the acoustic feature library is called by the system, updating the acoustic feature library independently of the application installed in the electronic device can be achieved.
The interaction method provided by the disclosure can directly convert the input voice information into instructions input by operating input devices such as a keyboard, a mouse and the like, and further convert the voice information into system events. Human-computer interaction is realized by distributing system events to applications, such as applications, applicable applications are very wide, such as various applications installed on electronic equipment with input devices, and related voice interaction technologies which usually need attention of application developers are omitted.
Fig. 3A schematically illustrates a flow chart of an interaction method performed by an electronic device according to another embodiment of the disclosure.
Because the speech expression mode of the same semantic is various, such as different expression habits in different places, especially for dialects and the like, it is difficult to include acoustic features of various dialects for one system event in the acoustic feature library, semantic information of speech information can be obtained through speech recognition, and then the system event corresponding to the speech information is determined based on the semantic information.
As shown in fig. 3A, the method may further include the following operations.
When it is determined that the information associated with the system event is not included in the voice information, voice recognition is performed on the voice information so as to obtain text information in operation S301.
The speech recognition technology may be, for example, an automatic speech recognition (Automatic Speech Recognition, abbreviated as ASR) technology, and text information corresponding to the speech information may be obtained through the ASR technology. This operation may be implemented locally or by a server.
In operation S303, semantic information of the text information is determined.
In operation S305, it is determined whether the semantic information includes semantic information associated with a system event.
In operation S307, when the semantic information includes semantic information associated with a system event, the electronic device distributes the system event to an object related to the system event.
It should be noted that if the semantic information does not include semantic information associated with a system event, the system layer already determines that no associated information of the system event exists in the voice information, but if a voice interaction function is set in the application, voice interaction can be performed on the voice interaction function set in the application, for example, whether the voice information includes a voice instruction that can be recognized by the application is detected by the application layer.
In another embodiment, the semantic information of the system event includes standard semantic information and extended semantic information, and the extended semantic information semantic is obtained by expanding the semantic of the standard semantic information. If the different age groups have larger expression difference on the same thing, the infant can easily pronounce "X" as "cha", and the keyboard can only input "X", then the true semantics of the infant should be to input "X". Thus, the method is convenient for coping with different situations of voice information with the same semantic meaning in different scenes, and improves the user experience.
The following description is of one specific embodiment.
Fig. 3B schematically illustrates a logic diagram of natural speech manipulation according to an embodiment of the present disclosure.
As shown in fig. 3B, after the user turns on the voice control function (by means of a switch, wake-up word, etc.), the voice information is matched in the acoustic feature library.
If the matching is successful, the system event corresponding to the matched acoustic feature is directly encoded to obtain the system event.
If the matching is successful, the voice information is processed through ASR technology to obtain text information of the voice information. Then, the text information is subjected to semantic analysis, and the text information can be matched with the instruction corresponding to the system event after the semantic analysis (for example, the text information can be matched with the system event of PAGEDOWN after page turning).
After the system event is obtained, the operating system distributes the system event. The distribution of system events may be performed using prior art techniques and is not limited herein.
In another embodiment, the attribute information of the user may be determined based on the acoustic features of the user, for example, the age group of the user, and then the accuracy of the obtained semantic information is improved based on the attribute information of the user, for example, after receiving the voice information, the voice information sent by the child is determined, for example, the child says to the computer: "input circle", if the computer has only a mouse and a keyboard as input devices, that is, there is no "circle" in the inputtable information of the input means, the semantic high probability of the voice information can be determined as input "OO" based on the expanded semantic information, and since the pronunciation habit of the child is double-sounding, the true intention of the child can be determined as input "O". Therefore, the real intention of the user can be determined based on the attribute information of the user, the inputtable information of the input device and the expansion semantic information, and the interaction experience is effectively improved.
Fig. 4A schematically illustrates a block diagram of an interaction apparatus performed by an electronic device according to an embodiment of the disclosure.
As shown in fig. 4A, the interaction device 400 may include an information receiving module 410, a first event determining module 430, and a first event distributing module 450.
Wherein, the information receiving module 410 is configured to receive voice information.
The first event determining module 430 is configured to determine whether the voice information includes information associated with a system event, where the system event is an event corresponding to inputtable information of at least one input device supported by the electronic device.
The first event distribution module 450 is configured to, when it is determined that the information associated with the system event is included in the voice information, distribute the system event to an object related to the system event by the electronic device.
Specifically, the first event determining module 430 may be configured to match the voice information in an acoustic feature library, determine whether the voice information includes an acoustic feature corresponding to a system event, where the acoustic feature library is stored in the electronic device, and the acoustic feature library includes a correspondence between the acoustic feature and the system event.
In addition, the apparatus 400 may further include: and a model library construction module.
The model library construction module comprises: an acoustic feature acquisition unit and a mapping model generation unit.
The acoustic feature acquisition unit is used for acquiring the system event and the corresponding acoustic feature.
The mapping model generation unit is configured to generate a mapping model between the system event and the corresponding acoustic feature.
Wherein the acoustic feature acquisition unit may include: an event acquisition subunit, a speech information acquisition subunit, and an acoustic feature acquisition subunit.
The event obtaining subunit is configured to obtain a system event corresponding to the inputtable information of the at least one input device.
The voice information obtaining subunit is configured to obtain voice information of a system event corresponding to the inputtable information of the at least one input device.
The acoustic feature acquisition subunit is used for extracting acoustic features of the voice information of the system event.
Fig. 4B schematically illustrates a block diagram of a speech manipulation module according to an embodiment of the disclosure.
As shown in fig. 4B, the voice manipulation module 4000 is a real-time operation system of voice interaction, including: the system comprises a voice receiving sub-module, a voice model management sub-module, a matching sub-module and an ASR sub-module.
The voice receiving submodule is used for receiving and encoding voice information after the operating system enters a voice control mode.
The voice model management submodule is used for storing and managing acoustic characteristics related to voice control, such as a voice model. The acoustic features may be generated by training and may be updated separately.
The matching submodule is used for matching the input voice information with the acoustic features, and if the matching is successful, the system event is obtained through conversion. If the matching fails, the text information obtained by the voice recognition can be subjected to semantic matching, and if the semantic matching is successful, the text information can be converted into a system event.
The ASR submodule is used for converting voice information into text information.
In another embodiment, the apparatus 400 may further comprise an update module for updating the acoustic feature library independently of an application installed in the electronic device.
To increase the success rate of the voice interaction, the apparatus 400 further includes: the system comprises a voice recognition module, a semantic information acquisition module, a second event determination module and a second event distribution module.
The voice recognition module is used for carrying out voice recognition on the voice information when the voice information is determined not to include the information associated with the system event, so as to obtain text information.
The semantic information acquisition module is used for determining semantic information of the text information.
The second event determination module is configured to determine whether the semantic information includes semantic information associated with a system event.
The second event distribution module is used for distributing the system event to an object related to the system event when the semantic information comprises semantic information related to the system event.
In order to increase applicable crowd, the semantic information of the system event can comprise standard semantic information and extended semantic information, wherein the semantic of the extended semantic information is obtained by expanding the semantic of the standard semantic information.
Any number of modules, sub-modules, units, sub-units, or at least some of the functionality of any number of the sub-units according to embodiments of the present disclosure may be implemented in one module. Any one or more of the modules, sub-modules, units, sub-units according to embodiments of the present disclosure may be implemented as split into multiple modules. Any one or more of the modules, sub-modules, units, sub-units according to embodiments of the present disclosure may be implemented at least in part as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system-on-chip, a system-on-substrate, a system-on-package, an Application Specific Integrated Circuit (ASIC), or in any other reasonable manner of hardware or firmware that integrates or encapsulates the circuit, or in any one of or a suitable combination of three of software, hardware, and firmware. Or one or more of the modules, sub-modules, units, sub-units according to embodiments of the present disclosure may be at least partially implemented as computer program modules, which, when executed, may perform the corresponding functions.
For example, any of the information receiving module 410, the first event determining module 430, and the first event distributing module 450 may be combined in one module to be implemented, or any of the modules may be split into a plurality of modules. Or at least some of the functionality of one or more of the modules may be combined with, and implemented in, at least some of the functionality of other modules. According to embodiments of the present disclosure, at least one of the information receiving module 410, the first event determining module 430, and the first event distributing module 450 may be implemented at least in part as hardware circuitry, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented in hardware or firmware in any other reasonable way of integrating or packaging circuitry, or in any one of or a suitable combination of three of software, hardware, and firmware. Or at least one of the information receiving module 410, the first event determining module 430 and the first event distribution module 450 may be at least partially implemented as a computer program module which, when executed, may perform the corresponding functions.
Fig. 5 schematically illustrates a block diagram of an electronic device according to an embodiment of the disclosure. The electronic device shown in fig. 5 is merely an example and should not be construed to limit the functionality and scope of use of the disclosed embodiments.
As shown in fig. 5, an electronic device 500 according to an embodiment of the present disclosure includes a processor 501 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 502 or a program loaded from a storage section 508 into a Random Access Memory (RAM) 503. The processor 501 may include, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or an associated chipset and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), or the like. The processor 501 may also include on-board memory for caching purposes. The processor 501 may comprise a single processing unit or a plurality of processing units for performing different actions of the method flows according to embodiments of the disclosure.
In the RAM 503, various programs and data required for the operation of the system 500 are stored, for example, an acoustic feature library is stored. The processor 501, ROM 502, and RAM 503 are connected to each other by a bus 504. The processor 501 performs various operations of the method flow according to the embodiments of the present disclosure by executing programs in the ROM 502 and/or the RAM 503. Note that the program may be stored in one or more memories other than the ROM 502 and the RAM 503. The processor 501 may also perform various operations of the method flow according to embodiments of the present disclosure by executing programs stored in the one or more memories.
According to an embodiment of the present disclosure, the system 500 may further include an input/output (I/O) interface 505, the input/output (I/O) interface 505 also being connected to the bus 504. The system 500 may also include one or more of the following components connected to the I/O interface 505: an input section 506 including a keyboard, a mouse, and the like; an output portion 507 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker, and the like; a storage portion 508 including a hard disk and the like; and a communication section 509 including a network interface card such as a LAN card, a modem, or the like. The communication section 509 performs communication processing via a network such as the internet. The drive 510 is also connected to the I/O interface 505 as needed. A removable medium 511 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 510 as needed so that a computer program read therefrom is mounted into the storage section 508 as needed.
According to embodiments of the present disclosure, the method flow according to embodiments of the present disclosure may be implemented as a computer software program. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable storage medium, the computer program comprising program code for performing the method shown in the flowcharts. In such an embodiment, the computer program may be downloaded and installed from a network via the communication portion 509, and/or installed from the removable media 511. The above-described functions defined in the system of the embodiments of the present disclosure are performed when the computer program is executed by the processor 501. The systems, devices, apparatus, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the disclosure.
The present disclosure also provides a computer-readable storage medium that may be embodied in the apparatus/device/system described in the above embodiments; or may exist alone without being assembled into the apparatus/device/system. The computer-readable storage medium carries one or more programs which, when executed, implement methods in accordance with embodiments of the present disclosure.
According to embodiments of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example, but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. For example, according to embodiments of the present disclosure, the computer-readable storage medium may include ROM 502 and/or RAM 503 and/or one or more memories other than ROM 502 and RAM 503 described above.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Those skilled in the art will appreciate that the features recited in the various embodiments of the disclosure and/or in the claims may be combined in various combinations and/or combinations, even if such combinations or combinations are not explicitly recited in the disclosure. In particular, the features recited in the various embodiments of the present disclosure and/or the claims may be variously combined and/or combined without departing from the spirit and teachings of the present disclosure. All such combinations and/or combinations fall within the scope of the present disclosure.
The embodiments of the present disclosure are described above. These examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described above separately, this does not mean that the measures in the embodiments cannot be used advantageously in combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be made by those skilled in the art without departing from the scope of the disclosure, and such alternatives and modifications are intended to fall within the scope of the disclosure.
Claims (13)
1. An interaction method performed by an electronic device, comprising:
Receiving voice information sent by a user;
determining whether the voice information comprises information related to a system event or not, wherein the system event is an event corresponding to inputtable information of at least one input device supported by the electronic equipment, and the inputtable information comprises inputtable information set on the input device; and
When the voice information comprises the information related to the system event, the electronic equipment distributes the system event to an object related to the system event, wherein the instruction of the system event is determined according to an operating system installed on the electronic equipment;
when the voice information is determined to not include the information related to the system event, performing voice recognition on the voice information so as to obtain text information;
determining semantic information of the text information based on attribute information of a user and expansion semantic information, wherein the attribute information of the user is determined based on acoustic features of the voice information, and the expansion semantic information is obtained by expanding semantics of standard semantic information;
determining whether the semantic information comprises semantic information associated with a system event, wherein the semantic information of the system event comprises the standard semantic information and the extended semantic information; and
When the semantic information includes semantic information associated with a system event, the electronic device distributes the system event to an object related to the system event.
2. The method according to claim 1, wherein:
The determining whether the voice information includes information associated with a system event includes: matching the voice information in an acoustic feature library, and determining whether the voice information comprises acoustic features corresponding to system events or not;
wherein the acoustic feature library is stored in the electronic device and comprises correspondence between acoustic features and system events.
3. The method of claim 2, further comprising: constructing the acoustic feature library; said constructing said acoustic feature library comprises:
obtaining the system event and corresponding acoustic features; and
A mapping model between the system event and the corresponding acoustic feature is generated.
4. A method according to claim 3, wherein the obtaining the system event and corresponding acoustic features comprises:
acquiring a system event corresponding to the inputtable information of the at least one input device;
Acquiring voice information of a system event corresponding to the inputtable information of the at least one input device; and
And extracting acoustic characteristics of the voice information of the system event.
5. The method of claim 2, further comprising updating the acoustic feature library independently of an application installed in the electronic device.
6. An interactive apparatus for execution by an electronic device, comprising:
the information receiving module is used for receiving voice information;
A first event determining module, configured to determine whether the voice information includes information associated with a system event, where the system event is an event corresponding to inputtable information of at least one input device supported by the electronic device, where the inputtable information includes inputtable information set on the input device; and
A first event distribution module, configured to, when it is determined that the voice information includes the information associated with a system event, distribute the system event to an object related to the system event, where the system event is determined according to a type of an operating system installed by the electronic device;
Further comprises:
the voice recognition module is used for carrying out voice recognition on the voice information when the voice information is determined to not comprise information related to a system event so as to obtain text information;
The semantic information acquisition module is used for determining semantic information of the text information based on attribute information of a user and extended semantic information, wherein the attribute information of the user is determined based on acoustic features of the voice information, and the extended semantic information is obtained by expanding the semantics of standard semantic information;
A second event determining module, configured to determine whether the semantic information includes semantic information associated with a system event, where the semantic information of the system event includes the standard semantic information and the extended semantic information; and
And the second event distribution module is used for distributing the system event to the object related to the system event by the electronic equipment when the semantic information comprises semantic information related to the system event.
7. The apparatus of claim 6, wherein:
The first event determining module is specifically configured to match the voice information in an acoustic feature library, determine whether the voice information includes an acoustic feature corresponding to a system event, where the acoustic feature library is stored in the electronic device, and the acoustic feature library includes a correspondence between the acoustic feature and the system event.
8. The apparatus of claim 6, further comprising: a model library construction module, the model library construction module comprising:
An acoustic feature acquisition unit, configured to acquire the system event and a corresponding acoustic feature; and
And the mapping model generating unit is used for generating a mapping model between the system event and the corresponding acoustic feature.
9. The apparatus of claim 8, wherein the acoustic feature acquisition unit comprises:
an event obtaining subunit, configured to obtain a system event corresponding to the inputtable information of the at least one input device;
a voice information obtaining subunit, configured to obtain voice information of a system event corresponding to the inputtable information of the at least one input device; and
And the acoustic feature acquisition subunit is used for extracting acoustic features of the voice information of the system event.
10. An electronic device, comprising:
One or more processors;
Storage means for storing executable instructions which when executed by the processor implement the method according to any one of claims 1 to 5.
11. The electronic device of claim 10, wherein:
The storage device is also used for storing an acoustic feature library.
12. A computer readable storage medium having stored thereon executable instructions which when executed by a processor implement the method according to any of claims 1 to 5.
13. A computer program product comprising a computer program which, when executed by a processor, implements the steps of the method according to any one of claims 1 to 5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910527533.8A CN112102820B (en) | 2019-06-18 | 2019-06-18 | Interaction method, interaction device, electronic equipment and medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910527533.8A CN112102820B (en) | 2019-06-18 | 2019-06-18 | Interaction method, interaction device, electronic equipment and medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112102820A CN112102820A (en) | 2020-12-18 |
CN112102820B true CN112102820B (en) | 2024-10-18 |
Family
ID=73749362
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910527533.8A Active CN112102820B (en) | 2019-06-18 | 2019-06-18 | Interaction method, interaction device, electronic equipment and medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112102820B (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103186232A (en) * | 2011-12-30 | 2013-07-03 | 上海博泰悦臻电子设备制造有限公司 | Voice keyboard device |
CN103778915A (en) * | 2012-10-17 | 2014-05-07 | 三星电子(中国)研发中心 | Speech recognition method and mobile terminal |
CN109656512A (en) * | 2018-12-20 | 2019-04-19 | Oppo广东移动通信有限公司 | Exchange method, device, storage medium and terminal based on voice assistant |
Family Cites Families (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2402507A (en) * | 2003-06-03 | 2004-12-08 | Canon Kk | A user input interpreter and a method of interpreting user input |
JP2007148118A (en) * | 2005-11-29 | 2007-06-14 | Infocom Corp | Voice interactive system |
US8326637B2 (en) * | 2009-02-20 | 2012-12-04 | Voicebox Technologies, Inc. | System and method for processing multi-modal device interactions in a natural language voice services environment |
CN102968409B (en) * | 2012-11-23 | 2015-09-09 | 海信集团有限公司 | Intelligent human-machine interaction semantic analysis and interactive system |
KR20140089863A (en) * | 2013-01-07 | 2014-07-16 | 삼성전자주식회사 | Display apparatus, Method for controlling display apparatus and Method for controlling display apparatus in Voice recognition system thereof |
KR101474856B1 (en) * | 2013-09-24 | 2014-12-30 | 주식회사 디오텍 | Apparatus and method for generateg an event by voice recognition |
CN104090652B (en) * | 2014-06-13 | 2017-07-21 | 北京搜狗科技发展有限公司 | A kind of pronunciation inputting method and device |
CN104715752B (en) * | 2015-04-09 | 2019-01-08 | 刘文军 | Audio recognition method, apparatus and system |
CN107301862A (en) * | 2016-04-01 | 2017-10-27 | 北京搜狗科技发展有限公司 | A kind of audio recognition method, identification model method for building up, device and electronic equipment |
CN105912629B (en) * | 2016-04-07 | 2019-08-13 | 上海智臻智能网络科技股份有限公司 | A kind of intelligent answer method and device |
CN106027485A (en) * | 2016-04-28 | 2016-10-12 | 乐视控股(北京)有限公司 | Rich media display method and system based on voice interaction |
US10043516B2 (en) * | 2016-09-23 | 2018-08-07 | Apple Inc. | Intelligent automated assistant |
CN108231063A (en) * | 2016-12-13 | 2018-06-29 | 中国移动通信有限公司研究院 | A kind of recognition methods of phonetic control command and device |
CN106898349A (en) * | 2017-01-11 | 2017-06-27 | 梅其珍 | A kind of Voice command computer method and intelligent sound assistant system |
US11004444B2 (en) * | 2017-09-08 | 2021-05-11 | Amazon Technologies, Inc. | Systems and methods for enhancing user experience by communicating transient errors |
CN108320742B (en) * | 2018-01-31 | 2021-09-14 | 广东美的制冷设备有限公司 | Voice interaction method, intelligent device and storage medium |
CN109256116A (en) * | 2018-09-27 | 2019-01-22 | 深圳市语芯维电子有限公司 | Pass through the method for speech recognition keypad function, system, equipment and storage medium |
-
2019
- 2019-06-18 CN CN201910527533.8A patent/CN112102820B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103186232A (en) * | 2011-12-30 | 2013-07-03 | 上海博泰悦臻电子设备制造有限公司 | Voice keyboard device |
CN103778915A (en) * | 2012-10-17 | 2014-05-07 | 三星电子(中国)研发中心 | Speech recognition method and mobile terminal |
CN109656512A (en) * | 2018-12-20 | 2019-04-19 | Oppo广东移动通信有限公司 | Exchange method, device, storage medium and terminal based on voice assistant |
Also Published As
Publication number | Publication date |
---|---|
CN112102820A (en) | 2020-12-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11682380B2 (en) | Systems and methods for crowdsourced actions and commands | |
US20230100423A1 (en) | Crowdsourced on-boarding of digital assistant operations | |
CN108877791B (en) | Voice interaction method, device, server, terminal and medium based on view | |
US10614803B2 (en) | Wake-on-voice method, terminal and storage medium | |
CN110069608B (en) | Voice interaction method, device, equipment and computer storage medium | |
CN110223695B (en) | Task creation method and mobile terminal | |
CN107924483B (en) | Generation and application of generic hypothesis ranking model | |
CN1790326B (en) | System for synchronizing natural language input element and graphical user interface | |
KR102170088B1 (en) | Method and system for auto response based on artificial intelligence | |
KR102146524B1 (en) | Method, system and computer program for generating speech recognition learning data | |
JP2015522892A (en) | Multimedia information retrieval method and electronic device | |
KR20200080400A (en) | Method for providing sententce based on persona and electronic device for supporting the same | |
US11403462B2 (en) | Streamlining dialog processing using integrated shared resources | |
JP7063937B2 (en) | Methods, devices, electronic devices, computer-readable storage media, and computer programs for voice interaction. | |
CN114375449A (en) | Techniques for dialog processing using contextual data | |
US11810553B2 (en) | Using backpropagation to train a dialog system | |
CN104850575B (en) | Method and system for integrating speech into a system | |
CN112487790A (en) | Improved semantic parser including coarse semantic parser and fine semantic parser | |
US11163377B2 (en) | Remote generation of executable code for a client application based on natural language commands captured at a client device | |
CN106202087A (en) | A kind of information recommendation method and device | |
WO2020052060A1 (en) | Method and apparatus for generating correction statement | |
JP2022121386A (en) | Speaker dialization correction method and system utilizing text-based speaker change detection | |
JP7225380B2 (en) | Audio packet recording function guide method, apparatus, device, program and computer storage medium | |
CN112102820B (en) | Interaction method, interaction device, electronic equipment and medium | |
CN112489632A (en) | Implementing correction models to reduce propagation of automatic speech recognition errors |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20210525 Address after: 100176 room 1004, 10th floor, building 1, 18 Kechuang 11th Street, Beijing Economic and Technological Development Zone, Daxing District, Beijing Applicant after: Beijing Huijun Technology Co.,Ltd. Address before: 100086 8th Floor, 76 Zhichun Road, Haidian District, Beijing Applicant before: BEIJING JINGDONG SHANGKE INFORMATION TECHNOLOGY Co.,Ltd. Applicant before: BEIJING JINGDONG CENTURY TRADING Co.,Ltd. |
|
GR01 | Patent grant | ||
GR01 | Patent grant |