WO2019047878A1 - 语音操控终端的方法、终端、服务器和存储介质 - Google Patents
语音操控终端的方法、终端、服务器和存储介质 Download PDFInfo
- Publication number
- WO2019047878A1 WO2019047878A1 PCT/CN2018/104264 CN2018104264W WO2019047878A1 WO 2019047878 A1 WO2019047878 A1 WO 2019047878A1 CN 2018104264 W CN2018104264 W CN 2018104264W WO 2019047878 A1 WO2019047878 A1 WO 2019047878A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- client
- server
- audio data
- terminal
- information
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 74
- 230000000875 corresponding effect Effects 0.000 claims abstract description 45
- 230000004044 response Effects 0.000 claims abstract description 11
- 230000011218 segmentation Effects 0.000 claims description 68
- 230000015654 memory Effects 0.000 claims description 28
- 239000012634 fragment Substances 0.000 claims description 14
- 238000004590 computer program Methods 0.000 claims description 12
- 230000009471 action Effects 0.000 claims description 11
- 230000015572 biosynthetic process Effects 0.000 claims description 8
- 238000003786 synthesis reaction Methods 0.000 claims description 8
- 230000005540 biological transmission Effects 0.000 claims description 3
- 230000001276 controlling effect Effects 0.000 abstract description 3
- 238000010586 diagram Methods 0.000 description 12
- 230000008569 process Effects 0.000 description 12
- 239000003795 chemical substances by application Substances 0.000 description 7
- 238000004891 communication Methods 0.000 description 6
- 230000006870 function Effects 0.000 description 5
- 230000003993 interaction Effects 0.000 description 4
- 230000009286 beneficial effect Effects 0.000 description 3
- 238000006243 chemical reaction Methods 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 230000001960 triggered effect Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/41—Structure of client; Structure of client peripherals
- H04N21/422—Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
- H04N21/42204—User interfaces specially adapted for controlling a client device through a remote control device; Remote control devices therefor
- H04N21/42206—User interfaces specially adapted for controlling a client device through a remote control device; Remote control devices therefor characterized by hardware details
- H04N21/42208—Display device provided on the remote control
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/16—Sound input; Sound output
- G06F3/167—Audio in a user interface, e.g. using voice commands for navigating, audio feedback
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/41—Structure of client; Structure of client peripherals
- H04N21/422—Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
- H04N21/42204—User interfaces specially adapted for controlling a client device through a remote control device; Remote control devices therefor
- H04N21/42206—User interfaces specially adapted for controlling a client device through a remote control device; Remote control devices therefor characterized by hardware details
- H04N21/42208—Display device provided on the remote control
- H04N21/42209—Display device provided on the remote control for displaying non-command information, e.g. electronic program guide [EPG], e-mail, messages or a second television channel
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/41—Structure of client; Structure of client peripherals
- H04N21/422—Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
- H04N21/42204—User interfaces specially adapted for controlling a client device through a remote control device; Remote control devices therefor
- H04N21/42206—User interfaces specially adapted for controlling a client device through a remote control device; Remote control devices therefor characterized by hardware details
- H04N21/4222—Remote control device emulator integrated into a non-television apparatus, e.g. a PDA, media center or smart toy
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/41—Structure of client; Structure of client peripherals
- H04N21/422—Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
- H04N21/42204—User interfaces specially adapted for controlling a client device through a remote control device; Remote control devices therefor
- H04N21/42206—User interfaces specially adapted for controlling a client device through a remote control device; Remote control devices therefor characterized by hardware details
- H04N21/42222—Additional components integrated in the remote control device, e.g. timer, speaker, sensors for detecting position, direction or movement of the remote control, microphone or battery charging device
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/81—Monomedia components thereof
- H04N21/8166—Monomedia components thereof involving executable data, e.g. software
- H04N21/8173—End-user applications, e.g. Web browser, game
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G10L15/30—Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
Definitions
- the present application relates to the field of Internet technologies, and in particular, to a method, a terminal, a server, and a storage medium for a voice control terminal.
- TV equipment such as smart TV, TV box
- Television equipment plays an increasingly important role in people's daily leisure and entertainment life.
- TV equipment refers to an open platform, equipped with an operation control system and a new TV product capable of installing applications. Therefore, users can install and uninstall various application software to expand the functions while enjoying ordinary TV content. And upgrades.
- the application examples provide a method, a terminal, a server, and a storage medium for a voice-operated terminal.
- the present application provides a method for voice-operating a terminal, the method being performed by a first terminal, where the method includes:
- the scene information includes At least one operable object information in at least one presentation interface of the second client;
- control instruction Upon receiving the control instruction returned by the server, the control instruction is sent to the second client to perform a corresponding action; wherein the control instruction carries the operable object information to be executed.
- the present application provides a method for voice-operating a terminal, the method being performed by a server, the method comprising:
- the application example provides a first terminal, where the first terminal includes:
- At least one memory At least one memory
- At least one processor At least one processor
- the at least one memory stores at least one instruction module configured to be executed by the at least one processor; wherein the at least one instruction module comprises:
- the response module is configured to: in response to the operation of the voice recording control of the first client on the first terminal, record the voice to obtain the first audio data, and send the scene information query instruction to the second client on the second terminal;
- the scenario information includes at least one operable object information in at least one presentation interface of the second client;
- a first sending module configured to send the scenario information and the first audio data to the first server when receiving the scenario information returned by the second client; wherein the first server is the second The background server of the client;
- a second sending module configured to send the control instruction to the second client to perform a corresponding action when receiving the control instruction returned by the first server; wherein the control instruction carries a to-be-executed Operable object information.
- the application example provides a server, where the server includes:
- At least one memory At least one memory
- At least one processor At least one processor
- the at least one memory stores at least one instruction module configured to be executed by the at least one processor; wherein the at least one instruction module comprises:
- a text determining module configured to determine text converted by the first audio data when receiving the scene information and the first audio data sent by the first client on the first terminal; wherein the scenario information includes At least one operable object information in at least one presentation interface of the second client on the second terminal;
- a text segmentation module for segmenting the text to obtain a word segmentation result
- An instruction forming module configured to form, according to the word segmentation result and the scene information, a control instruction that carries the operable object information to be executed;
- a third sending module configured to send the control instruction to the second client by using the first client, so that the second client performs an action corresponding to the control instruction.
- the present application provides a method for voice-operating a terminal, the method comprising:
- the first terminal responds to the operation of the voice recording control of the first client on the first terminal, records the voice to obtain the first audio data, and sends a scene information query instruction to the second client on the second terminal; wherein the scenario The information includes at least one operable object information of the at least one presentation interface of the second client; when the scenario information returned by the second client is received, the scenario information and the first audio data are sent to a server; Wherein the server is a background server of the second client;
- the server determining, by the server, the text converted by the first audio data when receiving the scenario information and the first audio data sent by the first client on the first terminal; where the scenario information includes the second terminal At least one operable object information in at least one display interface of the two clients; segmenting the text to obtain a word segmentation result; forming a control instruction carrying the operable object information to be executed according to the word segmentation result and the scene information Sending the control command to the first client on the first terminal;
- the first terminal when receiving the control instruction returned by the server, sends the control instruction to the second client to perform a corresponding action; wherein the control instruction carries the operable object information to be executed .
- An example of the present application provides a first terminal, where the first terminal includes:
- At least one memory At least one memory
- At least one processor At least one processor
- the at least one memory stores a computer program
- the computer program is implemented by the at least one processor to implement the above method.
- the application example provides a first server, where the first server includes:
- At least one memory At least one memory
- At least one processor At least one processor
- the at least one memory stores a computer program
- the computer program is implemented by the at least one processor to implement the above method.
- the present application example provides a non-transitory computer readable storage medium having stored thereon a computer program that, when executed by a processor, implements the steps of the above method.
- the first client after the user triggers the voice recording control of the first client, the first client records the voice and sends a scene information query instruction to the second client of the second terminal, when the first client receives After the scene information returned by the second client, the scene information and the audio data input by the user are sent to the background server of the second client, and the server determines the text of the first audio data, and further divides the word, and performs the word segmentation result and the scene information. Matching, and generating a control command according to the matching result, and sending the control command to the first client.
- the first client When the first client receives the control command, sending the control command to the second client, so that the second client performs the corresponding action, the whole process
- the user only needs to input voice through the voice recording control on the first client, which provides great convenience for the user to operate the second terminal.
- FIG. 1 is a system architecture diagram related to an example of the present application
- FIG. 2 is a schematic flowchart of a method for voice-operating a terminal in an example of the present application
- FIG. 3a is a schematic diagram of an interface displayed by a first client in an example of the present application.
- FIG. 3b is a schematic diagram of an interface of a smart TV display in an example of the present application.
- FIG. 4 is a system architecture diagram related to an example of the present application.
- 6a is a schematic diagram of an interface of a smart TV display in an example of the present application.
- 6b is a schematic diagram of an interface displayed by a first client in an example of the present application.
- FIG. 7 is a structural block diagram of a first client in an example of the present application.
- FIG. 8 is a structural block diagram of a first server in an example of the present application.
- FIG. 9 is a schematic diagram of overall interaction of a method for voice-operating a terminal in an example of the present application.
- FIG. 10 is a structural block diagram of a computer device in an example of the present application.
- the present application proposes a method for voice-operating a terminal, and the system architecture applied by the method is as shown in FIG. 1 .
- the system architecture includes: a first terminal 101, a second terminal 102, and a first server 103.
- the first terminal 101 and the second terminal 102 can be connected through a local area network 104, and the first terminal 101 and the first server 103 can pass through.
- Internet 105 connection where:
- the first terminal 101 may be a mobile terminal, such as a smart phone, a tablet computer, or the like, on which client software of various application software is installed, and the user can log in and use the client of the application software through the first terminal, for example, , voice assistant and other clients.
- a mobile terminal such as a smart phone, a tablet computer, or the like
- client software of various application software is installed, and the user can log in and use the client of the application software through the first terminal, for example, , voice assistant and other clients.
- the second terminal 102 may be a television device, such as a smart TV equipped with an android or other system, and a TV box connected to a conventional television.
- the television box is equipped with an android or other system
- the conventional television is equivalent to a display of a television box.
- a variety of applications can be installed on the second terminal, and the user can control the applications through the remote controller.
- the user can open the Tencent video client installed on the smart TV through the remote controller, find the video that he wants to watch, and then proceed. Play, fast forward or rewind.
- the user can open a client of a certain music software (for example, qq music) installed on the TV box through the remote controller, and then play and collect a certain local or online music in the interface of the traditional television display. Wait for the operation.
- a certain music software for example, qq music
- the first server 103 refers to a background server of a client installed on the second terminal, for example, a background server corresponding to a video client on the second terminal, and a background server of a music software, if the second terminal operates The video client, the corresponding first server is the background server corresponding to the video client, and if the second terminal operates a client of the music software, the corresponding first server is the background server of the music software. That is, the first server is a background server corresponding to the client operating on the second terminal.
- the first server may be a server or a server cluster formed by multiple servers.
- an example of the present application provides a method for a voice control terminal, which may be performed by the first terminal 101, and may be performed by a first client on the first terminal 101, as shown in FIG. 2 .
- the method includes:
- the scene information include at least one operable object information in at least one presentation interface of the second client;
- the first client on the first terminal 101 needs to be connected to the second terminal 102, specifically LAN connection.
- the client to which the mobile phone interface shown in FIG. 3A belongs is connected to the smart TV shown in FIG. 3b, wherein the client shown in FIG. 3a is a client of a voice assistant, and the smart TV shown in FIG. 3b is a living room of the user.
- Huawei TV at this time, the display interface of the variety client interface of the video client is displayed on the smart TV. In this interface, you can see the names, posters, and update dates of the variety shows such as "Tomorrow's Son" and "Running".
- the voice recording control 301 of the first client sends a scene information query instruction to the Huawei TV in the living room after being triggered.
- the voice recording control 301 can take various forms, for example, a virtual button set on an interface.
- the first client detects the surrounding sound and records the detected sound.
- the recording ends, and the recorded sound forms the first audio data.
- the function of the voice recording control in the example of the present application is not only the function of voice recording, but also sends a scene information query instruction to the second terminal 102 connected to the first client.
- the first client sends a scene information query command to the Huawei TV in the living room.
- the first audio data recorded by the user through the first client is a voice instruction for operating the second terminal 102.
- the prompt information for voice recording can also be displayed.
- the user is prompted: You can ask me this way: play Sansei III, open the viewing history, search Yang Mi’s TV series, The user can issue a voice operation instruction with reference to the prompt information.
- the second client is a client installed on the second terminal 102. Since a plurality of clients may be installed on the second terminal 102, one or more clients need to be selected as the second client, one of which The method is as follows: the client corresponding to the interface currently displayed on the television (the traditional TV or smart TV connected to the TV box) is used as the second client.
- the Huawei TV in the user's living room displays an interface of the Tencent video client, and the scene information query command sent by the first client is sent to the Tencent video client in the Huawei TV.
- the Tencent video client receives the interface, After querying the command, the scene information of the Tencent video client is returned to the first client.
- the so-called scene information includes at least one operable object information in at least one interface of the second client.
- the Tencent video client there is a variety show display interface, a TV show display interface, a movie display interface, a documentary display interface, etc., and there are several recent hot-selling variety in the display interface of the variety show interface.
- Programs, these variety shows are presented in the form of posters, program names, etc.
- viewing records, favorites, search, feedback, settings, etc. viewing records, favorites, search, feedback, setting these can be a global option for Tencent video clients.
- the user clicks on the poster of a variety interface in the variety show interface, the user enters the play interface of the variety interface.
- the user clicks on the favorite the user enters the favorite interface, which displays multiple videos of the user's collection.
- Related Information can be used as the actionable object of the video client, that is, If the Tencent video client receives the scene information query instruction, the scene information returned by the Tencent video client may include the names of the plurality of variety shows, the names of the plurality of TV series, the names of the plurality of movies, and the names of the plurality of documentaries, and may also Includes watch history, favorites, search, feedback, settings, and more.
- the above scenario information is described by taking a video client as an example.
- the second client is not limited to the video client, but may be other clients, such as a music client, a news client, and the like.
- For the music client there are a display interface of the ranking, a display interface of the song list, a display interface of the MV, and a local display interface.
- the ranked display interface there are many songs ranked according to the heat, and the display interface of the song list.
- the online display interface, the display interface of the song list, and the display interface of the MV all display online songs
- the local display interface displays local songs.
- it also includes global options such as settings, questions and suggestions, check updates, whether it is online songs, local songs, or global options, which can be used as an actionable object for the music client, that is, if the music client receives
- the returned scene information may include multiple online or local song names, and may also include settings, questions and suggestions, check updates, and the like.
- the first server 103 is a background server of the second client.
- the first server 103 is the server of the video client; and the second client is a certain news client.
- the first server 103 is the server of the news client.
- the so-called operable object information to be executed may include the name of the operable object.
- the voice input by the user on the first client is “open history”, and the operable object information to be executed includes “history record”.
- the second client receives the control instruction carrying the operable object information to be executed, the history record is opened, that is, an interface for displaying the history, from which the user can see the recently viewed media content.
- the actionable object information to be executed includes “Journey to the West”, and when the second client receives the carried, the to-be-executed
- the control instruction of the object information can be manipulated, the default is to perform a play action on the Journey to the West.
- the actionable object information to be executed may further include an action performed on the operable object. And other information.
- the user inputs a "download Journey theme song" in the first client voice input
- the first server 103 generates the operable object information to be executed in the control instruction based on the audio data and the scene information of the second client, including not only "westward travel”
- the theme song "includes”, and when the second client receives the control command, does not perform the action of playing the theme song of the Journey to the West, but performs the action of downloading the theme song of the Journey to the West.
- the process of forming the control command by the first server 103 according to the scenario information and the first audio data may be implemented in various manners, which is not limited in this application example.
- a certain interface of a video client is displayed on a smart TV connected to the first client of the user first terminal 101, and when the user presses the voice recording control on the first client, "I want to see the west tour.
- the video client displayed by the smart TV will include various TV drama titles, movie titles, variety show names.
- the scene information of some global options is returned to the first client.
- the first client receives the scene information, the scene information and the audio data of "I want to watch the Journey to the West" are sent to the background server of the video client.
- a server 103 after receiving the scene information and the audio data, the first server 103 knows that the user wants to watch the Journey to the West according to the audio data, and then combines the scene information to generate a control instruction, and then returns the control command to the first client, and then A client sends a control command to the second client, so that the second client performs the opening of the media resources of "Journey to the West” Drop action interface.
- the media resources related to "Journey to the West" found by the second client according to the control instruction include the TV series "Journey to the West" and the TV series "New Journey to the West".
- the second client can The relevant information of these media resources is displayed on the smart TV or on the traditional TV connected to the TV box for the user to select, and after the user selects one of the media resources by means of voice or remote control, the playback action is performed.
- the second client performs the corresponding action successfully or not, the user may also be given some corresponding prompts.
- the voice control terminal of the first client is triggered by the user, and the first client records the voice and sends a scene information query instruction to the second client of the second terminal 102.
- the first client After receiving the scenario information returned by the second client, the first client sends the scenario information and the audio data input by the user to the background server of the second client, and the server forms a control instruction according to the received information, and then passes the first A client sends the command to the second client, causing the second client to perform the corresponding action.
- the user only needs to input voice through the first client of the first terminal 101, thereby controlling the second terminal 102 to perform corresponding actions. For example, if the television device is used as the second terminal 102, the stepping by the remote controller is not required. The operation, that is, the cumbersome operation of the remote controller is reduced, thus providing great convenience for the user to operate the second terminal 102.
- a proxy module can be installed on the second terminal 102, for example, a television proxy module is installed on the smart TV or television box.
- the so-called television proxy module is actually an application, and is not a user.
- a visible application that acts as a bridge between the other applications installed on the smart TV or TV box to interact with the outside world, that is, other applications installed on the smart TV or TV box interact with the outside world through the proxy module, thus facilitating the smart TV Or the application in the TV box manages interaction with the outside world.
- an application that can participate in voice control in the second terminal 102 for example, a video client, a music client, a news client, etc.
- the application cannot participate in voice control, that is, the user cannot control the application to perform actions through voice control.
- the proxy module receives the command sent by the outside world, it will broadcast the command to the corresponding application. For example, specify the package name when broadcasting, and only send the command to the foreground application, so that only the foreground application The command can be received, and the so-called foreground application is the client to which the current presentation interface of the second terminal 102 belongs.
- the process of the first client sending the scenario information query instruction to the second client on the second terminal 102 in S201 may be: the first client queries the scenario information Sending to the proxy module in the second terminal 102, so that the proxy module sends the scenario information query instruction to the second client. That is, after the user triggers the voice recording control of the first client, the first client sends a scene information query instruction to the agent module in the second terminal 102, and after receiving the scene information query instruction, the agent module will display the scene information.
- the query command is sent to the second client.
- the process of the first client transmitting the control command to the second client may be: sending the control command to the proxy module in the second terminal 102, so that the proxy module will The control command is sent to the second client. That is to say, after receiving the control command sent by the first server 103, the first client sends a control command to the agent module, and the agent module sends the control command to the second client after receiving the control command. It can be seen that the interaction between the client in the second terminal 102 and the external first terminal 101 passes through the proxy module to implement management of communication between the client and the outside world in the second terminal 102.
- the proxy module in the second terminal 102 can also display some prompt information on the television interface. For example, when the user inputs the voice in the first client, the first client sends a scene information query instruction to the proxy module. When the agent module receives the instruction, it is learned that the user is recording a voice, so the agent module can display the reminder information that the voice is being input in the current interface of the television.
- step S202 there are multiple ways to send the scenario information and the first audio data to the first server 103 in step S202, one of which is: adopting a streaming slice sending mode to the first Audio data is transmitted to the first server 103 piece by piece to improve transmission efficiency.
- each slice is transmitted in 300 ms. That is, the first audio data is divided into a plurality of fragments, and the fragments are sent to the first server 103 piece by piece, and the scene information may be carried in any one of the fragments, for example, the scene information is carried in In the last piece.
- the first server 103 receives the fragment carrying the scene information, the first audio data reception is considered complete.
- the first audio data and the scene information may also be sent to the first server 103 in other manners.
- the first server 103 forms a plurality of control instructions, for example, the first server 103 converts the first audio data. For the text, then the word segmentation is performed, and the word segmentation result is matched with the scene information, and a control instruction is formed according to the matching result.
- the first server 103 can also send the first audio data to other servers having audio recognition capabilities, for example, a background server of the WeChat, a background server of the qq, and other servers having audio recognition capabilities are referred to as the second server 106.
- the second server 106 is added to the system architecture of the application example of the present application.
- the first server 103 and the second server 106 can be separately set with reference to FIG. 4, and the first server 103 and the second server at this time. Data is transmitted between the servers 106 through the network; of course, the first server 103 and the second server 106 can also be integrated to obtain an integrated server, in which case the integrated server has both the functionality of the first server 103 and the functionality of the second server 106.
- the second server 106 receives the first audio data, the first audio data is converted into text, and then the text is returned to the first server 103, so that the first server 103 segments the received text, and then the word segmentation result is The scene information is matched, and a control instruction is formed according to the matching result.
- the first server 103 having no voice processing capability can transmit the first audio data to the second server 106 having voice processing capability, and the second server 106 converts the first audio data into text and then returns to the first Server 103.
- the current display interface of the TV set is an interface of a video client.
- the first client will make the voice and
- the scene information returned by the video client is sent to the first server 103, and the first server 103 does not have the ability to convert the voice input by the user into text, and then sends the voice to the voice and the scene information.
- the background server of WeChat is the second server 106, the background server of WeChat converts the voice into text, and returns the text to the first server 103.
- the first server 103 uses the semantic-based word segmentation method to the text "I want to Look at the Journey to the West to carry out the word segmentation, and get the result of the word segmentation: "I”, “Want to see”, “Journey to the West”, and then match the result of the word segment with the scene information, and find that the relevant video of the Journey to the West in the scene information will form A control instruction carrying the video information related to "Journey to the West".
- the first server 103 can also convert the text into standard voice, which can be referred to as second audio data, and then send the second audio data to the first client.
- the first server 103 may also send a voice synthesis request to the second server 106.
- the second server 106 converts the text into standard voice, ie, second audio data, and then returns the second audio data to the first.
- the server 103 such that the first server 103 can send the second audio data to the first client.
- the second server 106 refers to a server having voice processing capabilities.
- the so-called voice processing capability includes converting audio data into text, and of course, converting text to standard audio data.
- This adoption enables the first server 103 to send a voice synthesis request to the second server 106 having voice processing capability, thereby obtaining the second audio data.
- the requirement for the first server 103 is not high, and the first server 103 is not required to have voice.
- the ability to process, therefore, for a first server 103 having voice processing capability, can convert text to second audio data by itself, and for the first server 103 without voice processing capability, can have voice processing capability
- the second server 106 sends a speech synthesis request, which also enables the second audio data to be obtained.
- the second audio data may be played, and the second audio data may also be sent to the second terminal 102.
- the second audio data is sent to the proxy module in the second terminal 102 to cause the proxy module to play the second audio data.
- the second audio data is text-converted audio data, and the text is converted from the first audio data. In this way, the user can hear the standard voice corresponding to the voice input by himself.
- the example of the present application further provides a method for voice-operating a terminal, which may be performed by the first server 103. As shown in FIG. 5, the method includes:
- S501 Determine, when receiving the scenario information and the first audio data sent by the first client on the first terminal 101, the converted text of the first audio data, where the scenario information includes the second terminal. At least one operable object information of at least one presentation interface of the second client on 102;
- the first server 103 determines the text converted by the first audio data, and the first server 103 may convert the first audio data into text, or the first server 103 may receive the text.
- the first client sends the scene information and the first audio data
- the first audio data is sent to the second server 106, so that the second server 106 converts the first audio data into text
- the text is returned to the first server 103, that is, the first server 103 transmits the first audio data to the second server 106 having voice processing capabilities. Either way, only the first server 103 can obtain the text converted from the first audio data.
- the word segmentation here can be, but is not limited to, a word segmentation method based on semantic analysis.
- the method for operating a voice terminal when the first server 103 receives the scene information and the first audio data sent by the first client, first acquiring the text corresponding to the first audio data, and then segmenting the text, and then Sending a control instruction based on the word segmentation result and the scene information to the first client, so that the first client sends the control command to the second client, so that the second client performs the corresponding action, thereby implementing voice control of the second client.
- the executable object information that is executed is the operable object information that matches the word segmentation result.
- the result of the word segmentation is “I”, “Want to See”, “Journey to the West”, and the scene information includes the video name “Journey to the West”, and the scene information is considered to have operable object information matching the word segmentation result.
- the formed control command carries the name of the video "Journey to the West”.
- the operable object information included in the scenario information may be the operable object information corresponding to the media resource stored in the first server 103, or may be the operable object information of the third-party media resource.
- the so-called media resources can be video, music, news content (including text, pictures, etc.), and other media resources.
- the process of forming the control instruction in step S503 may include:
- the control instruction carrying the operable object information to be executed is formed, and the operable object information to be executed is corresponding to the media resource matching the word segmentation result. Operation object information.
- the above search is to search in the first server 103, if the media resource matching the word segmentation result is stored in the first server 103, for example, if there is no operable object matching the "Journey to the West” in the scene information
- the information is searched in the first server 103. If a video resource matching the "Journey to the West” is searched, a control command is formed, and the control command includes the video name "Journey to the West", when the second client When the control command is received, the play operation is performed on the operable object-video "Journey to the West", that is, the smart TV or the TV connected to the TV box enters the play interface of the Journey to the West.
- the first server 103 may feed back the search result to the second client by using the first client, so that the second client may not search for the media resource that matches the word segmentation result.
- the first server 103 feeds back to the first client a search result that does not search for the Journey to the related video.
- the first client receives the information, it sends it to the second client, as shown in FIG. 6a, and second.
- the client will display the message “The video related to “Journey to the West” is not searched!” on the TV.
- the text may also be sent to the first client, so that the first client can display the text, as shown in FIG. 6b, "What you said is : I want to see Journey to the West.”
- the first client may send the first audio data to the first server 103 in a streaming mode, and if the first server 103 does not have voice processing capabilities, the first audio data may be sent to Specifically, when the second server 106 receives each fragment of the first audio data, the fragment is sent to the second server 106, so that the second server 106 scores the branch.
- the slice is converted into a corresponding text segment, and the text segment is returned to the first server 103; wherein the combination of the text segments corresponding to each tile is the text.
- Such a manner of transmitting the slice to the second server 106 by the streaming mode of the streaming slice to enable the second server 106 to perform text conversion can improve the efficiency of the transmission conversion.
- the first server 103 may obtain standard voice corresponding to the text, and if the first server 103 does not have voice processing capability, may send a voice synthesis request to the second server 106 to cause the second server 106 to Converting the text to second audio data; transmitting the second audio data to the first client upon receiving the second audio data returned by the second server 106.
- the first client may play the second audio data, or send the second audio data to the second terminal 102, so that the second terminal 102 plays the second audio. data.
- the example of the present application further provides a first terminal, where the first terminal includes:
- At least one memory At least one memory
- At least one processor At least one processor
- the at least one memory stores at least one instruction module configured to be executed by the at least one processor.
- the at least one instruction module is an instruction module in the first client, as shown in FIG. 7, the at least one The instruction module includes:
- the response module 701 is configured to: in response to the operation of the voice recording control of the first client on the first terminal, record the voice to obtain the first audio data, and send the scene information query instruction to the second client on the second terminal;
- the scenario information includes at least one operable object information in the at least one presentation interface of the second client;
- the first sending module 702 is configured to: when receiving the scenario information returned by the second client, send the scenario information and the first audio data to the first server, so that the first server is configured according to the first An audio data and the scene information form a control instruction carrying the operable object information to be executed; the first server is a background server of the second client;
- a second sending module 703 configured to send the control command to the second client when receiving the control instruction returned by the first server, so that the second client performs according to the control instruction The corresponding action.
- the first sending module 702 may send the first audio data to the first server one by one by using a sending mode of the streaming slice; the scene information is carried in one of the fragments, for example, The scene information is carried in the last slice of the first audio data.
- the first client can also include:
- a playing module configured to play the second audio data when receiving the second audio data sent by the first server; wherein the second audio data is converted by text, and the text is The first audio data is converted.
- the second sending module 703 is further configured to: when receiving the second audio data sent by the first server, send the second audio data to a proxy module of the second terminal, to And causing the proxy module to play the second audio data; wherein the second audio data is converted from text, and the text is converted from the first audio data.
- the response module 701 can be configured to send the scene information query instruction to a proxy module in the second terminal to cause the proxy module to send the scene information query instruction to the second client end.
- the second sending module 703 can be configured to send the control command to a proxy module in the second terminal to cause the proxy module to send the control command to the second client.
- the first client provided by the example of the present application is a functional architecture module of the method for the voice control terminal, and the related content interpretation, examples, beneficial effects, and the like can refer to the related content of the voice control terminal method. , will not repeat them here.
- the application example further provides a first server, where the server includes:
- At least one memory At least one memory
- At least one processor At least one processor
- the at least one memory stores at least one instruction module configured to be executed by the at least one processor; as shown in FIG. 8, the at least one instruction module comprises:
- a text determining module 801 configured to determine text converted by the first audio data when receiving the scene information and the first audio data sent by the first client on the first terminal; wherein the scene information Include at least one operable object information in at least one presentation interface of the second client on the second terminal;
- a text segmentation module 802 configured to segment the text to obtain a word segmentation result
- the instruction forming module 803 forms, according to the word segmentation result and the scene information, a control instruction that carries the operable object information to be executed;
- the third sending module 804 is configured to send the control instruction to the second client by using the first client, so that the second client performs a corresponding action according to the control instruction.
- the instruction forming module 803 can be configured to match the word segmentation result with the scene information, and if there is operable object information matching the word segmentation result in the scene information, forming an carried-to-be-executed
- the control instruction of the operable object information; the operable object information to be executed is the operable object information that matches the word segmentation result.
- the instruction forming module 803 can be configured to match the word segmentation result with the scene information, and if there is no operable object information matching the word segmentation result in the scene information, according to the The word segmentation result searches for a media resource that matches the segmentation result; if the media resource matching the segmentation result is searched, a control instruction carrying the operable object information to be executed is formed, and the operable object information to be executed is The operable object information corresponding to the media resource matching the word segmentation result.
- the instruction forming module 803 is further configured to: when the media resource that matches the word segmentation result is not searched, feed back the search result to the second client by using the first client, so that The second client displays the search result.
- the text determining module 801 can be configured to: when receiving the scene information and the first audio data sent by the first client, send the first audio data to the second server, so that the second The server converts the first audio data into text and returns the text to the first server.
- the text determining module 801 is specifically configured to: when each fragment of the first audio data is received, send the fragment to the second server, so that the second server The slice is converted into a corresponding text segment, and the text segment is returned to the first server; wherein the combination of the text segments corresponding to each slice is the text.
- the first server 800 can also include:
- a requesting module configured to send a voice synthesis request to the second server, to enable the second server to convert the text into second audio data; and receive the second audio data returned by the second server And transmitting the second audio data to the first client.
- the third sending module 804 is further configured to: send the text to the first client, so that the first client displays the text.
- the first server provided by the example of the present application is a functional architecture module of the method for the voice control terminal, and the related content, the example, the beneficial effects, and the like can refer to the related content of the method for the voice control terminal. I will not repeat them here.
- the application example further provides an overall process of a method for voice-operating the terminal:
- the television proxy module After receiving the scenario information, the television proxy module sends the scenario information to the first client.
- the first client sends the recorded first audio data to the first server one by one by using a streaming mode, and carries the scenario information in the last fragment.
- the first server segments the text composed of each text segment, and then matches the scene information, and forms a control instruction according to the matching result;
- the first server sends a tts request to the second server, that is, a voice synthesis request, and the second server processes the tts request, converts the text into the second audio data, and returns the data to the first server;
- the first server sends the text, the control instruction, and the second audio data to the first client.
- the first client displays the received text in the interface, and plays the second audio data or sends the second audio data to the television proxy module, so that the television proxy module plays the second audio data.
- the first client sends the received control command to the television proxy module, and the television proxy module sends the control command to the second client, and the second client performs the corresponding action, thereby completing the process of manipulating the second client by voice.
- the user only needs to input voice through the voice recording control, and does not need the cumbersome operation like the remote controller, which provides great convenience for the user.
- the actual hardware execution body of the method executed by the first client is the first terminal.
- the present application also provides a non-transitory computer readable storage medium having stored thereon a computer program that, when executed by a processor, implements the steps of any of the above methods.
- FIG. 10 shows a composition structure diagram of the first client or the computer device where the first server is located.
- the computing device includes one or more processors (CPUs) 1002, communication modules 1004, memories 1006, user interfaces 1010, and a communication bus 1008 for interconnecting these components, wherein:
- the processor 1002 can receive and transmit data through the communication module 1004 to effect network communication and/or local communication.
- User interface 1010 includes one or more output devices 1012 that include one or more speakers and/or one or more visual displays.
- User interface 1010 also includes one or more input devices 1014 including, for example, a keyboard, a mouse, a voice command input unit or loudspeaker, a touch screen display, a touch sensitive tablet, a gesture capture camera or other input button or control, and the like.
- the memory 1006 may be a high speed random access memory such as DRAM, SRAM, DDR RAM, or other random access solid state storage device; or a nonvolatile memory such as one or more magnetic disk storage devices, optical disk storage devices, flash memory devices, Or other non-volatile solid-state storage devices.
- a high speed random access memory such as DRAM, SRAM, DDR RAM, or other random access solid state storage device
- nonvolatile memory such as one or more magnetic disk storage devices, optical disk storage devices, flash memory devices, Or other non-volatile solid-state storage devices.
- the memory 1006 stores a set of instructions executable by the processor 1002, including:
- An operating system 1016 including a program for processing various basic system services and for performing hardware related tasks
- the application 1018 includes various applications for the voice-operated terminal, and the application can implement the processing flow in each of the above examples, for example, may include the first client or some or all of the modules in the first server.
- the first client or at least one module of the first server can store the machine executable instructions.
- the processor 1002 can implement the functions of at least one of the above modules by executing machine executable instructions in at least one of the units in the memory 1006.
- the hardware modules in the embodiments may be implemented in a hardware manner or a hardware platform plus software.
- the above software includes machine readable instructions stored in a non-volatile storage medium.
- embodiments can also be embodied as software products.
- the hardware may be implemented by specialized hardware or hardware that executes machine readable instructions.
- the hardware can be a specially designed permanent circuit or logic device (such as a dedicated processor such as an FPGA or ASIC) for performing a particular operation.
- the hardware may also include programmable logic devices or circuits (such as including general purpose processors or other programmable processors) that are temporarily configured by software for performing particular operations.
- each instance of the present application can be implemented by a data processing program executed by a data processing device such as a computer.
- the data processing program constitutes the present application.
- a data processing program usually stored in a storage medium is executed by directly reading a program out of a storage medium or by installing or copying the program to a storage device (such as a hard disk and or a memory) of the data processing device. Therefore, such a storage medium also constitutes the present application, and the present application also provides a non-volatile storage medium in which a data processing program is stored, which can be used to execute any of the above-mentioned method examples of the present application. An example.
- the machine readable instructions corresponding to the modules of FIG. 10 may cause an operating system or the like operating on a computer to perform some or all of the operations described herein.
- the non-transitory computer readable storage medium may be inserted into a memory provided in an expansion board within the computer or written to a memory provided in an expansion unit connected to the computer.
- the CPU or the like installed on the expansion board or the expansion unit can perform part and all of the actual operations according to the instructions.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Human Computer Interaction (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Acoustics & Sound (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Information Transfer Between Computers (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
- Telephonic Communication Services (AREA)
Abstract
本申请提供一种语音操控终端的方法、终端、服务器和存储介质,所述方法包括:响应于对第一终端上第一客户端的语音录制控件的操作,录制语音得到第一音频数据,并向第二终端上的第二客户端发送场景信息查询指令;在接收到所述第二客户端返回的场景信息时,将所述场景信息和第一音频数据发送至服务器;所述服务器为所述第二客户端的后台服务器;在接收到所述服务器返回的控制指令时,将所述控制指令发送至所述第二客户端,以使所述第二客户端根据所述控制指令执行相应的动作。
Description
本申请要求于2017年09月08日提交中国专利局、申请号为201710804781.3、发明名称为“语音操控终端的方法、客户端、服务器”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本申请涉及互联网技术领域,尤其是涉及一种语音操控终端的方法、终端、服务器和存储介质。
背景
随着人们生活水平的提高,电视设备(例如智能电视、电视盒子)在家庭中的普及率日益广泛,电视设备在人们的日常休闲与娱乐生活中起到越来越重要的作用。电视设备是指具有开放式平台,搭载了操作控制系统以及能够安装应用程序的新电视产品,因此用户在欣赏普通电视内容的同时,还可以自行安装和卸载各类应用软件,实现对功能进行扩充和升级。
技术内容
本申请实例提供了语音操控终端的方法、终端、服务器和存储介质。
本申请实例提供了一种语音操控终端的方法,所述方法由第一终端执行,所述方法包括:
响应于对第一终端上第一客户端的语音录制控件的操作,录制语音得到第一音频数据,并向第二终端上的第二客户端发送场景信息查询指令;其中,所述场景信息包括所述第二客户端的至少一个展示界面中的至少一个可操作对象信息;
在接收到所述第二客户端返回的场景信息时,将所述场景信息和第一音频数据发送至服务器;其中,所述服务器为所述第二客户端的后台服务器;
在接收到所述服务器返回的控制指令时,将所述控制指令发送至所述第二客户端以执行相应的动作;其中,所述控制指令携带有待执行的可操作对象信息。
本申请实例提供了一种语音操控终端的方法,所述方法由服务器执 行,所述方法包括:
在接收到第一终端上的第一客户端发送来的场景信息和第一音频数据时,确定所述第一音频数据转换而成的文本;其中,所述场景信息包括第二终端上第二客户端的至少一个展示界面中的至少一个可操作对象信息;
对所述文本进行分词,得到分词结果;
根据所述分词结果和所述场景信息,形成携带有待执行的可操作对象信息的控制指令;
将所述控制指令通过所述第一客户端发送至第二客户端以使所述第二客户端执行所述控制指令相对应的动作。
本申请实例提供了一种第一终端,该第一终端包括:
至少一个存储器;
至少一个处理器;
其中,所述至少一个存储器存储有至少一个指令模块,经配置由所述至少一个处理器执行;其中,所述至少一个指令模块包括:
响应模块,用于响应于对第一终端上第一客户端的语音录制控件的操作,录制语音得到第一音频数据,并向第二终端上的第二客户端发送场景信息查询指令;其中,所述场景信息包括所述第二客户端的至少一个展示界面中的至少一个可操作对象信息;
第一发送模块,用于在接收到所述第二客户端返回的场景信息时,将所述场景信息和第一音频数据发送至第一服务器;其中,所述第一服务器为所述第二客户端的后台服务器;
第二发送模块,用于在接收到所述第一服务器返回的控制指令时,将所述控制指令发送至所述第二客户端以执行相应的动作;其中,所述控制指令携带有待执行的可操作对象信息。
本申请实例提供了一种服务器,所述服务器包括:
至少一个存储器;
至少一个处理器;
其中,所述至少一个存储器存储有至少一个指令模块,经配置由所述至少一个处理器执行;其中,所述至少一个指令模块包括:
文本确定模块,用于在接收到第一终端上的第一客户端发送来的场景信息和第一音频数据时,确定所述第一音频数据转换而成的文本;其中,所述场景信息包括第二终端上第二客户端的至少一个展示界面中的至少一个可操作对象信息;
文本分词模块,用于对所述文本进行分词,得到分词结果;
指令形成模块,用于根据所述分词结果和所述场景信息,形成携带有待执行的可操作对象信息的控制指令;
第三发送模块,用于将所述控制指令通过所述第一客户端发送至第二客户端以使所述第二客户端执行所述控制指令相对应的动作。
本申请提供一种语音操控终端的方法,所述方法包括:
第一终端响应于对第一终端上第一客户端的语音录制控件的操作,录制语音得到第一音频数据,并向第二终端上的第二客户端发送场景信息查询指令;其中,所述场景信息包括所述第二客户端的至少一个展示界面中的至少一个可操作对象信息;在接收到所述第二客户端返回的场景信息时,将所述场景信息和第一音频数据发送至服务器;其中,所述服务器为所述第二客户端的后台服务器;
服务器在接收到第一终端上的第一客户端发送来的场景信息和第一音频数据时,确定所述第一音频数据转换而成的文本;其中,所述场景信息包括第二终端上第二客户端的至少一个展示界面中的至少一个可操作对象信息;对所述文本进行分词,得到分词结果;根据所述分词结果和所述场景信息,形成携带有待执行的可操作对象信息的控制指令;将所述控制指令发送至所述第一终端上的第一客户端;
所述第一终端在接收到所述服务器返回的控制指令时,将所述控制指令发送至所述第二客户端以执行相应的动作;其中,所述控制指令携带有待执行的可操作对象信息。
本申请实例提供一种第一终端,所述第一终端包括:
至少一个存储器;
至少一个处理器;
其中,所述至少一个存储器存储有计算机程序,所述计算机程序被所述至少一个处理器执行时可实现上述方法。
本申请实例提供一种第一服务器,所述第一服务器包括:
至少一个存储器;
至少一个处理器;
其中,所述至少一个存储器存储有计算机程序,所述计算机程序被所述至少一个处理器执行时可实现上述方法。
本申请实例提供了一种非易失性计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现上述方法的步骤。
基于本申请实例提供的以上技术方案,用户触发第一客户端的语音 录制控件后,第一客户端便录制语音并向第二终端的第二客户端发送场景信息查询指令,当第一客户端接收到第二客户端返回的场景信息后,将场景信息以及用户输入的音频数据发送至第二客户端的后台服务器,由该服务器确定第一音频数据的文本,进而分词,将分词结果与场景信息进行匹配,并根据匹配结果形成控制指令发送给第一客户端,当第一客户端接收到控制指令时,将控制指令发送至第二客户端,从而使第二客户端执行相应的动作,整个过程中,用户只需要通过第一客户端上的语音录制控件输入语音,为用户对第二终端的操作提供了极大的便捷。
附图简要说明
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1是本申请一实例涉及的一种系统构架图;
图2是本申请一实例中语音操控终端的方法的流程示意图;
图3a是本申请一实例中第一客户端展示的一个界面示意图;
图3b是本申请一实例中智能电视展示的一个界面示意图;
图4是本申请一实例涉及的一种系统架构图;
图5是本申请一实例中语音操控终端的方法的流程示意图;
图6a是本申请一实例中智能电视展示的一个界面示意图;
图6b是本申请一实例中第一客户端展示的一个界面示意图;
图7是本申请一实例中第一客户端的结构框图;
图8是本申请一实例中第一服务器的结构框图;
图9是本申请一实例中语音操控终端的方法的整体交互示意图;
图10是本申请一实例中计算机设备的结构框图。
实施方式
本申请提出了一种语音操控终端的方法,该方法应用的系统架构如图1所示。该系统架构包括:第一终端101、第二终端102和第一服务器103,第一终端101和第二终端102之间可以通过局域网104连接,第一终端101与第一服务器103之间可以通过互联网105连接,其中:
上述第一终端101,可以为移动终端,例如智能手机、平板电脑等,其上安装有各种应用软件的客户端软件,用户可以通过上述第一终端登录并使用这些应用软件的客户端,例如,语音助手等客户端。
上述第二终端102,可以为电视设备,例如搭载有android或其他系统的智能电视、与传统电视连接的电视盒子,该电视盒子搭载有android或其他系统,传统电视相当于电视盒子的显示器。第二终端上可安装有多种应用程序,用户可以通过遥控器操控这些应用程序,例如,用户可以通过遥控器打开安装在智能电视上的腾讯视频客户端,找到想要观看的视频,然后进行播放、快进或者快退等操作。再例如,用户可以通过遥控器打开安装在电视盒子上的某一音乐软件(例如,qq音乐)的客户端,然后在传统电视展示的界面中对本地或在线的某一首音乐进行播放、收藏等操作。
上述第一服务器103,是指第二终端上安装的客户端的后台服务器,例如,第二终端上某视频客户端对应的后台服务器、某音乐软件的后台服务器,若在第二终端上操作的是视频客户端,则对应的第一服务器为该视频客户端对应的后台服务器,若在第二终端上操作的是一个音乐软件的客户端,则对应的第一服务器为该音乐软件的后台服务器,也就是说,第一服务器是在第二终端上操作的客户端对应的后台服务器。第一服务器具体可以是一台服务器,也可以是多台服务器形成的服务器集群。
基于图1所示的系统架构,本申请一个实例提供一种语音操控终端的方法,该方法可以由第一终端101执行,具体可以由第一终端101上的第一客户端执行,如图2所示,该方法包括:
S201、响应于对第一终端101上第一客户端的语音录制控件的操作,录制语音得到第一音频数据,并向第二终端102上的第二客户端发送场景信息查询指令;所述场景信息包括所述第二客户端的至少一个展示界面中的至少一个可操作对象信息;
可理解的是,为使第一终端101上的第一客户端与第二终端102之间能够进行信息交互,第一终端101上的第一客户端需要与第二终端102连接,具体可以通过局域网连接。例如,图3a所示的手机界面所属的客户端与图3b示出的智能电视连接,其中图3a示出的客户端为一个语音助手的客户端,图3b示出的智能电视为用户客厅的小米电视,此时智能电视上展示了视频客户端的综艺界面的展示界面,在该界面中可以看到有“明日之子”、“奔跑吧”这些综艺节目的名称、海报、更新日期等 信息。第一客户端的语音录制控件301在被触发后便会向客厅的小米电视发送场景信息查询指令。
语音录制控件301,可以采用多种形式,例如,设置在一个界面上的虚拟按键,当用户长按该按键,第一客户端便会检测周围的声音,并对检测到的声音进行录制,当用户放开该按键时,录制结束,至此录制的声音形成了第一音频数据。当然,本申请实例中的语音录制控件的功能不仅仅是语音录制的作用,还会向与第一客户端连接的第二终端102发送场景信息查询指令。例如,当用户按下语音录制控件时,第一客户端便会向客厅的小米电视发送场景信息查询指令。实际上,用户通过第一客户端录制的第一音频数据为对第二终端102进行操作的语音指令。在第一客户端的展示界面中,还可以展示进行语音录制的提示信息,例如,如图3a所示,提示用户:您可以这样问我:播放三生三世、打开观看历史、搜索杨幂的电视剧,用户可以参考提示信息发出语音操作指令。
第二客户端是安装在第二终端102上的客户端,由于在第二终端102上可能安装有多个客户端,因此需要选择其中一个或多个客户端作为第二客户端,其中一种方式为:将电视机(电视盒子连接的传统电视或智能电视)当前展示的界面对应的客户端作为第二客户端。例如,用户客厅的小米电视展示的是腾讯视频客户端的某个界面,第一客户端发送的场景信息查询指令便会被发送至小米电视中的腾讯视频客户端,当腾讯视频客户端接收到该查询指令后,会把腾讯视频客户端的场景信息返回至第一客户端。
所谓的场景信息,包括第二客户端的至少一个界面中至少一个可操作对象信息。举例来说,对于腾讯视频客户端来说,有综艺节目的展示界面、电视剧的展示界面、电影的展示界面、纪录片的展示界面等,在综艺界面的展示界面中有多个近期热播的综艺节目,这些综艺节目以海报、节目名称等方式展示出来。同样的,电视剧的展示界面中有多部近期热播的电视剧,这些电视剧也是以海报、电视剧名称的方式展示出来等。当然,还有观看记录、收藏、搜索、意见反馈、设置等,观看记录、收藏、搜索、意见反馈、设置这些可以作为腾讯视频客户端的全局选项。当用户点击综艺节目展示界面中某综艺界面的海报时,便会进入该综艺界面的播放界面,当用户点击收藏时,便会进入收藏的界面,该界面中展示有用户收藏的多个视频的相关信息。因此不论是电视剧展示界面中的电视剧、综艺界面展示界面中的综艺界面等,还是观看记录、收藏、搜索、意见反馈、设置这些全局选项,均可以作为视频客户端的可操作 对象,也就是说,如果该腾讯视频客户端接收到场景信息查询指令,腾讯视频客户端返回的场景信息可以包括多个综艺节目的名称、多个电视剧的名称、多部电影的名称、多个纪录片的名称,还可以包括观看记录、收藏、搜索、意见反馈、设置等。
以上场景信息是以一个视频客户端为例进行说明,由于第二客户端不仅限于视频客户端,还可以是其他的客户端,例如某音乐客户端、新闻客户端等。对于音乐客户端来说,有排行的展示界面、歌单的展示界面、MV的展示界面、本地的展示界面,在排行的展示界面中有多首按照热度排名的歌曲,在歌单的展示界面中有多首按照歌曲类型分类的歌曲,在MV的展示界面有多首有MV的歌曲,在本地的展示界面中有多首已下载到本地的歌曲。其中,排行的展示界面、歌单的展示界面、MV的展示界面中均展示的是在线歌曲,本地的展示界面展示的是本地歌曲。当然,也包括设置、问题与建议、检查更新等全局选项,不论是在线歌曲、是本地歌曲,还是全局选项,均可以作为该音乐客户端的可操作对象,也就是说,如果该音乐客户端接收到场景信息查询指令,返回的场景信息可以包括多首在线或本地的歌曲名称,也可以包括设置、问题与建议、检查更新等。
S202、在接收到所述第二客户端返回的场景信息时,将所述场景信息和第一音频数据发送至第一服务器103,以使所述第一服务器103根据所述第一音频数据和所述场景信息形成携带有待执行的可操作对象信息的控制指令;所述第一服务器103为所述第二客户端的后台服务器;
第一服务器103,为第二客户端的后台服务器,例如,假设第二客户端为某一视频客户端,则第一服务器103为该视频客户端的服务器;假设第二客户端为某一新闻客户端,则第一服务器103为该新闻客户端的服务器。
所谓的待执行的可操作对象信息,可包括可操作对象的名称,例如,用户在第一客户端上输入的语音为“打开历史记录”,则待执行的可操作对象信息中包括“历史记录”,当第二客户端接收到携带有待执行的可操作对象信息的控制指令时,便会打开历史记录,即展示历史记录的界面,从该界面中用户可以看到近期观看的媒体内容。再例如,用户在第一客户端上输入的语音为“我想看西游记”,则待执行的可操作对象信息中包括“西游记”,当第二客户端接收到携带有该待执行的可操作对象信息的控制指令时,便会默认为是要对西游记执行播放动作。当然,由于对第二客户端的操作不仅仅限于打开、播放这种操作,当然还有下 载、前进、后退等这些操作,因此待执行的可操作对象信息中还可以包括对可操作对象执行的动作等信息。例如,用户在第一客户端语音输入“下载西游记主题曲”,第一服务器103基于该音频数据以及第二客户端的场景信息,生成控制指令中的待执行的可操作对象信息不仅包括“西游记主题曲”,还包括“下载”,当第二客户端接收到该控制指令时,不会执行播放西游记主题曲的动作,而是执行下载西游记主题曲的动作。
这里,对第一服务器103根据场景信息和第一音频数据形成控制指令的过程可以采用多种方式实现,对此本申请实例不做限定。
S203、在接收到所述第一服务器103返回的控制指令时,将所述控制指令发送至所述第二客户端,以使所述第二客户端根据所述控制指令执行相应的动作。
举例来说,假设与用户第一终端101的第一客户端连接的智能电视上展示一个视频客户端的某一个界面,当用户按住的第一客户端上的语音录制控件说“我想看西游记”时,第一客户端向第二终端102中的第二客户端发送场景信息查询指令,此时智能电视展示的视频客户端会将包括各种电视剧片名、电影片名、综艺节目名称、一些全局选项等的场景信息返回给第一客户端,当第一客户端接收到场景信息后,会将场景信息以及“我想看西游记”的音频数据发送给视频客户端的后台服务器—第一服务器103,第一服务器103接收到场景信息和音频数据后根据音频数据了解到用户想要看西游记,然后结合场景信息,生成控制指令,然后把控制指令返回给第一客户端,进而第一客户端将控制指令发送给第二客户端,以使第二客户端执行打开《西游记》的媒体资源的播放界面的动作。当然,如果,与“西游记”相关的媒体资源不止一个时,例如,第二客户端根据控制指令查找到的与“西游记”相关的媒体资源有电视剧《西游记》、电视剧《新西游记》、动画片《西游记》、电影《西游记之大闹天宫》、《西游记之大圣归来》等等,而且与西游记相关的电视剧还有多集,此时第二客户端可以将这些媒体资源的相关信息展示在智能电视上或者与电视盒子连接的传统电视机上,以供用户选择,在用户通过语音或者遥控器等方式选择其中某媒体资源后,再执行播放动作。当第二客户端执行相应的动作成功与否,还可以给用户一些相应的提示。
基于上述描述可知,本申请实例提供的语音操控终端的方法,用户触发第一客户端的语音录制控件,第一客户端便录制语音并向第二终端 102的第二客户端发送场景信息查询指令,当第一客户端接收到第二客户端返回的场景信息后,将场景信息以及用户输入的音频数据发送至第二客户端的后台服务器,该服务器根据接收到的信息形成一个控制指令,然后通过第一客户端将该指令发给第二客户端,使第二客户端执行相应的动作。整个过程,用户只需要通过第一终端101的第一客户端输入语音,进而控制第二终端102执行相应的动作,例如,以电视设备作为第二终端102,则不需要通过遥控器一步步的操作,即减少了遥控器的繁琐操作,因此为用户对第二终端102的操作提供了极大的便捷。
在一些实例中,可以在第二终端102上安装一个代理模块,例如,在智能电视或电视盒子上安装一个电视代理模块,所谓的电视代理模块实际上也是一个应用程序,而且是一个对用户不可见的应用程序,可作为智能电视或电视盒子上安装的其他应用程序与外界交互的桥梁,即在智能电视或电视盒子上安装的其他应用程序通过代理模块与外界进行交互,这样便于对智能电视或电视盒子中的应用程序与外界的交互进行管理。在实际应用中,可先将第二终端102中的可参与语音控制的应用程序(例如,视频客户端、音乐客户端、新闻客户端等)在代理模块中进行注册,如果某应用程序没有在代理模块中注册,该应用程序则不能参与语音控制,也就是说,用户不能通过语音控制该应用程序执行动作。当代理模块在接收到外界发送来的指令时,会通过广播的方式将指令发送给相应的应用程序,例如,在广播时指定包名,只将指令发送给前台应用程序,这样只有前台应用程序能够接收到指令,所谓的前台应用程序就是在第二终端102的当前展示界面所属的客户端。
在第二终端102中设置代理模块的基础上,S201中第一客户端向第二终端102上的第二客户端发送场景信息查询指令过程可以为:第一客户端将所述场景信息查询指令发送至所述第二终端102中的代理模块,以使所述代理模块将所述场景信息查询指令发送至所述第二客户端。也就是说,在用户触发第一客户端的语音录制控件后,第一客户端向第二终端102中的代理模块发送场景信息查询指令,代理模块在接收到场景信息查询指令之后,会把场景信息查询指令发送给第二客户端。当然,当第二客户端接收到代理模块发送来的场景信息查询指令后,会将场景信息返回给代理模块,当代理模块在接收到场景信息后,会把场景信息发送给第一客户端。同样的,在步骤S203中,第一客户端将控制指令发送给第二客户端的过程可以为:将所述控制指令发送至所述第二终端102中的代理模块,以使所述代理模块将所述控制指令发送至所述第二 客户端。也就是说,第一客户端在接收到第一服务器103发送来的控制指令后,会将控制指令发送给代理模块,代理模块在接收到控制指令后会把控制指令发送给第二客户端。可见,第二终端102中的客户端与外界第一终端101之间的交互均通过代理模块,以实现对第二终端102中的客户端与外界通信的管理。
此外,还可以利用第二终端102中的代理模块在电视界面展示一些提示信息,例如用户在第一客户端中输入语音时,此时第一客户端会向代理模块发送场景信息查询指令,当代理模块接收到该指令时得知用户正在录制语音,因此代理模块可以在电视机的当前界面中展示语音正在输入的提醒信息等。
在一些实例中,在步骤S202中将将所述场景信息和第一音频数据发送至第一服务器103的方式有多种,其中一种方式为:采用流式切片的发送模式将所述第一音频数据逐片发送至所述第一服务器103,以提高传输效率。例如,每一个分片用300ms的时间传输。也就是说,将第一音频数据分为多个分片,将这些分片逐片的发送给第一服务器103,所述场景信息可携带在其中任意一个分片中,例如,场景信息携带在最后一个分片中。当第一服务器103接收到携带有场景信息的分片时,即可认为第一音频数据接收完成。当然,也可以采用其他方式将第一音频数据和场景信息发送给第一服务器103。
在一些实例中,当第一客户端将场景信息和第一音频数据发送给第一服务器103之后,第一服务器103形成控制指令的方式有多种,例如第一服务器103将第一音频数据转换为文本,然后对文本进行分词,再将分词结果与场景信息进行匹配,根据匹配结果形成控制指令。当然,第一服务器103也可以将第一音频数据发送给具有音频识别能力的其他服务器,例如,微信的后台服务器、qq的后台服务器,将具有音频识别能力的其他服务器称为第二服务器106,此时如图4所示,本申请实例应用的系统架构中增加了第二服务器106,可以参照图4将第一服务器103和第二服务器106分开设置,此时第一服务器103和第二服务器106之间通过网络传输数据;当然,第一服务器103和第二服务器106也可以集成在一起得到集成服务器,此时集成服务器既具有第一服务器103的功能也具有第二服务器106的功能。当第二服务器106接收到第一音频数据后,将第一音频数据转换为文本,然后将文本返回给第一服务器103,这样第一服务器103对接收到的文本进行分词,然后将分词结果与场景信息进行匹配,根据匹配结果形成控制指令。也就是说,不具有 语音处理能力的第一服务器103可以将第一音频数据发送给具有语音处理能力的第二服务器106,由第二服务器106将第一音频数据转换为文本然后返回给第一服务器103。举例来说,假设电视机的当前展示界面为一个视频客户端的一个界面,当用户按住第一客户端的语音录制控件输入的语音为“我想看西游记”,第一客户端将该语音和视频客户端返回的场景信息发送给第一服务器103,而第一服务器103不具有把用户输入的语音转换为文本的能力,便会在接收到该语音和场景信息后,把这段语音发送给微信的后台服务器即第二服务器106,微信的后台服务器将语音转换为文本,并将文本返回给第一服务器103,第一服务器103接收到文本后,利用基于语义的分词方法对文本“我想看西游记”进行分词,得到分词结果:“我”、“想看”、“西游记”,然后将该分词结果与场景信息进行匹配,发现场景信息中存在西游记的相关视频,便会形成携带有“西游记”相关视频信息的控制指令。
当然,第一服务器103还可以将文本转换为标准语音,该标准语音可称为第二音频数据,然后将第二音频数据发送至第一客户端。第一服务器103也可以向第二服务器106发送语音合成请求,当第二服务器106接收到该请求后,会把文本转换为标准语音即第二音频数据,进而将第二音频数据返回给第一服务器103,这样第一服务器103便可以将第二音频数据发送给第一客户端。其中,第二服务器106是指具有语音处理能力的服务器,所谓的语音处理能力包括把音频数据转换为文本,当然还可以包括把文本转换为标准音频数据。这种采用使第一服务器103向具有语音处理能力的第二服务器106发送语音合成请求,进而获得第二音频数据的方式,对于第一服务器103的要求不高,不需要第一服务器103具有语音处理的能力,因此对于一个具有语音处理能力的第一服务器103来说,可以自己将文本转换为第二音频数据,对于没有语音处理能力的第一服务器103来说,可以向具有语音处理能力的第二服务器106发送语音合成请求,这样也能得到第二音频数据。
当第一客户端在接收到所述第一服务器103发送来的第二音频数据时,可以播放所述第二音频数据,也可以将所述第二音频数据发送至所述第二终端102,例如发送至第二终端102中的代理模块,以使代理模块播放所述第二音频数据。所述第二音频数据为文本转换而成的音频数据,所述文本由所述第一音频数据转换而成。这样,用户便可以听到自己输入的语音对应的标准语音。
基于图4示出的系统架构,本申请实例还提供一种语音操控终端的方法,该方法可以由第一服务器103执行,如图5所示,该方法包括:
S501、在接收到第一终端101上的第一客户端发送来的场景信息和第一音频数据时,确定所述第一音频数据转换而成的文本;其中,所述场景信息包括第二终端102上第二客户端的至少一个展示界面中的至少一个可操作对象信息;
可理解的是,步骤501中第一服务器103确定第一音频数据转换而成的文本的方式,可以是第一服务器103将第一音频数据转换为文本,也可以是第一服务器103在接收到第一客户端发送来的场景信息和第一音频数据时,将所述第一音频数据发送至第二服务器106,以使所述第二服务器106将所述第一音频数据转换为文本,并将所述文本返回至第一服务器103,也就是说,第一服务器103将第一音频数据发送给具有语音处理能力的第二服务器106。不论哪种方式,只有第一服务器103能够获得第一音频数据转换而成的文本即可。
S502、对所述文本进行分词,得到分词结果;
例如,对于文本“我想看西游记”,分词后得到的分词结果为“我”、“想看”、“西游记”。这里的分词可以但不限于采用基于语义分析的分词方法。
S503、根据所述分词结果和所述场景信息,形成携带有待执行的可操作对象信息的控制指令;
S504、将所述控制指令通过所述第一客户端发送至第二客户端,以使所述第二客户端根据所述控制指令执行相应的动作。
本申请实例提供的语音操作终端的方法,当第一服务器103接收到第一客户端发送来的场景信息和第一音频数据时,首先获取第一音频数据对应的文本,然后对文本分词,再将基于分词结果和场景信息形成控制指令发送给第一客户端,进而使第一客户端将控制指令发送给第二客户端,使第二客户端执行相应的动作,从而实现语音控制第二客户端的目的。
可理解的是,在本申请实例提供的由第一服务器执行的语音操控终端的方法中有关内容的解释、举例、有益效果等部分可以参考上一实例中由第一客户端执行的语音操控终端的方法中的相应内容,此处不再赘述。
在一些实例中,上述S503中形成控制指令的方式有多种,其中一种为:
将所述分词结果与所述场景信息进行匹配,若所述场景信息中存在与所述分词结果相匹配的可操作对象信息,则形成携带有待执行的可操作对象信息的控制指令;所述待执行的可操作对象信息为与所述分词结果相匹配的可操作对象信息。
举例来说,分词结果为“我”、“想看”、“西游记”,而场景信息中包含视频名称“西游记”,则认为场景信息中存在与分词结果相匹配的可操作对象信息,形成的控制指令中则携带有“西游记”这一个视频名称。这样,当第二客户端接收到这一个控制指令时,便会对可操作对象—视频“西游记”执行播放操作。
可理解的是,在场景信息中包含的可操作对象信息可以是第一服务器103中存储的媒体资源相对应的可操作对象信息,也可以是第三方媒体资源的可操作对象信息。所谓的媒体资源,可以是视频、可以是音乐、也可以是新闻内容(包括文字、图片等),还可以是其他媒体资源。
当然,由于在第二客户端的展示界面中大多展示的是近期的内容,对于时间比较久远的内容可能没有展示出来,这样场景信息中不存在相应的可操作对象信息,但是在第二客户端的后台服务器即第一服务器103中保存有相关的媒体资源。由于可能存在这种情况,因此步骤S503中形成控制指令的过程可以包括:
将所述分词结果与所述场景信息进行匹配,若所述场景信息中不存在与所述分词结果相匹配的可操作对象信息,则根据所述分词结果搜索与所述分词结果相匹配的媒体资源;
若搜索到与所述分词结果相匹配的媒体资源,形成携带有待执行的可操作对象信息的控制指令,所述待执行的可操作对象信息为与所述分词结果相匹配的媒体资源对应的可操作对象信息。
上述搜索即是在第一服务器103中进行搜索,如果在第一服务器103中存储有与分词结果匹配的媒体资源,例如,如果在场景信息中不存在与“西游记”相匹配的可操作对象信息,则在第一服务器103中进行搜索,如果搜索到与“西游记”相匹配的视频资源,则形成控制指令,该控制指令中包含“西游记”这一个视频名称,当第二客户端接收到该控制指令时,便会对可操作对象—视频“西游记”执行播放操作,也就是说,智能电视或与电视盒子连接的电视机进入西游记的播放界面。
当然,也可能存在没有搜索到与分词结果相匹配的媒体资源,此时第一服务器103可以通过所述第一客户端向所述第二客户端反馈搜索结果,以使所述第二客户端展示所述搜索结果。例如,第一服务器103向 第一客户端反馈没有搜索到西游记相关视频的搜索结果,当第一客户端接收到信息后,便会发送至第二客户端,如图6a所示,第二客户端便会在电视机上显示“未搜索到“西游记”相关视频!”的提示信息。
在第一服务器103在获得第一音频数据的文本之后,还可以将文本发送至第一客户端,这样第一客户端可以展示该文本,如图6b所示中展示的“您说的内容是:我想看西游记”。
在一些实例中,第一客户端可能以流式分片的发送模式将第一音频数据发送至第一服务器103,如果第一服务器103不具有语音处理能力的话,会把第一音频数据发送至第二服务器106的话,具体可以是:在接收到所述第一音频数据的每一个分片时,将该分片发送至所述第二服务器106,以使所述第二服务器106将该分片转换为对应的文本片段,并将所述文本片段返回第一服务器103;其中,各个分片对应的文本片段的组合为所述文本。这种通过流式分片的发送模式将分片发送给第二服务器106以使第二服务器106进行文本转换的方式,可以提高传输换的效率。
在一些实例中,第一服务器103可以获取文本对应的标准语音,如果第一服务器103不具有语音处理能力的话,可以向所述第二服务器106发送语音合成请求,以使所述第二服务器106将所述文本转换为第二音频数据;在接收到所述第二服务器106返回的所述第二音频数据时,将所述第二音频数据发送至所述第一客户端。
当将第二音频数据发送至第一客户端之后,第一客户端可以播放该第二音频数据,也可以将第二音频数据发送给第二终端102,以便第二终端102播放该第二音频数据。
本申请实例还提供一种第一终端,所述第一终端包括:
至少一个存储器;
至少一个处理器;
其中,所述至少一个存储器存储有至少一个指令模块,经配置由所述至少一个处理器执。
由于第一终端所执行的上述方法实际为第一终端上安装的第一客户端所执行,因此上述至少一个指令模块为第一客户端中的指令模块,如图7所示,所述至少一个指令模块包括:
响应模块701,用于响应于对第一终端上第一客户端的语音录制控件的操作,录制语音得到第一音频数据,并向第二终端上的第二客户端 发送场景信息查询指令;所述场景信息包括所述第二客户端的至少一个展示界面中的至少一个可操作对象信息;
第一发送模块702,用于在接收到所述第二客户端返回的场景信息时,将所述场景信息和第一音频数据发送至第一服务器,以使所述第一服务器根据所述第一音频数据和所述场景信息形成携带有待执行的可操作对象信息的控制指令;所述第一服务器为所述第二客户端的后台服务器;
第二发送模块703,用于在接收到所述第一服务器返回的控制指令时,将所述控制指令发送至所述第二客户端,以使所述第二客户端根据所述控制指令执行相应的动作。
在一些实例中,第一发送模块702可以采用流式切片的发送模式将所述第一音频数据逐片发送至所述第一服务器;所述场景信息携带在其中一个分片中,例如,所述场景信息携带在所述第一音频数据的最后一个分片中。
在一些实例中,第一客户端还可以包括:
播放模块,用于在接收到所述第一服务器发送来的第二音频数据时,播放所述第二音频数据;其中,所述第二音频数据由文本转换而成,所述文本由所述第一音频数据转换而成。
在一些实例中,第二发送模块703还可以用于在接收到所述第一服务器发送来的第二音频数据时,将所述第二音频数据发送至所述第二终端的代理模块,以使所述代理模块播放所述第二音频数据;其中,所述第二音频数据由文本转换而成,所述文本由所述第一音频数据转换而成。
在一些实例中,响应模块701可以用于将所述场景信息查询指令发送至所述第二终端中的代理模块,以使所述代理模块将所述场景信息查询指令发送至所述第二客户端。
在一些实例中,第二发送模块703可以用于将所述控制指令发送至所述第二终端中的代理模块,以使所述代理模块将所述控制指令发送至所述第二客户端。
可理解的是,本申请实例提供的第一客户端,为上述语音操控终端的方法的功能架构模块,其有关内容的解释、举例、有益效果等可参考上文中语音操控终端的方法的相关内容,此处不再赘述。
本申请实例还提供一种第一服务器,该服务器包括:
至少一个存储器;
至少一个处理器;
其中,所述至少一个存储器存储有至少一个指令模块,经配置由所述至少一个处理器执行;如图8所示,所述至少一个指令模块包括:
文本确定模块801,用于在接收到第一终端上的第一客户端发送来的场景信息和第一音频数据时,确定所述第一音频数据转换而成的文本;其中,所述场景信息包括第二终端上第二客户端的至少一个展示界面中的至少一个可操作对象信息;
文本分词模块802,用于对所述文本进行分词,得到分词结果;
指令形成模块803,根据所述分词结果和所述场景信息,形成携带有待执行的可操作对象信息的控制指令;
第三发送模块804,用于将所述控制指令通过所述第一客户端发送至第二客户端,以使所述第二客户端根据所述控制指令执行相应的动作。
在一些实例中,指令形成模块803可以用于将所述分词结果与所述场景信息进行匹配,若所述场景信息中存在与所述分词结果相匹配的可操作对象信息,则形成携带有待执行的可操作对象信息的控制指令;所述待执行的可操作对象信息为与所述分词结果相匹配的可操作对象信息。
在一些实例中,指令形成模块803可以用于将所述分词结果与所述场景信息进行匹配,若所述场景信息中不存在与所述分词结果相匹配的可操作对象信息,则根据所述分词结果搜索与所述分词结果相匹配的媒体资源;若搜索到与所述分词结果相匹配的媒体资源,形成携带有待执行的可操作对象信息的控制指令,所述待执行的可操作对象信息为与所述分词结果相匹配的媒体资源对应的可操作对象信息。
在一些实例中,指令形成模块803还可以用于在未搜索到与所述分词结果相匹配的媒体资源时,通过所述第一客户端向所述第二客户端反馈搜索结果,以使所述第二客户端展示所述搜索结果。
在一些实例中,文本确定模块801可以用于在接收到第一客户端发送来的场景信息和第一音频数据时,将所述第一音频数据发送至第二服务器,以使所述第二服务器将所述第一音频数据转换为文本,并将所述文本返回至第一服务器。
在一些实例中,文本确定模块801具体可以用于在接收到所述第一音频数据的每一个分片时,将该分片发送至所述第二服务器,以使所述 第二服务器将该分片转换为对应的文本片段,并将所述文本片段返回第一服务器;其中,各个分片对应的文本片段的组合为所述文本。
在一些实例中,第一服务器800还可以包括:
请求模块,用于向所述第二服务器发送语音合成请求,以使所述第二服务器将所述文本转换为第二音频数据;在接收到所述第二服务器返回的所述第二音频数据时,将所述第二音频数据发送至所述第一客户端。
在一些实例中,第三发送模块804还可以用于:将所述文本发送至所述第一客户端,以使所述第一客户端展示所述文本。
可理解的是,本申请实例提供的第一服务器,为上述语音操控终端的方法的功能架构模块,其有关内容的解释、举例、有益效果等可参考上文中语音操控终端的方法的相关内容,此处不再赘述。
基于以上第一终端上的第一客户端和第一服务器,并结合图9本申请实例还提供一种语音操控终端的方法的整体过程:
S901、当用户按下第一客户端上的语音录制控件时,开始录音,并且第一客户端向电视代理模块发送场景信息查询指令;
S902、当电视代理模块接收到场景信息查询指令时,将场景信息查询指令发送至第二客户端;
S903、当第二客户端接收到场景信息查询指令时,将场景信息返回给电视代理模块;
S904、当电视代理模块接收到场景信息后,把场景信息发送至第一客户端;
S905、第一客户端采用流式分片的发送模式将录制得到的第一音频数据逐片发送至第一服务器,并在最后一个分片中携带有场景信息;
S906、当第一服务器接收到语音分片时,向第二服务器发送语音识别请求,得到该语音分片的文本片段;
S907、当识别完成后,第一服务器对各个文本片段组成的文本进行分词,然后与场景信息进行匹配,并根据匹配结果形成控制指令;
S908、第一服务器向第二服务器发送tts请求即语音合成请求,第二服务器对该tts请求进行处理,将文本转化为第二音频数据,返回给第一服务器;
S909、第一服务器将文本、控制指令、第二音频数据发送至第一客户端;
S910、第一客户端在界面中展示接收到的文本,并播放第二音频数据或者将第二音频数据发送至电视代理模块,以使电视代理模块播放第二音频数据。第一客户端将接收到的控制指令发送至电视代理模块,电视代理模块将控制指令发送给第二客户端,进而第二客户端执行相应的动作,至此完成通过语音操控第二客户端的过程。在上述过程中,用户仅需要通过语音录制控件输入语音即可,不需要像遥控器一样的繁琐操作,为用户提供了很大的便捷。
可理解的是,由于第一客户端安装在第一终端上,因此第一客户端所执行的方法的实际硬件执行主体为第一终端。
本申请实例还提供一种非易失性计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现上述任一方法的步骤。
本申请实例还提供一种计算机设备,图10示出了第一客户端或第一服务器所在的计算机设备的组成结构图。如图10所示,该计算设备包括一个或者多个处理器(CPU)1002、通信模块1004、存储器1006、用户接口1010,以及用于互联这些组件的通信总线1008,其中:
处理器1002可通过通信模块1004接收和发送数据以实现网络通信和/或本地通信。
用户接口1010包括一个或多个输出设备1012,其包括一个或多个扬声器和/或一个或多个可视化显示器。用户接口1010也包括一个或多个输入设备1014,其包括诸如,键盘,鼠标,声音命令输入单元或扩音器,触屏显示器,触敏输入板,姿势捕获摄像机或其他输入按钮或控件等。
存储器1006可以是高速随机存取存储器,诸如DRAM、SRAM、DDR RAM、或其他随机存取固态存储设备;或者非易失性存储器,诸如一个或多个磁盘存储设备、光盘存储设备、闪存设备,或其他非易失性固态存储设备。
存储器1006存储处理器1002可执行的指令集,包括:
操作系统1016,包括用于处理各种基本系统服务和用于执行硬件相关任务的程序;
应用1018,包括用于语音操控终端的各种应用程序,这种应用程序能够实现上述各实例中的处理流程,比如可以包括第一客户端或者第一服务器中的部分或者全部模块。第一客户端或者第一服务器的至少一个 模块可以存储有机器可执行指令。处理器1002通过执行存储器1006中各单元中至少一个单元中的机器可执行指令,进而能够实现上述模块中的至少一个模块的功能。
需要说明的是,上述各流程和各结构图中不是所有的步骤和模块都是必须的,可以根据实际的需要忽略某些步骤或模块。各步骤的执行顺序不是固定的,可以根据需要进行调整。各模块的划分仅仅是为了便于描述采用的功能上的划分,实际实现时,一个模块可以分由多个模块实现,多个模块的功能也可以由同一个模块实现,这些模块可以位于同一个设备中,也可以位于不同的设备中。
各实施例中的硬件模块可以以硬件方式或硬件平台加软件的方式实现。上述软件包括机器可读指令,存储在非易失性存储介质中。因此,各实施例也可以体现为软件产品。
各例中,硬件可以由专门的硬件或执行机器可读指令的硬件实现。例如,硬件可以为专门设计的永久性电路或逻辑器件(如专用处理器,如FPGA或ASIC)用于完成特定的操作。硬件也可以包括由软件临时配置的可编程逻辑器件或电路(如包括通用处理器或其它可编程处理器)用于执行特定操作。
另外,本申请的每个实例可以通过由数据处理设备如计算机执行的数据处理程序来实现。显然,数据处理程序构成了本申请。此外,通常存储在一个存储介质中的数据处理程序通过直接将程序读取出存储介质或者通过将程序安装或复制到数据处理设备的存储设备(如硬盘和或内存)中执行。因此,这样的存储介质也构成了本申请,本申请还提供了一种非易失性存储介质,其中存储有数据处理程序,这种数据处理程序可用于执行本申请上述方法实例中的任何一种实例。
图10模块对应的机器可读指令可以使计算机上操作的操作系统等来完成这里描述的部分或者全部操作。非易失性计算机可读存储介质可以是插入计算机内的扩展板中所设置的存储器中或者写到与计算机相连接的扩展单元中设置的存储器。安装在扩展板或者扩展单元上的CPU等可以根据指令执行部分和全部实际操作。
以上所述仅为本发明的一些实施例而已,并不用以限制本发明,凡在本发明的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本发明保护的范围之内。
Claims (19)
- 一种语音操控终端的方法,所述方法由第一终端执行,所述方法包括:响应于对第一终端上第一客户端的语音录制控件的操作,录制语音得到第一音频数据,并向第二终端上的第二客户端发送场景信息查询指令;其中,所述场景信息包括所述第二客户端的至少一个展示界面中的至少一个可操作对象信息;在接收到所述第二客户端返回的场景信息时,将所述场景信息和第一音频数据发送至服务器;其中,所述服务器为所述第二客户端的后台服务器;在接收到所述服务器返回的控制指令时,将所述控制指令发送至所述第二客户端以执行相应的动作;其中,所述控制指令携带有待执行的可操作对象信息。
- 根据权利要求1所述的方法,其中,所述将所述场景信息和第一音频数据发送至服务器,包括:采用流式切片的发送模式将所述第一音频数据逐片发送至所述服务器;所述场景信息携带在其中一个分片中。
- 根据权利要求2所述的方法,其中,所述场景信息携带在所述第一音频数据的最后一个分片中。
- 根据权利要求1所述的方法,所述方法还包括:在接收到所述服务器发送来的第二音频数据时,播放所述第二音频数据;其中,所述第二音频数据由文本转换而成,所述文本由所述第一音频数据转换而成。
- 根据权利要求1所述的方法,所述方法还包括:在接收到所述服务器发送来的第二音频数据时,将所述第二音频数据发送至所述第二终端的代理模块中进行播放;其中,所述第二音频数据由文本转换而成,所述文本由所述第一音频数据转换而成。
- 根据权利要求1~5任一所述的方法,其中,所述向第二终端上的第二客户端发送场景信息查询指令,包括:将所述场景信息查询指令发送至所述第二终端中的代理模块;其中,所述代理模块能够将所述场景信息查询指令发送至所述第二客户端。
- 一种语音操控终端的方法,所述方法由服务器执行,所述方法包括:在接收到第一终端上的第一客户端发送来的场景信息和第一音频数据时,确定所述第一音频数据转换而成的文本;其中,所述场景信息包括第二终端上第二客户端的至少一个展示界面中的至少一个可操作对象信息;对所述文本进行分词,得到分词结果;根据所述分词结果和所述场景信息,形成携带有待执行的可操作对象信息的控制指令;将所述控制指令通过所述第一客户端发送至第二客户端以使所述第二客户端执行所述控制指令相对应的动作。
- 根据权利要求7所述的方法,其中,所述根据所述分词结果和所述场景信息,形成携带有待执行的可操作对象信息的控制指令,包括:将所述分词结果与所述场景信息进行匹配,若所述场景信息中存在与所述分词结果相匹配的可操作对象信息,则形成携带有待执行的可操作对象信息的控制指令;所述待执行的可操作对象信息为与所述分词结果相匹配的可操作对象信息。
- 根据权利要求7所述的方法,其中,所述根据所述分词结果和所述场景信息,形成携带有待执行的可操作对象信息的控制指令,包括:将所述分词结果与所述场景信息进行匹配,若所述场景信息中不存在与所述分词结果相匹配的可操作对象信息,则根据所述分词结果搜索与所述分词结果相匹配的媒体资源;若搜索到与所述分词结果相匹配的媒体资源,形成携带有待执行的可操作对象信息的控制指令,所述待执行的可操作对象信息为与所述分词结果相匹配的媒体资源对应的可操作对象信息。
- 根据权利要求9所述的方法,所述方法还包括:若未搜索到与所述分词结果相匹配的媒体资源,则通过所述第一客户端向所述第二客户端反馈搜索结果;其中,所述第二客户端能够在接收到所述搜索结果时展示所述搜索结果。
- 根据权利要求7所述的方法,其中,所述服务器为第一服务器;所述在接收到第一客户端发送来的场景信息和第一音频数据时,确定所述第一音频数据转换而成的文本,包括:在接收到第一客户端发送来的场景信息和第一音频数据时,将所述 第一音频数据发送至第二服务器;其中,所述第二服务器能够将所述第一音频数据转换为文本;从所述第二服务器中获取所述文本。
- 根据权利要求11所述的方法,其中,所述在接收到第一客户端发送来的场景信息和第一音频数据时,将所述第一音频数据发送至第二服务器,包括:在接收到所述第一音频数据的每一个分片时,将该分片发送至所述第二服务器以获取该分片对应的文本片段;其中,各个分片对应的文本片段的组合为所述文本。
- 根据权利要求11所述的方法,所述方法还包括:向所述第二服务器发送语音合成请求;其中,所述第二服务器在接收到所述语音合成请求时能够将所述文本转换为第二音频数据;在接收到所述第二服务器返回的第二音频数据时,将所述第二音频数据发送至所述第一客户端。
- 一种第一终端,该第一终端包括:至少一个存储器;至少一个处理器;其中,所述至少一个存储器存储有至少一个指令模块,经配置由所述至少一个处理器执行;其中,所述至少一个指令模块包括:响应模块,用于响应于对第一终端上第一客户端的语音录制控件的操作,录制语音得到第一音频数据,并向第二终端上的第二客户端发送场景信息查询指令;其中,所述场景信息包括所述第二客户端的至少一个展示界面中的至少一个可操作对象信息;第一发送模块,用于在接收到所述第二客户端返回的场景信息时,将所述场景信息和第一音频数据发送至第一服务器;其中,所述第一服务器为所述第二客户端的后台服务器;第二发送模块,用于在接收到所述第一服务器返回的控制指令时,将所述控制指令发送至所述第二客户端以执行相应的动作;其中,所述控制指令携带有待执行的可操作对象信息。
- 一种服务器,所述服务器包括:至少一个存储器;至少一个处理器;其中,所述至少一个存储器存储有至少一个指令模块,经配置由所述至少一个处理器执行;其中,所述至少一个指令模块包括:文本确定模块,用于在接收到第一终端上的第一客户端发送来的场景信息和第一音频数据时,确定所述第一音频数据转换而成的文本;其中,所述场景信息包括第二终端上第二客户端的至少一个展示界面中的至少一个可操作对象信息;文本分词模块,用于对所述文本进行分词,得到分词结果;指令形成模块,用于根据所述分词结果和所述场景信息,形成携带有待执行的可操作对象信息的控制指令;第三发送模块,用于将所述控制指令通过所述第一客户端发送至第二客户端以使所述第二客户端执行所述控制指令相对应的动作。
- 一种语音操控终端的方法,所述方法包括:第一终端响应于对第一终端上第一客户端的语音录制控件的操作,录制语音得到第一音频数据,并向第二终端上的第二客户端发送场景信息查询指令;其中,所述场景信息包括所述第二客户端的至少一个展示界面中的至少一个可操作对象信息;在接收到所述第二客户端返回的场景信息时,将所述场景信息和第一音频数据发送至服务器;其中,所述服务器为所述第二客户端的后台服务器;服务器在接收到第一终端上的第一客户端发送来的场景信息和第一音频数据时,确定所述第一音频数据转换而成的文本;其中,所述场景信息包括第二终端上第二客户端的至少一个展示界面中的至少一个可操作对象信息;对所述文本进行分词,得到分词结果;根据所述分词结果和所述场景信息,形成携带有待执行的可操作对象信息的控制指令;将所述控制指令发送至所述第一终端上的第一客户端;所述第一终端在接收到所述服务器返回的控制指令时,将所述控制指令发送至所述第二客户端以执行相应的动作;其中,所述控制指令携带有待执行的可操作对象信息。
- 一种第一终端,所述第一终端包括:至少一个存储器;至少一个处理器;其中,所述至少一个存储器存储有计算机程序,所述计算机程序被所述至少一个处理器执行时可实现如权利要求1~6任一项所述的方法。
- 一种第一服务器,所述第一服务器包括:至少一个存储器;至少一个处理器;其中,所述至少一个存储器存储有计算机程序,所述计算机程序被 所述至少一个处理器执行时可实现如权利要求7~13任一项所述的方法。
- 一种非易失性计算机可读存储介质,其上存储有计算机程序,在所述计算机程序被处理器执行时可实现如权利要求1~13任一项所述的方法。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP18853000.0A EP3680896B1 (en) | 2017-09-08 | 2018-09-06 | Method for controlling terminal by voice, terminal, server and storage medium |
US16/809,746 US11227598B2 (en) | 2017-09-08 | 2020-03-05 | Method for controlling terminal by voice, terminal, server and storage medium |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710804781.3 | 2017-09-08 | ||
CN201710804781.3A CN109474843B (zh) | 2017-09-08 | 2017-09-08 | 语音操控终端的方法、客户端、服务器 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/809,746 Continuation US11227598B2 (en) | 2017-09-08 | 2020-03-05 | Method for controlling terminal by voice, terminal, server and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2019047878A1 true WO2019047878A1 (zh) | 2019-03-14 |
Family
ID=65634661
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2018/104264 WO2019047878A1 (zh) | 2017-09-08 | 2018-09-06 | 语音操控终端的方法、终端、服务器和存储介质 |
Country Status (4)
Country | Link |
---|---|
US (1) | US11227598B2 (zh) |
EP (1) | EP3680896B1 (zh) |
CN (1) | CN109474843B (zh) |
WO (1) | WO2019047878A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111464595A (zh) * | 2020-03-17 | 2020-07-28 | 云知声智能科技股份有限公司 | 一种云端配置个性化场景的方法及装置 |
Families Citing this family (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109474843B (zh) * | 2017-09-08 | 2021-09-03 | 腾讯科技(深圳)有限公司 | 语音操控终端的方法、客户端、服务器 |
CN110322873B (zh) | 2019-07-02 | 2022-03-01 | 百度在线网络技术(北京)有限公司 | 语音技能的退出方法、装置、设备及存储介质 |
CN110600027B (zh) * | 2019-08-26 | 2022-12-02 | 深圳市丰润达科技有限公司 | 语音终端场景控制、应用方法、语音终端、云端及系统 |
CN110718219B (zh) | 2019-09-12 | 2022-07-22 | 百度在线网络技术(北京)有限公司 | 一种语音处理方法、装置、设备和计算机存储介质 |
CN113194346A (zh) * | 2019-11-29 | 2021-07-30 | 广东海信电子有限公司 | 一种显示设备 |
CN114430496B (zh) * | 2020-10-15 | 2024-03-01 | 华为技术有限公司 | 跨设备视频搜索方法及相关设备 |
CN112397068B (zh) * | 2020-11-16 | 2024-03-26 | 深圳市朗科科技股份有限公司 | 一种语音指令执行方法及存储设备 |
CN117882130A (zh) * | 2021-06-22 | 2024-04-12 | 海信视像科技股份有限公司 | 一种进行语音控制的终端设备及服务器 |
CN114610158B (zh) * | 2022-03-25 | 2024-09-27 | Oppo广东移动通信有限公司 | 数据处理方法及装置、电子设备、存储介质 |
CN115002059B (zh) * | 2022-05-06 | 2024-03-12 | 深圳市雷鸟网络传媒有限公司 | 信息处理方法、装置、计算机可读存储介质及计算机设备 |
CN115802083A (zh) * | 2022-11-22 | 2023-03-14 | 深圳创维-Rgb电子有限公司 | 控制方法、装置、分体电视及可读存储介质 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103188538A (zh) * | 2012-12-28 | 2013-07-03 | 吴玉胜 | 基于智能电视设备和互联网的家电控制方法及系统 |
CN104599669A (zh) * | 2014-12-31 | 2015-05-06 | 乐视致新电子科技(天津)有限公司 | 一种语音控制方法和装置 |
CN104717536A (zh) * | 2013-12-11 | 2015-06-17 | 中国电信股份有限公司 | 一种语音控制的方法和系统 |
CN105161106A (zh) * | 2015-08-20 | 2015-12-16 | 深圳Tcl数字技术有限公司 | 智能终端的语音控制方法、装置及电视机系统 |
EP2986014A1 (en) * | 2011-08-05 | 2016-02-17 | Samsung Electronics Co., Ltd. | Method for controlling electronic apparatus based on voice recognition and motion recognition, and electronic apparatus applying the same |
CN105957530A (zh) * | 2016-04-28 | 2016-09-21 | 海信集团有限公司 | 一种语音控制方法、装置和终端设备 |
Family Cites Families (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7289964B1 (en) * | 1999-08-31 | 2007-10-30 | Accenture Llp | System and method for transaction services patterns in a netcentric environment |
US6636242B2 (en) * | 1999-08-31 | 2003-10-21 | Accenture Llp | View configurer in a presentation services patterns environment |
US6389467B1 (en) * | 2000-01-24 | 2002-05-14 | Friskit, Inc. | Streaming media search and continuous playback system of media resources located by multiple network addresses |
US7899915B2 (en) * | 2002-05-10 | 2011-03-01 | Richard Reisman | Method and apparatus for browsing using multiple coordinated device sets |
US20150135214A1 (en) * | 2002-05-10 | 2015-05-14 | Convergent Media Solutions Llc | Method and apparatus for browsing using alternative linkbases |
US20150135206A1 (en) * | 2002-05-10 | 2015-05-14 | Convergent Media Solutions Llc | Method and apparatus for browsing using alternative linkbases |
US9357025B2 (en) * | 2007-10-24 | 2016-05-31 | Social Communications Company | Virtual area based telephony communications |
KR101560183B1 (ko) * | 2008-04-17 | 2015-10-15 | 삼성전자주식회사 | 사용자 인터페이스를 제공/수신하는 방법 및 장치 |
WO2013000125A1 (en) * | 2011-06-28 | 2013-01-03 | Nokia Corporation | Method and apparatus for live video sharing with multimodal modes |
US10096033B2 (en) * | 2011-09-15 | 2018-10-09 | Stephan HEATH | System and method for providing educational related social/geo/promo link promotional data sets for end user display of interactive ad links, promotions and sale of products, goods, and/or services integrated with 3D spatial geomapping, company and local information for selected worldwide locations and social networking |
US10127563B2 (en) * | 2011-09-15 | 2018-11-13 | Stephan HEATH | System and method for providing sports and sporting events related social/geo/promo link promotional data sets for end user display of interactive ad links, promotions and sale of products, goods, gambling and/or services integrated with 3D spatial geomapping, company and local information for selected worldwide locations and social networking |
US9436650B2 (en) * | 2011-11-25 | 2016-09-06 | Lg Electronics Inc. | Mobile device, display device and method for controlling the same |
CN103839549A (zh) * | 2012-11-22 | 2014-06-04 | 腾讯科技(深圳)有限公司 | 一种语音指令控制方法及系统 |
US9569467B1 (en) * | 2012-12-05 | 2017-02-14 | Level 2 News Innovation LLC | Intelligent news management platform and social network |
CN104104703B (zh) * | 2013-04-09 | 2018-02-13 | 广州华多网络科技有限公司 | 多人音视频互动方法、客户端、服务器及系统 |
CN103546762A (zh) * | 2013-10-30 | 2014-01-29 | 乐视致新电子科技(天津)有限公司 | 一种搜索智能电视资源的方法和装置 |
CN106164817A (zh) * | 2014-02-28 | 2016-11-23 | 罗素商标有限责任公司 | 体育设备与可穿戴的计算机的交互 |
US9727661B2 (en) * | 2014-06-20 | 2017-08-08 | Lg Electronics Inc. | Display device accessing broadcast receiver via web browser and method of controlling therefor |
US9691070B2 (en) * | 2015-09-01 | 2017-06-27 | Echostar Technologies L.L.C. | Automated voice-based customer service |
CN109474843B (zh) * | 2017-09-08 | 2021-09-03 | 腾讯科技(深圳)有限公司 | 语音操控终端的方法、客户端、服务器 |
-
2017
- 2017-09-08 CN CN201710804781.3A patent/CN109474843B/zh active Active
-
2018
- 2018-09-06 EP EP18853000.0A patent/EP3680896B1/en active Active
- 2018-09-06 WO PCT/CN2018/104264 patent/WO2019047878A1/zh unknown
-
2020
- 2020-03-05 US US16/809,746 patent/US11227598B2/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP2986014A1 (en) * | 2011-08-05 | 2016-02-17 | Samsung Electronics Co., Ltd. | Method for controlling electronic apparatus based on voice recognition and motion recognition, and electronic apparatus applying the same |
CN103188538A (zh) * | 2012-12-28 | 2013-07-03 | 吴玉胜 | 基于智能电视设备和互联网的家电控制方法及系统 |
CN104717536A (zh) * | 2013-12-11 | 2015-06-17 | 中国电信股份有限公司 | 一种语音控制的方法和系统 |
CN104599669A (zh) * | 2014-12-31 | 2015-05-06 | 乐视致新电子科技(天津)有限公司 | 一种语音控制方法和装置 |
CN105161106A (zh) * | 2015-08-20 | 2015-12-16 | 深圳Tcl数字技术有限公司 | 智能终端的语音控制方法、装置及电视机系统 |
CN105957530A (zh) * | 2016-04-28 | 2016-09-21 | 海信集团有限公司 | 一种语音控制方法、装置和终端设备 |
Non-Patent Citations (1)
Title |
---|
See also references of EP3680896A4 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111464595A (zh) * | 2020-03-17 | 2020-07-28 | 云知声智能科技股份有限公司 | 一种云端配置个性化场景的方法及装置 |
CN111464595B (zh) * | 2020-03-17 | 2022-10-18 | 云知声智能科技股份有限公司 | 一种云端配置个性化场景的方法及装置 |
Also Published As
Publication number | Publication date |
---|---|
EP3680896A4 (en) | 2021-01-06 |
US20200202860A1 (en) | 2020-06-25 |
CN109474843A (zh) | 2019-03-15 |
US11227598B2 (en) | 2022-01-18 |
EP3680896B1 (en) | 2024-04-10 |
CN109474843B (zh) | 2021-09-03 |
EP3680896A1 (en) | 2020-07-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11227598B2 (en) | Method for controlling terminal by voice, terminal, server and storage medium | |
US12010373B2 (en) | Display apparatus, server apparatus, display system including them, and method for providing content thereof | |
WO2022121558A1 (zh) | 一种直播演唱方法、装置、设备和介质 | |
RU2614137C2 (ru) | Способ и устройство для получения информации | |
CN107396177B (zh) | 视频播放方法、装置及存储介质 | |
WO2020010814A1 (zh) | 选择背景音乐拍摄视频的方法、装置、终端设备及介质 | |
US10545954B2 (en) | Determining search queries for obtaining information during a user experience of an event | |
WO2020010818A1 (zh) | 视频拍摄方法、装置、终端、服务器和存储介质 | |
JP2019525272A (ja) | 自然言語クエリのための近似的テンプレート照合 | |
WO2017080200A1 (zh) | 一种自定义菜单的实现方法、装置、客户端及服务器 | |
WO2021103398A1 (zh) | 一种智能电视以及服务器 | |
CN107515870B (zh) | 一种搜索方法和装置、一种用于搜索的装置 | |
CN112163086A (zh) | 多意图的识别方法、显示设备 | |
EP3438852B1 (en) | Electronic device and control method thereof | |
CN108810580B (zh) | 媒体内容推送方法及装置 | |
JPWO2018056105A1 (ja) | 情報処理装置、情報処理方法、プログラム、および情報処理システム | |
WO2022012299A1 (zh) | 显示设备及人物识别展示的方法 | |
WO2020010817A1 (zh) | 视频处理方法、装置、终端和存储介质 | |
CN117082292A (zh) | 视频生成方法、装置、设备、存储介质和程序产品 | |
US9084011B2 (en) | Method for advertising based on audio/video content and method for creating an audio/video playback application | |
CN116800988A (zh) | 视频生成方法、装置、设备、存储介质和程序产品 | |
KR102463066B1 (ko) | 디스플레이 장치, 서버 장치 및 이들을 포함하는 디스플레이 시스템과 그 컨텐츠 제공 방법들 | |
US20240290329A1 (en) | Systems and methods for enhanced contextual responses with a virtual assistant | |
WO2024036979A9 (zh) | 一种多媒体资源播放方法及相关装置 | |
KR102326067B1 (ko) | 디스플레이 장치, 서버 장치 및 이들을 포함하는 디스플레이 시스템과 그 컨텐츠 제공 방법들 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18853000 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 2018853000 Country of ref document: EP Effective date: 20200408 |