WO2015029304A1 - 音声認識方法及び音声認識装置 - Google Patents
音声認識方法及び音声認識装置 Download PDFInfo
- Publication number
- WO2015029304A1 WO2015029304A1 PCT/JP2014/003608 JP2014003608W WO2015029304A1 WO 2015029304 A1 WO2015029304 A1 WO 2015029304A1 JP 2014003608 W JP2014003608 W JP 2014003608W WO 2015029304 A1 WO2015029304 A1 WO 2015029304A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- voice
- operation instruction
- utterance
- speech recognition
- character information
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims description 47
- 238000001514 detection method Methods 0.000 claims description 18
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 claims description 12
- 230000008859 change Effects 0.000 claims description 8
- 238000012545 processing Methods 0.000 abstract description 42
- 238000010586 diagram Methods 0.000 description 29
- 238000004364 calculation method Methods 0.000 description 17
- 230000009471 action Effects 0.000 description 9
- 238000005259 measurement Methods 0.000 description 4
- 238000004891 communication Methods 0.000 description 3
- 238000004590 computer program Methods 0.000 description 3
- 238000000605 extraction Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 230000006872 improvement Effects 0.000 description 2
- 238000004378 air conditioning Methods 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000005406 washing Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/22—Interactive procedures; Man-machine interfaces
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/1807—Speech classification or search using natural language modelling using prosody or stress
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L2015/088—Word spotting
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/226—Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics
- G10L2015/227—Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics of the speaker; Human-factor methodology
Definitions
- the present disclosure relates to a speech recognition method and a speech recognition apparatus that recognizes input speech and controls a device based on a recognition result.
- the speech recognition device when a speaker speaks toward the speech recognition device, it is necessary to give the speech recognition device a trigger for starting speech recognition.
- Examples of the voice recognition trigger in the conventional voice recognition apparatus include pressing a pushbutton switch and detecting a specific keyword registered in advance (see, for example, Patent Document 1 and Patent Document 2).
- a speech recognition method is a speech recognition method in a system for controlling one or a plurality of devices by speech recognition, and obtains speech information representing speech uttered by a user;
- a voice recognition step for recognizing the voice information acquired in the voice information acquisition step as character information, and whether the voice is an utterance to the device based on the character information recognized in the voice recognition step.
- FIG. 1 It is a block diagram which shows the structure of the speech recognition system in Embodiment 1 of this indication. It is a block diagram which shows the structure of the apparatus in Embodiment 1 of this indication. It is a block diagram which shows the structure of the speech recognition apparatus in Embodiment 1 of this indication. 6 is a flowchart for explaining an operation of the speech recognition system according to the first embodiment of the present disclosure.
- (A) is a figure which shows an example of the character information whose sentence type is a plain text
- (B) is a figure which shows an example of the character information whose sentence type is a question sentence
- (C) is a sentence type is a command.
- FIG. It is a block diagram which shows the structure of the speech recognition apparatus in Embodiment 2 of this indication. It is a block diagram which shows the structure of the speech recognition apparatus in Embodiment 3 of this indication. It is a block diagram which shows the structure of the speech recognition apparatus in Embodiment 4 of this indication.
- FIG. 5 It is a block diagram which shows the structure of the speech recognition apparatus in Embodiment 5 of this indication. It is a block diagram which shows the structure of the speech recognition apparatus in Embodiment 6 of this indication. It is a block diagram which shows the structure of the speech recognition apparatus in Embodiment 7 of this indication. It is a block diagram which shows the structure of the conventional speech recognition apparatus described in patent document 1. FIG. It is a block diagram which shows the structure of the conventional speech recognition apparatus described in patent document 2. FIG.
- FIG. 14 is a block diagram showing a configuration of a conventional speech recognition apparatus described in Patent Document 1
- FIG. 15 is a block diagram showing a configuration of a conventional speech recognition apparatus described in Patent Document 2.
- a conventional voice recognition apparatus 201 detects a command from a voice input unit 210 for inputting voice, a voice input by the voice input unit 210, and controls a device based on the detected command.
- Unit 220 and a permission unit 230 that detects a predetermined keyword from the voice input by voice input unit 210 and enables control of the device by control unit 220 only for a predetermined command input period after the keyword is detected. .
- Patent Document 1 since a keyword is input by voice, there is no need to operate a button or the like every time a device is controlled. Therefore, the user can control the device even in a state where the button cannot be pressed.
- a conventional speech recognition apparatus 301 includes a speech / non-speech discrimination unit 305 that discriminates whether a sound input from the speech input unit 303 is speech or non-speech, a keyword dictionary 310, and speech recognition.
- Dictionary 313, a speech recognition unit 308 that performs speech recognition based on the speech recognition dictionary 313, and words that are determined to be speech by the speech / non-speech discrimination unit 305 are words registered in the keyword dictionary 310 in advance.
- Patent Document 2 voice recognition is performed with a specific keyword being uttered after the target command word is uttered by the user.
- the speech recognition apparatus 301 of Patent Document 2 gives a trigger for starting speech recognition without a specific keyword registered in advance before a command word is spoken. be able to.
- the configuration of the conventional speech recognition apparatus of Patent Document 1 has a problem that it is necessary to utter a specific keyword for starting speech recognition before the target command word.
- the configuration of the conventional speech recognition apparatus disclosed in Patent Document 2 has a problem that it is necessary to utter a specific keyword for starting speech recognition after the target command word. That is, neither of the speech recognition apparatuses of Patent Literature 1 and Patent Literature 2 starts speech recognition unless the user speaks a specific keyword.
- FIG. 1 is a block diagram illustrating a configuration of a speech recognition system according to Embodiment 1 of the present disclosure.
- the voice recognition system shown in FIG. 1 includes a device 1 and a server 2.
- the device 1 includes, for example, home appliances disposed in the home.
- the device 1 is communicably connected to the server 2 via the network 3.
- the network 3 is the Internet, for example.
- the device 1 includes a device that can be connected to the network 3 (for example, a smartphone, a personal computer, or a television), or a device that cannot be connected to the network 3 by itself (for example, a lighting device, a washing machine, or a refrigerator). Etc.). Even if the device itself cannot be connected to the network 3, there may be a device that can be connected to the network 3 via the home gateway. A device that can be connected to the network 3 may be directly connected to the server 2 without going through the home gateway.
- a device that can be connected to the network 3 for example, a smartphone, a personal computer, or a television
- a device that cannot be connected to the network 3 by itself for example, a lighting device, a washing machine, or a refrigerator.
- a device that can be connected to the network 3 may be directly connected to the server 2 without going through the home gateway.
- the server 2 is composed of a known server computer or the like, and is communicably connected to the device 1 via the network 3.
- FIG. 2 is a block diagram illustrating a configuration of the device 1 according to the first embodiment of the present disclosure.
- the device 1 in the first embodiment includes a communication unit 11, a control unit 12, a memory 13, a microphone 14, a speaker 15, a display unit 16, and a voice recognition device 100.
- the apparatus 1 may not be provided with some structures among these structures, and may be provided with another structure.
- the communication unit 11 transmits information to the server 2 via the network 3 and receives information from the server 2 via the network 3.
- the control unit 12 is configured by a CPU (Central Processing Unit), for example, and controls the entire device 1.
- the memory 13 is composed of, for example, a ROM (Read Only Memory) or a RAM (Random Access Memory) and stores information.
- the microphone 14 converts sound into an electric signal and outputs it as sound information.
- the microphone 14 is composed of a microphone array including at least three microphones, and collects sound in a space where the device 1 is arranged.
- the speaker 15 outputs sound.
- the display part 16 is comprised, for example with a liquid crystal display device, and displays various information.
- the voice recognition device 100 recognizes the user's voice and generates an operation instruction for operating the device 1.
- the control unit 12 operates the device 1 based on an operation instruction corresponding to the voice recognized by the voice recognition device 100.
- FIG. 3 is a block diagram illustrating a configuration of the speech recognition apparatus according to the first embodiment of the present disclosure.
- the speech recognition apparatus 100 includes a speech acquisition unit 101, a speech recognition processing unit 102, a recognition result determination unit 103, and an operation instruction generation unit 104.
- the voice acquisition unit 101 acquires voice information representing the voice uttered by the user.
- the voice acquisition unit 101 acquires voice information from the microphone 14. Specifically, the microphone 14 converts voice that is an analog signal into voice information that is a digital signal, and the voice acquisition unit 101 acquires the voice information converted into a digital signal from the microphone 14.
- the voice acquisition unit 101 outputs the acquired voice information to the voice recognition processing unit 102.
- the voice recognition processing unit 102 recognizes the voice information acquired by the voice acquisition unit 101 as character information.
- the speech recognition processing unit 102 receives speech information from the speech acquisition unit 101, performs speech recognition using a speech recognition dictionary, and outputs character information as a speech recognition result.
- the recognition result determination unit 103 determines whether the voice is an utterance to the device 1 based on the character information recognized by the voice recognition processing unit 102.
- the recognition result determination unit 103 analyzes the speech recognition result received from the speech recognition processing unit 102 and generates a recognition result analysis tree corresponding to the speech recognition result.
- the recognition result determination unit 103 analyzes the generated recognition result analysis tree and estimates the sentence pattern of the character information recognized by the speech recognition processing unit 102.
- the recognition result determination unit 103 analyzes the sentence pattern of the character information and determines whether the sentence pattern is a question sentence or a command sentence. If the sentence pattern is a question sentence or a command sentence, the voice is an utterance to the device 1. Judge. On the other hand, the recognition result determination unit 103 determines that the voice is not an utterance to the device 1 when the sentence pattern is not a question sentence or a command sentence, that is, when the sentence pattern is a plain sentence or an exclamation sentence.
- the operation instruction generation unit 104 generates an operation instruction for the device 1 when the recognition result determination unit 103 determines that the speech is for the device 1. Based on the determination result received from the recognition result determination unit 103, the operation instruction generation unit 104 determines a device to which the operation instruction is to be sent and the operation content, and generates an operation instruction including the determined operation content for the determined device. To do.
- FIG. 4 is a flowchart for explaining the operation of the speech recognition system according to the first embodiment of the present disclosure.
- step S ⁇ b> 1 the voice acquisition unit 101 acquires voice information from the microphone 14 provided in the device 1.
- step S2 the voice recognition processing unit 102 recognizes the voice information acquired by the voice acquisition unit 101 as character information.
- step S ⁇ b> 3 the recognition result determination unit 103 determines whether the voice is an utterance to the device 1 based on the character information recognized by the voice recognition processing unit 102.
- the recognition result determination unit 103 analyzes the syntax of the character information recognized by the speech recognition processing unit 102 using a known syntax analysis technique.
- a syntax analysis technique for example, URL: http: // nlp. ist. i. kyoto-u. ac. jp / index. php? It is possible to use the analysis system shown in KNP.
- the recognition result determination unit 103 divides the text of the character information into a plurality of clauses, analyzes the part of speech of each clause, and analyzes the utilization form of each part of speech (prescription).
- the recognition result determination unit 103 analyzes the sentence pattern of the character information and determines whether the sentence pattern is a plain text, a question sentence, an exclamation sentence, or a command sentence. Then, the recognition result determination unit 103 determines that the voice is an utterance to the device 1 when the sentence pattern is either a question sentence or a command sentence. For example, the recognition result determination unit 103 can determine that the sentence type of the character information is a question sentence when the sentence includes a question word. The recognition result determination unit 103 can determine that the sentence type of the character information is an imperative sentence, for example, when the utilization form of the word at the end of the sentence is an imperative form.
- step S3 If it is determined in step S3 that the voice is not an utterance to the device 1 (NO in step S3), the process returns to step S1.
- step S3 when it is determined in step S3 that the voice is an utterance to the device 1 (YES in step S3), the operation instruction generation unit 104 generates an operation instruction for the device 1 in step S4.
- the operation instruction generation unit 104 stores in advance an operation table in which combinations of a plurality of words are associated with device operations. The operation table will be described later.
- the operation instruction generation unit 104 refers to the operation table and corresponds to a combination of words included in the character information analyzed by the recognition result determination unit 103.
- the operation of the device is specified, and an operation instruction for operating the device with the specified operation is generated.
- FIG. 5A is a diagram illustrating an example of character information whose sentence type is a plain text
- FIG. 5B is a diagram illustrating an example of character information whose sentence type is a question sentence.
- FIG. 5D is a diagram showing an example of character information whose sentence type is an imperative sentence
- FIG. 5E is a diagram showing an example of character information whose sentence pattern is body text.
- the voice recognition processing unit 102 when the voice recognition processing unit 102 acquires voice information “Tomorrow's weather is sunny” by the voice acquisition unit 101, the voice recognition processing unit 102 converts the voice information to “Tomorrow's weather is sunny”. Convert to information.
- the voice recognition processing unit 102 outputs the recognized character information to the recognition result determination unit 103 as a voice recognition result.
- the recognition result determination unit 103 divides the character information recognized by the speech recognition processing unit 102 into phrases “tomorrow”, “weather is”, and “sunny”, and whether each clause is a statement or a pretext. If it is a predicate, the part of speech of the word included in the clause is analyzed. In FIG. 5A, the phrase at the end of the sentence is a determination word, so the recognition result determination unit 103 determines that the sentence type of the character information is a plain text. When the recognition result determination unit 103 determines that the sentence pattern is a plain text, the recognition result determination unit 103 determines that the voice is not an utterance to the device 1.
- the voice recognition processing unit 102 converts the voice information to the text “How is the weather tomorrow?” Convert to information.
- the voice recognition processing unit 102 outputs the recognized character information to the recognition result determination unit 103 as a voice recognition result.
- the recognition result determination unit 103 divides the character information recognized by the voice recognition processing unit 102 into phrases “tomorrow”, “weather” and “how”, and determines whether each clause is a body word or a pretext. If it is a predicate, the part of speech of the word included in the clause is analyzed. In FIG. 5B, the interrogation word is included in the sentence at the end of the sentence, so that the recognition result determination unit 103 determines that the sentence type of the character information is a question sentence. When the recognition result determination unit 103 determines that the sentence pattern is a question sentence, the recognition result determination unit 103 determines that the voice is an utterance to the device 1.
- FIG. 6 is a diagram illustrating an example of an operation table according to the first embodiment.
- the operation table 1401 includes a word 1 that is a word string for determining the date and time, a word 2 that is a word string for determining the purpose of the operation or search target, and the system.
- the word 3 which is a word string for determining whether or not it is an utterance is linked.
- the action instruction generation unit 104 uses the action table 1401 to make the word 1 “Tomorrow” representing the date and time, the word 2 “weather” representing the search target, and the utterance to the system.
- the action “output: weather [one day later]” is determined from the word 3 “how” representing that.
- the operation instruction generation unit 104 outputs an operation instruction to acquire the weather forecast for the next day from the server that provides the weather forecast to the control unit 12 of the device 1.
- the control unit 12 accesses a server that provides a weather forecast, acquires the next day's weather forecast from a database related to the weather of the server, and displays the acquired weather forecast as a display unit 16 or the speaker 15.
- FIG. 7 is a diagram showing an example of a weather-related database in the first embodiment. In the database 1402 regarding the weather, for example, as shown in FIG. 7, the date and the weather are associated with each other.
- the control unit 12 can acquire the weather forecast at the current position by transmitting the position information specifying the current position of the device 1 to the server that provides the weather forecast.
- the voice recognition processing unit 102 reads “tell about tomorrow's weather”. To character information.
- the voice recognition processing unit 102 outputs the recognized character information to the recognition result determination unit 103 as a voice recognition result.
- the recognition result determination unit 103 divides the character information recognized by the speech recognition processing unit 102 into phrases “tomorrow”, “weather”, and “tell me”, and whether each clause is a statement or a pretext. If it is a predicate, the part of speech of the word included in the clause is analyzed. In FIG. 5C, since the utilization form of the word at the end of the sentence is an instruction form, the recognition result determination unit 103 determines that the sentence type of the character information is an instruction sentence. When the recognition result determination unit 103 determines that the sentence pattern is a command sentence, the recognition result determination unit 103 determines that the voice is an utterance to the device 1.
- the operation instruction generation unit 104 When it is determined that the voice is an utterance to the device 1, the operation instruction generation unit 104 generates an operation instruction for the device 1.
- the action instruction generation unit 104 uses the action table 1401 to make the word 1 “Tomorrow” representing the date and time, the word 2 “weather” representing the search target, and the utterance to the system.
- the action “output: weather [one day later]” is determined from the word 3 “teach” representing that.
- the operation instruction generation unit 104 outputs an operation instruction to acquire the weather forecast for the next day from the database 1402 regarding the weather of the server that provides the weather forecast to the control unit 12 of the device 1.
- the subsequent operation of the control unit 12 is the same as described above.
- the voice recognition processing unit 102 reads the voice information “Check the weather of the day after tomorrow”. To character information.
- the voice recognition processing unit 102 outputs the recognized character information to the recognition result determination unit 103 as a voice recognition result.
- the recognition result determination unit 103 divides the character information recognized by the speech recognition processing unit 102 into phrases “the day after tomorrow”, “weather”, and “examine”, and whether each phrase is a body word or a pretext. If it is a predicate, the part of speech of the word included in the clause is analyzed. In FIG. 5D, since the utilization form of the word at the end of the sentence is an instruction form, the recognition result determination unit 103 determines that the sentence type of the character information is an instruction sentence. When the recognition result determination unit 103 determines that the sentence pattern is a command sentence, the recognition result determination unit 103 determines that the voice is an utterance to the device 1.
- the operation instruction generation unit 104 When it is determined that the voice is an utterance to the device 1, the operation instruction generation unit 104 generates an operation instruction for the device 1.
- the action instruction generation unit 104 uses the action table 1401 to make a word 1 “Tomorrow” representing the date and time, a word 2 “Weather” representing the search target, and an utterance to the system.
- the action “output: weather [2 days later]” is determined from the word 3 “examine” indicating that the above is true.
- the operation instruction generation unit 104 outputs an operation instruction to acquire the weather forecast for the next day from the database 1402 regarding the weather of the server that provides the weather forecast to the control unit 12 of the device 1.
- the subsequent operation of the control unit 12 is the same as described above.
- the voice recognition processing unit 102 converts the voice information into character information “Tomorrow's weather”. To do.
- the voice recognition processing unit 102 outputs the recognized character information to the recognition result determination unit 103 as a voice recognition result.
- the recognition result determination unit 103 divides the character information recognized by the speech recognition processing unit 102 into phrases “tomorrow” and “weather”, analyzes whether each clause is a body phrase or a predicate, If so, the part of speech of the word included in the clause is analyzed. In FIG. 5E, since the word at the end of the sentence is a body word, the recognition result determination unit 103 determines that the sentence type of the character information is a body sentence stopping sentence. The recognition result determination unit 103 determines that the voice is an utterance to the device 1 when determining that the sentence pattern is a sentence-stopping sentence.
- the operation instruction generation unit 104 When it is determined that the voice is an utterance to the device 1, the operation instruction generation unit 104 generates an operation instruction for the device 1. For example, in the operation table, a combination of the words “tomorrow” and “weather” is associated with an operation of acquiring a weather forecast. Therefore, the operation instruction generation unit 104 refers to the operation table, and acquires the weather forecast of the device corresponding to the combination of the words “tomorrow” and “weather” included in the character information analyzed by the recognition result determination unit 103. And an operation instruction for operating the device with the specified operation is generated.
- the operation instruction generation unit 104 outputs an operation instruction for acquiring the next day's weather forecast from the server that provides the weather forecast to the control unit 12 of the device 1.
- the subsequent operation of the control unit 12 is the same as described above.
- the recognition result determination unit 103 determines that the voice is an utterance to the device 1
- the operation instruction generation unit 104 generates an operation instruction for the device 1
- the recognition result determination unit 103 transmits the sound to the device 1.
- the operation instruction for the device 1 is not generated, so that the utterance of a specific keyword that is a trigger for starting speech recognition can be made unnecessary. Therefore, the user can speak without being aware of a specific keyword that is a trigger for starting voice recognition, and can operate the device from daily conversation.
- the device 1 includes the voice recognition device 100, but the present disclosure is not particularly limited thereto, and the server 2 may include the voice recognition device 100.
- the voice information acquired by the microphone 14 of the device 1 is transmitted to the server 2 via the network 3, and the voice recognition device 100 of the server 2 executes the processes of steps S1 to S4 in FIG.
- the server 2 determines that the voice is an utterance to the device, the server 2 transmits an operation instruction of the device 1 based on the voice recognition processing result to the device 1, and the device 1 responds to the operation instruction from the server 2. Operate. This can be applied to other embodiments.
- each functional block of the speech recognition apparatus 100 in the present embodiment may be realized by a microprocessor operating according to a computer program.
- Each functional block of the speech recognition apparatus 100 may be typically realized as an LSI (Large Scale Integration) that is an integrated circuit.
- LSI Large Scale Integration
- Each functional block of the speech recognition apparatus 100 may be individually made into one chip, or may be made into one chip so as to include one or more functional blocks or some functional blocks.
- each functional block of the speech recognition apparatus 100 may be realized by software, or may be realized by a combination of LSI and software.
- the speech recognition apparatus measures the time during which silence has occurred, and determines whether or not the speech is an utterance to the device 1 in accordance with the length of time during which the silence has been measured.
- the speech recognition apparatus measures the silence time from the end of the acquisition of the speech information until the start of the acquisition of the speech information, and the measured silence time is a predetermined time or more. If there is, it is determined that the voice is an utterance to the device 1.
- FIG. 8 is a block diagram illustrating a configuration of the speech recognition apparatus according to the second embodiment of the present disclosure.
- the configuration of the voice recognition system in the second embodiment is the same as the configuration of the voice recognition system in the first embodiment, and a description thereof will be omitted.
- the configuration of the device in the second embodiment is the same as the configuration of the device in the first embodiment, and a description thereof will be omitted.
- the speech recognition apparatus 100 includes a speech acquisition unit 101, a speech recognition processing unit 102, an operation instruction generation unit 104, a silence time measurement unit 105, a silence time determination unit 106, and a recognition result determination unit 107.
- the same components as those in the first embodiment are denoted by the same reference numerals, and description thereof is omitted.
- the silent time measuring unit 105 measures the time after the acquisition of the voice information by the voice acquisition unit 101 is completed as the silent time.
- the silence time determination unit 106 determines whether the silence time measured by the silence time measurement unit 105 is equal to or longer than a predetermined time when the sound information is acquired by the sound acquisition unit 101.
- the recognition result determination unit 107 determines that the voice is an utterance to the device 1 when it is determined that the measured silent time is equal to or longer than the predetermined time.
- the silence time determination unit 106 determines whether the silence time measured by the silence time measurement unit 105 is equal to or longer than the time during which the user was speaking. To do.
- the recognition result determination unit 107 may determine that the sound is an utterance to the device 1 when it is determined that the measured silent time is equal to or longer than a predetermined time.
- the predetermined time determined in advance is, for example, 30 seconds, and is a time during which it can be determined that the user is not talking to another person.
- the time after the acquisition of the voice information is measured as the silent time, and when the voice information is acquired next, it is determined that the measured silent time is equal to or longer than the predetermined time.
- the user since it is determined that the voice is an utterance to the device, the user can speak without being aware of a specific keyword that is a trigger for starting voice recognition, and the device can be operated from daily conversation. Can do.
- the recognition result determination unit 107 acquires the voice information after the silent time has continued for a predetermined time or more, and after the acquisition of the voice information has been completed, You may judge that it is an utterance.
- the speech recognition apparatus determines whether or not a predetermined keyword related to the operation of the device 1 is included in the character information. If the predetermined keyword is included in the character information, the speech is transmitted to the device 1. It is determined that the utterance is
- FIG. 9 is a block diagram illustrating a configuration of the speech recognition apparatus according to the third embodiment of the present disclosure. Note that the configuration of the speech recognition system in the third embodiment is the same as the configuration of the speech recognition system in the first embodiment, and a description thereof will be omitted. In addition, since the configuration of the device in the third embodiment is the same as the configuration of the device in the first embodiment, description thereof is omitted.
- the voice recognition apparatus 100 includes a voice acquisition unit 101, a voice recognition processing unit 102, an operation instruction generation unit 104, a keyword storage unit 108, and a recognition result determination unit 109.
- the same components as those in the first embodiment are denoted by the same reference numerals, and description thereof is omitted.
- the keyword storage unit 108 stores a predetermined keyword related to the operation of the device in advance.
- the recognition result determination unit 109 determines whether or not a keyword stored in advance is included in the character information. If the keyword is included in the character information, the recognition result determination unit 109 determines that the voice is an utterance to the device 1.
- the keyword storage unit 108 stores keywords “channel” and “change” in advance.
- the recognition result determination unit 109 refers to the keyword storage unit 108 and determines that the voice is an utterance to the device 1 when the words included in the character information include the keywords “channel” and “change”. .
- the operation instruction generation unit 104 refers to the operation table, and changes the TV channel corresponding to the combination of the words “channel” and “change” included in the character information analyzed by the recognition result determination unit 103. And an operation instruction for operating the device with the specified operation is generated.
- a predetermined keyword related to the operation of the device is included in the character information. If the predetermined keyword is included in the character information, it is determined that the voice is an utterance to the device 1. Therefore, the user can speak without being aware of a specific keyword that is a trigger for starting voice recognition, and can operate the device from daily conversation.
- the speech recognition apparatus determines whether or not a personal name stored in advance is included in the character information. If the personal name is included in the character information, the speech is not an utterance to the device 1. Judge.
- the voice recognition apparatus stores a person name such as a family name in advance, and when the person name stored in advance is included in the character information, the voice is not uttered to the device 1. Judge that there is no.
- FIG. 10 is a block diagram illustrating a configuration of the speech recognition apparatus according to the fourth embodiment of the present disclosure.
- the configuration of the voice recognition system in the fourth embodiment is the same as the configuration of the voice recognition system in the first embodiment, and a description thereof will be omitted.
- the configuration of the device in the fourth embodiment is the same as the configuration of the device in the first embodiment, and a description thereof will be omitted.
- the speech recognition apparatus 100 includes a speech acquisition unit 101, a speech recognition processing unit 102, an operation instruction generation unit 104, a person name storage unit 110, and a recognition result determination unit 111. Note that in the speech recognition device of the fourth embodiment, the same components as those of the first embodiment are denoted by the same reference numerals and description thereof is omitted.
- the personal name storage unit 110 stores personal names in advance.
- the personal name storage unit 110 stores in advance the name of the family living in the house where the device 1 is installed or the name of the family of the user who owns the device 1.
- the personal name storage unit 110 may store names of family members such as a father, a mother, and an older brother in advance.
- the personal name is input by the user using an input receiving unit (not shown) included in the device 1 and stored in the personal name storage unit 110.
- the recognition result determination unit 111 determines whether or not a character name or name stored in advance in the name storage unit 110 is included in the character information. It is determined that the utterance is not for 1. When the family name is stored in the server 2 as user information, the recognition result determination unit 111 may determine using the user information stored in the server 2.
- the recognition result determination unit 111 analyzes the sentence type of character information, determines whether the sentence type is a question sentence or a command sentence, and the sentence type is a question sentence or a command sentence. In this case, it is determined that the voice is an utterance to the device 1. At this time, even if the recognition result determination unit 111 determines that the sentence pattern is a question sentence or a command sentence, the character information includes a personal name or a name stored in advance in the personal name storage unit 110 Determines that the voice is not an utterance to the device 1.
- the recognition result determination unit 111 when the sentence pattern is a question sentence or a command sentence and the personal name or the name stored in advance in the personal name storage unit 110 is not included in the character information, the voice is an utterance to the device 1. Judge that there is.
- the user can speak without being aware of a specific keyword that is a trigger for starting voice recognition, and can operate the device from daily conversation.
- the voice recognition device detects a person in the space where the device 1 is arranged, and when a plurality of people are detected, determines that the voice is not an utterance to the device 1 and is a single person. Is detected, it is determined that the voice is an utterance to the device 1.
- the voice recognition apparatus determines that the voice is not an utterance to the device 1 when a plurality of persons are detected in the space where the device 1 is arranged.
- the voice recognition device determines that the voice is an utterance to the device 1 when a single person is detected in the space where the device 1 is arranged.
- FIG. 11 is a block diagram illustrating a configuration of the speech recognition apparatus according to the fifth embodiment of the present disclosure.
- the configuration of the speech recognition system in the fifth embodiment is the same as the configuration of the speech recognition system in the first embodiment, and a description thereof will be omitted.
- the configuration of the device in the fifth embodiment is the same as the configuration of the device in the first embodiment, and a description thereof will be omitted.
- the voice recognition apparatus 100 includes a voice acquisition unit 101, a voice recognition processing unit 102, an operation instruction generation unit 104, a person detection unit 112, and a recognition result determination unit 113. Note that in the speech recognition device according to the fifth embodiment, the same components as those in the first embodiment are denoted by the same reference numerals, and description thereof is omitted.
- the person detection unit 112 detects a person in the space where the device is arranged.
- the person detection unit 112 may detect a person by analyzing an image acquired from a camera included in the device 1.
- the person detection unit 112 may detect a person using a human sensor or a heat sensor.
- the recognition result determination unit 113 determines that the sound is not an utterance to the device 1 when a plurality of persons are detected by the person detection unit 112, and the sound is transmitted to the device when a person is detected by the person detection unit 112. It is determined that the utterance is for 1.
- the recognition result determination unit 113 analyzes the sentence type of the character information, determines whether the sentence type is a question sentence or a command sentence, and the sentence type is a question sentence or a command sentence. In this case, it is determined that the voice is an utterance to the device 1. At this time, even if the recognition result determination unit 113 determines that the sentence type is a question sentence or a command sentence, if a plurality of persons are detected by the person detection unit 112, the voice is not uttered to the device 1. Judge that there is no.
- the recognition result determination unit 111 when the sentence pattern is a question sentence or a command sentence and a plurality of persons are not detected by the person detection unit 112 (when one person is detected), the voice is an utterance to the device 1. Judge that there is.
- the speech recognition apparatus determines whether or not the usage form of the prescription included in the character information is a command form.
- the speech is an utterance to the device 1.
- the speech recognition apparatus determines the usage form of the prescription included in the character information, and determines that the speech is an utterance to the device 1 when the usage form is a command form.
- FIG. 12 is a block diagram illustrating a configuration of the speech recognition apparatus according to the sixth embodiment of the present disclosure.
- the configuration of the voice recognition system in the sixth embodiment is the same as the configuration of the voice recognition system in the first embodiment, and thus the description thereof is omitted.
- the configuration of the device in the sixth embodiment is the same as the configuration of the device in the first embodiment, and a description thereof will be omitted.
- the voice recognition apparatus 100 includes a voice acquisition unit 101, a voice recognition processing unit 102, an operation instruction generation unit 104, and a recognition result determination unit 114. Note that in the speech recognition device of the sixth embodiment, the same components as those of the first embodiment are denoted by the same reference numerals, and description thereof is omitted.
- the recognition result determination unit 114 analyzes whether the usage form of the prescription included in the character information is a blank form, a continuous form, an end form, a combined form, an assumed form, or a command form. The recognition result determination unit 114 determines whether or not the usage form of the prescription included in the character information is a command form. If the usage form is a command form, the recognition result determination unit 114 determines that the voice is an utterance to the device 1. The recognition result determination unit 114 divides the text of the character information into a plurality of clauses, analyzes the part of speech of each clause, and analyzes the utilization form of each part of speech (prescription). Then, the recognition result determination unit 114 determines that the voice is an utterance to the device 1 when the character information includes a phrase whose usage form is the command form.
- the recognition result determination unit 114 determines that the voice is not an utterance to the device 1 when the utilization form is not a command form, that is, when the utilization form is a blank form, a continuous form, a termination form, a combined form, or a hypothetical form To do.
- the usage form of the prescription included in the character information is a command form.
- the usage form is a command form
- the voice is an utterance to the device 1.
- the user can speak without being aware of a specific keyword that is a trigger for starting voice recognition, and can operate the device from daily conversation.
- the recognition result determination unit 114 determines whether or not the usage form of the prescription included in the character information is a termination form or a command form. If the utilization form is a termination form or a command form, the speech is transmitted to the device 1. It may be determined that this is an utterance.
- the speech recognition apparatus may be a combination of a plurality of speech recognition apparatuses in the first to sixth embodiments.
- the speech recognition apparatus totals the weight values given according to the predetermined determination result for the character information, determines whether the total weight value is equal to or greater than the predetermined value, and totals them. If the weight value is greater than or equal to a predetermined value, it is determined that the voice is an utterance to the device 1.
- FIG. 13 is a block diagram illustrating a configuration of the speech recognition apparatus according to the seventh embodiment of the present disclosure.
- the configuration of the speech recognition system in the seventh embodiment is the same as the configuration of the speech recognition system in the first embodiment, and thus the description thereof is omitted.
- the configuration of the device in the seventh embodiment is the same as the configuration of the device in the first embodiment, and a description thereof will be omitted.
- the voice recognition apparatus 100 includes a voice acquisition unit 101, a voice recognition processing unit 102, an operation instruction generation unit 104, a silence time measurement unit 105, a silence time determination unit 106, a keyword storage unit 108, and a person name storage unit 110.
- the same components as those in the first to sixth embodiments are denoted by the same reference numerals, and description thereof is omitted.
- the weight value table storage unit 115 stores a weight value table in which a predetermined determination result for character information is associated with a weight value.
- the determination result that the sentence type of the character information is a question sentence or a command sentence is associated with the first weight value.
- the weight value table associates the determination result that the silence time from the end of the acquisition of the sound information to the start of the acquisition of the sound information is a predetermined time or more and the second weight value. Yes.
- the weight value table associates the determination result that the keyword stored in advance is included in the character information with the third weight value.
- the weight value table associates the determination result that the personal information or name stored in advance is included in the character information with the fourth weight value.
- the weight value table associates the determination result that the plurality of persons are detected with the fifth weight value.
- the weight value table associates the determination result that a single person is detected with the sixth weight value.
- the weight value table associates the determination result that the usage form of the prescription included in the character information is the command form with the seventh weight value. Further, the weight value table associates the pitch frequency of the audio information with the eighth weight value.
- the weight value calculation unit 116 sums up the weight values given according to the predetermined determination result for the character information.
- the weight value calculation unit 116 determines the weight value assigned according to whether the sentence type of the character information is a question sentence or a command sentence, and until the next voice information is acquired after the acquisition of the voice information is completed. Is given according to whether the character information includes a weight value given according to whether or not the silent time is equal to or longer than a predetermined time and a predetermined keyword relating to the operation of the device stored in advance.
- the weight value given according to whether or not the personal name stored in advance is included in the character information, and whether or not a plurality of persons are detected in the space where the device is arranged
- the weight value given in accordance with whether or not the usage form of the prescription included in the character information is the command form, and whether the pitch frequency of the voice information is greater than or equal to a predetermined threshold value
- the weight values assigned accordingly are summed.
- the weight value calculation unit 116 analyzes the sentence type of the character information recognized by the speech recognition processing unit 102, determines whether the sentence type is a question sentence or a command sentence, and the sentence type is a question sentence or a command sentence The corresponding first weight value is read from the weight value table storage unit 115.
- the weight value calculation unit 116 has the silent time from the end of the acquisition of the audio information by the audio acquisition unit 101 by the silence time determination unit 106 to be equal to or longer than the predetermined time. If it is determined, the corresponding second weight value is read from the weight value table storage unit 115.
- the weight value calculation unit 116 determines whether or not the keyword stored in advance in the keyword storage unit 108 is included in the character information recognized by the voice recognition processing unit 102, and the keyword is included in the character information. If so, the corresponding third weight value is read from the weight value table storage unit 115.
- the weight value calculation unit 116 determines whether or not a person name or a name stored in advance in the person name storage unit 110 is included in the character information recognized by the voice recognition processing unit 102, and the person name or the name is determined. If it is included in the character information, the corresponding fourth weight value is read from the weight value table storage unit 115.
- the weight value calculation unit 116 reads the corresponding fifth weight value from the weight value table storage unit 115.
- the weight value calculation unit 116 reads the corresponding sixth weight value from the weight value table storage unit 115.
- the weight value calculation unit 116 determines whether or not the usage form of the prescription included in the character information recognized by the speech recognition processing unit 102 is the command form.
- the seventh weight value to be read is read from the weight value table storage unit 115.
- the weight value calculation unit 116 reads the corresponding eighth weight value from the weight value table storage unit 115 when the pitch frequency of the voice information is equal to or greater than a predetermined threshold value. Specifically, for each utterance, the weight value calculation unit 116 combines the pitch frequency extracted by the pitch extraction unit 118 from the input speech information and the speaker information recognized by the speaker recognition unit 120 into one set. Is stored in the pitch storage unit 119. When a new utterance is input, the pitch frequency extracted by the pitch extraction unit 118 from the input voice information and the speaker information recognized by the speaker recognition unit 120 are stored in the pitch storage unit 119 as one set.
- the pitch frequency of the previous utterance of the same speaker is compared with the pitch frequency of the current utterance, and if the pitch frequency of the current utterance is higher than a preset threshold value, the corresponding eighth The weight value is read from the weight value table storage unit 115.
- the eighth weight value may be read out by using a fixed threshold value, not limited to the speaker, without using the speaker recognition by the speaker recognition unit 120.
- the weight value calculation unit 116 sums the read weight values.
- the weight value calculation unit 116 when it is determined that the sentence type of the character information is not a question sentence or a command sentence, is the silent time from the end of the acquisition of the sound information until the start of the acquisition of the sound information. Is determined not to be longer than the predetermined time, if it is determined that the keyword stored in advance is not included in the character information, it is determined that the character information stored in advance is not included in the character information. If it is determined, or if it is determined that the usage form of the prescription included in the character information is not the instruction form, no weight value is added.
- the recognition result determination unit 117 determines whether or not the weight value totaled by the weight value calculation unit 116 is equal to or greater than a predetermined value. If the total weight value is equal to or greater than the predetermined value, the voice is transmitted to the device 1. Judged as utterance.
- first weight value, the second weight value, the third weight value, the sixth weight value, and the seventh weight value are preferably higher than the fourth weight value and the fifth weight value.
- the first weight value, the second weight value, the third weight value, the sixth weight value, and the seventh weight value are, for example, “5”, and the fourth weight value is, for example, “ ⁇ 5”.
- the fifth weight value is, for example, “0”.
- the recognition result determination unit 117 determines that the voice is an utterance to the device 1 when the total weight value is “10” or more, for example.
- the first to seventh weight values are not limited to the above values, and may be other values.
- the predetermined value compared with the total weight value is not limited to the above value, and may be another value.
- the weight value calculation unit 116 does not use all of the first to seventh weight values, but uses a part of the first to seventh weight values to determine whether the recognition result determination unit 117 is an utterance to the device 1 or not. A weight value for judging the above may be calculated.
- a determination result that the sentence type of the character information is not a question sentence or a command sentence may be associated with a predetermined weight value.
- the weight value table may associate a determination result that the silent time from the end of acquisition of audio information to the start of acquisition of audio information is not a predetermined time or more and a predetermined weight value. Good.
- the weight value table may associate a determination result that a keyword stored in advance is not included in the character information with a predetermined weight value.
- the weight value table may associate a determination result that a personal name or a name stored in advance is not included in the character information with a predetermined weight value.
- the weight value table may associate a determination result that the usage form of the prescription included in the character information is not an instruction form with a predetermined weight value.
- the weight value table includes a determination result indicating that the silence time has continued for a predetermined time after the silence information has been acquired for a predetermined time or more and the sound information has been acquired, and the predetermined weight value has been obtained. May be associated.
- the weight values given in accordance with the predetermined determination result with respect to the character information are totaled, and when the total weight value is equal to or greater than the predetermined value, it is determined that the voice is an utterance to the device. Therefore, the user can speak without being aware of a specific keyword that is a trigger for starting voice recognition, and can operate the device from daily conversation.
- the time from the end of the previous utterance to the start of the current utterance is set in advance.
- a condition may be set as to whether or not the time is within a threshold value.
- the device 1 preferably includes an information terminal such as a smartphone, a tablet computer, and a mobile phone.
- the operation instruction includes an operation instruction for acquiring a weather forecast for a day designated by the user and outputting the acquired weather forecast.
- the voice acquisition unit 101 acquires voice information “Tell me about tomorrow's weather”
- the operation instruction generation unit 104 generates an operation instruction for acquiring a weather forecast for the next day.
- the operation instruction generation unit 104 outputs the generated operation instruction to the mobile terminal.
- the device 1 preferably includes a lighting device.
- the operation instruction includes an operation instruction for turning on the lighting device and an operation instruction for turning off the lighting device.
- the voice acquisition unit 101 acquires the voice information “Turn on electricity”
- the operation instruction generation unit 104 generates an operation instruction to turn on the lighting device.
- the operation instruction generation unit 104 outputs the generated operation instruction to the lighting device.
- the device 1 includes a faucet device that automatically draws water from the faucet.
- the operation instruction includes an operation instruction to discharge water from the faucet device and an operation instruction to stop water from the faucet device.
- the voice acquisition unit 101 acquires voice information “400 cc of water”
- the operation instruction generation unit 104 generates an operation instruction of 400 cc of water from the faucet device.
- the operation instruction generation unit 104 outputs the generated operation instruction to the faucet device.
- the device 1 preferably includes a television.
- the operation instruction includes an operation instruction for changing a television channel.
- the operation instruction generation unit 104 generates an operation instruction to change the television channel to 4 channels.
- the operation instruction generation unit 104 outputs the generated operation instruction to the television.
- the device 1 preferably includes an air conditioner.
- the operation instruction includes an operation instruction for starting the operation of the air conditioner, an operation instruction for stopping the operation of the air conditioner, and an operation instruction for changing the set temperature of the air conditioner.
- the voice acquisition unit 101 acquires voice information “increase the temperature of the air conditioner”
- the operation instruction generation unit 104 generates an operation instruction to increase the set temperature of the air conditioner.
- the operation instruction generation unit 104 outputs the generated operation instruction to the air conditioning equipment.
- a speech recognition method includes a speech information acquisition step for acquiring speech information representing speech uttered by a user, and a speech for recognizing the speech information acquired in the speech information acquisition step as character information.
- voice information representing the voice uttered by the user is acquired.
- the acquired voice information is recognized as character information. Based on the recognized character information, it is determined whether or not the voice is an utterance to the device.
- the voice since it is determined whether or not the voice is an utterance to the device based on the recognized character information, the utterance of a specific keyword that is a trigger for starting the voice recognition can be made unnecessary. Therefore, the user can speak without being aware of a specific keyword that is a trigger for starting voice recognition, and can operate the device from daily conversation.
- the method further includes an operation instruction generation step of generating an operation instruction for the device when it is determined that the utterance is for the device in the utterance determination step.
- an operation instruction for the device when it is determined that the speech is for the device, an operation instruction for the device is generated. Therefore, when it is determined that the voice is an utterance to the device, an operation instruction for the device is generated, and when it is determined that the voice is not an utterance to the device, an operation instruction for the device is not generated, and thus voice recognition is started. Therefore, it is possible to eliminate the utterance of a specific keyword that is a trigger for the purpose.
- the utterance determining step analyzes a sentence pattern of the character information, determines whether the sentence pattern is a question sentence or a command sentence, and the sentence pattern is the question sentence or the command sentence. When it is a sentence, it is preferable to determine that the voice is an utterance to the device.
- the sentence pattern of the character information is analyzed, it is determined whether the sentence pattern is a question sentence or a command sentence, and if the sentence pattern is a question sentence or a command sentence, it is determined that the voice is an utterance to the device. Is done.
- the voice is likely to be an utterance to the device. Therefore, by determining whether or not the sentence pattern is a question sentence or a command sentence, it can be easily determined that the voice is an utterance to the device.
- the time after the end of the acquisition of the sound information is measured as the silence time.
- the sound information is acquired, it is determined whether or not the measured silence time is equal to or longer than the predetermined time.
- the measured silence time is equal to or longer than the predetermined time.
- the audio information is acquired after a silent state in which no audio information is acquired for a predetermined time, the audio is likely to be an utterance to the device. Therefore, it can be easily determined that the sound is utterance to the device by determining whether the silent time from the end of the acquisition of the sound information until the next sound information is acquired is a predetermined time or more. can do.
- the speech recognition method further includes a keyword storage step for storing a predetermined keyword related to the operation of the device in advance, and the speech determination step includes whether the keyword stored in advance is included in the character information. It is preferable to determine whether or not the voice is an utterance to the device when the keyword is included in the character information.
- predetermined keywords relating to the operation of the device are stored in advance. It is determined whether or not a keyword stored in advance is included in the character information. If the keyword is included in the character information, it is determined that the voice is an utterance to the device.
- the speech recognition method further includes a person name storing step of storing a person name in advance, wherein the utterance determining step determines whether or not the character information stored in advance is included in the character information, When a personal name is included in the character information, it is preferable to determine that the voice is not an utterance to the device.
- the personal name is stored in advance. It is determined whether or not a personal name stored in advance is included in the character information. If the personal name is included in the character information, it is determined that the voice is not an utterance to the device.
- the voice is not an utterance to the device but an utterance to the person with the personal name. Therefore, it is possible to easily determine whether or not the voice is an utterance to the device by storing the personal name in advance and determining whether or not the personal name is included in the character information.
- the speech recognition method may further include a detection step of detecting a person in a space where the device is arranged, and the speech determination step may be performed when the plurality of persons are detected in the detection step. Is not an utterance to the device, and when a person is detected in the detection step, it is preferable to determine that the voice is an utterance to the device.
- a person in the space where the device is arranged is detected.
- it is determined that the sound is not an utterance to the device it is determined that the sound is an utterance to the device.
- the user's utterance When there are a plurality of persons in the space where the device is arranged, the user's utterance is likely to be an utterance toward another person. In addition, when there is only one person in the space where the device is arranged, the user's utterance is highly likely to be directed to the device. Therefore, by detecting the number of persons in the space where the device is arranged, it can be easily determined whether or not the voice is an utterance to the device.
- the utterance determination step determines whether or not the usage form of the prescription included in the character information is a command form, and when the usage form is the command form, It is preferable to determine that the voice is an utterance to the device.
- the utilization form of the prescription included in the character information is a command form. If the utilization form is a command form, it is determined that the voice is an utterance to the device.
- the voice is likely to be an utterance to the device. Therefore, it can be easily determined that the voice is an utterance to the device by determining that the usage form of the prescription included in the character information is the command form.
- the speech recognition method may further include a weight value calculating step of summing weight values given according to a predetermined determination result for the character information, and the utterance determining step is summed in the weight value calculating step. It is preferable to determine whether or not the weight value is greater than or equal to a predetermined value. If the total weight value is equal to or greater than the predetermined value, it is preferable to determine that the voice is an utterance to the device.
- the weight values given according to the predetermined determination result for the character information are summed up. It is determined whether or not the total weight value is equal to or greater than a predetermined value. If the total weight value is equal to or greater than the predetermined value, it is determined that the voice is an utterance to the device.
- the weight values given according to the predetermined determination result for the character information are summed, and it is determined whether the voice is an utterance to the device according to the total weight value.
- the weight value calculating step may end the acquisition of the speech information and the weight value given depending on whether the sentence type of the character information is a question sentence or a command sentence.
- the weight value assigned according to whether or not the measured silent time is equal to or longer than a predetermined time is stored in advance.
- the command value is a weight value assigned according to whether or not a plurality of persons are detected in the space where the device is arranged, and the usage form of the prescription included in the character information Granted depending on whether there is It is preferred to sum the weighted values.
- the weight value given according to whether or not the sentence type of the character information is a question sentence or a command sentence, and until the next voice information is acquired after the acquisition of the voice information is completed.
- a weight value given according to whether or not the silence time is equal to or longer than a predetermined time, and a weight given depending on whether or not a predetermined keyword relating to the operation of the device stored in advance is included in the character information.
- a value, a weight value assigned depending on whether or not a personal name stored in advance is included in the character information, and whether or not a plurality of persons are detected in the space where the device is arranged
- the weight value to be given and the weight value given depending on whether or not the utilization form of the prescription included in the character information is the command form are summed up.
- weight value weight values are summed up, and it is determined whether or not the voice is an utterance to the device according to the total weight value, so that it is more accurately determined that the voice is an utterance to the device. be able to.
- the device includes a mobile terminal, and the operation instruction includes an operation instruction to acquire a weather forecast for a day specified by the user and output the acquired weather forecast.
- the operation instruction generation step preferably outputs the generated operation instruction to the mobile terminal.
- the weather forecast for the day specified by the user can be acquired, and the acquired weather forecast can be output from the mobile terminal.
- the device includes a lighting device
- the operation instruction includes an operation instruction to turn on the lighting device and an operation instruction to turn off the lighting device
- the operation instruction generating step Preferably outputs the generated operation instruction to the lighting device.
- the lighting device can be turned on or off by the sound.
- the device includes a faucet device that automatically ejects water from a faucet, and the operation instruction includes an operation instruction for ejecting water from the faucet device and an output from the faucet device. It is preferable that the operation instruction generating step outputs the generated operation instruction to the faucet device.
- the device includes a television
- the operation instruction includes an operation instruction to change a channel of the television
- the operation instruction generation step transmits the generated operation instruction to the television. It is preferable to output.
- the TV channel can be changed by voice.
- a voice recognition device recognizes, as character information, a voice information acquisition unit that acquires voice information representing voice uttered by a user, and the voice information acquired by the voice information acquisition unit.
- a voice recognition unit and a determination unit that determines whether the voice is an utterance to the device based on the character information recognized by the voice recognition unit.
- voice information representing the voice uttered by the user is acquired.
- the acquired voice information is recognized as character information. Based on the recognized character information, it is determined whether or not the voice is an utterance to the device.
- the voice since it is determined whether or not the voice is an utterance to the device based on the recognized character information, the utterance of a specific keyword that is a trigger for starting the voice recognition can be made unnecessary. Therefore, the user can speak without being aware of a specific keyword that is a trigger for starting voice recognition, and can operate the device from daily conversation.
- the speech recognition method and the speech recognition apparatus can eliminate the need for utterance of a specific keyword for starting speech recognition, recognize input speech, and control a device based on the recognition result. It is useful as a speech recognition method and speech recognition device.
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Machine Translation (AREA)
- Telephonic Communication Services (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
Description
図14は、特許文献1に記載された従来の音声認識装置の構成を示すブロック図であり、図15は、特許文献2に記載された従来の音声認識装置の構成を示すブロック図である。
図1は、本開示の実施の形態1における音声認識システムの構成を示すブロック図である。図1に示す音声認識システムは、機器1及びサーバ2を備える。
続いて、本実施の形態2における音声認識装置について説明する。本実施の形態2における音声認識装置は、無音であった時間を計測し、計測した無音であった時間の長さに応じて、音声が機器1に対する発話であるか否かを判断する。
続いて、本実施の形態3における音声認識装置について説明する。本実施の形態3における音声認識装置は、機器1の動作に関する所定のキーワードが文字情報に含まれているか否かを判断し、所定のキーワードが文字情報に含まれている場合、音声が機器1に対する発話であると判断する。
続いて、本実施の形態4における音声認識装置について説明する。本実施の形態4における音声認識装置は、予め記憶されている人名が文字情報に含まれているか否かを判断し、人名が文字情報に含まれている場合、音声が機器1に対する発話ではないと判断する。
続いて、本実施の形態5における音声認識装置について説明する。本実施の形態5における音声認識装置は、機器1が配置されている空間内の人物を検知し、複数の人物が検知された場合、音声が機器1に対する発話ではないと判断し、一人の人物が検知された場合、音声が機器1に対する発話であると判断する。
続いて、本実施の形態6における音声認識装置について説明する。本実施の形態6における音声認識装置は、文字情報に含まれる用言の活用形が命令形であるか否かを判断し、活用形が命令形である場合、音声が機器1に対する発話であると判断する。
続いて、本実施の形態7における音声認識装置について説明する。本実施の形態7における音声認識装置は、文字情報に対する所定の判断結果に応じて付与される重み値を合計し、合計した重み値が所定の値以上であるか否かを判断し、合計した重み値が所定の値以上である場合、音声が機器1に対する発話であると判断する。
11 通信部
12 制御部
13 メモリ
14 マイクロフォン
15 スピーカ
16 表示部
100 音声認識装置
101 音声取得部
102 音声認識処理部
103 認識結果判断部
104 動作指示生成部
Claims (15)
- 一または複数の機器を音声認識によって制御するシステムにおける音声認識方法であって、
ユーザによって発話された音声を表す音声情報を取得する音声情報取得ステップと、
前記音声情報取得ステップにおいて取得された前記音声情報を文字情報として認識する音声認識ステップと、
前記音声認識ステップにおいて認識された前記文字情報に基づいて、前記音声が前記機器に対する発話であるか否かを判断する発話判断ステップと、
を含む音声認識方法。 - 前記発話判断ステップにおいて前記機器に対する発話であると判断された場合、前記機器に対する動作指示を生成する動作指示生成ステップをさらに含む請求項1記載の音声認識方法。
- 前記発話判断ステップは、前記文字情報の文型を解析し、前記文型が疑問文又は命令文であるか否かを判断し、前記文型が前記疑問文又は前記命令文である場合、前記音声が前記機器に対する発話であると判断する請求項1又は2記載の音声認識方法。
- 前記音声情報の取得が終了してからの時間を無音時間として計測する計時ステップと、
前記音声情報が取得された場合、前記計時ステップにおいて計測された前記無音時間が所定時間以上であるか否かを判断する時間判断ステップとをさらに含み、
前記発話判断ステップは、計測された前記無音時間が所定時間以上であると判断された場合、前記音声が前記機器に対する発話であると判断する請求項1~3のいずれかに記載の音声認識方法。 - 前記機器の動作に関する所定のキーワードを予め記憶するキーワード記憶ステップをさらに含み、
前記発話判断ステップは、予め記憶されている前記キーワードが前記文字情報に含まれているか否かを判断し、前記キーワードが前記文字情報に含まれている場合、前記音声が前記機器に対する発話であると判断する請求項1~4のいずれかに記載の音声認識方法。 - 人名を予め記憶する人名記憶ステップをさらに含み、
前記発話判断ステップは、予め記憶されている前記人名が前記文字情報に含まれているか否かを判断し、前記人名が前記文字情報に含まれている場合、前記音声が前記機器に対する発話ではないと判断する請求項1~5のいずれかに記載の音声認識方法。 - 前記機器が配置されている空間内の人物を検知する検知ステップをさらに含み、
前記発話判断ステップは、前記検知ステップにおいて複数の人物が検知された場合、前記音声が前記機器に対する発話ではないと判断し、前記検知ステップにおいて一人の人物が検知された場合、前記音声が前記機器に対する発話であると判断する請求項1~6のいずれかに記載の音声認識方法。 - 前記発話判断ステップは、前記文字情報に含まれる用言の活用形が命令形であるか否かを判断し、前記活用形が前記命令形である場合、前記音声が前記機器に対する発話であると判断する請求項1~7のいずれかに記載の音声認識方法。
- 前記文字情報に対する所定の判断結果に応じて付与される重み値を合計する重み値算出ステップをさらに含み、
前記発話判断ステップは、前記重み値算出ステップにおいて合計された前記重み値が所定の値以上であるか否かを判断し、合計した前記重み値が所定の値以上である場合、前記音声が前記機器に対する発話であると判断する請求項1又は2記載の音声認識方法。 - 前記重み値算出ステップは、前記文字情報の文型が疑問文又は命令文であるか否かに応じて付与される重み値と、前記音声情報の取得が終了してから次の前記音声情報が取得されるまでの無音時間が所定時間以上であるか否かに応じて付与される重み値と、予め記憶されている前記機器の動作に関する所定のキーワードが前記文字情報に含まれているか否かに応じて付与される重み値と、予め記憶されている人名が前記文字情報に含まれているか否かに応じて付与される重み値と、前記機器が配置されている空間内で複数の人物が検知されたか否かに応じて付与される重み値と、前記文字情報に含まれる用言の活用形が命令形であるか否かに応じて付与される重み値とを合計する請求項9記載の音声認識方法。
- 前記機器は、携帯端末を含み、
前記動作指示は、前記ユーザによって指定された日の天気予報を取得し、取得した前記天気予報を出力する動作指示を含み、
前記動作指示生成ステップは、生成した前記動作指示を前記携帯端末へ出力する請求項2記載の音声認識方法。 - 前記機器は、照明機器を含み、
前記動作指示は、前記照明機器を点灯させる動作指示と、前記照明機器を消灯させる動作指示とを含み、
前記動作指示生成ステップは、生成した前記動作指示を前記照明機器へ出力する請求項2記載の音声認識方法。 - 前記機器は、自動的に蛇口から水を出す水栓機器を含み、
前記動作指示は、前記水栓機器から水を出す動作指示と、前記水栓機器から出ている水を止める動作指示とを含み、
前記動作指示生成ステップは、生成した前記動作指示を前記水栓機器へ出力する請求項2記載の音声認識方法。 - 前記機器は、テレビを含み、
前記動作指示は、前記テレビのチャンネルを変更する動作指示を含み、
前記動作指示生成ステップは、生成した前記動作指示を前記テレビへ出力する請求項2記載の音声認識方法。 - 一または複数の機器を音声認識によって制御する音声認識装置であって、
ユーザによって発話された音声を表す音声情報を取得する音声情報取得部と、
前記音声情報取得部によって取得された前記音声情報を文字情報として認識する音声認識部と、
前記音声認識部で認識された前記文字情報に基づいて、前記音声が前記機器に対する発話であるか否かを判断する判断部と、
を備える音声認識装置。
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/428,093 US9865255B2 (en) | 2013-08-29 | 2014-07-08 | Speech recognition method and speech recognition apparatus |
JP2015511537A JP6502249B2 (ja) | 2013-08-29 | 2014-07-08 | 音声認識方法及び音声認識装置 |
US15/822,926 US10446151B2 (en) | 2013-08-29 | 2017-11-27 | Speech recognition method and speech recognition apparatus |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201361871625P | 2013-08-29 | 2013-08-29 | |
US61/871,625 | 2013-08-29 | ||
US201461973411P | 2014-04-01 | 2014-04-01 | |
US61/973,411 | 2014-04-01 |
Related Child Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/428,093 A-371-Of-International US9865255B2 (en) | 2013-08-29 | 2014-07-08 | Speech recognition method and speech recognition apparatus |
US15/822,926 Continuation US10446151B2 (en) | 2013-08-29 | 2017-11-27 | Speech recognition method and speech recognition apparatus |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2015029304A1 true WO2015029304A1 (ja) | 2015-03-05 |
Family
ID=52585904
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2014/003608 WO2015029304A1 (ja) | 2013-08-29 | 2014-07-08 | 音声認識方法及び音声認識装置 |
Country Status (4)
Country | Link |
---|---|
US (2) | US9865255B2 (ja) |
JP (1) | JP6502249B2 (ja) |
MY (1) | MY179900A (ja) |
WO (1) | WO2015029304A1 (ja) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2015129794A (ja) * | 2014-01-06 | 2015-07-16 | 株式会社デンソー | 音声認識装置 |
WO2017042906A1 (ja) * | 2015-09-09 | 2017-03-16 | 三菱電機株式会社 | 車載用音声認識装置および車載機器 |
WO2018216180A1 (ja) * | 2017-05-25 | 2018-11-29 | 三菱電機株式会社 | 音声認識装置および音声認識方法 |
US10331795B2 (en) | 2016-09-28 | 2019-06-25 | Panasonic Intellectual Property Corporation Of America | Method for recognizing speech sound, mobile terminal, and recording medium |
CN111033611A (zh) * | 2017-03-23 | 2020-04-17 | 乔伊森安全系统收购有限责任公司 | 使嘴部图像与输入指令关联的系统和方法 |
CN111742363A (zh) * | 2018-02-22 | 2020-10-02 | 松下知识产权经营株式会社 | 语音控制信息输出系统、语音控制信息输出方法以及程序 |
Families Citing this family (24)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9110889B2 (en) * | 2013-04-23 | 2015-08-18 | Facebook, Inc. | Methods and systems for generation of flexible sentences in a social networking system |
US9606987B2 (en) | 2013-05-06 | 2017-03-28 | Facebook, Inc. | Methods and systems for generation of a translatable sentence syntax in a social networking system |
WO2015029304A1 (ja) * | 2013-08-29 | 2015-03-05 | パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカ | 音声認識方法及び音声認識装置 |
US11676608B2 (en) | 2021-04-02 | 2023-06-13 | Google Llc | Speaker verification using co-location information |
US9257120B1 (en) | 2014-07-18 | 2016-02-09 | Google Inc. | Speaker verification using co-location information |
US11942095B2 (en) | 2014-07-18 | 2024-03-26 | Google Llc | Speaker verification using co-location information |
JP2016061970A (ja) * | 2014-09-18 | 2016-04-25 | 株式会社東芝 | 音声対話装置、方法およびプログラム |
US9812128B2 (en) | 2014-10-09 | 2017-11-07 | Google Inc. | Device leadership negotiation among voice interface devices |
US9318107B1 (en) * | 2014-10-09 | 2016-04-19 | Google Inc. | Hotword detection on multiple devices |
US10134425B1 (en) * | 2015-06-29 | 2018-11-20 | Amazon Technologies, Inc. | Direction-based speech endpointing |
US9542941B1 (en) * | 2015-10-01 | 2017-01-10 | Lenovo (Singapore) Pte. Ltd. | Situationally suspending wakeup word to enable voice command input |
US9779735B2 (en) | 2016-02-24 | 2017-10-03 | Google Inc. | Methods and systems for detecting and processing speech signals |
GB2552722A (en) * | 2016-08-03 | 2018-02-07 | Cirrus Logic Int Semiconductor Ltd | Speaker recognition |
US9972320B2 (en) | 2016-08-24 | 2018-05-15 | Google Llc | Hotword detection on multiple devices |
EP4328905A3 (en) | 2016-11-07 | 2024-04-24 | Google Llc | Recorded media hotword trigger suppression |
US10559309B2 (en) | 2016-12-22 | 2020-02-11 | Google Llc | Collaborative voice controlled devices |
US10937421B2 (en) * | 2016-12-23 | 2021-03-02 | Spectrum Brands, Inc. | Electronic faucet with smart features |
CA3047984A1 (en) * | 2016-12-23 | 2018-06-28 | Spectrum Brands, Inc. | Electronic faucet with smart features |
KR102458805B1 (ko) | 2017-04-20 | 2022-10-25 | 구글 엘엘씨 | 장치에 대한 다중 사용자 인증 |
US10395650B2 (en) | 2017-06-05 | 2019-08-27 | Google Llc | Recorded media hotword trigger suppression |
US10692496B2 (en) | 2018-05-22 | 2020-06-23 | Google Llc | Hotword suppression |
EP4036910A1 (en) * | 2018-08-21 | 2022-08-03 | Google LLC | Dynamic and/or context-specific hot words to invoke automated assistant |
JP7322076B2 (ja) | 2018-08-21 | 2023-08-07 | グーグル エルエルシー | 自動アシスタントを起動させるための動的および/またはコンテキスト固有のホットワード |
JP7266432B2 (ja) * | 2019-03-14 | 2023-04-28 | 本田技研工業株式会社 | エージェント装置、エージェント装置の制御方法、およびプログラム |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2001207499A (ja) * | 2000-01-28 | 2001-08-03 | Denso Corp | 自動水栓 |
JP2004303251A (ja) * | 1997-11-27 | 2004-10-28 | Matsushita Electric Ind Co Ltd | 制御方法 |
JP2006058479A (ja) * | 2004-08-18 | 2006-03-02 | Matsushita Electric Works Ltd | 音声認識機能付制御装置 |
JP2009109535A (ja) * | 2007-10-26 | 2009-05-21 | Panasonic Electric Works Co Ltd | 音声認識装置 |
JP2012181374A (ja) * | 2011-03-01 | 2012-09-20 | Toshiba Corp | テレビジョン装置及び遠隔操作装置 |
JP2014002586A (ja) * | 2012-06-19 | 2014-01-09 | Ntt Docomo Inc | 機能実行指示システム、機能実行指示方法及び機能実行指示プログラム |
JP2014006306A (ja) * | 2012-06-21 | 2014-01-16 | Sharp Corp | 表示装置、テレビジョン受像機、表示装置の制御方法、プログラムおよび記録媒体 |
Family Cites Families (28)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
NL8401862A (nl) * | 1984-06-13 | 1986-01-02 | Philips Nv | Werkwijze voor het herkennen van een besturingskommando in een systeem, en een interaktief systeem voor het uitvoeren van de werkwijze. |
EP0543329B1 (en) * | 1991-11-18 | 2002-02-06 | Kabushiki Kaisha Toshiba | Speech dialogue system for facilitating human-computer interaction |
JP3363283B2 (ja) * | 1995-03-23 | 2003-01-08 | 株式会社日立製作所 | 入力装置、入力方法、情報処理システムおよび入力情報の管理方法 |
US8001067B2 (en) * | 2004-01-06 | 2011-08-16 | Neuric Technologies, Llc | Method for substituting an electronic emulation of the human brain into an application to replace a human |
JPH11224179A (ja) * | 1998-02-05 | 1999-08-17 | Fujitsu Ltd | 対話インタフェース・システム |
JP2001154694A (ja) | 1999-09-13 | 2001-06-08 | Matsushita Electric Ind Co Ltd | 音声認識装置及び方法 |
DE60032982T2 (de) | 1999-09-13 | 2007-11-15 | Matsushita Electric Industrial Co., Ltd., Kadoma | Spracherkennung zur Steuerung eines Geräts |
US20040215443A1 (en) * | 2001-07-27 | 2004-10-28 | Hatton Charles Malcolm | Computers that communicate in the english language and complete work assignments by reading english language sentences |
JP4363076B2 (ja) * | 2002-06-28 | 2009-11-11 | 株式会社デンソー | 音声制御装置 |
JP2004110613A (ja) * | 2002-09-20 | 2004-04-08 | Toshiba Corp | 制御装置、制御プログラム、対象装置及び制御システム |
JP2006039120A (ja) * | 2004-07-26 | 2006-02-09 | Sony Corp | 対話装置および対話方法、並びにプログラムおよび記録媒体 |
JP2006048218A (ja) * | 2004-08-02 | 2006-02-16 | Advanced Media Inc | 音声動画応答方法および音声動画応答システム |
US7567895B2 (en) * | 2004-08-31 | 2009-07-28 | Microsoft Corporation | Method and system for prioritizing communications based on sentence classifications |
US8725505B2 (en) * | 2004-10-22 | 2014-05-13 | Microsoft Corporation | Verb error recovery in speech recognition |
JP4237713B2 (ja) | 2005-02-07 | 2009-03-11 | 東芝テック株式会社 | 音声処理装置 |
JP4622558B2 (ja) | 2005-02-07 | 2011-02-02 | ソニー株式会社 | 符号化装置および方法、プログラム、記録媒体、並びにデータ処理システム |
US20080221892A1 (en) * | 2007-03-06 | 2008-09-11 | Paco Xander Nathan | Systems and methods for an autonomous avatar driver |
US8032383B1 (en) * | 2007-05-04 | 2011-10-04 | Foneweb, Inc. | Speech controlled services and devices using internet |
JP2008309864A (ja) * | 2007-06-12 | 2008-12-25 | Fujitsu Ten Ltd | 音声認識装置および音声認識方法 |
DE102007044792B4 (de) * | 2007-09-19 | 2012-12-13 | Siemens Ag | Verfahren, Steuergerät und System zur Steuerung oder Bedienung |
US8165886B1 (en) * | 2007-10-04 | 2012-04-24 | Great Northern Research LLC | Speech interface system and method for control and interaction with applications on a computing system |
US8374859B2 (en) * | 2008-08-20 | 2013-02-12 | Universal Entertainment Corporation | Automatic answering device, automatic answering system, conversation scenario editing device, conversation server, and automatic answering method |
US10553209B2 (en) * | 2010-01-18 | 2020-02-04 | Apple Inc. | Systems and methods for hands-free notification summaries |
US8762156B2 (en) * | 2011-09-28 | 2014-06-24 | Apple Inc. | Speech recognition repair using contextual information |
US8924209B2 (en) * | 2012-09-12 | 2014-12-30 | Zanavox | Identifying spoken commands by templates of ordered voiced and unvoiced sound intervals |
US20140122056A1 (en) * | 2012-10-26 | 2014-05-01 | Xiaojiang Duan | Chatbot system and method with enhanced user communication |
WO2014171144A1 (ja) * | 2013-04-19 | 2014-10-23 | パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカ | 家電機器の制御方法、家電機器制御システム、及びゲートウェイ |
WO2015029304A1 (ja) * | 2013-08-29 | 2015-03-05 | パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカ | 音声認識方法及び音声認識装置 |
-
2014
- 2014-07-08 WO PCT/JP2014/003608 patent/WO2015029304A1/ja active Application Filing
- 2014-07-08 US US14/428,093 patent/US9865255B2/en active Active
- 2014-07-08 MY MYPI2015700684A patent/MY179900A/en unknown
- 2014-07-08 JP JP2015511537A patent/JP6502249B2/ja active Active
-
2017
- 2017-11-27 US US15/822,926 patent/US10446151B2/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2004303251A (ja) * | 1997-11-27 | 2004-10-28 | Matsushita Electric Ind Co Ltd | 制御方法 |
JP2001207499A (ja) * | 2000-01-28 | 2001-08-03 | Denso Corp | 自動水栓 |
JP2006058479A (ja) * | 2004-08-18 | 2006-03-02 | Matsushita Electric Works Ltd | 音声認識機能付制御装置 |
JP2009109535A (ja) * | 2007-10-26 | 2009-05-21 | Panasonic Electric Works Co Ltd | 音声認識装置 |
JP2012181374A (ja) * | 2011-03-01 | 2012-09-20 | Toshiba Corp | テレビジョン装置及び遠隔操作装置 |
JP2014002586A (ja) * | 2012-06-19 | 2014-01-09 | Ntt Docomo Inc | 機能実行指示システム、機能実行指示方法及び機能実行指示プログラム |
JP2014006306A (ja) * | 2012-06-21 | 2014-01-16 | Sharp Corp | 表示装置、テレビジョン受像機、表示装置の制御方法、プログラムおよび記録媒体 |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2015129794A (ja) * | 2014-01-06 | 2015-07-16 | 株式会社デンソー | 音声認識装置 |
WO2017042906A1 (ja) * | 2015-09-09 | 2017-03-16 | 三菱電機株式会社 | 車載用音声認識装置および車載機器 |
JPWO2017042906A1 (ja) * | 2015-09-09 | 2017-11-24 | 三菱電機株式会社 | 車載用音声認識装置および車載機器 |
US10331795B2 (en) | 2016-09-28 | 2019-06-25 | Panasonic Intellectual Property Corporation Of America | Method for recognizing speech sound, mobile terminal, and recording medium |
CN111033611A (zh) * | 2017-03-23 | 2020-04-17 | 乔伊森安全系统收购有限责任公司 | 使嘴部图像与输入指令关联的系统和方法 |
JP2020518844A (ja) * | 2017-03-23 | 2020-06-25 | ジョイソン セイフティ システムズ アクイジション エルエルシー | 口の画像を入力コマンドと相互に関連付けるシステム及び方法 |
JP7337699B2 (ja) | 2017-03-23 | 2023-09-04 | ジョイソン セイフティ システムズ アクイジション エルエルシー | 口の画像を入力コマンドと相互に関連付けるシステム及び方法 |
WO2018216180A1 (ja) * | 2017-05-25 | 2018-11-29 | 三菱電機株式会社 | 音声認識装置および音声認識方法 |
CN111742363A (zh) * | 2018-02-22 | 2020-10-02 | 松下知识产权经营株式会社 | 语音控制信息输出系统、语音控制信息输出方法以及程序 |
CN111742363B (zh) * | 2018-02-22 | 2024-03-29 | 松下知识产权经营株式会社 | 语音控制信息输出系统、语音控制信息输出方法以及记录介质 |
Also Published As
Publication number | Publication date |
---|---|
US20180082687A1 (en) | 2018-03-22 |
JPWO2015029304A1 (ja) | 2017-03-02 |
MY179900A (en) | 2020-11-19 |
JP6502249B2 (ja) | 2019-04-17 |
US20150262577A1 (en) | 2015-09-17 |
US10446151B2 (en) | 2019-10-15 |
US9865255B2 (en) | 2018-01-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2015029304A1 (ja) | 音声認識方法及び音声認識装置 | |
CN112513833B (zh) | 用于基于预先合成的对话提供人工智能服务的电子设备和方法 | |
US11875820B1 (en) | Context driven device arbitration | |
US12125483B1 (en) | Determining device groups | |
CN111344780B (zh) | 基于上下文的设备仲裁 | |
JP6887031B2 (ja) | 方法、電子装置、家庭用機器ネットワークおよび記憶媒体 | |
EP3413303B1 (en) | Information processing device, information processing method, and program | |
CN108346425B (zh) | 一种语音活动检测的方法和装置、语音识别的方法和装置 | |
US20130289994A1 (en) | Embedded system for construction of small footprint speech recognition with user-definable constraints | |
CN109074806A (zh) | 控制分布式音频输出以实现语音输出 | |
US8521525B2 (en) | Communication control apparatus, communication control method, and non-transitory computer-readable medium storing a communication control program for converting sound data into text data | |
WO2011148594A1 (ja) | 音声認識システム、音声取得端末、音声認識分担方法および音声認識プログラム | |
US10685664B1 (en) | Analyzing noise levels to determine usability of microphones | |
US9691389B2 (en) | Spoken word generation method and system for speech recognition and computer readable medium thereof | |
JP2016109897A (ja) | 電子機器、発話制御方法、およびプログラム | |
US20220084520A1 (en) | Speech-responsive construction tool | |
TW201519172A (zh) | 具有丟失提醒功能的可攜式電子裝置及其使用方法 | |
JP2010078763A (ja) | 音声処理装置、音声処理プログラム、およびインターホンシステム | |
US20030163309A1 (en) | Speech dialogue system | |
JP7287006B2 (ja) | 話者決定装置、話者決定方法、および話者決定装置の制御プログラム | |
CN112823047A (zh) | 用于控制网络应用程序的系统和设备 | |
JP4449380B2 (ja) | 話者正規化方法及びそれを用いた音声認識装置 | |
JP3846500B2 (ja) | 音声認識対話装置および音声認識対話処理方法 | |
JP2003058184A (ja) | 機器制御システム、音声認識装置及び方法、並びにプログラム | |
KR20210098250A (ko) | 전자 장치 및 이의 제어 방법 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
ENP | Entry into the national phase |
Ref document number: 2015511537 Country of ref document: JP Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: IDP00201501442 Country of ref document: ID |
|
WWE | Wipo information: entry into national phase |
Ref document number: 14428093 Country of ref document: US |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 14839428 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 14839428 Country of ref document: EP Kind code of ref document: A1 |