TW200407710A

TW200407710A - Dialog control for an electric apparatus

Info

Publication number: TW200407710A
Application number: TW092112722A
Authority: TW
Inventors: Martin Oerder
Original assignee: Koninkl Philips Electronics Nv
Priority date: 2002-05-14
Filing date: 2003-05-09
Publication date: 2004-05-16
Also published as: RU2336560C2; PL372592A1; AU2003230067A1; JP2005525597A; US20050159955A1; EP1506472A1; WO2003096171A1; TWI280481B; RU2004136294A; BR0304830A; CN100357863C; CN1653410A

Abstract

A device comprising means for picking up and recognizing speech signals and a method of controlling an electric apparatus are proposed. The device comprises a personifying element 14 which can be moved mechanically. The position of a user is determined and the personifying element 14, which may comprise, for example, the representation of a human face, is moved in such a way that its front side 44 points in the direction of the user's position. Microphones 16, loudspeakers 18 and/or a camera 20 may be arranged on the personifying element 14. The user can conduct a speech dialog with the device, in which the apparatus is represented in the form of the personifying element 14. An electric apparatus can be controlled in accordance with the user's speech input. A dialog of the user with the personifying element for the purpose of instructing the user is also possible.

Description

200407710 玖、發明說明：技術領域本發明揭示-種包括用於拾取及辨識語音訊號之構件之裝置，以及一種讓使用者與一電氣裝置通信之方法。已知之語音辨識構件可將所拾取之聲學語音訊號指定給對應語詞或對應語詞序列。語音辨識系統通常與語音合成相、'.“，作控制電氣裝置之對話系統。與使用者之對話可作為操作該電氣裝置之唯一介面。亦可將語音輸入甚至輸出作多種溝通方式當中的一種。先前技術美國專利第US-A-6，1 1 8,888號描述了一種控制裝置以及一種控制電氣裝置（譬如電腦）或娛樂電子領域所用裝置之方法。為控制該裝置，使用者有權支配複數個輸入設備。孩等設備為機械輸入設備（譬如鍵盤或滑鼠）以及語音辨識設備。此外’該控制裝置包括一攝影機，其可拾取使用者的手勢及擬態’並可將其處理後作為進一步的輸入訊號。與使用者之溝通係以對話形式實現，其中該系統具有複數個模式可供支配，以向使用者傳送資訊。其包括語音合成及吾首輸出。尤其亦包括擬人化圖像，譬如人、人臉或動物的圖像。該圖像係以電腦圖形的形式在顯示幕上顯示給使用者。儘管目前對話系統已用於各種特殊應用，譬如電話資訊系統，但在諸如家用領域内之控制電氣裝置、娛樂電子等其他領域之應用則仍然未獲廣泛認可。 85329 200407710 發明内容本發明之一項目的係提供一 — 禮包括拾取構件以用於辨識語晋訊號之裝置，以及一種和 > #作—電氣裝置之方法，該電氣裝置讓使用者可藉由注立松庄τ 一、田％曰控制輕鬆操作該裝置。藉由如申凊專利範圍第1项壯、 /、 k且以及如申請專利範圍第π項之方法可實現本目的。其定義了本發明之較佳具體實施例根據本發明之裝置包括一可機^ 他申請專利範圍附屬項則〇械地移動之擬人化元件。其為該裝置之H，該裝置係作為使用者之擬人化對話夥伴。該種擬人化元件之具體實施可能差異很大。譬如，其可為可藉由馬達相對於電氣裝置之固定外殼移動之外殼的邛刀。關键在於該擬人化元件具有一使用者可辨識無疾（A側。若此前側朝向該使用者，他將感覺到該裝置是 "注意傾聽π的，即其可接收語音指令。根據本發明，該裝置包括用於判定使用者位置之構件。此可經由諸如聲音或光學感應器來實現。該擬人化元件之運動構件係被控制以使該擬人化元件之前側朝向該使用者之位置。如此使得使用者始終感覺該裝置準備”聆聽”他講話0 根據本發明之另一項具體實施例，該擬人化元件包括_ 擬人化圖像◦此不僅可為一人或動物之圖像、亦可為一虛幻角色（譬如機器人）之圖像。較易被接受的為人臉之圖像。其可為寫實或象徵性的圖像，譬如其中僅顯示出眼、鼻、口等之輪廓。 85329 -6- 200407710 該裝置最好亦包括供給語音訊號之構件。語音辨識對於控制電氣裝置的確尤其重要，然而，回答、確認、查詢等亦可以語音輸出構件實現。語音輸出可包括再現預存的語音訊號，以及真實的語音合成。可以語音輸出構件實現一完整的對話控制。亦可與使用者對話，以實現為其提供娛樂之目的。根據本發明之另一項具體實施例，該裝置包括複數個麥克風及/或至少一個攝影機。語音訊號由一單一麥克風即可拾取。然而，當使用複數個麥克風時，一方面可達成一拾取模式，另一方面亦可藉由通過複數個麥克風接收使用者之語音訊號來查明使用者位置。可以一攝影機來觀察該裝置之環境。藉由對應的影像處理，亦可根據所拾取之影像判定使用者之位置。麥克風、攝影機及/或用於供給語音訊號之揚聲器可安排在可機械地移動之該擬人化元件上。譬如，對於一人頭形式之擬人化元件，可在眼部區域内安置兩架攝影機，在嘴部位置安置一揚聲器，以及靠近耳部位置安置兩個麥克風。最好係配備用以辨識使用者之構件。此係可藉由譬如評估所拾取之影像訊號（視覺或臉部辨識）或藉由評估所拾取之聲音訊號（語音辨識）來實現。因而該裝置可從該裝置環境内的若干人中判定當前使用者，並使該擬人化元件面向該使用者。可以多種不同方式配置該運動構件以機械地移動該擬人化元件。譬如，該等構件可為電動馬達或液壓調整構件。 85329 200407710 亦可藉由該運動構件以移動該擬人化元件。然而，該擬人化7L件最好僅可相對於一固定部分轉動。舉例而言，在本例中’其可圍繞一水平及/或垂直軸轉動。根據本發明之装置可形成電氣裝置之一部分，諸如用於娱樂電子之裝置（譬如電視、音訊及/或視訊之播放裝置，等等）°在本例中，該裝置代表該裝置之使用者介面。此外，該裝置亦可包括其他作業構件（鍵盤等）。或者，根據本發明之裝置亦可為一獨立裝置，作為控制一或多個獨立電氣裝置足控制裝置。在本例中，待控制之該等裝置具有一電氣控制終端機（譬如無線終端機或合適之控制匯流排），經由該終端機’該裝置根據所接收之使用者語音指令來控制該裝置。根據本發明之裝置可特別地作為使用者之資料存儲及/ 或查沟系統之介面。為此，該裝置包括内部資料記憶體，或該裝置係經由諸如電腦網路或網際網路與一外部資料記憶體連接。使用者可在對話時存儲資料（譬如電話號碼、備 :&錄等等）或査詢貧料（譬如時間、新聞、最新電視節目表等等）。此外，與使用者之對話亦可用於調整該裝置自身之參數，以及改變其組態。當配有提供聲音訊號之揚聲器以及拾取該等訊號之麥克風時，即可提供具有干擾抑制的訊號處理，即處理所拾取聲音訊號之方式可抑制部分來自揚聲器之聲音訊號。當揚聲器及麥克風在空間上相鄰排列，譬如排列在該擬人化元 85329 件上時，此點尤為有利。除上述利用該裝置使用者進行對話，以服=子裳置外’亦可將其用於與 ^ # ^ , 力万他目的，諸如資訊、娛樂或向使用者發出指示。根櫨、備有可藉以進行對⑽ 月之另-項具體實施例，配時，^^❹者發出指示㈣話構件。此時，對活万式最好既可給之回答。該等指示可為複又可拾取使用者物件提問，譬如外語詞彙，並卞白答（如外語中+ 一致毛 /、中扣不（如—語詞之定義）及回〜PQ D§])均相對較短。對話係在使用者與該擬化：件，間進行’且可採取視覺及/或音訊方式實施。本發明提出一種可能有件（諸如外語詞彙）存错ϋ自万法’即將一組學習物果）存储起來，其中對於每個學習物件存儲至 V —個問題（譬如定義）、一安個a衣（i如詞彙）以及最近一次 -使用者k問後或該使用者正確回答提問後所經歷時間之 -種量龍。在對話中，逐個選取並提問學習物件係向該使用者提問，而將使用者之回答與存儲之答案比較。待k出作為問題之學習物件之選取係考慮到所存儲之計時量測值，即自悬折_、A 乂丄斗丄、、目取近，人針對孩物件提問後所經過的時間經由（譬如）—適宜之學習模式來實現，該模式具有假錯料。此外，㈣時間量測值外，在選取時亦可將相關性程度納人考量，來評估每個學習物件。結合下列具體實施例’將更清楚的瞭解本發明之這些及其它方面。 85329 200407710 圖1係控制裝置10以及受此裝置控制之裝置12的方塊圖。控制裝f1G之㈣為針對使用者之擬人化元件14。麥克風揚耳1 8及針對使用者位置之位置感應器（此處為攝影機此形式）㈣在擬人化元件14上。此料件共同構成一機械罕兀22。該擬人化元件u以及機械單元22藉由馬達 24圍、'兀垂直軸轉動。—中央控制單元％經由—驅動電路 28控制該馬達24。她人化元件14係一獨立機械單元。其具有使用纟可辨識無誤之一前側。麥克風]6、揚聲器㈣及攝影機20排列在擬人化元件14上，朝向此前側之方向。孩麥克風16提供聲音訊號。此訊號由拾取系統3〇拾取，並由語首辨識單元32處理。該語音辨識結果，即指定給拾取之耸首訊號之語詞序列，被傳送至中央控制單元％。為中央控制單元26亦控制一語音合成單元34，其經由_ 發聲單元3 6及揚聲器1 8提供合成語音説號。该攝景> 機20所拾取之影像由該影像處理單元3 8處理。該象處理單元38根據攝影機20提供之影像訊號判定使用者之位且。该位置寅说被傳送至該中央控制單元2 $。忒機械單元2 2係作為一使用者介面，該中央控制單元2 6 經由該機械單元接收來自使用者之輸入（麥克風丨6、語音辨哉單元32)，並回答使用者（語音合成單元34、揚聲器18)。在本例中，該控制單元1 0係用於控制一電氣裝置1 2，璧如 —娛樂電子領域所用裝置。圖1中僅象徵性地表示出該控制裝置1 〇之功能性單元。不同單元，譬如中央控制單元26、語音辨識單元32及影像處 85329 -10 - 200407710 理單元3 8，在一且歸尚 t 、 “睹笑杈中可以獨立群組方式存在。同樣地’亦可以純粹軟體 — 卜乃式/目、她琢寺早凡，其中可藉由在一中央早兀上執行余式不㈢現禝數個或所有該等單元之功能性。该寺早兀在命門 \ -j- 二间上不必彼此或與該機械單元22相鄭。該機械單元22，亦即耘、，乂 f但並非必要排列在此元上人化元件14以及來力m r 克風16、揚聲器]8和感應器2〇，可與控制裝置]0之其餘部分分班 __ 刀開女且，且僅經由線路或無線連接與之進行訊號連接。 =作中’轉制裝置1Q不斷探查其鄰近是否有使用者曰判疋使用者位置後，該中央控制單7t26即控制馬達24 ，令擬人化元件10之前側朝向該使用者。奋亥；5^像處理早元3 8 ~ 二、亚包括面邵辨識。當該攝影機20提供複數個人之影像時，係葬由、、、 r 你稭由面邯辨識來判疋誰為系統已知之使用者。然後令兮I > 7 d k人化兀件14朝向該使用者。當配有複數個麥克風時，i以、b 、万式處理該等麥克風發出之訊號，以便祕已知使用者位置方向上之拾取模式。此外，料設定該影像處理單元38之實施方式，使其可 "理解’’攝影機2 0所於取夕遍奸口口叮心取&機械早元22附近之景象。接著，可將相應景象指定給若干預先定義之狀態。譬如，以此方式’该中央#制單元26可得知房間内是有—人或有多人。該單元亦可辨識及指認使用者的行為，即：諸如該使用者是正注視該機械軍& 9 9 > 士 & 錢早7L22m或是正與他人交談。藉由評估所辨識之狀態，可顯著改進辨識能力。譬如，可避免 85329 -11 - 200407710 將兩人間之部分對話錯誤地理解為語音指令。與使用者對話時，該中央控制單元會判定其輸入，並相應地fe制該裝置1 2。可以如下方式對話，來控制聲音再生裝置12之音量： -使用者改變其位置並面向該擬人化元件14。藉由馬達 24的不斷引導該擬人化元件14，令其前側朝向該使用者。為此，根據判定之使用者位置，藉由裝置1〇之中央控制單元26控制驅動電路28 ; 使用者發出語音指令，譬如"電視音量"。麥克風1 6拾取4 ^曰扣令，並由語音辨識單元3 2進行辨識；中央控制單元26作出反應，經由語音合成單元34以揚常器18提問：”升高或降低？，，：使用者發出浯首指令"降低"。辨識語音訊號後，中央控制單元26控制裝置12，使音量降低。圖2係具有整合式控制裝置之電氣裝置40的透視圖。該圖 ^ ^工制衣置1 0之擬人化元件14，該元件可圍繞一垂直轴相對於m裝置4Q之固定外殼42轉動。在此實例中，該擬人化元件且古、一 ’烏平矩形之形狀。攝影機20及揚聲器18 目^示' 係位y{印丨抑 ^ 4上。兩麥克風16係排列在側面。機械早元2 2係精由_1民、去 ’建（未顯示）轉動，使得前側始終指向使用者方向。 /、/、l貝施例（未顯示）中，圖1之裝置1 〇並非用於控制裝置12，而# 、… '、於進行對話，其目的在指示使用者。中央控制單元2 61彳f — 订—可供使用者學習外語之學習程式。記 85329 -12 - 200407710 =:=::_件。該等物件係個別資料組，每組 (在該語言中出現之頻率）之評估1」、“狀關聯性料紀錄φ、 ^ 里、心以及自最近提出資 3中义問畸後經過時間之時間量測值。匕時在逐個選取並提問之數據★己錄中$ 習單 ^-己塚中執行該對話之學白早兀。在此情況下，給予使 ,^ x J考一#曰不，即以光學顧示或’耳首播放資料記錄中存儲键…人么者《…拾取使用者藉由(譬如）鍵|的輪入，且較佳地由麥克 i人i 士门斤次啟動自動語晋辨識32 知取〜回谷，並將其與已存答案（詞彙告知答案是不判# A T i I存儲。使用者被。木疋口刦疋為正確。若复鸯正確答案，納曰、… 使用者會被告知咨拉、，„ & f新回合又機會。如此處理貝枓屺錄後，所存最近一次接設為零。人挺問後<計時獲得更新，即重 k後，選取並查詢下一資料記錄。藉由一記憶模型選取待杳詢 d又貝枓1己錄。以公式 P(k) = exp(-t(k)*r(c(k)))表示一筒川衣間早吕己憶模型，豆中P(k)代表 !人知曉學習物件k之機率’叫代表指數函數、，雜表自 ^迎提問以來之時間，e(k)代表物件之學習級別，轉 :係學習級別之特定錯誤率。t可表示時間。亦可在學習步 #中給疋時間t。學習級別可以 1 U 1通且万式來足義。一可仃模式係給被答對N次之物件泛|彻M σϊ 忏又母個^^〇指定一相應級別。至於錯誤率，可假設一適宜、、、〜口疋值，或選擇一通宜 I初始值，並以一種梯度演算法調整。才曰示足目的係最大化知識的度量。 ,^ 規又此知識度Τ為整 85329 -13 - 200407710 :!習Π之部分，為使用者知曉，1以相關性量測值來 ΓΓ二=髮物件k之問題令機娜)成為因而，為 I 4心’應在每—步中提問知識機率為P(k)最低模量測警刚、^ ::，可在母步後計算知識度量並顯示給使料。將該方、、、化*以邊使用者盡可能廣泛地獲取當前學習物件組 j4 11由使用良好之記憶模型，可依此達成有效之學習策略。可對上逑對活式查詢進行多種修改及進一步改良。譬如問碭（疋義）可具有複數個正確答案（詞彙）。譬如，可考慮，用所存相關性量測值來強調更為相關（更常則之語詞 :如，相應學習物件組可包括數千個語詞。該等可為譬如卞白物件’即給足用途(譬如文學、商冑、技術領域等等）之具體詞彙。、’心〜，本發明涉及一種包括用於拾取及辨識語音訊號之構件的裝置，以及一種與一電氣裝置溝通之方法。該裝置包括一可機械地移動之擬人化元件。判定使用者位置，且居擬人化元件（其可包括諸如一人臉之圖像）之移動方式可使其七側指向該使用者位置之方向。麥克風、揚聲器及/或才砰影機可排列在該擬人化元件上。使用者可與該裝置進行語音對話’其中該裝置為擬人化元件之形式。可根據使用者語晋輸入控制一電氣裝置。亦可為實現指示使用者之目的而進行使用者與該擬人化元件之對話。 85329 -14- 200407710 在圖式中。圖1係一控制裝置之元件方塊圖；圖2係包括一控制裝置之電氣裝置的透視圖。圖式代表符號說明 10 控制裝置 12 裝置 14 擬人化元件 16 麥克風 18 揚聲器 20 攝影機 22 機械早元 24 馬達 26 中央控制單元 28 驅動電路 30 拾取系統 32 語音辨 Ί线單元 34 語晋合成單元 36 發聲單元 38 影像處理口口早元 40 裝置 42 固定機殼 44 前側 -15 - 85329200407710 (1) Description of the invention: TECHNICAL FIELD The present invention discloses a device including a component for picking up and recognizing a voice signal, and a method for allowing a user to communicate with an electrical device. The known speech recognition component can assign the picked-up acoustic speech signal to a corresponding word or a corresponding word sequence. The speech recognition system is usually synthesized with speech, ".", As a dialogue system for controlling electrical devices. The dialogue with the user can be used as the only interface to operate the electrical device. Voice input and even output can be used as one of many communication methods The prior art US Patent No. US-A-6, 1 1 8,888 describes a control device and a method for controlling an electrical device (such as a computer) or a device used in the field of entertainment electronics. In order to control the device, the user has the right to control plural numbers Input devices. Children's devices are mechanical input devices (such as a keyboard or mouse) and speech recognition devices. In addition, 'the control device includes a camera that can pick up the user's gestures and mimicry' and can be processed as further The communication with the user is realized in the form of dialogue, in which the system has multiple modes at its disposal to transmit information to the user. It includes speech synthesis and my output. It also includes anthropomorphic images, Such as an image of a person, a face, or an animal. This image is in the form of computer graphics on the display Display to the user. Although the dialog system has been used for various special applications, such as telephone information systems, applications in other fields such as home control electronics, entertainment electronics, etc. have not been widely recognized. 85329 200407710 Summary of the Invention One of the items of the present invention is to provide a device for picking up a component for identifying a signal, and a method for making an electrical device. The electrical device allows a user to set up Songzhuang τ First, Tian Wei said that the device can be easily operated by control. The purpose can be achieved by methods such as the first item in the scope of patent application, and / or the second item in the scope of patent application, which defines the better of the present invention. DETAILED DESCRIPTION The device according to the present invention includes an anthropomorphic element that can be moved mechanically and mechanically. It is the H of the device, and the device is an anthropomorphic conversation partner of the user. The implementation of anthropomorphic components can vary widely. For example, they can be moved by a motor relative to a fixed housing of an electrical device. The guillotine of the shell. The key is that the anthropomorphic element has a user-identifiable disease-free side (A side. If the front side is facing the user, he will feel that the device is " attention to listening, that is, it can be received Voice instructions. According to the invention, the device includes means for determining the position of the user. This can be achieved via, for example, a sound or an optical sensor. The movement element of the anthropomorphic element is controlled such that the front side of the anthropomorphic element faces The position of the user. This makes the user always feel that the device is ready to "listen" to his speech. 0 According to another embodiment of the present invention, the anthropomorphic element includes _ anthropomorphic image. This can be not only a person or an animal The image can also be an image of an illusive character (such as a robot). The more easily accepted image is a human face. It can be a realistic or symbolic image, such as only showing eyes, nose, Mouth contour. 85329 -6- 200407710 The device preferably also includes means for supplying a voice signal. Speech recognition is indeed particularly important for controlling electrical devices. However, answers, confirmations, queries, etc. can also be implemented with speech output components. Speech output can include reproduction of pre-stored speech signals, as well as real speech synthesis. A complete dialog control can be implemented with the voice output component. You can also talk to users to provide entertainment for them. According to another embodiment of the present invention, the device includes a plurality of microphones and / or at least one camera. Voice signals can be picked up by a single microphone. However, when using multiple microphones, on the one hand, a pick-up mode can be achieved, and on the other hand, the user's position can be ascertained by receiving the user's voice signals through the multiple microphones. A camera can be used to observe the environment of the device. With corresponding image processing, the user's position can also be determined based on the picked up image. A microphone, a camera and / or a speaker for supplying a voice signal may be arranged on the anthropomorphic element which can be moved mechanically. For example, for anthropomorphic elements in the form of a human head, two cameras can be placed in the eye area, a speaker can be placed in the mouth, and two microphones can be placed close to the ear. It is best to have components to identify the user. This can be achieved, for example, by evaluating the picked up image signals (visual or facial recognition) or by evaluating the picked up audio signals (voice recognition). The device can thus determine the current user from several people within the device environment and direct the anthropomorphic component to the user. The moving member can be configured in a number of different ways to mechanically move the anthropomorphic element. For example, these components may be electric motors or hydraulic adjustment components. 85329 200407710 The anthropomorphic element can also be moved by the moving member. However, the anthropomorphic 7L piece is preferably rotatable only with respect to a fixed portion. For example, in this example, it may be rotated about a horizontal and / or vertical axis. The device according to the invention may form part of an electrical device, such as a device for entertainment electronics (such as a television, audio and / or video playback device, etc.). In this example, the device represents the user interface of the device . In addition, the device may include other working members (keyboard, etc.). Alternatively, the device according to the present invention may be an independent device as a foot control device for controlling one or more independent electrical devices. In this example, the devices to be controlled have an electrical control terminal (such as a wireless terminal or a suitable control bus), via which the device controls the device based on the user's voice command received. The device according to the invention can be used in particular as an interface for a user's data storage and / or trenching system. To this end, the device includes internal data memory, or the device is connected to an external data memory via, for example, a computer network or the Internet. Users can store data (such as phone numbers, backup: & recordings, etc.) or query poor materials (such as time, news, latest TV program list, etc.) during conversations. In addition, the dialogue with the user can also be used to adjust the device's own parameters and change its configuration. When equipped with speakers that provide sound signals and microphones that pick up those signals, you can provide signal processing with interference suppression, that is, the way of processing the picked up sound signals can suppress some of the sound signals from the speakers. This is especially advantageous when the speakers and microphones are arranged next to each other in space, such as on the 85329 anthropomorphic element. In addition to using the device for the above-mentioned users to conduct dialogues, taking service = Zishangzhi 'can also be used for ^ # ^, force million purposes, such as information, entertainment or giving instructions to users. Based on this, there is another specific embodiment which can be used to perform the opposite month, and in time, the person who sends the instruction issues instructions. At this time, it is better to answer both the living style. These instructions can be complex and can pick up user objects to ask questions, such as foreign language vocabulary, and answer blankly (such as foreign language + consistent hair /, deduction is not (such as the definition of words) and return ~ PQ D§]) Relatively short. Dialogue is conducted between the user and the simulation: it can be implemented visually and / or audioly. The present invention proposes that there may be pieces (such as foreign language vocabulary) that are stored incorrectly, and are stored in Wanfa '(that is, a set of learning objects and fruits), in which each learning object is stored to V — a question (such as a definition), a security a Clothing (i like vocabulary) and the last time-the amount of time elapsed after the user k asked or after the user answered the question correctly. In the dialogue, selecting and questioning the learning objects one by one is asking the user the question, and comparing the user's answer with the stored answer. The selection of the learning object to be used as a question is to take into account the stored time measurement values, that is, since the suspension _, A 乂丄丄,, and the subject are close, the time elapsed after the person asked the child object ( For example)-suitable learning mode to achieve, this mode has false errors. In addition, in addition to the time measurement value, the degree of relevance can also be taken into consideration when selecting to evaluate each learning object. These and other aspects of the invention will be more clearly understood in conjunction with the following specific examples. 85329 200407710 Figure 1 is a block diagram of the control device 10 and the device 12 controlled by the device. The control device f1G is an anthropomorphic element 14 for the user. Microphone 18 and a position sensor (here in the form of a camera) for the user's position are placed on the anthropomorphic element 14. This piece together constitutes a mechanically rare 22. The anthropomorphic element u and the mechanical unit 22 are rotated by a motor 24 and a vertical axis. The central control unit controls the motor 24 via a drive circuit 28. Her human element 14 is an independent mechanical unit. It has a front side that can be identified without errors. Microphone] 6, speaker ㈣ and camera 20 are arranged on the anthropomorphic element 14 and face the front side. The microphone 16 provides a sound signal. This signal is picked up by the pickup system 30 and processed by the speech recognition unit 32. The speech recognition result, that is, the sequence of words assigned to the picked up signal, is transmitted to the central control unit%. The central control unit 26 also controls a speech synthesizing unit 34, which provides a synthesized speech signal through the sound generating unit 36 and the speaker 18. The scene picked up by the camera 20 is processed by the image processing unit 38. The image processing unit 38 determines the position of the user based on the image signal provided by the camera 20. The position is said to be transmitted to the central control unit 2 $.忒 The mechanical unit 2 2 serves as a user interface. The central control unit 2 6 receives input from the user (microphone 丨 6, speech recognition unit 32) through the mechanical unit, and answers the user (speech synthesis unit 34, Speaker 18). In this example, the control unit 10 is used to control an electric device 12, such as a device used in the field of entertainment electronics. Only functional units of the control device 10 are shown symbolically in FIG. 1. Different units, such as the central control unit 26, the voice recognition unit 32, and the image processing unit 85329 -10-200407710 processing unit 38, can be separated into groups at the same time t, "seeing smiles can exist in the same way. The same can also be Pure software-Bunai / Mu, She Zhuo Temple Early Fan, among which the functionality of several or all of these units can be realized by performing a Yu-style on a central early tower. \ -j- The two rooms do not need to be in line with each other or the mechanical unit 22. The mechanical unit 22, that is, Yun ,, 乂 f but not necessarily arranged on this element, human elements 14 and Lyric mr grams wind 16, Speaker] 8 and sensor 20, can be separated from the rest of the control device] __ 刀开女和, and the signal connection to it only through the line or wireless connection. = Origin ', the conversion device 1Q constantly probes its After a user is judged nearby, the central control unit 7t26 controls the motor 24 so that the front side of the anthropomorphic element 10 faces the user. Fen Hai; 5 ^ image processing early Yuan 3 8 ~ Including face recognition. When the camera 20 provides plural In the image of a person, the user is identified by face recognition, to determine who is known to the system. Then I > 7 dk humanized component 14 is facing the user. When equipped When there are multiple microphones, i processes signals from these microphones in the form of b, b, and ten, so as to know the pickup mode in the direction of the user's position. In addition, the implementation of the image processing unit 38 is set to make it available ; Understand '' the scene of the camera 20 in the vicinity of the rushing gangster mouth & mechanical early Yuan 22. Then, the corresponding scene can be assigned to a number of predefined states. For example, in this way 'the central # 制组 26 can know whether there are people in the room, or there are many people. This unit can also identify and identify the behavior of the user, such as: the user is watching the robot army & 9 9 > soldiers & Qian Zao 7L22m or talking to others. By evaluating the identified status, the recognition ability can be significantly improved. For example, 85329 -11-200407710 can be avoided to misunderstand part of the dialogue between two people as a voice command. When talking to the user , The central control unit determines its input and controls the device 12 accordingly. The volume of the sound reproduction device 12 can be controlled as follows:-The user changes his position and faces the anthropomorphic element 14. By the motor 24 Continuously guide the anthropomorphic element 14 so that its front side faces the user. To this end, according to the determined user position, the driving circuit 28 is controlled by the central control unit 26 of the device 10; the user issues a voice command, such as & quot TV volume ". Microphone 16 picks up 4 ^ order and is recognized by the voice recognition unit 32; the central control unit 26 responds, and asks the speaker 18 via the voice synthesis unit 34 to raise or lower: ? , :: The user issues the first command " lower ". After recognizing the voice signal, the central control unit 26 controls the device 12 to reduce the volume. FIG. 2 is a perspective view of an electrical device 40 having an integrated control device. In the figure, a personification element 14 of a work garment 10 is provided, and the element can be rotated about a vertical axis relative to the fixed housing 42 of the m device 4Q. In this example, the anthropomorphic element has the shape of an ancient and a flat black rectangle. The camera 20 and the speaker 18 are shown on the display system. Two microphones 16 are arranged on the side. Machinery Early Yuan 2 Series 2 is rotated by _1min. To go (not shown), so that the front side always points in the direction of the user. /, /, L In the example (not shown), the device 10 in FIG. 1 is not used to control the device 12, and #, ... 'is used for dialogue, and the purpose is to instruct the user. The central control unit 2 61 彳 f — subscription — a learning program for users to learn foreign languages. 85329 -12-200407710 =: = :: _ pieces. These objects are individual data groups, each group (frequency in the language) assessment 1 "," state-related material records φ, ^ li, heart, and Time measurement value. Data selected and questioned one by one during the dungeon. ★ Recorded $ Xidan ^-Ji Tzu learned the implementation of the dialogue in the early days. In this case, give, ^ x J 考一 # Yue No, that is to store the key in the optical record or 'ear-broadcast data record ... person or person ... pick up the user's turn-in by (for example) the key |, and preferably by Mike i person i Shimenjin Start the automatic speech recognition 32 to learn ~ back to the valley, and store it with the stored answer (the word tells the answer is not judged # AT i I store. The user is hacked. The wooden mouth is robbed as correct. If the correct answer is restored, Maybe,… the user will be informed about the new round, “& f new opportunity. After processing the beijing record, the last saved will be set to zero. After the person asks very much < the timing is updated, it will be repeated After k, select and query the next data record. Select a to-be-queried d by a memory model 1 The formula P (k) = exp (-t (k) * r (c (k))) is used to express a model of Chuanyima early Lu Jiyi, P (k) in the bean represents! People know the learning object k The probability is called the exponential function. The time of the miscellaneous table since the question was welcomed. E (k) represents the learning level of the object. Turn: the specific error rate of the learning level. T can represent time. Give the time t. The learning level can be 1 U 1 pass and all kinds of meanings. A savable mode is to give the object that is answered N times correctly | To M σ ϊ 母 and mother ^^ 〇 assign a corresponding level. As for The error rate can be assumed to be a suitable value, or a threshold value, or to choose an initial value of Tongyi I, and adjust it with a gradient algorithm. It is said that the goal is a measure of maximizing knowledge. Τ is part of the entire 85329 -13-200407710 :! Xi, for the user's knowledge, 1 uses the correlation measurement value to ΓΓ2 = the problem of sending the object k makes the machine), therefore, I 4 heart The probability of asking knowledge in each step is P (k). The lowest modulus is to measure the police force, ^ :, and the knowledge measure can be calculated after the parent step and displayed to the agent. This method can be used to obtain the current learning object group j4 11 as widely as possible. By using a good memory model, an effective learning strategy can be achieved. A variety of modifications and further improvements can be made to the live query on the query. For example, Q 砀 (疋义) can have multiple correct answers (vocabulary). For example, it may be considered to use stored correlation measures to emphasize more relevant (more common words: for example, the corresponding learning object group may include thousands of words. These can be, for example, white objects' for sufficient use (Such as literature, business, technology, etc.) specific words. 'Heart ~' The present invention relates to a device including a component for picking up and recognizing a voice signal, and a method for communicating with an electrical device. The device It includes a mechanically movable anthropomorphic element. The position of the user is determined, and the anthropomorphic element (which may include an image of a human face, for example) is moved in such a way that its seven sides point in the direction of the user's position. Speakers and / or video cameras can be arranged on the anthropomorphic element. The user can have a voice conversation with the device, where the device is in the form of anthropomorphic element. An electrical device can be controlled according to the user's language input. The dialogue between the user and the anthropomorphic component can be carried out for the purpose of instructing the user. 85329 -14- 200407710 In the figure. Figure 1 is the element of a control device Figure 2 is a perspective view of an electrical device including a control device. Explanation of the representative symbols of the drawings 10 Control device 12 Device 14 Anthropomorphic element 16 Microphone 18 Speaker 20 Camera 22 Mechanical element 24 Motor 26 Central control unit 28 Drive Circuit 30 Picking system 32 Speech recognition line unit 34 Language synthesis unit 36 Sound unit 38 Image processing mouth early element 40 Device 42 Fixing case 44 Front side -15-85329

Claims

200407710 Scope of patent application: 1. A device comprising:-a component for picking up and recognizing a voice signal (30, 32), and '-an anthropomorphic element (14) having a front side (44), and A moving member (24) for mechanically moving the anthropomorphic element (14), wherein:-a member (38) is provided for determining the position of the user; and-the way of controlling the moving member (24) makes the anthropomorphic The side (4 4) of the chemical element (14) points in the direction of the user's position. 2. The device according to item 1 of the scope of patent application, which is provided with a component (34, 36, 18) that provides a voice signal. 3. The device according to any of Lishu's patent applications, wherein the anthropomorphic element (14) includes an anthropomorphic image, especially an image of a human face. 4. The device according to any one of the foregoing patent applications, wherein: a plurality of microphones (16) and / or at least one camera (20) are provided; and the microphone (16) and / or the camera (20) are Ideally placed on the anthropomorphic element (14). Lu 5. The device according to any of the foregoing patent applications, which is equipped with a means for identifying at least one user. 6. The device according to any one of the foregoing patent applications, wherein the moving member (24) enables the anthropomorphic element (14) to rotate about at least one axis. 7. As mentioned in any one of the patent application devices, it is equipped with at least one foreign electrical device (12), which is controlled by these voice signals. 8. The device according to any one of the foregoing patent applications, wherein: 85329-equipped with at least-Gara, λ crows for providing audio signals-equipped with at least-Yang 'per ear, and among them : A microphone (16) for picking up the audio signal;-Equipped with one of the audio signals picked up for the virtual :, processing unit (30), complex φ such as ,, etc.% & / 、 Q $ is derived from the sound emitted by the speaker (18) (the signal is suppressed. Earphones 9. As described in the aforementioned application for patent 笳 Λ, j, the device is equipped with a camera for the purpose of 'no user' And / or by standing m τ "righteous component, in the dialogue is visual or an antiquated, ', & to the user to point and pick up the user's answer with -keyboard and / or a 4g wind 1 0. For example, the scope of the patent application No. 9 Shigan b, η ^ ^ 1, where the child dialogue component includes a component that stores a set of learning objects, where ...-for each learning f object, at least-one instruction, one The answer and a measure and component of the time it took the user to process the instruction The formation method makes it possible to select and query the learning objects by comparing the stored answers of Ai Fen Box Six and Mu Cong to instruct the user and user; Measured value.]. A method of communication between a user and an electrical device, including:-determining the location of a user;-moving-anthropomorphic element (14) before the anthropomorphic element (14) The side (4 4) points in the direction of the user; and _ picking up and processing the user's voice signal. 200407710 1 2. The method according to item 11 of the scope of patent application, which is based on the picked up voice signals to Control the electrical device (1 2). 85329