JP2007221300A

JP2007221300A - Robot and control method of robot

Info

Publication number: JP2007221300A
Application number: JP2006037544A
Authority: JP
Inventors: Yusuke Yasukawa; 裕介安川
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2006-02-15
Filing date: 2006-02-15
Publication date: 2007-08-30

Abstract

<P>PROBLEM TO BE SOLVED: To provide a robot that has a function of making communication with humans and reacts to a call with a simple configuration and to provide a control method of the robot. <P>SOLUTION: The robot is configured such that its head is turnably fitted to a shell, a directivity microphone directed in a front direction of a face of the head is provided to the head, and an array microphone comprising a plurality of microphones arranged in the front part of the surrounding of its neck at a prescribed interval and a control section for controlling the robot are provided to the shell. The control section is configured to include: a voice detection means for detecting voice from the array microphone; a means for detecting a call from a human including a sound source direction from the detected voice; a head turning control means for turning the head in the detected sound source direction; a directivity microphone voice acquisition means for acquiring a voice from the directivity microphone directed in the sound source direction; and an output control means for carrying out output control on the basis of the acquired voice. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は，オフィスロボット等の人とのコミュニケーションを行う機能を持つロボット及びロボットの制御方法に関する。 The present invention relates to a robot having a function of communicating with a person such as an office robot and a method for controlling the robot.

近年，頭部，胴体を持つ対話型の人型ロボットが各種の企業において広く進められており，そのようなロボットに対して人間が声をかけた時にその声がどの方向から発生したかをロボットが検出することができる技術が知られている。その中の一つは，自律行動ロボットとして，頭部の眼の位置に撮像部，耳の位置に左・右のマイクを備えた音検出部を設け，音源から発生した音を検出して音源定位処理により音が発した方向を特定し，特定した方向に頭部を駆動することで，撮像部により音源の方向の周辺の画像を撮影して，画像の中から特定形状の目標画像を抽出するという技術である（特許文献１参照）。同様の技術としてロボットの頭部の眼の位置にカメラ，耳の位置に左・右のマイクを備え，聴覚モジュール，視覚モジュール，モータ制御モジュールによりそれぞれ聴覚イベント，視覚イベント及びモータイベントを抽出して聴覚情報，視覚情報及びモータ情報を表示する技術（特許文献２参照）がある。更に，ロボットの胴体，頭部の何れかに複数個のマイクロフォンから成るマイクロフォンアレイを設け，音源からの音響信号を受音し，マイクロフォン毎の入力信号の遅延和から音源方向を算出し，音源の方向にロボットの注目方向に向けて音源の方向とマイクロフォンアレイの指向方向を一致させる技術（特許文献３参照）がある。
特開２００３−６２７７７号公報特開２００２−２６４０５２号公報特開２００２−３６６１９１号公報 In recent years, interactive humanoid robots with a head and a torso have been widely used in various companies. When a human speaks to such a robot, the direction from which the voice is generated is determined by the robot. There are known techniques that can be detected. One of them is an autonomous behavior robot that has an imaging unit at the eye position of the head and a sound detection unit with left and right microphones at the ear position, and detects the sound generated from the sound source. The localization process identifies the direction of sound and drives the head in the specified direction so that the imager captures images around the direction of the sound source and extracts a target image of a specific shape from the image. This is a technique (see Patent Document 1). As a similar technology, a robot is equipped with a camera at the eye position of the robot's head and left and right microphones at the ear position. The auditory module, visual module, and motor control module extract auditory events, visual events, and motor events, respectively. There is a technique for displaying auditory information, visual information, and motor information (see Patent Document 2). In addition, a microphone array consisting of a plurality of microphones is provided on either the body or head of the robot to receive an acoustic signal from the sound source, calculate the sound source direction from the sum of delays of the input signals for each microphone, and There is a technique (refer to Patent Document 3) in which the direction of the sound source and the direction of the microphone array coincide with each other toward the direction of interest of the robot.
JP 2003-62777 A JP 2002-264052 A JP 2002-366191 A

上記特許文献１，２の技術では，ロボットの頭部の耳の位置に左・右のマイクを設け，眼の位置に視覚情報を設けるものであり，音源の方向を簡単な構成により抽出して，その方向から音声を抽出したいという目的に対しては構成が複雑であり，コストがかかるという問題がある。また，特許文献３の技術では，マイクロフォンアレイをロボットのどこかに設け音響信号を受音してマイクロフォン毎の遅延和から音源方向を算出してロボットの注目方向をマイクロフォンアレイの指向方向と一致させるというだけであり，マイクロフォンアレイをどのように設け，注目方向に向けて音源からの音声を検出するかという具体的な構成を特徴とするものではない。 In the techniques of Patent Documents 1 and 2 described above, left and right microphones are provided at the position of the ear of the robot head, and visual information is provided at the position of the eye. The direction of the sound source is extracted with a simple configuration. , For the purpose of extracting voice from that direction, there is a problem that the configuration is complicated and expensive. In the technique of Patent Document 3, a microphone array is provided somewhere on the robot, an acoustic signal is received, a sound source direction is calculated from a delay sum for each microphone, and the attention direction of the robot is matched with the pointing direction of the microphone array. However, it does not feature a specific configuration of how to provide a microphone array and detect sound from a sound source in the direction of interest.

本発明は人とのコミュニケーションを行う機能を持ち，呼びかけに対して簡易な構成により反応をするロボット及びロボットの制御方法を提供することを目的とする。 An object of the present invention is to provide a robot having a function of communicating with a person and reacting to a call with a simple configuration and a control method of the robot.

図１は本発明の原理構成を示す図である。図中，１は胴体に対して回転可能なロボットの頭部，１０は頭部１の正面方向に指向性を持つ指向性マイク，２はロボットの胴体，２０は複数のマイクロフォンを配列して構成されたアレイマイク，２１は制御部，２２は頭部回転機構，２３は呼びかけ者の音声に応じた出力（表示出力，音声出力，機械的な出力等）を行う出力手段である。制御部２１内の２１ａはアレイマイクの音声検出手段，２１ｂは取得した音声信号を分析して音源方向，音量及び音声構成からある方向からの人による呼びかけであることを検出する呼びかけ検出手段，２１ｃは頭部回転制御手段，２１ｄは指向性マイクの音声取得手段，２１ｅは音声に対応した出力制御を行う出力制御手段である。 FIG. 1 is a diagram showing a principle configuration of the present invention. In the figure, 1 is a robot head rotatable with respect to the body, 10 is a directional microphone having directivity in the front direction of the head 1, 2 is a robot body, and 20 is a plurality of microphones arranged. The array microphone, 21 is a control unit, 22 is a head rotation mechanism, and 23 is an output means for performing output (display output, audio output, mechanical output, etc.) according to the caller's voice. 21a in the control unit 21 is a voice detecting means for the array microphone, 21b is a call detecting means for analyzing the acquired voice signal and detecting that the call is made by a person from a certain direction based on the sound source direction, volume and voice structure, 21c Is a head rotation control means, 21d is a sound acquisition means of a directional microphone, and 21e is an output control means for performing output control corresponding to the sound.

本発明の特徴はロボットの胴体２の上部（首の周囲等）に所定の間隔で配置された複数のマイクロフォンで構成するアレイマイク２０を設けて呼びかけ人（音源）のいる方向を検出するために設けられ，頭部１に設けた指向性マイク１０は呼びかけ人の音声を取得するために設けられている。 A feature of the present invention is that an array microphone 20 composed of a plurality of microphones arranged at a predetermined interval is provided on the upper part of the robot body 2 (around the neck, etc.) to detect the direction of the calling person (sound source). The directional microphone 10 provided in the head 1 is provided for acquiring the caller's voice.

最初に制御部２１において，アレイマイクの音声検出手段２１ａで呼びかけ人からの声を検出する。次に呼びかけ検出手段２１ｂでアレイマイク２０を構成する複数のマイクで取得した各信号を分析し，音源の方向や音量及び人の声であることを検出すると，検出した音源の方向を示す信号を頭部回転制御手段２１ｃに供給する。頭部回転制御手段２１ｃは受け取った方向を示す信号に応じて，頭部回転機構２２を駆動する。これにより胴体２に取り付けられた頭部１の正面は呼びかけ人に真っ直ぐに向き合う。この状態で，指向性マイク音声取得手段２１ｄにより，正面方向に対して指向性を持つ指向性マイク１０から呼びかけ人の音声を取得し，取得した音声に応じて出力制御手段２１ｅが制御され，出力制御手段２１ｅにより出力手段２３が駆動される。出力手段２３としては，音声出力，表示出力，ロボットの動き等がある。なお，音声取得手段２１ｄにより取得した音声は，出力手段２３により「音声伝言」として録音したり，音声認識手段を付加することで音声を認識して対応した出力制御手段２１ｅを制御するように構成することができる。また，アレイマイク２０は首の前方または周囲（全周）に複数配置するか，胴体２（胸部や腹部）の前方または周囲（全周）に複数配置することができる。 First, in the control unit 21, the voice from the calling person is detected by the voice detecting means 21a of the array microphone. Next, when the call detection means 21b analyzes each signal acquired by a plurality of microphones constituting the array microphone 20, and detects that it is the direction of the sound source, the volume and the voice of the person, a signal indicating the direction of the detected sound source is obtained. This is supplied to the head rotation control means 21c. The head rotation control means 21c drives the head rotation mechanism 22 according to the signal indicating the received direction. As a result, the front of the head 1 attached to the body 2 faces the calling person straight. In this state, the caller's voice is acquired from the directional microphone 10 having directivity with respect to the front direction by the directional microphone voice acquisition means 21d, and the output control means 21e is controlled according to the acquired voice, and output control is performed. The output means 23 is driven by the means 21e. The output means 23 includes voice output, display output, robot movement, and the like. The voice acquired by the voice acquisition means 21d is recorded as a “voice message” by the output means 23, or the voice recognition means is added to recognize the voice and control the corresponding output control means 21e. can do. A plurality of array microphones 20 can be arranged in front of or around the neck (around the entire circumference), or a plurality of arrays can be arranged in front of or around the trunk 2 (chest or abdomen).

本発明によればロボットの周囲から声をかけたときに，ロボットが反応して頭部（顔の正面）をその声をかけた人の方に向けると声をかけた人はロボットが自分に注目してくれたと感じてロボットとのコミュニケーションに意欲を持つようになる。また，声をかけた人の音声の内容に応じた出力動作が行われると，ロボットが声を認識したことが分かり，更に深いコミュニケーションを続ける契機となる。 According to the present invention, when a voice is spoken from the surroundings of the robot, the robot reacts and the head (front of the face) is directed toward the person who made the voice. I feel that I have noticed that I am motivated to communicate with the robot. Also, if an output operation is performed according to the content of the voice of the person who made the voice, it will be understood that the robot has recognized the voice, and this will trigger further deeper communication.

また，音響的に声をかけた方向からの音だけを指向性マイクにより抽出し，他の方向の音を抑制することができ，Ｓ／Ｎ比の高い音声信号を入力することができるので音声認識率を向上することができる。 In addition, only sound from the direction in which the voice is spoken can be extracted by the directional microphone, the sound in the other direction can be suppressed, and an audio signal with a high S / N ratio can be input. The recognition rate can be improved.

図２はロボットの実施例の外観構成，図３はロボットのシステム構成を示す。 FIG. 2 shows an external configuration of an embodiment of the robot, and FIG. 3 shows a system configuration of the robot.

図２において，１，２の各符号は上記図１の同一符号と同じであり，１は頭部，１０は頭部１の顔の中心位置（図の例は鼻の位置）に設けた指向性マイク，１１は頭部の眼の位置に設けたカメラ，２は胴体（ボディ），２０はロボットの胴体２の首の周りの，頭部１を軸として両肩を結ぶ前方の半円に沿って一定間隔を置いて配置した複数のマイクにより構成したアレイマイク，２４は外部の人に対して文字や画像を表示するモニタ部，２５は外部の人に音声を出力するスピーカ，２６はロボットの腕，２７はロボットの胴体２を支え，ロボットを移動させる移動装置である。図２の実施例では，アレイマイク２０は首の周りの前方に一定間隔をおいて配置されているが，アレイマイク２０の設置形式としては，配置位置や，配置構成により次の(1) 〜(4) のように各種の形式で設けることができる。 In FIG. 2, reference numerals 1 and 2 are the same as the same reference numerals in FIG. 1, 1 is the head, and 10 is the directivity provided at the center position of the face of the head 1 (in the example shown, the position of the nose). 11 is a camera provided at the position of the eye of the head, 2 is a torso (body), 20 is a semicircle in front of the robot body 2 around the neck and connecting both shoulders with the head 1 as an axis. An array microphone composed of a plurality of microphones arranged at regular intervals along the line, 24 is a monitor unit for displaying characters and images to an external person, 25 is a speaker for outputting sound to the external person, and 26 is a robot. , 27 is a moving device that supports the body 2 of the robot and moves the robot. In the embodiment of FIG. 2, the array microphone 20 is arranged at regular intervals in front of the neck. However, the arrangement type of the array microphone 20 is as follows depending on the arrangement position and the arrangement configuration. It can be provided in various formats as shown in (4).

(1) 首の前方に配置する形式
(2) 首の周りに配置する形式，例えば，６０度おきに６個のマイクを全周に配置する。
(3) 胴体の前方に配置する形式
(4) 胴体の周囲に配置する形式 (1) Form to be placed in front of neck
(2) Place around the neck, for example, place 6 microphones every 60 degrees around the circumference.
(3) Form to be placed in front of the fuselage
(4) Form to be placed around the fuselage

図３はロボットのシステム構成を示し，上記図２に示すロボットの全体の制御を行うシステム構成である。図中，３０はデータを格納するＲＡＭ，プログラム及びデータを格納したＲＯＭ，マイクから入力する音声（ディジタル信号）について分析（音量，周波数帯域等），音声認識及び音声合成等の処理を行う音声処理回路や，プログラムによる処理を行うＣＰＵを搭載したＣＰＵボード，３１は頭部回転機構，３２は頭部に設けた指向性マイク（図２の１０），３３はアレイマイク（図２の２０），３４はカメラ（図２の１１），３５はモニタ部（図２の２４），３６は移動装置駆動機構，３７はスピーカ（図２の２５），３８はロボットの各部に電源を供給する電源部である。 FIG. 3 shows a system configuration of the robot, which is a system configuration for controlling the entire robot shown in FIG. In the figure, reference numeral 30 denotes a RAM that stores data, a ROM that stores programs and data, and a voice process that performs processing such as analysis (volume, frequency band, etc.), voice recognition, and voice synthesis on voice (digital signal) input from a microphone. A CPU board equipped with a circuit and a CPU for processing by a program, 31 is a head rotating mechanism, 32 is a directional microphone (10 in FIG. 2) provided on the head, 33 is an array microphone (20 in FIG. 2), 34 is a camera (11 in FIG. 2), 35 is a monitor unit (24 in FIG. 2), 36 is a moving device drive mechanism, 37 is a speaker (25 in FIG. 2), and 38 is a power supply unit that supplies power to each part of the robot. It is.

図４は実施例１のフローチャートであり，上記図３のＣＰＵボード３０において実行される。 FIG. 4 is a flowchart of the first embodiment, which is executed in the CPU board 30 of FIG.

最初にアレイマイクにより音声を取得し（図４のＳ１），アレイマイクにより取得した音声（複数のマイクからの複数の音声）から，音源方向を判別し（同Ｓ２０），マイクからの音量が判別され（同Ｓ２１），更に音声らしさ（人間の音声であるか）を判別する（同Ｓ２２）。なお，「音声らしさ」は，音が機械雑音でないことを示す度合であるが，入力音の周波数特性や途切れパターンを解析することで音声らしいかどうかを判断することができる。なお，これらのＳ２０〜Ｓ２２の判別は同時に行うことができるが，順番に行ってもよい。 First, the sound is acquired by the array microphone (S1 in FIG. 4), and the sound source direction is determined from the sound acquired by the array microphone (a plurality of sounds from the plurality of microphones) (S20 in the same), and the sound volume from the microphone is determined. (S21), and further, it is determined whether the voice is likely to be a human voice (S22). Note that “speech quality” is a degree indicating that the sound is not mechanical noise, but it is possible to determine whether the sound seems to be sound or not by analyzing the frequency characteristics and the discontinuity pattern of the input sound. The determinations of S20 to S22 can be performed at the same time, but may be performed in order.

この後，上記Ｓ２０〜Ｓ２２の結果について，(1) 一定以上の音量であるか，(2) 一定以上（度合い）の音声らしさ，(3) 一定以上の音源方向確実性，で音声を取得できたかの判別をする（図４のＳ３）。これによりロボットが人から呼びかけられたかどうかが判断される。この判別で一つでも条件を満たさない音声を取得できないと判定されると，ステップＳ１に戻り，全ての条件を満たした音声を取得したと判定されると，呼びかけられた方向（音源方向）の情報を用い，ロボットの頭部をその方向に回転させる（図４のＳ４）。ロボットの頭部を回転させることにより，呼びかけた人にはロボットが自分の声に注意を向けたことが，身振りから分かる。これにより，コミュニケーション性が向上する。次に指向性マイクにより音声を取得し（同Ｓ５），以降は指向性マイクで呼びかけた人の音声をクリアに取得し，取得した音声を利用した処理を行う（同Ｓ６）。この時の音声を利用した処理としては，音声方向に頭を動かして使用者に反応したり，音声認識をして認識にしたがった制御をしたり，モニタ部（図２の２４）に何らかの表示を行ったり，音声伝言の録音（ＣＰＵボード３０に音声記録のチップを搭載することにより実現する）をしたり，スピーカ（図２の２５）から何らかの音声出力を行ったり，腕（図２の２６）を動かす等の処理がある。 After this, with regard to the results of S20 to S22, the sound can be acquired with (1) a certain level of sound volume, (2) a certain level of sound quality (degree), and (3) a certain level of sound source direction certainty. (S3 in FIG. 4). Thereby, it is determined whether or not the robot is called by a person. If it is determined in this determination that even one voice that does not satisfy the condition cannot be acquired, the process returns to step S1, and if it is determined that the voice that satisfies all the conditions is acquired, the call direction (sound source direction) Using the information, the head of the robot is rotated in that direction (S4 in FIG. 4). By rotating the robot's head, the caller can see from the gesture that the robot has turned his attention to his voice. This improves communication. Next, the voice is acquired by the directional microphone (S5), and thereafter, the voice of the person who called by the directional microphone is clearly acquired, and the process using the acquired voice is performed (S6). As processing using the voice at this time, the head is moved in the voice direction to react to the user, the voice is recognized and control is performed according to the recognition, or some display on the monitor unit (24 in FIG. 2). , Recording a voice message (implemented by mounting a voice recording chip on the CPU board 30), performing some voice output from the speaker (25 in FIG. 2), and arm (26 in FIG. 2) ).

上記の音声を利用した処理（図４のＳ６の処理）を実行中にもアレイマイクによる音声方向を検出し続け，利用者が動いた場合でも頭部を追従して回転させるようにすることができる。 It is possible to continue detecting the voice direction by the array microphone even while the above-described processing using the voice (the processing of S6 in FIG. 4) is performed, and to follow and rotate the head even when the user moves. it can.

また，アレイマイクにより擬似的な指向性を作り出すこともできるが指向性マイクを使用した方が容易に性能が上がる。また，アレイマイクは音声方向を検出するためだけに使用するので性能がそれほど高くないマイクを使用することができる。 In addition, pseudo directivity can be created with an array microphone, but using a directional microphone can easily improve performance. Further, since the array microphone is used only for detecting the voice direction, it is possible to use a microphone whose performance is not so high.

図５は音源方向計測の確実性の説明図である。図５のＡ．はアレイマイクの構成を示し，Ｂ．はマイク信号の相関度を示す。図５のＡ．において，２０ａ〜２０ｅは上記図２に示すロボットの胴体２の首回り（両肩を結ぶ前方の半円に沿った）に設けたアレイマイク２０を構成する個別のマイクを表し，この例では５個設けられているが，その個数は５個に限定されない。このような構成において，音源方向の検出は次のように行われる。 FIG. 5 is an explanatory diagram of the certainty of the sound source direction measurement. A. of FIG. Indicates the configuration of the array microphone; Indicates the correlation of the microphone signal. A. of FIG. , 20a to 20e represent individual microphones constituting the array microphone 20 provided around the neck of the robot body 2 shown in FIG. 2 (along the front semicircle connecting both shoulders). Although the number is provided, the number is not limited to five. In such a configuration, the direction of the sound source is detected as follows.

音源方向検出は，基本的には多数のマイクに入力する音声の相関を次のように取る処理により行われる。 Sound source direction detection is basically performed by the following process for correlating audio input to many microphones.

すなわち，音が正面からθの方向からマイク２０ａ，マイク２０ｂ，２０ｃ，…，２０ｅに入力するとき，５本のマイクに到達するのに時間差が生じる。各マイクから取得した音を想定した角θに従って，遅延させて比較し，最も一致した角θを音の到来方向とする。図５のＢ．は横軸に想定角，縦軸に一致度をグラフで表し，相関度の高い角度（方向）を音の到来方向と判断する。実際には，環境には雑音が存在するためにグラフは到来方向以外の角でもピークが存在したり，波形がゆがむが，この度合いが音源方向計測の確実度（信頼度と同じ）となる。具体的には，第２ピーク（２番目の高さ）が存在するときは，第１ピーク（１番目の高さ）との高さの差が確実度（信頼度）となる。また，ピークが一つの時でも，ピークの精鋭度（鋭さ）が計測の確実度（信頼度）となる。 That is, when sound is input to the microphone 20a, microphones 20b, 20c,..., 20e from the direction of θ from the front, a time difference occurs to reach the five microphones. The sound acquired from each microphone is compared with a delay according to the assumed angle θ, and the most consistent angle θ is set as the direction of arrival of the sound. B. of FIG. Represents the assumed angle on the horizontal axis and the degree of coincidence on the vertical axis, and the angle (direction) with a high degree of correlation is determined as the direction of sound arrival. Actually, because there is noise in the environment, the graph has peaks at angles other than the direction of arrival, and the waveform is distorted. This degree is the certainty of sound source direction measurement (same as reliability). Specifically, when the second peak (second height) exists, the difference in height from the first peak (first height) is the certainty (reliability). Even when there is only one peak, the sharpness (sharpness) of the peak is the certainty (reliability) of the measurement.

図６は実施例２のフローチャートである。最初にアレイマイクにより音を取得し（図６のＳ１），アレイマイクにより取得した複数の音声から，音源方向を判別し（同Ｓ２０），アレイマイクからの音量を判別し（同Ｓ２１），また，入力した音を音声認識する（同Ｓ２２）。これらの処理結果に基づいて，(1) 一定以上の音量で，(2) 認識した単語にキーワードが含まれ，(3) 一定以上の音源方向の確実性，という条件を満たす音声を取得したか判別する。この判別で一つでも条件を満たさないことが判定されると，ステップＳ１に戻り，全ての条件を満たした音声を取得したと判定されると，呼びかけられた方向（音源方向）の情報を用い，ロボットの頭部をその方向に回転させる（図６のＳ４）。続いて，音源に頭部を回転することにより音源方向に向いた指向性マイクにより呼びかけた人の音声を取得し（図６のＳ５），音声を利用した処理を行う（同Ｓ６）。 FIG. 6 is a flowchart of the second embodiment. First, sound is acquired by the array microphone (S1 in FIG. 6), the sound source direction is determined from the plurality of sounds acquired by the array microphone (S20), the volume from the array microphone is determined (S21), and , The input sound is recognized (S22). Based on these processing results, whether or not a speech that satisfies the following conditions: (1) at a certain volume level, (2) a keyword included in the recognized word, and (3) certainty of the sound source direction above a certain level Determine. If it is determined in this determination that even one of the conditions is not satisfied, the process returns to step S1, and if it is determined that a sound satisfying all the conditions has been acquired, information on the called direction (sound source direction) is used. The robot head is rotated in that direction (S4 in FIG. 6). Subsequently, the voice of the person called by the directional microphone oriented in the direction of the sound source is acquired by rotating the head of the sound source (S5 in FIG. 6), and processing using the sound is performed (S6).

この実施例２では，音声認識（図６のＳ２２）により特定のキーワードとして，例えば呼びかけた人からロボットの名前（例えば，「ロボチャン」といった名前）で呼びかけられた時に反応することができ，空調雑音や突発雑音等に応答することがなくなる。なお，キーワードは一つではなく，複数を登録しておいて，その中の一つでも呼びかけられると反応するよう構成することができる。例えば，「ロボットくん」や，「こんにちわ」等である。 In the second embodiment, the voice recognition (S22 in FIG. 6) can react when a call is made with a name of a robot (for example, a name such as “Robochan”) as a specific keyword by voice recognition (S22 in FIG. 6). And no response to sudden noise. It should be noted that a plurality of keywords are registered instead of one, and it can be configured to react when even one of them is called. For example, “Robot-kun” or “Konchi-wa”.

図７は音声認識の説明図であり，上記図６のステップＳ２２においてアレイマイクから取得した音声について実行され，後述する図８のＳ２２においても実行される。なお，図８のＳ６１においても同様の原理により実行できる（但し，指向性マイクから取得した音声について認識する）。 FIG. 7 is an explanatory diagram of voice recognition, which is executed for the voice acquired from the array microphone in step S22 of FIG. 6, and is also executed in S22 of FIG. Note that the same principle can be used in S61 of FIG. 8 (however, the voice acquired from the directional microphone is recognized).

図７において，２０はアレイマイク，３０は単語切り出し部，３１は比較部，３２は単語バンク，３３は認識結果出力部，３４はテーブルである。この音声認識の原理は，アレイマイク２０から入力した音声から単語切り出し部３０で単語を表す音声パターンを切り出し，比較部３１において予め各候補単語の音声パターンが登録された単語バンク３２の各候補単語リストの音声パターンと比較し，一致度を計算して，認識結果出力部３３でテーブル３４に各候補単語について一致度（得点）を格納する。単語が短いほど，また音声が不明瞭なほど単語リスト間の差は少なくなる。実際にはＳ／Ｎが低下したりイントネーションが想定したものと異なったりして更に認識はあいまいとなる。音声認識の信頼度は，第１候補単語の一致度（得点）とそれ以外の単語の得点の差から計算できる。単純には第１候補と第２候補の得点比から計算することで得られる。 In FIG. 7, 20 is an array microphone, 30 is a word segmentation unit, 31 is a comparison unit, 32 is a word bank, 33 is a recognition result output unit, and 34 is a table. The principle of this voice recognition is that each candidate word in the word bank 32 in which a voice pattern representing a word is cut out from the voice inputted from the array microphone 20 by the word cutout unit 30 and the voice pattern of each candidate word is registered in the comparison unit 31 in advance. The degree of coincidence is calculated by comparing with the voice pattern of the list, and the degree of coincidence (score) is stored for each candidate word in the table 34 by the recognition result output unit 33. The shorter the word and the unclear voice, the less the difference between the word lists. In reality, the S / N is lowered or the intonation is different from that assumed, and the recognition becomes further ambiguous. The reliability of speech recognition can be calculated from the difference between the score (score) of the first candidate word and the score of other words. Simply, it is obtained by calculating from the score ratio of the first candidate and the second candidate.

図８は実施例３のフローチャートである。この実施例３のフローチャートでは，アレイマイクにより音を取得し（図８のＳ１），音源方向，音量，音声認識を行って（同Ｓ２０〜Ｓ２２），一定以上の音量で，認識単語にキーワードが含まれていて，一定以上の音源方向確実性で音声を取得できたかを判定し（同Ｓ３），頭部を呼びかけ方向に回転させ（同Ｓ４），指向性マイクにより音声を取得（同Ｓ５）までの処理は上記図６に示す実施例２のフローチャートと同じである。この後，指向性マイク（図２の１０）により取得した音声について音量を検出し（図８のＳ６０），指向性マイクにより明瞭な信号として取得した音声について音声認識を行い（同Ｓ６１），続いて取得した音がロボットに人が話しかけた音声であるか評価する。すなわち，一定以上の音量があり，単語の内容を認識できたかを判断する（図８のＳ７）。認識できた（人による認識できる音声であると評価）できた場合，ロボット向けの呼びかけであると判断し，人の話声でないと判断された場合には，回転させた首を戻して周囲の音を注意するモードに復帰する（図８のＳ８）。判断できた場合は，音声を利用した処理を行う（図８のＳ９）。なお，ステップＳ８では回転させた首を戻しているが，戻さないでステップＳ１において，アレイマイクによる音を取得する処理に移行し，周囲の音を聴取するようにしてもよい。 FIG. 8 is a flowchart of the third embodiment. In the flowchart of the third embodiment, a sound is acquired by an array microphone (S1 in FIG. 8), sound source direction, volume, and voice recognition are performed (S20 to S22), and a keyword is assigned to a recognized word at a volume higher than a certain level. It is included and it is determined whether or not the sound has been acquired with a certain sound source direction certainty (S3), the head is rotated in the calling direction (S4), and the sound is acquired by the directional microphone (S5). The processing up to this is the same as the flowchart of the second embodiment shown in FIG. Thereafter, the volume of the sound acquired by the directional microphone (10 in FIG. 2) is detected (S60 in FIG. 8), and the voice acquired as a clear signal by the directional microphone is recognized (S61). It is evaluated whether the acquired sound is the voice that the person talks to the robot. That is, it is determined whether there is a sound volume above a certain level and the content of the word can be recognized (S7 in FIG. 8). If it can be recognized (evaluated as a voice that can be recognized by a person), it is determined that the call is for a robot. If it is determined that it is not a human voice, the rotated neck is returned to the surroundings. The mode returns to the mode for paying attention to the sound (S8 in FIG. 8). If it can be determined, processing using voice is performed (S9 in FIG. 8). In step S8, the rotated neck is returned. However, in step S1, the process may be shifted to the process of acquiring the sound by the array microphone in order to listen to the surrounding sounds.

この実施例３により，アレイマイクだけではあまりクリアに取得できない環境でも，声に反応するロボットを構成することができる。 According to the third embodiment, it is possible to configure a robot that reacts to voice even in an environment where it is not possible to acquire the image with an array microphone.

（付記１）頭部が胴体に対して回転可能に取り付けられたロボットにおいて，前記頭部に，その顔の正面方向に向けられた指向性マイクを設け，前記胴体に，その首または胴体の何れかの前方または周囲の何れかに一定間隔を置いて配置された複数個のマイクにより構成されたアレイマイクとロボットを制御する制御部とを備え，前記制御部は，前記アレイマイクからの音声検出手段と，検出した音声から音源方向を含む人の呼びかけを検出する手段と，検出した音源方向に頭部を回転させる頭部回転制御手段と，音源方向に向けられた指向性マイクからの音声を取得する指向性マイクの音声取得手段と，取得した音声に基づいて出力制御を行う出力制御手段とを備える，ことを特徴とするロボット。 (Supplementary note 1) In a robot whose head is rotatably attached to the torso, the head is provided with a directional microphone directed in front of the face, and the torso has either a neck or a torso An array microphone composed of a plurality of microphones arranged at regular intervals either in front of or around and a control unit for controlling the robot, wherein the control unit detects voice from the array microphone. Means, a means for detecting a person's call including the direction of the sound source from the detected sound, a head rotation control means for rotating the head in the detected sound source direction, and a sound from a directional microphone directed to the sound source direction. A robot comprising voice acquisition means for a directional microphone to be acquired and output control means for performing output control based on the acquired voice.

（付記２）付記１において，前記制御部の呼びかけ検出手段は，前記アレイマイクから取得した音声について，音量が一定レベル以上か，音源方向が検出されたか，音声らしさが検出されたか，のそれぞれを検出することにより呼びかけとして検出することを特徴とするロボット。 (Supplementary Note 2) In Supplementary Note 1, the call detection means of the control unit determines whether the volume acquired from the array microphone is higher than a certain level, whether the direction of the sound source is detected, or whether the voice is detected. A robot characterized by detecting it as a call by detecting it.

（付記３）付記１において，前記制御部の前記呼びかけ検出手段は，前記アレイマイクから取得した音声の認識を行って予め決められたキーワードを検出すると，前記頭部回転制御手段による音源方向への頭部回転制御を行うことを特徴とするロボット。 (Supplementary note 3) In Supplementary note 1, when the call detection means of the control unit recognizes the voice acquired from the array microphone and detects a predetermined keyword, the head rotation control means moves to the sound source direction. A robot characterized by performing head rotation control.

（付記４）付記１乃至３の何れかにおいて，前記制御部の前記頭部回転制御手段による頭部回転制御の後，音声認識手段により前記指向性マイクから取得した音声について評価を行い，音声として評価できないと前記頭部回転制御手段を駆動して，前記頭部を以前の方向の戻すように回転させることを特徴とするロボット。 (Supplementary Note 4) In any one of Supplementary Notes 1 to 3, after the head rotation control by the head rotation control unit of the control unit, the voice acquired from the directional microphone by the voice recognition unit is evaluated, and the voice is If the evaluation cannot be performed, the robot rotates the head rotation control means to rotate the head to return to the previous direction.

（付記５）付記４において，前記音声認識手段は前記指向性マイクから取得した音声について，音量が一定量以上であると共に音声認識により単語を認識できたかを判断することにより音声として評価することを特徴とするロボット。 (Additional remark 5) In additional remark 4, about the audio | voice recognized from the said directional microphone, the said voice recognition means evaluates as a voice by judging whether the volume was more than a fixed amount and the word was able to be recognized by voice recognition. Characteristic robot.

（付記６）頭部が胴体に対して回転可能に取り付けられ，前記頭部に，その顔の正面方向に向けられた指向性マイクを備え，前記胴体に，その首の回りの前方に一定間隔を置いて配置された複数個のマイクにより構成されたアレイマイクを備えたロボットの制御方法において，前記アレイマイクにより音声を取得し，前記アレイマイクにより取得した音声から音源方向を検出し，前記検出した音源方向に前記頭部を回転させ，前記頭部の回転の後，前記指向性マイクから音声を取得し，前記指向性マイクから取得した音声を認識して制御を行う，ことを特徴とするロボットの制御方法。 (Appendix 6) The head is rotatably attached to the torso, and the head is provided with a directional microphone directed in the front direction of the face, and the torso is provided at a predetermined interval around the neck in front of it. In a method for controlling a robot having an array microphone composed of a plurality of microphones arranged with a microphone, sound is acquired by the array microphone, a sound source direction is detected from the sound acquired by the array microphone, and the detection is performed. The head is rotated in the direction of the sound source, and after the head is rotated, the voice is acquired from the directional microphone, and the voice acquired from the directional microphone is recognized and controlled. Robot control method.

本発明の原理構成を示す図である。It is a figure which shows the principle structure of this invention. ロボットの実施例の外観構成を示す図である。It is a figure which shows the external appearance structure of the Example of a robot. ロボットのシステム構成を示す図である。It is a figure which shows the system configuration | structure of a robot. 実施例１のフローチャートである。3 is a flowchart of the first embodiment. 音源方向計測の確実性の説明図である。It is explanatory drawing of the certainty of a sound source direction measurement. 実施例２のフローチャートを示す図である。FIG. 6 is a diagram illustrating a flowchart of a second embodiment. 音声認識の説明図である。It is explanatory drawing of speech recognition. 実施例３のフローチャートを示す図である。FIG. 10 is a diagram illustrating a flowchart of a third embodiment.

Explanation of symbols

１ロボットの頭部
１０指向性マイク
２ロボットの胴体
２０アレイマイク
２１制御部
２１ａアレイマイクの音声検出手段
２１ｂ呼びかけ検出手段
２１ｃ頭部回転制御手段
２１ｄ指向性マイクの音声取得手段
２１ｅ出力制御手段
２２頭部回転機構
２３出力手段 DESCRIPTION OF SYMBOLS 1 Robot head 10 Directional microphone 2 Robot body 20 Array microphone 21 Control part 21a Array microphone voice detection means 21b Call detection means 21c Head rotation control means 21d Directional microphone voice acquisition means 21e Output control means 22 Head Part rotation mechanism 23 output means

Claims

In a robot whose head is rotatably attached to the torso,
The head is provided with a directional microphone directed in front of the face,
The body includes an array microphone composed of a plurality of microphones arranged at regular intervals either in front of or around the neck or the body, and a controller for controlling the robot.
The control unit includes: sound detection means from the array microphone; means for detecting a call to a person including a sound source direction from the detected sound; head rotation control means for rotating the head in the detected sound source direction; Directional microphone sound acquisition means for acquiring sound from a directional microphone directed in the direction, and output control means for performing output control based on the acquired sound,
A robot characterized by that.

In claim 1,
The means for detecting the call of the control unit is as a call by detecting whether the volume acquired from the array microphone is higher than a certain level, whether the direction of the sound source is detected, or whether the voice is detected. A robot characterized by detecting.

In claim 1,
The means for detecting the call of the control unit performs head rotation control in the direction of the sound source by the head rotation control means when the voice obtained from the array microphone is recognized and a predetermined keyword is detected. A robot characterized by that.

The head is rotatably attached to the torso, and the head is provided with a directional microphone directed in the front direction of the face, and the torso is arranged at a predetermined interval around the neck. In a control method for a robot having an array microphone composed of a plurality of microphones,
Voice is acquired by the array microphone,
The direction of the sound source is detected from the sound acquired by the array microphone,
Rotate the head in the direction of the detected sound source,
After rotating the head, obtain voice from the directional microphone,
Recognizing and controlling the voice acquired from the directional microphone,
A robot control method characterized by the above.