[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

WO2017145929A1 - Pose control device, robot, and pose control method - Google Patents

Pose control device, robot, and pose control method Download PDF

Info

Publication number
WO2017145929A1
WO2017145929A1 PCT/JP2017/005857 JP2017005857W WO2017145929A1 WO 2017145929 A1 WO2017145929 A1 WO 2017145929A1 JP 2017005857 W JP2017005857 W JP 2017005857W WO 2017145929 A1 WO2017145929 A1 WO 2017145929A1
Authority
WO
WIPO (PCT)
Prior art keywords
robot
user
posture
utterance
voice
Prior art date
Application number
PCT/JP2017/005857
Other languages
French (fr)
Japanese (ja)
Inventor
誠悟 伊藤
秀俊 篠原
Original Assignee
シャープ株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by シャープ株式会社 filed Critical シャープ株式会社
Priority to CN201780007508.6A priority Critical patent/CN108698231A/en
Priority to JP2018501632A priority patent/JPWO2017145929A1/en
Publication of WO2017145929A1 publication Critical patent/WO2017145929A1/en

Links

Images

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J13/00Controls for manipulators

Definitions

  • the present invention relates to a posture control device that controls the posture of a robot that can interact with a user, a robot including the posture control device, and a posture control method.
  • Patent Document 1 discloses a robot apparatus that naturally performs an operation corresponding to an utterance by synthesizing a voice synchronized with an actual operation.
  • Patent Document 2 discloses a humanoid robot that naturally performs an operation corresponding to an utterance by generating a gesture of the robot while the robot outputs sound.
  • Patent No. 5402648 registered on November 8, 2013
  • Japanese Patent Gazette “Special Table 2014-504959” published February 27, 2014
  • the present invention has been made in view of the above-described problems, and its purpose is to clearly indicate to the user whether or not the robot itself intends to speak at the start of dialogue with the user.
  • An attitude control device and an attitude control method are realized.
  • an attitude control device is a robot that can interact with a user and can drive a plurality of driving units to take various attitudes.
  • a posture control device that is provided and controls the posture of the robot, the posture specifying unit that specifies the posture of the robot from the drive state of each drive unit, and the drive control unit that performs drive control of each drive unit
  • the drive control unit is configured such that the posture of the robot specified by the posture specifying unit at the start of dialogue with the user is not an utterance intention presentation posture indicating that the robot has an intention to speak.
  • the drive units are driven to cause the robot to take the utterance intention presentation posture.
  • An attitude control method is an attitude control method for controlling the attitude of a robot capable of interacting with a user and driving a plurality of driving units to take various attitudes.
  • the posture specifying step for specifying the posture of the robot at the start of the dialogue with the user, and the posture of the robot specified by the posture specifying step are not the utterance intention presenting posture indicating that the robot is intended to speak.
  • FIG. 1 is a schematic configuration block diagram of a robot according to a first embodiment of the present invention. It is a sequence diagram which shows the flow of the process of the attitude
  • FIG. 4 is a sequence diagram showing a flow of processing of posture control of the robot by the posture control device provided in the robot shown in FIG. 3. It is a schematic block diagram of the robot which concerns on Embodiment 3 of this invention. It is a schematic block diagram of the robot which concerns on the modification of Embodiment 3 of this invention.
  • Embodiment 1 Hereinafter, embodiments of the present invention will be described in detail.
  • a robot having a driving system composed of at least a human or animal or similar outer shell and a plurality of driving units that operate the outer shell and capable of interacting with a user will be described.
  • FIG. 1 is a schematic configuration diagram of a robot 101 according to the present embodiment.
  • the robot 101 includes at least a human or animal or similar outer shell (not shown).
  • the robot 101 further drives the driving system 1 composed of a plurality of driving units (manipulators) that operate the outer shell, the audio system 2 for realizing dialogue with the user, and the driving system 1 to take various postures.
  • the attitude control device 3 is included.
  • the voice system 2 includes a microphone 21, an input device 22, a voice recognition device 23, a dialogue device 24, a voice synthesis device 25, a playback device 26, a speaker 27, and a playback status acquisition device 28.
  • the microphone 21 is a device that collects a voice uttered by a user and converts the collected voice into electronic wave data (waveform data). The microphone 21 sends the converted electronic waveform data to the input device 22 at the subsequent stage.
  • the input device 22 is a device for recording the electronic waveform data.
  • the input device 22 ends the recording, and indicates that the recording has ended, that is, a signal indicating the end of input. Send to control device 3.
  • the input device 22 sends the recorded waveform data to the subsequent speech recognition device 23 at the timing of sending a signal indicating the end of input to the attitude control device 3.
  • the speech recognition device 23 is a device that converts electronic waveform data sent from the input device 22 into text data (ASR: Automatic Speech Recognition).
  • the voice recognition device 23 sends the converted text data to the subsequent dialogue device 24.
  • the dialogue device 24 analyzes the text data sent from the speech recognition device 23 to identify the user's utterance content (analysis result), and obtains dialogue data indicating the response content in which conversation is established for the identified utterance content. It is a device that performs. Further, the dialogue device 24 extracts text data corresponding to the response content from the obtained dialogue data. Then, the dialogue device 24 sends the extracted text data to the subsequent speech synthesizer 25.
  • the speech synthesizer 25 is a TTS (Text-to-Speech) device that converts text data sent from the dialogue device 24 into PCM data.
  • the speech synthesizer 25 sends the converted PCM data to the playback device 26 at the subsequent stage.
  • the playback device 26 is a device that outputs the PCM data sent from the speech synthesizer 25 to the speaker 27 as sound waves.
  • the sound wave output here means a sound that can be recognized by a person.
  • the sound wave output from the playback device 26 becomes the response content for the user's utterance content. As a result, a conversation is established between the user and the robot 101. Further, the playback device 26 outputs the PCM data to the speaker 27 and simultaneously outputs it to the playback status acquisition device 28.
  • the playback status acquisition device 28 is a signal indicating that the output of the voice from the speaker 27 has started, that is, when the playback of the voice to the user by the robot 101 is started (at the start of speech). Is sent to the attitude control device 3.
  • the posture control device 3 is a device that controls the posture of the robot 101, and includes a drive control device 31, a housing state acquisition device 32, a posture recording device 33, and a behavior pattern recording device 34.
  • the drive control device 31 includes a posture specifying unit 31 a that specifies the posture of the robot 101 from the drive state of the drive system (drive unit) 1 and a drive control unit 31 b that performs drive control of the drive system 1.
  • the housing state acquisition device 32 is a device that acquires information indicating the drive state of the drive system 1.
  • the information indicating the driving state of the driving system 1 is information indicating what driving state the driving system 1 for specifying the posture of the robot 101 is in.
  • joint angle information obtained from a rotary encoder attached to the joint of the robot, torque on / off state, and the like correspond to information indicating the drive state. This information is sent from the housing state acquisition device 32 to the attitude specifying unit 31a of the drive control device 31.
  • the posture recording device 33 is a device that records a speech intention presentation posture taken by the robot 101. Specifically, information indicating the driving state of the drive system 1 is recorded in the posture recording device 33 so that the robot 101 assumes the utterance intention presentation posture.
  • the utterance intention presentation posture is, for example, a posture in which the robot touches the mouth, a careful posture, a posture in which the user faces the user's face, and the posture in which the robot indicates the intention to speak to the user. It is.
  • the behavior pattern recording device 34 is a device that records a behavior pattern associated with the utterance content of the robot 101. Specifically, in the behavior pattern recording device 34, information indicating the driving state of the driving system 1 associated with each utterance content is recorded as a behavior pattern.
  • behavior patterns not only the information of the posture recording device 33 but also the housing state acquisition device 32, for example, various sensors such as fall detection and gravitational acceleration, or the internal state of the robot 101, for example, past behaviors of voice recognition results. You may add a pattern.
  • the user's utterance content is divided into categories. It may be adapted to the pitch at the time of utterance.
  • the utterance intention presentation posture is not limited to one type, and may be a plurality of types, depending on the situation of the robot 101 such as the robot 101 holding an object.
  • the posture specifying unit 31a acquires information indicating the drive state of the drive system 1 of the robot 201, thereby specifying what posture the robot 201 is currently in. Information indicating the identified posture is sent from the posture identifying unit 31a to the drive control unit 31b.
  • the drive control unit 31b determines whether or not the posture of the robot 101 specified by the posture specifying unit 31a at the start of the dialogue with the user is an utterance intention presentation posture.
  • the dialogue with the user is started when the robot 101 starts reproducing the voice to the user. That is, the drive control unit 31b determines the posture of the robot 101 specified by the posture specifying unit 31a at the timing when the signal indicating that the reproduction of the voice to the user by the robot 101 is started from the reproduction status acquisition device 28 is received. judge.
  • the drive control unit 31b drives the drive system 1 to cause the robot 101 to assume the utterance intention presentation posture. That is, whether or not the robot 101 is identified at the start of the dialogue with the user (posture identifying step), and whether the identified robot posture is an utterance intention presentation posture indicating that the robot 101 is intended to speak. Determine whether or not. If it is not the utterance intention presentation posture, the drive system 1 is driven to cause the robot 101 to take the utterance intention presentation posture (drive control step).
  • the robot 101 when the robot 101 is not in the utterance intention presentation posture at the start of the dialogue with the user, the robot 101 performs an operation of returning to the utterance intention presentation posture. Therefore, the user can easily confirm that the robot 101 has the intention of utterance. I can understand.
  • the drive control unit 31b notifies the user that the utterance will be performed before the utterance of the robot 101 starts. Let the action take place. For example, if the head is facing the front in the utterance intention presentation posture of the robot 101, the operation of returning the head to the front with the head once swung is performed before starting the utterance. After performing this operation, the robot 101 speaks. Thereby, the user can easily understand that the robot 101 has the intention of speaking.
  • FIG. 2 is a sequence diagram showing the flow of the posture control process of the robot 101 shown in FIG. In the following sequence diagram, the process until the voice is reproduced by the robot 101 (1), the process when the behavior of the robot 101 is finished during the voice reproduction (2), and the voice reproduction is finished during the behavior of the robot 101. Processing (3) is included.
  • Outline of Process (1) In the robot 101, in the voice system 2, the user's utterance is basically acquired from the microphone 21 and recorded by the input device 22. The recorded utterance is then recognized by the speech recognition device 23, the dialogue character string is acquired from the speech recognition result from the dialogue device 24, the dialogue character string is synthesized by the speech synthesizer 25, and the playback device 26 The speech synthesis content is sounded by the speaker 27. Note that a series of operations from the acquisition of the user's utterance to the ringing of the speech synthesis content is made.
  • the user's utterance is acquired from the microphone 21, and playback of the response utterance corresponding to the acquired utterance is started by the playback device 26. Timing.
  • information (drive information) of the housing (the driving system 1 of the robot 101) is obtained by the housing state obtaining device 32 at the start of the dialogue. Then, if necessary, the drive control device 31 activates the drive system 1 and then changes to the utterance intention presentation posture according to the information of the posture recording device 33, and either one of the behavior patterns is selected from the behavior pattern recording device 34 according to the utterance content. Select.
  • the processing (1) corresponds to the processing from (1. voice data input) to (13. sound data ringing) in the sequence shown in FIG. That is, the microphone 21 converts voice input by the user's speech into waveform data, and outputs the waveform data to the input device 22 (1. voice data input).
  • the input device 22 inputs the input sound data, and outputs the input sound data to the speech recognition device 23 (2. Speech recognition start command).
  • the voice recognition device 23 receives a voice recognition start command from a control unit (not shown), converts the input sound data into text data, and outputs the text data to the dialogue device 24 (3. dialogue start command).
  • the dialogue device 24 receives a dialogue start command from a control unit (not shown), analyzes the user's utterance content from the input text data, and obtains text data of a dialogue sentence corresponding to the utterance content from a database (not shown). get. Then, the acquired text data is output to the speech synthesizer 25 (4. dialogue wording synthesis command).
  • the voice synthesizer 25 receives an interactive wording synthesis command from a control unit (not shown), converts the input text data into output sound wave data (PCM data), and outputs it to the playback device 26 (5. Voice data playback). order).
  • the reproduction device 26 receives an audio data reproduction command from a control unit (not shown) and reproduces the output sound wave data
  • the reproduction device 26 outputs utterance start state change information to the reproduction state acquisition device 28 (6. Speech start). State change).
  • This utterance start state change information is information indicating whether or not the utterance by the robot 101 has been started. In this case, the utterance start state change information is information indicating that the utterance by the robot 101 has been disclosed.
  • the reproduction status acquisition device 28 notifies the drive control device 31 that the robot 101 has spoken from the input utterance start state change information (7. Talk start status notification).
  • the signal to be notified here is a signal indicating that the robot 101 has started to reproduce the voice to the user.
  • the drive control device 31 receives the signal indicating that the reproduction of the voice to the user by the robot 101 from the reproduction status acquisition device 28 is received, and the state of the robot 101 (the case state) from the case state acquisition device 32. ) (8. Acquire housing information). Further, the drive control device 31 also acquires the speech intention presentation posture recorded in the posture recording device 33 (9. Speech intention presentation posture acquisition). Accordingly, the drive control device 31 specifies the posture of the robot 101 by the posture specifying unit 31a from the acquired housing state, and determines whether or not the specified posture of the robot 101 is the acquired speech intention presentation posture. can do. Then, the drive control device 31 drives the drive system 1 according to the determination result (10. Utterance intention presentation posture transition).
  • the drive control device 31 drives the drive system 1 so as to be the utterance intention presentation posture.
  • the drive control device 31 temporarily hangs the head if the head of the utterance intention presentation posture is facing the front. To return to the front.
  • the drive control device 31 acquires a behavior pattern corresponding to the utterance content from the behavior pattern recording device 34 (11. behavior pattern acquisition), and drives the drive system so that the acquired behavior pattern is obtained. 1 is started (12. Behavior start command).
  • the playback device 26 receives an audio data playback command from a control unit (not shown) and causes the speaker 27 to ring the input output sound wave data as a sound wave (13. audio data). Ringing).
  • Outline of process (2) When the information acquired from the reproduction status acquisition device 28 is information indicating continuation of the utterance, that is, when the utterance (reproduction) has not ended, the drive control device 31 again performs the behavior pattern recording device Activating any of the behavior patterns in 34.
  • the behavior pattern may be selected at the timing when the utterance ends or may be selected in advance.
  • the processing (2) corresponds to the processing from (14. Behavior end) to (18. Behavior start command) in the sequence shown in FIG. That is, the end of the behavior pattern by the drive system 1 during the speech of the robot 101 (during sound reproduction) is determined by the drive state (case state) of the drive system 1 acquired by the case state acquisition device 32 (14. Behavior). End).
  • the housing state acquisition device 32 outputs information indicating that the behavior has ended to the drive control device 31 as a behavior notification (15. behavior notification command).
  • the drive control device 31 When the drive control device 31 is notified that the behavior has ended from the housing state acquired from the housing state acquisition device 32, the drive control device 31 acquires the playback state from the playback state acquisition device 28 (16. playback state acquisition). If the drive control device 31 determines that the reproduction is in progress from the obtained reproduction status, the drive control device 31 again obtains a behavior pattern according to the utterance content from the behavior pattern recording device 34 (17. behavior pattern acquisition). Then, the drive of the drive system 1 is started so as to obtain the acquired behavior pattern (18. Behavior start command).
  • Outline of Process (3) When the information acquired from the reproduction status acquisition device 28 is information indicating the end of the utterance, the drive control device 31 determines that the utterance has ended and sets the drive system 1 in the idle state or inactive. On the other hand, when the playback status acquisition device 28 acquires the timing when the utterance is finished, the drive control device 31 checks the housing state acquisition device 32, and if the operation is being performed, issues a stop command to the drive system 1. The drive system 1 is driven so as to return to the utterance intention presentation posture that is issued and is the initial posture. If the operation is within a predetermined time (for example, 400 ms), no stop command is issued as an allowable range.
  • a predetermined time for example, 400 ms
  • the processing (3) corresponds to the processing from (19. Playback end) to (22. Playback end command) in the sequence shown in FIG. That is, the playback device 26 outputs playback end state change information to the playback status acquisition device 28 (20. playback end state change) when playback ends (19. playback end).
  • the playback status acquisition device 28 notifies the drive control device 31 that the robot 101 has finished speaking from the input playback end status change information (21. Playback end notification).
  • the signal to be notified here is a signal indicating that the reproduction of the voice to the user by the robot 101 has been completed.
  • the drive control device 31 issues a playback end command (stop command) to the drive system 1 from the playback end notification acquired from the playback status acquisition device 28 (22. playback end command). As a result, the operation of the drive system 1 is stopped.
  • the robot 101 becomes an utterance intention presentation posture for informing the user that the posture is intended to be uttered when the conversation with the user is started. That is, the robot 101 can indicate to the user whether or not the robot itself intends to speak at the start of dialogue with the user. Thereby, since the dialogue between the user and the robot 101 can be performed smoothly, natural non-verbal communication can be realized between the user and the robot.
  • the start of the dialog with the user is the start of the voice reproduction to the user by the robot.
  • the present invention is not limited to this, and is the end of the user's voice input by the robot. Also good.
  • a dialog start status notification is sent to the drive control device 31 when input by the input device 22 is completed.
  • the drive control device 31 acquires the housing state from the housing state acquisition device 32. Subsequent processing is the same as the processing described above.
  • the start of dialogue with the user may be the start of speech recognition by the robot.
  • the microphone 21 is turned on when the switch is pressed, and the microphone 21 is turned off when the switch is released
  • the start of the dialog with the user may be the start of the switch release.
  • a camera is used for the housing constituting the robot 101, a person is detected by the camera, and further, the timing when the conversation is assumed to be started by detecting that the movement of the lips of the person is finished is determined with the user. It is also possible to start the dialogue.
  • posture control device 3 configured as described above, an example is shown in which the posture recording device 33 and the behavior pattern recording device 34 are provided independently. However, these two devices may be a single recording device. .
  • FIG. 3 is a schematic configuration diagram of the robot 201 according to the present embodiment.
  • the robot 201 has the configuration in which the voice recognition device 23 is provided in a server (not shown) on the network and a communication device 29 for performing communication with the voice recognition device 23 is added. Different from the first robot 101. That is, in the robot 201, after the voice input through the microphone 21 is input by the input device 22, the input voice data is sent to the server on the network by the communication device 29, and the voice recognition device 23 in the server recognizes the voice. Can be performed. Then, the recognition result by the voice recognition device 23 in the server is sent to the dialogue device 24 via the communication device 29. In these respects, the robot 201 is different from the robot 101 of the first embodiment.
  • the communication device 29 may be any type of communication device as long as it can communicate with the voice recognition device 23 provided on an external network such as the Internet.
  • FIG. 4 is a sequence diagram showing the flow of the posture control process of the robot 201 shown in FIG. In the following sequence diagram, the process until the voice is reproduced by the robot 201 (11), the process when the behavior of the robot 201 is finished during the voice reproduction (12), and the voice reproduction is finished during the behavior of the robot 201. Processing (13) is included.
  • This process (11) is substantially the same process as the process (1) described in the first embodiment, but is different at the time of starting the dialogue with the user. That is, the process of the first embodiment is that the user's utterance is acquired as sound data from the microphone 21, the acquired sound data is input by the input device 22, and the input end time is set as the start of the dialog with the user ( Different from 1).
  • the processing (11) corresponds to the processing from (1. voice data input) to (15. sound data ringing) in the sequence shown in FIG. That is, the microphone 21 converts voice input by the user's speech into waveform data, and outputs the waveform data to the input device 22 (1. voice data input).
  • the input device 22 inputs the input sound data, and when the input is completed, notifies the drive control device 31 of a dialog start status (2. Dialog start status notification). By this dialog start status notification, the drive control device 31 is notified that the input of the user's voice has been completed.
  • the drive control device 31 drives the drive system 1 so as to be the utterance intention presentation posture.
  • the drive control device 31 temporarily hangs the head if the head of the utterance intention presentation posture is facing the front. To return to the front.
  • the input device 22 receives a voice recognition start command (1) from a control unit (not shown), and transmits the voice data input to the voice recognition device 23 provided in the server on the network via the communication device 29. (6. Speech recognition start command (1)).
  • the speech recognition device 23 receives the speech recognition start command (2) from the control unit in the server, converts the input sound data into text data (7. speech recognition start command (2)), and the dialogue device 24. (8. Dialogue start command).
  • the dialogue device 24 receives a dialogue start command from a control unit (not shown), analyzes the user's utterance content from the input text data, and obtains text data of a dialogue sentence corresponding to the utterance content from a database (not shown). get. Furthermore, the acquired text data is output to the speech synthesizer 25 (9. Dialogue word composition command).
  • the speech synthesizer 25 receives an interactive wording synthesis command from a control unit (not shown), converts the input text data into output sound wave data (PCM data), and outputs it to the playback device 26 (10. Voice data playback). order).
  • the playback device 26 receives an audio data playback command from a control unit (not shown) and plays back the output sound wave data
  • the playback device 26 outputs utterance start state change information to the playback status acquisition device 28 (11. Start of speech). State change).
  • the utterance start state change information is information indicating whether or not the utterance by the robot 201 has been started. In this case, the utterance start state change information is information indicating that the utterance by the robot 201 has been started.
  • the reproduction status acquisition device 28 notifies the drive control device 31 that the robot 201 has spoken from the input utterance start state change information (12. Talk start status notification).
  • the signal to be notified here is a signal indicating that the robot 201 has started playing the voice to the user.
  • the drive control device 31 acquires a behavior pattern corresponding to the utterance content from the behavior pattern recording device 34 (13. behavior pattern acquisition), and drives the drive system so that the acquired behavior pattern is obtained. 1 is started (14. Behavior start command).
  • the playback device 26 receives a sound data playback command from a control unit (not shown) and causes the speaker 27 to ring the input output sound wave data as a sound wave (15. sound data). Ringing).
  • the process (11) is as described above, but the process (12) is the same as the process (2) in the first embodiment, and the process (13) is the same as the process (3) in the first embodiment. Therefore, description of these processes is omitted.
  • the start of the dialog with the user is the end of the user's voice input by the robot.
  • the present invention is not limited to this, and may be the start of the playback of the voice to the user by the robot. .
  • the end of input by the input device 22 is also the start of speech recognition, so the start of dialogue with the user is the start of speech recognition by the robot.
  • the start of the dialog with the user may be set as the start of the switch release.
  • a camera is used for the housing constituting the robot 201, a person is detected by the camera, and further, the timing when the conversation is assumed to be started by detecting that the movement of the lips of the person is finished is determined with the user. It is also possible to start the dialogue.
  • the robot 201 starts the utterance when the utterance content for response can be formed from the dialogue device 24.
  • the robot 201 may take an utterance intention release behavior indicating that the utterance intention has disappeared from the utterance intention presentation posture instead of speaking.
  • posture control device 3 configured as described above, an example is shown in which the posture recording device 33 and the behavior pattern recording device 34 are provided independently. However, these two devices may be a single recording device. .
  • the posture control of the robots 101 and 201 is performed by using an output signal from the voice system 2 (a signal indicating the end of voice input from the input device 22 and voice playback start from the playback status acquisition device 28). Signal).
  • an output signal from the voice system 2 a signal indicating the end of voice input from the input device 22 and voice playback start from the playback status acquisition device 28. Signal.
  • FIG. 5 is a schematic configuration diagram of the robot 301 according to the present embodiment.
  • the robot 301 has substantially the same configuration as the robot 101 of the first embodiment, and is different in that the image system 4 is newly provided.
  • the image system 4 includes a camera 41 that captures the face of the user, and an image acquisition device (image acquisition unit) 42 that acquires a face image captured by the camera 41.
  • the image system 4 further includes an image determination device (image determination unit) 43 that determines whether or not the face image acquired by the image acquisition device 42 is an image indicating the end of speech by the user.
  • the camera 41 is a digital camera that captures an image of a user who is a conversation partner of the robot 301, and may be any type and system as long as it can be mounted inside the robot 301.
  • the image acquisition device 42 is a device that acquires the face image of the user from the user image captured by the camera 41. The image acquisition device 42 sends the acquired user face image to the image determination device 43.
  • the image determination device 43 is a device that performs face authentication from the user's face image sent from the image acquisition device 42 and determines whether the image indicates the end of speech by the user from the authentication result. Here, it is determined whether or not the face image is when the user's mouth is closed. Then, the result of determining the face image when the user's mouth is closed is sent to the posture control device 3. That is, the posture control device 3 performs posture control of the robot 301 at the timing when the user's mouth is closed. That is, in this embodiment, the timing of the posture determination of the robot 301 by the posture control device 3, that is, the time when the dialogue with the user is started is determined by the image determination device 43 as an image indicating the end of the user's utterance. To do.
  • the robot 301 informs the user that the posture is intended to be uttered when the dialogue with the user is started (when the image determining device 43 determines that the image indicates the end of the user's utterance). It becomes the utterance intention presentation posture to squeeze. That is, the robot 301 can indicate to the user whether or not the robot itself intends to speak at the start of the dialogue with the user. Thereby, the dialogue between the user and the robot 301 can be performed smoothly.
  • FIG. 6 is a schematic configuration block diagram of a robot 401 which is a modification of the robot 301 shown in FIG.
  • the robot 401 has substantially the same configuration as the robot 201 of the second embodiment, and is different in that the image system 4 is newly provided.
  • the posture control using the image system 4 is the same as that of the robot 301 shown in FIG.
  • the dialogue start time with the user is determined when an image indicating the end of the user's speech is determined.
  • the present invention is not limited to this. Good.
  • an output signal from the voice system 2 (a signal indicating the end of voice input from the input device 22, a voice reproduction from the playback status acquisition device 28). It may be at the time of reception of a signal indicating the start.
  • the control block of the drive control device 31 may be realized by a logic circuit (hardware) formed in an integrated circuit (IC chip) or the like, or may be realized by software using a CPU (Central Processing Unit). .
  • the drive control device 31 includes a CPU that executes instructions of a program that is software that implements each function, and a ROM (Read Only Memory) in which the program and various data are recorded so as to be readable by the computer (or CPU).
  • a storage device (these are referred to as “recording media”), a RAM (Random Access Memory) for expanding the program, and the like are provided.
  • the objective of this invention is achieved when a computer (or CPU) reads the said program from the said recording medium and runs it.
  • a “non-temporary tangible medium” such as a tape, a disk, a card, a semiconductor memory, a programmable logic circuit, or the like can be used.
  • the program may be supplied to the computer via an arbitrary transmission medium (such as a communication network or a broadcast wave) that can transmit the program.
  • a transmission medium such as a communication network or a broadcast wave
  • the present invention can also be realized in the form of a data signal embedded in a carrier wave in which the program is embodied by electronic transmission.
  • the posture control apparatus is a robot (101, 201,) capable of interacting with a user and driving a plurality of drive units (drive system 1) to take various postures. 301, 401) and a posture control device (31) for controlling the posture of the robot (101, 201, 301, 401), wherein the posture of the robot (101, 201, 301, 401) is An attitude specifying unit (31a) that specifies the drive state of each drive unit (drive system 1), and a drive control unit (31b) that controls the drive of each drive unit (drive system 1).
  • the unit (31b) indicates that the posture of the robot (101, 201, 301, 401) specified by the posture specifying unit (31a) at the start of the dialogue with the user is the robot (101, 201, 301, 40). ) Is not an utterance intention presentation posture indicating that there is an intention to utter, and each of the driving units (drive system 1) is driven to set the utterance intention presentation posture to the robot (101, 201, 301, 401). It is characterized by letting it take.
  • the posture of the robot (101, 201, 301, 401) can always be set to the utterance intention presentation posture at the start of the dialogue with the user, the user can confirm that the robot has the utterance intention. It can be easily recognized visually by the posture of the robot.
  • the robot (101, 201, 301, 401) inputs a user's voice, and makes a voice toward the user according to the input voice.
  • the start of the dialog with the user may be a time when the robot (101, 201, 301, 401) starts to play a voice to the user.
  • the robot since the robot (101, 201, 301, 401) starts the voice reproduction to the user at the start of the dialogue with the user, the robot prompts the user at the timing when the robot is about to speak.
  • the utterance intention presentation posture can be taken. Thereby, in addition to the posture of the robot, the user can clearly recognize that the robot has an intention to speak by voice.
  • the posture control device is the posture control apparatus according to aspect 1, in which the robot (101, 201, 301, 401) inputs the user's voice and speaks to the user according to the input voice.
  • the start of the dialogue with the user may be the end of the user's voice input by the robot (101, 201, 301, 401).
  • the robot since the dialogue with the user starts when the user's voice input by the robot (101, 201, 301, 401) ends, the utterance intention presentation posture to the user at the timing of the user's utterance end. Can be taken. Thereby, the robot can inform the user that there is an intention to speak quickly.
  • the image acquisition unit (image acquisition device 42) that acquires a face image obtained by imaging the user's face and the image acquisition unit (image acquisition device 42) include: An image determination unit (image determination device 43) for determining whether or not the acquired face image is an image indicating the end of the utterance by the user, and at the start of the dialogue with the user, the image determination unit ( It may be when the image determination device 43) determines that the image indicates the end of the user's utterance.
  • the start of the dialogue with the user is a time when the image determination unit (image determination device 43) determines that the image indicates the end of the user's utterance.
  • the user can take the utterance intention presentation posture. Thereby, the robot can inform the user that there is an intention to speak quickly.
  • a robot according to aspect 5 of the present invention is characterized by including the attitude control device (31) according to any one of the above aspects 1 to 4. According to the above configuration, it is possible to notify the user that there is a clear intention to speak.
  • the posture control method according to aspect 6 of the present invention is a robot (101, 201, 101) capable of interacting with a user and driving a plurality of drive units (drive system 1) to take various postures.
  • 301, 401) is a posture control method for controlling the posture of the robot (101, 201, 301, 401) at the start of dialogue with the user, and is specified by the posture specifying step and the posture specifying step.
  • the posture of the robot (101, 201, 301, 401) is not an utterance intention presentation posture indicating that the robot (101, 201, 301, 401) has an intention to utter
  • the above drive units are driven.
  • a drive control step for causing the robot (101, 201, 301, 401) to take the utterance intention presentation posture.
  • FIG. 1 there exists the same effect as the said aspect 1.
  • the posture control device may be realized by a computer.
  • the posture control device is operated on each computer by causing the computer to operate as each unit (software element) included in the posture control device.
  • the attitude control program of the attitude control device to be realized and a computer-readable recording medium on which the attitude control program is recorded also fall within the scope of the present invention.
  • 1 drive system (drive unit), 2 voice system, 3 attitude control device, 4 image system, 21 microphone, 22 input device, 23 speech recognition device, 24 dialog device, 25 speech synthesizer, 26 playback device, 27 speaker, 28 Playback status acquisition device, 29 communication device, 31 drive control device 31a attitude specifying unit, 31b drive control unit, 32 housing state acquisition device, 33 attitude recording device, 34 behavior pattern recording device, 41 camera, 42 image acquisition device (image Acquisition unit), 43 image determination device (image determination unit) 101, 201, 301, 401 robot

Landscapes

  • Engineering & Computer Science (AREA)
  • Robotics (AREA)
  • Mechanical Engineering (AREA)
  • Manipulator (AREA)
  • Toys (AREA)

Abstract

The present invention addresses the problem of enabling a robot to be able to indicate to a user that the robot intends to speak when initiating dialog with a user. The present invention is a pose control device (3) for controlling the pose of a robot (101) that is capable of dialog with a user, wherein when the pose of the robot (101) identified at the time dialog with the user is initiated is not a pose for indicating an intent to speak, a drive system (1) is driven to cause the robot (101) to assume a pose for indicating an intent to speak.

Description

姿勢制御装置、ロボット及び姿勢制御方法Attitude control device, robot, and attitude control method
 本発明は、ユーザとの対話が可能なロボットの姿勢を制御する姿勢制御装置、姿勢制御装置を備えたロボット及び姿勢制御方法に関する。 The present invention relates to a posture control device that controls the posture of a robot that can interact with a user, a robot including the posture control device, and a posture control method.
 近年、自分の発話に応じた動作を行うロボットが開発されている。そして、これらロボットに対して、自分の発話に応じた動作をより自然に行うことが要求されている。例えば、特許文献1には、実際の動作と同期した音声を合成することで、発話に応じた動作を自然に行わせるロボット装置が開示されている。また、特許文献2には、ロボットが音声を出力する間にロボットのジェスチャを生成することで、発話に応じた動作を自然に行わせる人間型ロボットが開示されている。 In recent years, robots that perform actions according to their utterances have been developed. And it is requested | required of these robots to perform the operation | movement according to own utterance more naturally. For example, Patent Document 1 discloses a robot apparatus that naturally performs an operation corresponding to an utterance by synthesizing a voice synchronized with an actual operation. Further, Patent Document 2 discloses a humanoid robot that naturally performs an operation corresponding to an utterance by generating a gesture of the robot while the robot outputs sound.
日本国公開特許公報「特許第5402648号」(2013年11月8日登録)Japanese Patent Publication “Patent No. 5402648” (registered on November 8, 2013) 日本国公表特許公報「特表2014-504959号」(2014年2月27日公表)Japanese Patent Gazette “Special Table 2014-504959” (published February 27, 2014)
 ところで、ユーザとロボットの対話を円滑に行うには、ユーザとの対話を開始する際に、当該ロボット自体が発話の意図があるか否かをユーザに対して明確に知らせる必要がある。しかし、上記の各特許文献に開示されたロボットは、ロボット自体が発話する際の動作が自然になるように工夫されているものの、ユーザとの対話を開始する際に、当該ロボット自体が発話の意図があるか否かをユーザに対して示すことは特に考慮されていない。 By the way, in order to facilitate the dialogue between the user and the robot, it is necessary to clearly notify the user whether or not the robot itself has an intention to speak when the dialogue with the user is started. However, although the robots disclosed in each of the above-mentioned patent documents are devised so that the behavior when the robot itself speaks is natural, when the dialogue with the user starts, the robot itself speaks. It is not particularly considered to indicate to the user whether or not there is an intention.
 本発明は、前記の問題点に鑑みてなされたものであり、その目的は、ユーザとの対話開始時に、ロボット自体が発話の意図があるか否かをユーザに対して明確に示すことができる姿勢制御装置及び姿勢制御方法を実現することにある。 The present invention has been made in view of the above-described problems, and its purpose is to clearly indicate to the user whether or not the robot itself intends to speak at the start of dialogue with the user. An attitude control device and an attitude control method are realized.
 上記の課題を解決するために、本発明の一態様に係る姿勢制御装置は、ユーザとの対話が可能であり、且つ複数の駆動部を駆動させて種々の姿勢をとることが可能なロボットに備えられ、当該ロボットの姿勢を制御する姿勢制御装置であって、上記ロボットの姿勢を、上記各駆動部の駆動状態から特定する姿勢特定部と、上記各駆動部の駆動制御を行う駆動制御部と、を備え、上記駆動制御部は、ユーザとの対話開始時に上記姿勢特定部によって特定された上記ロボットの姿勢が、当該ロボットに発話を行う意図があることを示す発話意図提示姿勢でない場合に、上記各駆動部を駆動させて当該ロボットに上記発話意図提示姿勢をとらせることを特徴としている。 In order to solve the above problems, an attitude control device according to one embodiment of the present invention is a robot that can interact with a user and can drive a plurality of driving units to take various attitudes. A posture control device that is provided and controls the posture of the robot, the posture specifying unit that specifies the posture of the robot from the drive state of each drive unit, and the drive control unit that performs drive control of each drive unit And the drive control unit is configured such that the posture of the robot specified by the posture specifying unit at the start of dialogue with the user is not an utterance intention presentation posture indicating that the robot has an intention to speak. The drive units are driven to cause the robot to take the utterance intention presentation posture.
 本発明の一態様に係る姿勢制御方法は、ユーザとの対話が可能であり、且つ複数の駆動部を駆動させて種々の姿勢をとることが可能なロボットの姿勢を制御する姿勢制御方法であって、ユーザとの対話開始時に上記ロボットの姿勢を特定する姿勢特定ステップと、上記姿勢特定ステップにより特定されたロボットの姿勢が、当該ロボットに発話を行う意図があることを示す発話意図提示姿勢でない場合に、上記各駆動部を駆動させて当該ロボットに上記発話意図提示姿勢をとらせる駆動制御ステップとを含むことを特徴としている。 An attitude control method according to an aspect of the present invention is an attitude control method for controlling the attitude of a robot capable of interacting with a user and driving a plurality of driving units to take various attitudes. Thus, the posture specifying step for specifying the posture of the robot at the start of the dialogue with the user, and the posture of the robot specified by the posture specifying step are not the utterance intention presenting posture indicating that the robot is intended to speak. A drive control step of driving each of the drive units to cause the robot to take the utterance intention presentation posture.
 本発明の一態様によれば、ユーザとの対話開始時に、ロボット自体が発話の意図があるか否かをユーザに対して明確に示すことができるという効果を奏する。 According to one aspect of the present invention, it is possible to clearly indicate to the user whether or not the robot itself intends to speak at the start of dialogue with the user.
本発明の実施形態1に係るロボットの概略構成ブロック図である。1 is a schematic configuration block diagram of a robot according to a first embodiment of the present invention. 図1に示すロボットが備える姿勢制御装置によるロボットの姿勢制御の処理の流れを示すシーケンス図である。It is a sequence diagram which shows the flow of the process of the attitude | position control of the robot by the attitude | position control apparatus with which the robot shown in FIG. 1 is provided. 本発明の実施形態2に係るロボットの概略構成ブロック図である。It is a schematic block diagram of a robot according to Embodiment 2 of the present invention. 図3に示すロボットが備える姿勢制御装置によるロボットの姿勢制御の処理の流れを示すシーケンス図である。FIG. 4 is a sequence diagram showing a flow of processing of posture control of the robot by the posture control device provided in the robot shown in FIG. 3. 本発明の実施形態3に係るロボットの概略構成ブロック図である。It is a schematic block diagram of the robot which concerns on Embodiment 3 of this invention. 本発明の実施形態3の変形例に係るロボットの概略構成ブロック図である。It is a schematic block diagram of the robot which concerns on the modification of Embodiment 3 of this invention.
 〔実施形態1〕
 以下、本発明の実施の形態について、詳細に説明する。本実施形態では、少なくとも人もしくは動物、ないしはそれに類した外殻とそれを動作する複数の駆動部からなる駆動系を持ち、ユーザとの対話が可能なロボットについて説明する。
Embodiment 1
Hereinafter, embodiments of the present invention will be described in detail. In the present embodiment, a robot having a driving system composed of at least a human or animal or similar outer shell and a plurality of driving units that operate the outer shell and capable of interacting with a user will be described.
 (ロボットの概要)
 図1は、本実施形態に係るロボット101の概略構成図である。ロボット101は、少なくとも人もしくは動物、ないしはそれに類した外殻(図示せず)を備えている。ロボット101は、さらに、外殻を動作する複数の駆動部(マニピュレータ)からなる駆動系1、ユーザとの対話を実現するための音声系2、駆動系1を駆動させて種々の姿勢を取らせるための姿勢制御装置3を含んでいる。
(Robot overview)
FIG. 1 is a schematic configuration diagram of a robot 101 according to the present embodiment. The robot 101 includes at least a human or animal or similar outer shell (not shown). The robot 101 further drives the driving system 1 composed of a plurality of driving units (manipulators) that operate the outer shell, the audio system 2 for realizing dialogue with the user, and the driving system 1 to take various postures. The attitude control device 3 is included.
 音声系2は、マイク21、入力装置22、音声認識装置23、対話装置24、音声合成装置25、再生装置26、スピーカ27、再生状況取得装置28を含んでいる。マイク21は、ユーザが発する声を集音し、集音した声を電子的な波のデータ(波形データ)に変換する装置である。マイク21は、変換した電子的な波形データを、後段の入力装置22に送る。 The voice system 2 includes a microphone 21, an input device 22, a voice recognition device 23, a dialogue device 24, a voice synthesis device 25, a playback device 26, a speaker 27, and a playback status acquisition device 28. The microphone 21 is a device that collects a voice uttered by a user and converts the collected voice into electronic wave data (waveform data). The microphone 21 sends the converted electronic waveform data to the input device 22 at the subsequent stage.
 入力装置22は、上記電子的な波形データを記録する装置である。入力装置22は、波形データの記録中に、当該波形データが無音を示す波形データの状態が所定時間以上継続した場合、記録を終了し、記録が終了したこと、すなわち入力終了を示す信号を姿勢制御装置3に送る。入力装置22は、入力終了を示す信号を姿勢制御装置3に送るタイミングで、記録した波形データを後段の音声認識装置23に送る。音声認識装置23は、入力装置22から送られた電子的な波形データからテキストデータに変換する(ASR:Automatic Speech Recognition)装置である。音声認識装置23は、変換したテキストデータを、後段の対話装置24に送る。 The input device 22 is a device for recording the electronic waveform data. When recording the waveform data, if the state of the waveform data indicating that the waveform data indicates no sound continues for a predetermined time or longer, the input device 22 ends the recording, and indicates that the recording has ended, that is, a signal indicating the end of input. Send to control device 3. The input device 22 sends the recorded waveform data to the subsequent speech recognition device 23 at the timing of sending a signal indicating the end of input to the attitude control device 3. The speech recognition device 23 is a device that converts electronic waveform data sent from the input device 22 into text data (ASR: Automatic Speech Recognition). The voice recognition device 23 sends the converted text data to the subsequent dialogue device 24.
 対話装置24は、音声認識装置23から送られたテキストデータを解析してユーザの発話内容(解析結果)を特定し、特定した発話内容に対して会話が成り立つ応答内容を示す対話用データの取得を行う装置である。また、対話装置24は、取得した対話用データから、応答内容に対応する、テキストデータを抽出する。そして、対話装置24は、抽出したテキストデータを後段の音声合成装置25に送る。 The dialogue device 24 analyzes the text data sent from the speech recognition device 23 to identify the user's utterance content (analysis result), and obtains dialogue data indicating the response content in which conversation is established for the identified utterance content. It is a device that performs. Further, the dialogue device 24 extracts text data corresponding to the response content from the obtained dialogue data. Then, the dialogue device 24 sends the extracted text data to the subsequent speech synthesizer 25.
 音声合成装置25は、対話装置24から送られたテキストデータをPCMデータにするTTS(Text to Speech)装置である。音声合成装置25は、変換したPCMデータを後段の再生装置26に送る。再生装置26は、音声合成装置25から送られたPCMデータを音波としてスピーカ27に出力する装置である。ここで出力される音波は、人が認識できる音をいう。また、再生装置26から出力された音波は、ユーザの発話内容に対する応答内容となる。これにより、ユーザとロボット101との間で会話が成り立つことになる。また、再生装置26は、PCMデータをスピーカ27に出力すると同時に再生状況取得装置28に出力する。 The speech synthesizer 25 is a TTS (Text-to-Speech) device that converts text data sent from the dialogue device 24 into PCM data. The speech synthesizer 25 sends the converted PCM data to the playback device 26 at the subsequent stage. The playback device 26 is a device that outputs the PCM data sent from the speech synthesizer 25 to the speaker 27 as sound waves. The sound wave output here means a sound that can be recognized by a person. In addition, the sound wave output from the playback device 26 becomes the response content for the user's utterance content. As a result, a conversation is established between the user and the robot 101. Further, the playback device 26 outputs the PCM data to the speaker 27 and simultaneously outputs it to the playback status acquisition device 28.
 再生状況取得装置28は、再生装置26からPCMデータが送られると、スピーカ27による音声の出力が開始されたことを示す信号、すなわちロボット101によるユーザへの音声の再生開始時(発話開始時)であることを示す信号を姿勢制御装置3に送る。 When the PCM data is sent from the playback device 26, the playback status acquisition device 28 is a signal indicating that the output of the voice from the speaker 27 has started, that is, when the playback of the voice to the user by the robot 101 is started (at the start of speech). Is sent to the attitude control device 3.
 姿勢制御装置3は、ロボット101の姿勢を制御する装置であって、駆動制御装置31、筐体状態取得装置32、姿勢記録装置33、挙動パターン記録装置34を含んでいる。駆動制御装置31は、ロボット101の姿勢を、上記駆動系(駆動部)1の駆動状態から特定する姿勢特定部31aと、上記駆動系1の駆動制御を行う駆動制御部31bを含んでいる。 The posture control device 3 is a device that controls the posture of the robot 101, and includes a drive control device 31, a housing state acquisition device 32, a posture recording device 33, and a behavior pattern recording device 34. The drive control device 31 includes a posture specifying unit 31 a that specifies the posture of the robot 101 from the drive state of the drive system (drive unit) 1 and a drive control unit 31 b that performs drive control of the drive system 1.
 筐体状態取得装置32は、駆動系1の駆動状態を示す情報を取得する装置である。ここで、駆動系1の駆動状態を示す情報とは、ロボット101の姿勢を特定するための駆動系1がどのような駆動状態にあるかを示す情報である。例えば、ロボットの関節に取り付けられたロータリーエンコーダから得られる関節の角度情報やトルクのオン/オフ状態などが駆動状態を示す情報に相当する。この情報は、筐体状態取得装置32から駆動制御装置31の姿勢特定部31aに送られる。 The housing state acquisition device 32 is a device that acquires information indicating the drive state of the drive system 1. Here, the information indicating the driving state of the driving system 1 is information indicating what driving state the driving system 1 for specifying the posture of the robot 101 is in. For example, joint angle information obtained from a rotary encoder attached to the joint of the robot, torque on / off state, and the like correspond to information indicating the drive state. This information is sent from the housing state acquisition device 32 to the attitude specifying unit 31a of the drive control device 31.
 姿勢記録装置33は、ロボット101がとる発話意図提示姿勢を記録する装置である。具体的には、姿勢記録装置33には、ロボット101が発話意図提示姿勢をとるように、駆動系1の駆動状態を示す情報が記録されている。発話意図提示姿勢とは、例えば、ロボットが口に手を当てるような姿勢や気を付け姿勢、ユーザの顔を向くといった姿勢のことであり、ロボットが発話を行う意図をユーザに示す姿勢のことである。 The posture recording device 33 is a device that records a speech intention presentation posture taken by the robot 101. Specifically, information indicating the driving state of the drive system 1 is recorded in the posture recording device 33 so that the robot 101 assumes the utterance intention presentation posture. The utterance intention presentation posture is, for example, a posture in which the robot touches the mouth, a careful posture, a posture in which the user faces the user's face, and the posture in which the robot indicates the intention to speak to the user. It is.
 挙動パターン記録装置34は、ロボット101の発話内容に対応付けられた挙動パターンを記録する装置である。具体的には、挙動パターン記録装置34には、挙動パターンとして、発話内容毎に対応付けられた駆動系1の駆動状態を示す情報が記録されている。なお、挙動パターンとして、姿勢記録装置33の情報だけでなく、筐体状態取得装置32ならび、例えば転倒検知や重力加速度などの各種センサ、もしくはロボット101の内部状態、例えば音声認識結果の過去の行動パターンを追加しても構わない。さらに、ユーザの発話内容をカテゴリ分けしたもの。発話時のピッチに合わせたものであってもよい。また、発話意図提示姿勢は、ロボット101が物を持つなど、当該ロボット101の状況によるため一種類でなく、複数種類あってもよい。 The behavior pattern recording device 34 is a device that records a behavior pattern associated with the utterance content of the robot 101. Specifically, in the behavior pattern recording device 34, information indicating the driving state of the driving system 1 associated with each utterance content is recorded as a behavior pattern. As behavior patterns, not only the information of the posture recording device 33 but also the housing state acquisition device 32, for example, various sensors such as fall detection and gravitational acceleration, or the internal state of the robot 101, for example, past behaviors of voice recognition results. You may add a pattern. In addition, the user's utterance content is divided into categories. It may be adapted to the pitch at the time of utterance. Also, the utterance intention presentation posture is not limited to one type, and may be a plurality of types, depending on the situation of the robot 101 such as the robot 101 holding an object.
 姿勢特定部31aでは、ロボット201の駆動系1の駆動状態を示す情報を取得することで、当該ロボット201が現在どのような姿勢をしているかを特定するようになっている。特定した姿勢を示す情報は、姿勢特定部31aから駆動制御部31bに送られる。 The posture specifying unit 31a acquires information indicating the drive state of the drive system 1 of the robot 201, thereby specifying what posture the robot 201 is currently in. Information indicating the identified posture is sent from the posture identifying unit 31a to the drive control unit 31b.
 駆動制御部31bでは、ユーザとの対話開始時に上記姿勢特定部31aによって特定された上記ロボット101の姿勢が、発話意図提示姿勢であるか否かを判定する。ここで、ユーザとの対話開始時は、当該ロボット101によるユーザへの音声の再生開始時である。つまり、駆動制御部31bは、再生状況取得装置28からのロボット101によるユーザへの音声の再生開始されたことを示す信号を受け取ったタイミングにより、姿勢特定部31aによって特定されたロボット101の姿勢を判定する。 The drive control unit 31b determines whether or not the posture of the robot 101 specified by the posture specifying unit 31a at the start of the dialogue with the user is an utterance intention presentation posture. Here, the dialogue with the user is started when the robot 101 starts reproducing the voice to the user. That is, the drive control unit 31b determines the posture of the robot 101 specified by the posture specifying unit 31a at the timing when the signal indicating that the reproduction of the voice to the user by the robot 101 is started from the reproduction status acquisition device 28 is received. judge.
 駆動制御部31bは、ロボット101の姿勢を判定した結果、当該ロボット101の姿勢が発話意図提示姿勢でない場合、駆動系1を駆動させてロボット101に発話意図提示姿勢をとらせる。つまり、ユーザとの対話開始時に上記ロボット101の姿勢を特定し(姿勢特定ステップ)、特定されたロボットの姿勢が、当該ロボット101に発話を行う意図があることを示す発話意図提示姿勢であるか否かを判定する。そして、発話意図提示姿勢ではない場合に、駆動系1を駆動させて当該ロボット101に上記発話意図提示姿勢をとらせる(駆動制御ステップ)。このように、ロボット101は、ユーザとの対話開始時に発話意図提示姿勢でない場合に、当該発話意図提示姿勢に戻る動作を行っているため、ユーザはロボット101に発話の意図があることを容易に理解することができる。 If the posture of the robot 101 is not the utterance intention presentation posture as a result of determining the posture of the robot 101, the drive control unit 31b drives the drive system 1 to cause the robot 101 to assume the utterance intention presentation posture. That is, whether or not the robot 101 is identified at the start of the dialogue with the user (posture identifying step), and whether the identified robot posture is an utterance intention presentation posture indicating that the robot 101 is intended to speak. Determine whether or not. If it is not the utterance intention presentation posture, the drive system 1 is driven to cause the robot 101 to take the utterance intention presentation posture (drive control step). As described above, when the robot 101 is not in the utterance intention presentation posture at the start of the dialogue with the user, the robot 101 performs an operation of returning to the utterance intention presentation posture. Therefore, the user can easily confirm that the robot 101 has the intention of utterance. I can understand.
 また、駆動制御部31bは、ロボット101の姿勢を判定した結果、当該ロボット101の姿勢が発話意図提示姿勢である場合、ロボット101の発話開始前に、これから発話を行うことをユーザに知らせるための動作を行わせる。例えば、ロボット101の発話意図提示姿勢において頭が正面を向いていれば、一度頭をうなだれた状態にして頭を正面に戻す動作を発話開始前に行わせる。この動作を行った後、ロボット101は発話を行う。これにより、ユーザは、ロボット101に発話の意図があることを容易に理解することができる。 In addition, when the posture of the robot 101 is determined to be the utterance intention presentation posture as a result of determining the posture of the robot 101, the drive control unit 31b notifies the user that the utterance will be performed before the utterance of the robot 101 starts. Let the action take place. For example, if the head is facing the front in the utterance intention presentation posture of the robot 101, the operation of returning the head to the front with the head once swung is performed before starting the utterance. After performing this operation, the robot 101 speaks. Thereby, the user can easily understand that the robot 101 has the intention of speaking.
 (姿勢制御処理)
 図2は、図1に示すロボット101の姿勢制御処理の流れを示すシーケンス図である。以下のシーケンス図には、ロボット101による音声が再生されるまでの処理(1)、音声再生中にロボット101の挙動が終了したときの処理(2)、ロボット101の挙動中に音声再生が終了したときの処理(3)が含まれている。
(Attitude control processing)
FIG. 2 is a sequence diagram showing the flow of the posture control process of the robot 101 shown in FIG. In the following sequence diagram, the process until the voice is reproduced by the robot 101 (1), the process when the behavior of the robot 101 is finished during the voice reproduction (2), and the voice reproduction is finished during the behavior of the robot 101. Processing (3) is included.
 処理(1)の概要:ロボット101において、音声系2では、基本的にユーザの発話がマイク21より取得され、入力装置22により録される。録された発話について、その後、音声認識装置23により音声認識をし、対話装置24より音声認識結果から対話文字列を取得し、音声合成装置25により対話文字列を音声合成し、再生装置26により音声合成内容をスピーカ27にて鳴動する。なお、ユーザの発話の取得から音声合成内容の鳴動までを一連の所作としている。 Outline of Process (1): In the robot 101, in the voice system 2, the user's utterance is basically acquired from the microphone 21 and recorded by the input device 22. The recorded utterance is then recognized by the speech recognition device 23, the dialogue character string is acquired from the speech recognition result from the dialogue device 24, the dialogue character string is synthesized by the speech synthesizer 25, and the playback device 26 The speech synthesis content is sounded by the speaker 27. Note that a series of operations from the acquisition of the user's utterance to the ringing of the speech synthesis content is made.
 また、本実施形態では、上記音声系2においてユーザとの対話開始時は、ユーザの発話がマイク21より取得され、取得された発話に対応する応答用の発話を再生装置26により再生を開始するタイミングとする。 In the present embodiment, at the start of dialogue with the user in the voice system 2, the user's utterance is acquired from the microphone 21, and playback of the response utterance corresponding to the acquired utterance is started by the playback device 26. Timing.
 姿勢制御装置3では、上記対話開始時に、筐体状態取得装置32によって筐体(ロボット101の駆動系1)の情報(駆動情報)を取得する。そして、駆動制御装置31は必要であれば駆動系1をアクティブにした後に姿勢記録装置33の情報に従って発話意図提示姿勢に変更し、挙動パターン記録装置34から発話内容に応じて挙動パターンのいずれかを選ぶ。 At the attitude control device 3, information (drive information) of the housing (the driving system 1 of the robot 101) is obtained by the housing state obtaining device 32 at the start of the dialogue. Then, if necessary, the drive control device 31 activates the drive system 1 and then changes to the utterance intention presentation posture according to the information of the posture recording device 33, and either one of the behavior patterns is selected from the behavior pattern recording device 34 according to the utterance content. Select.
 駆動系1が挙動パターンに応じて駆動を開始すると、ロボット101の発話が開始される。具体的には、上記処理(1)は、図2に示すシーケンスのうち、(1.音声データ入力)~(13.音データ鳴動)までの処理が対応している。すなわち、マイク21は、ユーザが発話することで入力された音声を波形データに変換し、音データとして入力装置22に出力する(1.音声データ入力)。入力装置22は、入力された音データを入力し、入力した音データを音声認識装置23に出力する(2.音声認識開始命令)。 When the drive system 1 starts driving according to the behavior pattern, the utterance of the robot 101 is started. Specifically, the processing (1) corresponds to the processing from (1. voice data input) to (13. sound data ringing) in the sequence shown in FIG. That is, the microphone 21 converts voice input by the user's speech into waveform data, and outputs the waveform data to the input device 22 (1. voice data input). The input device 22 inputs the input sound data, and outputs the input sound data to the speech recognition device 23 (2. Speech recognition start command).
 音声認識装置23は、図示しない制御部からの音声認識開始命令を受けて、入力された音データをテキストデータに変換し、対話装置24に出力する(3.対話開始命令)。対話装置24は、図示しない制御部からの対話開始命令を受けて、入力されたテキストデータからユーザの発話内容を解析し、発話内容に対応する対話文のテキストデータをデータベース(図示せず)から取得する。そして、取得したテキストデータを音声合成装置25に出力する(4.対話文言合成命令)。 The voice recognition device 23 receives a voice recognition start command from a control unit (not shown), converts the input sound data into text data, and outputs the text data to the dialogue device 24 (3. dialogue start command). The dialogue device 24 receives a dialogue start command from a control unit (not shown), analyzes the user's utterance content from the input text data, and obtains text data of a dialogue sentence corresponding to the utterance content from a database (not shown). get. Then, the acquired text data is output to the speech synthesizer 25 (4. dialogue wording synthesis command).
 音声合成装置25は、図示しない制御部からの対話文言合成命令を受けて、入力されたテキストデータを出力用音波データ(PCMデータ)に変換し、再生装置26に出力する(5.音声データ再生命令)。再生装置26は、図示しない制御部からの音声データ再生命令を受けて、出力用音波データを再生する際に、再生状況取得装置28に対して発話開始状態変更情報を出力する(6.発話開始状態変更)。この発話開始状態変更情報は、ロボット101による発話が開始されたか否かを示す情報であり、この場合は、ロボット101による発話が開示されたことを示す情報である。 The voice synthesizer 25 receives an interactive wording synthesis command from a control unit (not shown), converts the input text data into output sound wave data (PCM data), and outputs it to the playback device 26 (5. Voice data playback). order). When the reproduction device 26 receives an audio data reproduction command from a control unit (not shown) and reproduces the output sound wave data, the reproduction device 26 outputs utterance start state change information to the reproduction state acquisition device 28 (6. Speech start). State change). This utterance start state change information is information indicating whether or not the utterance by the robot 101 has been started. In this case, the utterance start state change information is information indicating that the utterance by the robot 101 has been disclosed.
 再生状況取得装置28は、入力された発話開始状態変更情報からロボット101が発話したことを駆動制御装置31に通知する(7.発話開始状況通知)。ここで通知するのは、ロボット101によるユーザへの音声の再生が開始されたことを示す信号である。 The reproduction status acquisition device 28 notifies the drive control device 31 that the robot 101 has spoken from the input utterance start state change information (7. Talk start status notification). The signal to be notified here is a signal indicating that the robot 101 has started to reproduce the voice to the user.
 駆動制御装置31は、再生状況取得装置28からのロボット101によるユーザへの音声の再生開始されたことを示す信号を受け取ったタイミングにより、筐体状態取得装置32からロボット101の状態(筐体状態)を取得する(8.筐体情報取得)。また、駆動制御装置31は、さらに、姿勢記録装置33に記録されている発話意図提示姿勢も取得する(9.発話意図提示姿勢取得)。これにより、駆動制御装置31は、取得した筐体状態から、姿勢特定部31aによってロボット101の姿勢を特定し、特定したロボット101の姿勢が、取得した発話意図提示姿勢であるか否かを判定することができる。そして、駆動制御装置31は、判定結果に応じて駆動系1を駆動する(10.発話意図提示姿勢移行)。 The drive control device 31 receives the signal indicating that the reproduction of the voice to the user by the robot 101 from the reproduction status acquisition device 28 is received, and the state of the robot 101 (the case state) from the case state acquisition device 32. ) (8. Acquire housing information). Further, the drive control device 31 also acquires the speech intention presentation posture recorded in the posture recording device 33 (9. Speech intention presentation posture acquisition). Accordingly, the drive control device 31 specifies the posture of the robot 101 by the posture specifying unit 31a from the acquired housing state, and determines whether or not the specified posture of the robot 101 is the acquired speech intention presentation posture. can do. Then, the drive control device 31 drives the drive system 1 according to the determination result (10. Utterance intention presentation posture transition).
 ここで、駆動制御装置31は、特定したロボット101の姿勢が発話意図提示姿勢でない場合、発話意図提示姿勢になるように駆動系1を駆動させる。一方、駆動制御装置31は、特定したロボット101の姿勢が発話意図提示姿勢である場合、当該発話意図提示姿勢の頭が正面を向いている状態であれば、一旦、頭をうなだれた状態にしてから正面に戻す動作を行わせる。 Here, when the specified posture of the robot 101 is not the utterance intention presentation posture, the drive control device 31 drives the drive system 1 so as to be the utterance intention presentation posture. On the other hand, when the identified posture of the robot 101 is the utterance intention presentation posture, the drive control device 31 temporarily hangs the head if the head of the utterance intention presentation posture is facing the front. To return to the front.
 駆動制御装置31は、ロボット101が発話を開始する際に、挙動パターン記録装置34から発話内容に応じた挙動パターンを取得し(11.挙動パターン取得)、取得した挙動パターンになるように駆動系1の駆動を開始させる(12.挙動開始命令)。駆動系1の駆動が開始されると、再生装置26は、図示しない制御部からの音声データ再生命令を受けて、入力された出力用音波データを音波としてスピーカ27によって鳴動させる(13.音声データ鳴動)。 When the robot 101 starts an utterance, the drive control device 31 acquires a behavior pattern corresponding to the utterance content from the behavior pattern recording device 34 (11. behavior pattern acquisition), and drives the drive system so that the acquired behavior pattern is obtained. 1 is started (12. Behavior start command). When driving of the drive system 1 is started, the playback device 26 receives an audio data playback command from a control unit (not shown) and causes the speaker 27 to ring the input output sound wave data as a sound wave (13. audio data). Ringing).
 処理(2)の概要:駆動制御装置31は、再生状況取得装置28から取得した情報が発話継続を示す情報である場合、すなわち発話(再生)が終わっていない場合には再度、挙動パターン記録装置34の中いずれかの挙動パターンを作動させる。なお、挙動パターンは、発話が終わったタイミングで選択してもよいし、事前に選択していても構わない。 Outline of process (2): When the information acquired from the reproduction status acquisition device 28 is information indicating continuation of the utterance, that is, when the utterance (reproduction) has not ended, the drive control device 31 again performs the behavior pattern recording device Activating any of the behavior patterns in 34. The behavior pattern may be selected at the timing when the utterance ends or may be selected in advance.
 具体的には、上記の処理(2)は、図2に示すシーケンスのうち、(14.挙動終了)~(18.挙動開始命令)までの処理が対応している。すなわち、ロボット101の発話中(音声再生中)における駆動系1による挙動パターンの終了は、筐体状態取得装置32が取得した駆動系1の駆動状態(筐体状態)によって判定する(14.挙動終了)。筐体状態取得装置32は、挙動終了していることを示す情報を挙動通知として駆動制御装置31に出力する(15.挙動通知命令)。 Specifically, the processing (2) corresponds to the processing from (14. Behavior end) to (18. Behavior start command) in the sequence shown in FIG. That is, the end of the behavior pattern by the drive system 1 during the speech of the robot 101 (during sound reproduction) is determined by the drive state (case state) of the drive system 1 acquired by the case state acquisition device 32 (14. Behavior). End). The housing state acquisition device 32 outputs information indicating that the behavior has ended to the drive control device 31 as a behavior notification (15. behavior notification command).
 駆動制御装置31は、筐体状態取得装置32から取得した筐体状態から挙動が終了したことが通知されると、再生状況取得装置28から再生状況を取得する(16.再生状況取得)。また、駆動制御装置31は、取得した再生状況から再生中であると判断すれば、再度、挙動パターン記録装置34から発話内容に応じた挙動パターンを取得(17.挙動パターン取得)する。そして、取得した挙動パターンになるように駆動系1の駆動を開始させる(18.挙動開始命令)。 When the drive control device 31 is notified that the behavior has ended from the housing state acquired from the housing state acquisition device 32, the drive control device 31 acquires the playback state from the playback state acquisition device 28 (16. playback state acquisition). If the drive control device 31 determines that the reproduction is in progress from the obtained reproduction status, the drive control device 31 again obtains a behavior pattern according to the utterance content from the behavior pattern recording device 34 (17. behavior pattern acquisition). Then, the drive of the drive system 1 is started so as to obtain the acquired behavior pattern (18. Behavior start command).
 処理(3)の概要:駆動制御装置31は、再生状況取得装置28から取得した情報が発話終了を示す情報である場合、発話終了と判定し、駆動系1をアイドル状態もしくは非アクティブとする。一方、発話が終わったタイミングを再生状況取得装置28が取得すると、駆動制御装置31は筐体状態取得装置32を確認し、作動が行われている最中の場合、駆動系1に中止命令を発行し、初期姿勢となる発話意図提示姿勢に戻るように、当該駆動系1を駆動させる。なお、作動が所定の時間(例えば400ms)以内であれば許容の範囲として中止命令を出さない。 Outline of Process (3): When the information acquired from the reproduction status acquisition device 28 is information indicating the end of the utterance, the drive control device 31 determines that the utterance has ended and sets the drive system 1 in the idle state or inactive. On the other hand, when the playback status acquisition device 28 acquires the timing when the utterance is finished, the drive control device 31 checks the housing state acquisition device 32, and if the operation is being performed, issues a stop command to the drive system 1. The drive system 1 is driven so as to return to the utterance intention presentation posture that is issued and is the initial posture. If the operation is within a predetermined time (for example, 400 ms), no stop command is issued as an allowable range.
 具体的には、上記の処理(3)は、図2に示すシーケンスのうち、(19.再生終了)~(22.再生終了命令)までの処理が対応している。すなわち、再生装置26は、再生を終了したとき(19.再生終了)、再生状況取得装置28に対して再生終了状態変更情報を出力する(20.再生終了状態変更)。 Specifically, the processing (3) corresponds to the processing from (19. Playback end) to (22. Playback end command) in the sequence shown in FIG. That is, the playback device 26 outputs playback end state change information to the playback status acquisition device 28 (20. playback end state change) when playback ends (19. playback end).
 再生状況取得装置28は、入力された再生終了状態変更情報からロボット101が発話を終了したことを駆動制御装置31に通知する(21.再生終了通知)。ここで通知するのは、ロボット101によるユーザへの音声の再生が終了されたことを示す信号である。 The playback status acquisition device 28 notifies the drive control device 31 that the robot 101 has finished speaking from the input playback end status change information (21. Playback end notification). The signal to be notified here is a signal indicating that the reproduction of the voice to the user by the robot 101 has been completed.
 駆動制御装置31は、再生状況取得装置28から取得した再生終了通知から再生終了命令(中止命令)を駆動系1に発する(22.再生終了命令)。これにより、駆動系1の動作を停止させる。 The drive control device 31 issues a playback end command (stop command) to the drive system 1 from the playback end notification acquired from the playback status acquisition device 28 (22. playback end command). As a result, the operation of the drive system 1 is stopped.
 (効果)
 以上のように、ロボット101は、ユーザとの対話開始時に、姿勢が発話の意図があることをユーザに知らしめるための発話意図提示姿勢となる。つまり、ロボット101は、ユーザとの対話開始時に、ロボット自体が発話の意図があるか否かをユーザに対して示すことができる。これにより、ユーザとロボット101との対話を円滑に行うことができるので、ユーザとロボットとの間で自然なノンバーバルコミュニケーションを実現できる。
(effect)
As described above, the robot 101 becomes an utterance intention presentation posture for informing the user that the posture is intended to be uttered when the conversation with the user is started. That is, the robot 101 can indicate to the user whether or not the robot itself intends to speak at the start of dialogue with the user. Thereby, since the dialogue between the user and the robot 101 can be performed smoothly, natural non-verbal communication can be realized between the user and the robot.
 なお、本実施形態では、ユーザとの対話開始時を、ロボットによるユーザへの音声の再生開始時としていたが、これに限定されるものではなく、ロボットによるユーザの音声の入力終了時であってもよい。この場合、図2に示すシーケンスでは、入力装置22による入力終了時に、駆動制御装置31に対して対話開始状況通知を行う。このタイミングで駆動制御装置31は、筐体状態取得装置32から筐体状態を取得する。以降の処理は、上述した処理と同じである。 In this embodiment, the start of the dialog with the user is the start of the voice reproduction to the user by the robot. However, the present invention is not limited to this, and is the end of the user's voice input by the robot. Also good. In this case, in the sequence shown in FIG. 2, a dialog start status notification is sent to the drive control device 31 when input by the input device 22 is completed. At this timing, the drive control device 31 acquires the housing state from the housing state acquisition device 32. Subsequent processing is the same as the processing described above.
 また、入力装置22による入力終了時は、音声認識の開始時でもあるため、ユーザとの対話開始時を、ロボットによる音声認識の開始時であってもよい。さらに、ロボット101を構成する筐体にスイッチを設け、スイッチ押下をもってマイク21をONにし、スイッチ離しをもってOFFにする装置の場合、ユーザとの対話開始時を、スイッチ離しを開始時としてもよい。また、ロボット101を構成する筐体にカメラを用い、カメラにおいて人物検知をし、更に人物の唇の動作が終了したことを検知することをもって会話が開始されると想定されるタイミングを、ユーザとの対話開始時としてもよい。 Further, since the end of input by the input device 22 is also the start of speech recognition, the start of dialogue with the user may be the start of speech recognition by the robot. Furthermore, in the case of a device in which a switch is provided in a housing constituting the robot 101, the microphone 21 is turned on when the switch is pressed, and the microphone 21 is turned off when the switch is released, the start of the dialog with the user may be the start of the switch release. In addition, when a camera is used for the housing constituting the robot 101, a person is detected by the camera, and further, the timing when the conversation is assumed to be started by detecting that the movement of the lips of the person is finished is determined with the user. It is also possible to start the dialogue.
 なお、ロボット101が発話の開始をするのは、対話装置24から応答用の発話内容が形成できた場合であり、ユーザの意図が不十分である発話内容の場合や、くしゃみ等ユーザの発話に意味が無い場合では、発話内容が形成できない。このような場合には、ロボット101は発話をする代わりに、発話意図提示姿勢から、発話意図が無くなったことを示す発話意図解除挙動をとっても良い。この発話意図解除挙動としては、発話意図提示姿勢と異なる姿勢になるような挙動であれば、どのような挙動でもよく、ユーザが、ロボット101に発話の意図がなくなったことを認識させ易い挙動であることが好ましい。 Note that the robot 101 starts utterance when the utterance content for response can be formed from the dialogue device 24. In the case of the utterance content that the user's intention is insufficient or the user's utterance such as sneeze. If there is no meaning, utterance content cannot be formed. In such a case, the robot 101 may take an utterance intention release behavior indicating that the utterance intention is lost from the utterance intention presentation posture instead of uttering. The utterance intention release behavior may be any behavior as long as the utterance intention presentation posture is different from the utterance intention presentation posture, and is a behavior in which the user can easily recognize that the utterance intention has been lost. Preferably there is.
 また、上記構成の姿勢制御装置3では、姿勢記録装置33と、挙動パターン記録装置34とは独立して設けられている例を示しているが、これら2つの装置を一つの記録装置としてもよい。 In the posture control device 3 configured as described above, an example is shown in which the posture recording device 33 and the behavior pattern recording device 34 are provided independently. However, these two devices may be a single recording device. .
 〔実施形態2〕
 本発明の他の実施形態について、説明すれば以下のとおりである。なお、説明の便宜上、前記実施形態1にて説明した部材と同じ機能を有する部材については、同じ符号を付記し、その説明を省略する。
[Embodiment 2]
Another embodiment of the present invention will be described as follows. For convenience of explanation, members having the same functions as those described in the first embodiment are given the same reference numerals, and descriptions thereof are omitted.
 (ロボットの概要)
 図3は、本実施形態に係るロボット201の概略構成図である。ロボット201は、音声認識装置23をネットワーク上のサーバ(図示せず)内に設け、当該音声認識装置23との通信を行うための通信装置29を追加した構成となっている点で前記実施形態1のロボット101と異なる。つまり、ロボット201では、マイク21を通して入力された音声を入力装置22で入力した後、入力した音声データを通信装置29によってネットワーク上のサーバに送り、当該サーバ内の音声認識装置23にて音声認識を行わせることができる。そして、サーバ内の音声認識装置23による認識結果は、通信装置29を介して対話装置24に送られる。それらの点で、ロボット201は、前記実施形態1のロボット101と異なる。通信装置29は、インターネット等の外部のネットワーク上に設けられた音声認識装置23と通信が行えるものであれば、どのような方式の通信装置であってもよい。
(Robot overview)
FIG. 3 is a schematic configuration diagram of the robot 201 according to the present embodiment. The robot 201 has the configuration in which the voice recognition device 23 is provided in a server (not shown) on the network and a communication device 29 for performing communication with the voice recognition device 23 is added. Different from the first robot 101. That is, in the robot 201, after the voice input through the microphone 21 is input by the input device 22, the input voice data is sent to the server on the network by the communication device 29, and the voice recognition device 23 in the server recognizes the voice. Can be performed. Then, the recognition result by the voice recognition device 23 in the server is sent to the dialogue device 24 via the communication device 29. In these respects, the robot 201 is different from the robot 101 of the first embodiment. The communication device 29 may be any type of communication device as long as it can communicate with the voice recognition device 23 provided on an external network such as the Internet.
 (姿勢制御処理)
 図4は、図3に示すロボット201の姿勢制御処理の流れを示すシーケンス図である。以下のシーケンス図では、ロボット201による音声が再生されるまでの処理(11)、音声再生中にロボット201の挙動が終了したときの処理(12)、ロボット201の挙動中に音声再生が終了したときの処理(13)が含まれている。
(Attitude control processing)
FIG. 4 is a sequence diagram showing the flow of the posture control process of the robot 201 shown in FIG. In the following sequence diagram, the process until the voice is reproduced by the robot 201 (11), the process when the behavior of the robot 201 is finished during the voice reproduction (12), and the voice reproduction is finished during the behavior of the robot 201. Processing (13) is included.
 処理(11)の概要:この処理(11)は、前記実施形態1で説明した処理(1)とほぼ同じ処理であるが、ユーザとの対話開始時が異なる。すなわち、ユーザの発話がマイク21より音声データとして取得され、取得された音声データを入力装置22により入力し、入力終了した時点を、ユーザと対話開始時としている点で前記実施形態1の処理(1)と異なる。 Outline of process (11): This process (11) is substantially the same process as the process (1) described in the first embodiment, but is different at the time of starting the dialogue with the user. That is, the process of the first embodiment is that the user's utterance is acquired as sound data from the microphone 21, the acquired sound data is input by the input device 22, and the input end time is set as the start of the dialog with the user ( Different from 1).
 具体的には、上記処理(11)は、図4に示すシーケンスのうち、(1.音声データ入力)~(15.音データ鳴動)までの処理が対応している。すなわち、マイク21は、ユーザが発話することで入力された音声を波形データに変換し、音データとして入力装置22に出力する(1.音声データ入力)。入力装置22は、入力された音データを入力し、入力が終了した場合に、駆動制御装置31に対して、対話開始状況通知を行う(2.対話開始状況通知)。この対話開始状況通知により、駆動制御装置31に対して、ユーザの音声の入力が終了したことを通知する。 Specifically, the processing (11) corresponds to the processing from (1. voice data input) to (15. sound data ringing) in the sequence shown in FIG. That is, the microphone 21 converts voice input by the user's speech into waveform data, and outputs the waveform data to the input device 22 (1. voice data input). The input device 22 inputs the input sound data, and when the input is completed, notifies the drive control device 31 of a dialog start status (2. Dialog start status notification). By this dialog start status notification, the drive control device 31 is notified that the input of the user's voice has been completed.
 駆動制御装置31は、再生状況取得装置28からのロボット101によるユーザへの音声の再生開始されたことを示す信号を受け取ったタイミングにより、筐体状態取得装置32からロボット201の状態(筐体状態)を取得する(3.筐体情報取得)。駆動制御装置31は、さらに、姿勢記録装置33に記録されている発話意図提示姿勢も取得する(4.発話意図提示姿勢取得)。これにより、駆動制御装置31は、取得した筐体状態から、姿勢特定部31aによってロボット201の姿勢を特定し、特定したロボット201の姿勢が、取得した発話意図提示姿勢であるか否かを判定することができる。そして、駆動制御装置31は、判定結果に応じて駆動系1を駆動する(5.発話意図提示姿勢移行)。 The drive control device 31 receives the signal indicating that the reproduction of the voice to the user by the robot 101 from the reproduction status acquisition device 28 is received, and the state of the robot 201 (the case state) from the case state acquisition device 32. ) (3. Acquire housing information). The drive control device 31 also acquires the utterance intention presentation posture recorded in the posture recording device 33 (4. utterance intention presentation posture acquisition). Accordingly, the drive control device 31 specifies the posture of the robot 201 by the posture specifying unit 31a from the acquired housing state, and determines whether or not the specified posture of the robot 201 is the acquired utterance intention presentation posture. can do. And the drive control apparatus 31 drives the drive system 1 according to a determination result (5. Speech intention presentation attitude | position transition).
 駆動制御装置31は、特定したロボット201の姿勢が発話意図提示姿勢でない場合、発話意図提示姿勢になるように駆動系1を駆動させる。一方、駆動制御装置31は、特定したロボット201の姿勢が発話意図提示姿勢である場合、当該発話意図提示姿勢の頭が正面を向いている状態であれば、一旦、頭をうなだれた状態にしてから正面に戻す動作を行わせる。 When the identified posture of the robot 201 is not the utterance intention presentation posture, the drive control device 31 drives the drive system 1 so as to be the utterance intention presentation posture. On the other hand, when the identified posture of the robot 201 is the utterance intention presentation posture, the drive control device 31 temporarily hangs the head if the head of the utterance intention presentation posture is facing the front. To return to the front.
 その後、入力装置22は、図示しない制御部による音声認識開始命令(1)を受付けて、通信装置29を介して、ネットワーク上のサーバに設けられた音声認識装置23に入力した音声データを送信する(6.音声認識開始命令(1))。音声認識装置23は、サーバ内の制御部からの音声認識開始命令(2)を受けて、入力された音データをテキストデータに変換し(7.音声認識開始命令(2))、対話装置24に出力する(8.対話開始命令)。 Thereafter, the input device 22 receives a voice recognition start command (1) from a control unit (not shown), and transmits the voice data input to the voice recognition device 23 provided in the server on the network via the communication device 29. (6. Speech recognition start command (1)). The speech recognition device 23 receives the speech recognition start command (2) from the control unit in the server, converts the input sound data into text data (7. speech recognition start command (2)), and the dialogue device 24. (8. Dialogue start command).
 対話装置24は、図示しない制御部からの対話開始命令を受けて、入力されたテキストデータからユーザの発話内容を解析し、発話内容に対応する対話文のテキストデータをデータベース(図示せず)から取得する。さらに、取得したテキストデータを音声合成装置25に出力する(9.対話文言合成命令)。 The dialogue device 24 receives a dialogue start command from a control unit (not shown), analyzes the user's utterance content from the input text data, and obtains text data of a dialogue sentence corresponding to the utterance content from a database (not shown). get. Furthermore, the acquired text data is output to the speech synthesizer 25 (9. Dialogue word composition command).
 音声合成装置25は、図示しない制御部からの対話文言合成命令を受けて、入力されたテキストデータを出力用音波データ(PCMデータ)に変換し、再生装置26に出力する(10.音声データ再生命令)。再生装置26は、図示しない制御部からの音声データ再生命令を受けて、出力用音波データを再生する際に、再生状況取得装置28に対して発話開始状態変更情報を出力する(11.発話開始状態変更)。この発話開始状態変更情報は、ロボット201による発話が開始されたか否かを示す情報であり、この場合は、ロボット201による発話が開始されたことを示す情報である。 The speech synthesizer 25 receives an interactive wording synthesis command from a control unit (not shown), converts the input text data into output sound wave data (PCM data), and outputs it to the playback device 26 (10. Voice data playback). order). When the playback device 26 receives an audio data playback command from a control unit (not shown) and plays back the output sound wave data, the playback device 26 outputs utterance start state change information to the playback status acquisition device 28 (11. Start of speech). State change). The utterance start state change information is information indicating whether or not the utterance by the robot 201 has been started. In this case, the utterance start state change information is information indicating that the utterance by the robot 201 has been started.
 再生状況取得装置28は、入力された発話開始状態変更情報からロボット201が発話したことを駆動制御装置31に通知する(12.発話開始状況通知)。ここで通知するのは、ロボット201によるユーザへの音声の再生が開始されたことを示す信号である。駆動制御装置31は、ロボット201が発話を開始する際に、挙動パターン記録装置34から発話内容に応じた挙動パターンを取得し(13.挙動パターン取得)、取得した挙動パターンになるように駆動系1の駆動を開始させる(14.挙動開始命令)。駆動系1の駆動が開始されると、再生装置26は、図示しない制御部からの音声データ再生命令を受けて、入力された出力用音波データを音波としてスピーカ27によって鳴動させる(15.音声データ鳴動)。 The reproduction status acquisition device 28 notifies the drive control device 31 that the robot 201 has spoken from the input utterance start state change information (12. Talk start status notification). The signal to be notified here is a signal indicating that the robot 201 has started playing the voice to the user. When the robot 201 starts utterance, the drive control device 31 acquires a behavior pattern corresponding to the utterance content from the behavior pattern recording device 34 (13. behavior pattern acquisition), and drives the drive system so that the acquired behavior pattern is obtained. 1 is started (14. Behavior start command). When driving of the drive system 1 is started, the playback device 26 receives a sound data playback command from a control unit (not shown) and causes the speaker 27 to ring the input output sound wave data as a sound wave (15. sound data). Ringing).
 処理(11)は以上の通りであるが、処理(12)は、前記実施形態1の処理(2)と同じであり、処理(13)は、前記実施形態1の処理(3)と同じであるため、これらの処理についての説明は省略する。 The process (11) is as described above, but the process (12) is the same as the process (2) in the first embodiment, and the process (13) is the same as the process (3) in the first embodiment. Therefore, description of these processes is omitted.
 (効果)
 以上のように、ロボット201は、ユーザとの対話開始時に、姿勢が発話の意図があることをユーザに知らしめるための発話意図提示姿勢となる。つまり、ロボット201は、ユーザとの対話開始時に、ロボット自体が発話の意図があるか否かをユーザに対して示すことができる。これにより、ユーザとロボット201との対話を円滑に行うことができる。しかも、本実施形態の場合、音声認識装置23をネットワーク上のサーバ内に設けているので、ロボット201内部での音声認識処理を行わずに済むため、ロボット201における処理負担を軽減することができる。
(effect)
As described above, the robot 201 becomes an utterance intention presentation posture for informing the user that the posture is intended to be uttered when the conversation with the user is started. That is, the robot 201 can indicate to the user whether or not the robot itself intends to speak at the start of dialogue with the user. Thereby, the dialogue between the user and the robot 201 can be performed smoothly. In addition, in the case of the present embodiment, since the voice recognition device 23 is provided in a server on the network, it is not necessary to perform voice recognition processing inside the robot 201, so the processing load on the robot 201 can be reduced. .
 なお、本実施形態では、ユーザとの対話開始時を、ロボットによるユーザの音声の入力終了時としていたが、これに限定されるものではなく、ロボットによるユーザへの音声の再生開始時としてもよい。 In the present embodiment, the start of the dialog with the user is the end of the user's voice input by the robot. However, the present invention is not limited to this, and may be the start of the playback of the voice to the user by the robot. .
 また、本実施形態の場合、前記実施形態1と同様に、入力装置22による入力終了時は、音声認識の開始時でもあるため、ユーザとの対話開始時を、ロボットによる音声認識の開始時であってもよい。さらに、ロボット201を構成する筐体にスイッチを設け、スイッチ押下をもってマイク21をONにし、スイッチ離しをもってOFFにする装置の場合、ユーザとの対話開始時を、スイッチ離しを開始時としてもよい。また、ロボット201を構成する筐体にカメラを用い、カメラにおいて人物検知をし、更に人物の唇の動作が終了したことを検知することをもって会話が開始されると想定されるタイミングを、ユーザとの対話開始時としてもよい。 In the case of the present embodiment, as in the first embodiment, the end of input by the input device 22 is also the start of speech recognition, so the start of dialogue with the user is the start of speech recognition by the robot. There may be. Furthermore, in the case of a device in which a switch is provided in a casing constituting the robot 201 and the microphone 21 is turned on when the switch is pressed and turned off when the switch is released, the start of the dialog with the user may be set as the start of the switch release. In addition, when a camera is used for the housing constituting the robot 201, a person is detected by the camera, and further, the timing when the conversation is assumed to be started by detecting that the movement of the lips of the person is finished is determined with the user. It is also possible to start the dialogue.
 また、ロボット201が発話の開始をするのは、対話装置24から応答用の発話内容が形成できた場合であり、ユーザの意図が不十分である発話内容の場合や、くしゃみ等ユーザの発話に意味が無い場合では、発話内容が形成できない。このような場合には、ロボット201は発話をする代わりに、発話意図提示姿勢から、発話意図が無くなったことを示す発話意図解除挙動をとっても良い。 Also, the robot 201 starts the utterance when the utterance content for response can be formed from the dialogue device 24. In the case of the utterance content that the user's intention is insufficient or the user's utterance such as sneeze. If there is no meaning, utterance content cannot be formed. In such a case, the robot 201 may take an utterance intention release behavior indicating that the utterance intention has disappeared from the utterance intention presentation posture instead of speaking.
 なお、上記構成の姿勢制御装置3では、姿勢記録装置33と、挙動パターン記録装置34とは独立して設けられている例を示しているが、これら2つの装置を一つの記録装置としてもよい。 In the posture control device 3 configured as described above, an example is shown in which the posture recording device 33 and the behavior pattern recording device 34 are provided independently. However, these two devices may be a single recording device. .
 また、前記実施形態1,2では、ロボット101、201の姿勢制御を、音声系2からの出力信号(入力装置22からの音声入力終了を示す信号、再生状況取得装置28からの音声再生開始を示す信号)に基づいて行っていた。これに対して、以下の実施形態3では、カメラで撮像したユーザの顔画像に基づいて行う例について説明する。 In the first and second embodiments, the posture control of the robots 101 and 201 is performed by using an output signal from the voice system 2 (a signal indicating the end of voice input from the input device 22 and voice playback start from the playback status acquisition device 28). Signal). On the other hand, in the following third embodiment, an example performed based on a user's face image captured by a camera will be described.
 〔実施形態3〕
 本発明のさらに他の実施形態について、説明すれば以下のとおりである。なお、説明の便宜上、前記実施形態1にて説明した部材と同じ機能を有する部材については、同じ符号を付記し、その説明を省略する。
[Embodiment 3]
The following will describe still another embodiment of the present invention. For convenience of explanation, members having the same functions as those described in the first embodiment are given the same reference numerals, and descriptions thereof are omitted.
 (ロボットの概要)
 図5は、本実施形態に係るロボット301の概略構成図である。ロボット301は、前記実施形態1のロボット101とほぼ同じ構成であり、画像系4が新たに設けられている点で異なる。画像系4は、ユーザの顔を撮像するカメラ41、およびカメラ41によって撮像した顔画像を取得する画像取得装置(画像取得部)42を含んでいる。画像系4は、さらに、画像取得装置42が取得した顔画像が、当該ユーザによる発話終了を示す画像であるか否かを判定する画像判定装置(画像判定部)43を含んでいる。
(Robot overview)
FIG. 5 is a schematic configuration diagram of the robot 301 according to the present embodiment. The robot 301 has substantially the same configuration as the robot 101 of the first embodiment, and is different in that the image system 4 is newly provided. The image system 4 includes a camera 41 that captures the face of the user, and an image acquisition device (image acquisition unit) 42 that acquires a face image captured by the camera 41. The image system 4 further includes an image determination device (image determination unit) 43 that determines whether or not the face image acquired by the image acquisition device 42 is an image indicating the end of speech by the user.
 カメラ41は、ロボット301の対話相手となるユーザを撮像するデジタルカメラであって、ロボット301の内部に搭載できるものであれば、どのような形式、方式のカメラであってもよい。画像取得装置42は、カメラ41が撮像したユーザの画像から当該ユーザの顔画像を取得する装置である。画像取得装置42は、取得したユーザの顔画像を画像判定装置43に送る。 The camera 41 is a digital camera that captures an image of a user who is a conversation partner of the robot 301, and may be any type and system as long as it can be mounted inside the robot 301. The image acquisition device 42 is a device that acquires the face image of the user from the user image captured by the camera 41. The image acquisition device 42 sends the acquired user face image to the image determination device 43.
 画像判定装置43は、画像取得装置42から送られたユーザの顔画像から顔認証を行い、認証結果から、ユーザによる発話終了を示す画像であるか否かを判定する装置である。ここでは、ユーザの口が閉じたときの顔画像であるか否かを判定する。そして、ユーザの口が閉じたときの顔画像を判定した結果を、姿勢制御装置3に送る。つまり、姿勢制御装置3では、ユーザの口が閉じたときのタイミングでロボット301の姿勢制御を行う。すなわち、本実施形態では、姿勢制御装置3によるロボット301の姿勢判定のタイミング、すなわちユーザとの対話開始時を、上記画像判定装置43によりユーザの発話終了を示す画像であると判定された時とする。 The image determination device 43 is a device that performs face authentication from the user's face image sent from the image acquisition device 42 and determines whether the image indicates the end of speech by the user from the authentication result. Here, it is determined whether or not the face image is when the user's mouth is closed. Then, the result of determining the face image when the user's mouth is closed is sent to the posture control device 3. That is, the posture control device 3 performs posture control of the robot 301 at the timing when the user's mouth is closed. That is, in this embodiment, the timing of the posture determination of the robot 301 by the posture control device 3, that is, the time when the dialogue with the user is started is determined by the image determination device 43 as an image indicating the end of the user's utterance. To do.
 (効果)
 以上のように、ロボット301は、ユーザとの対話開始時(画像判定装置43によりユーザの発話終了を示す画像であると判定された時)に、姿勢が発話の意図があることをユーザに知らしめるための発話意図提示姿勢となる。つまり、ロボット301は、ユーザとの対話開始時に、ロボット自体が発話の意図があるか否かをユーザに対して示すことができる。これにより、ユーザとロボット301との対話を円滑に行うことができる。
(effect)
As described above, the robot 301 informs the user that the posture is intended to be uttered when the dialogue with the user is started (when the image determining device 43 determines that the image indicates the end of the user's utterance). It becomes the utterance intention presentation posture to squeeze. That is, the robot 301 can indicate to the user whether or not the robot itself intends to speak at the start of the dialogue with the user. Thereby, the dialogue between the user and the robot 301 can be performed smoothly.
 (変形例)
 図6は、図5に示すロボット301の変形例のロボット401の概略構成ブロック図である。ロボット401は、前記実施形態2のロボット201とほぼ同じ構成であり、画像系4が新たに設けられている点で異なる。また、画像系4を用いた姿勢制御については、図5に示すロボット301と同じであるため、その説明は省略する。
(Modification)
FIG. 6 is a schematic configuration block diagram of a robot 401 which is a modification of the robot 301 shown in FIG. The robot 401 has substantially the same configuration as the robot 201 of the second embodiment, and is different in that the image system 4 is newly provided. The posture control using the image system 4 is the same as that of the robot 301 shown in FIG.
 (効果)
 ロボット401によれば、ロボット301とほぼ同じ効果を奏し、さらに、音声認識装置23をネットワーク上のサーバ内に設けているので、ロボット401内部での音声認識処理を行わずに済む。そのため、ロボット401における処理負担を軽減することができるという効果を奏する。
(effect)
According to the robot 401, substantially the same effect as the robot 301 is obtained, and furthermore, since the voice recognition device 23 is provided in a server on the network, it is not necessary to perform voice recognition processing inside the robot 401. As a result, the processing load on the robot 401 can be reduced.
 なお、本実施形態のロボット301及び変形例のロボット401においては、何れも、ユーザとの対話開始時を、ユーザの発話終了を示す画像を判定した時としていたが、これに限定されなくてもよい。例えば、前記実施形態1のロボット101、前記実施形態2のロボット201と同様に、音声系2からの出力信号(入力装置22からの音声入力終了を示す信号、再生状況取得装置28からの音声再生開始を示す信号)の受信時であってもよい。 Note that in the robot 301 of the present embodiment and the robot 401 of the modification example, the dialogue start time with the user is determined when an image indicating the end of the user's speech is determined. However, the present invention is not limited to this. Good. For example, similar to the robot 101 of the first embodiment and the robot 201 of the second embodiment, an output signal from the voice system 2 (a signal indicating the end of voice input from the input device 22, a voice reproduction from the playback status acquisition device 28). It may be at the time of reception of a signal indicating the start.
 〔ソフトウェアによる実現例〕
 駆動制御装置31の制御ブロックは、集積回路(ICチップ)等に形成された論理回路(ハードウェア)によって実現してもよいし、CPU(Central Processing Unit)を用いてソフトウェアによって実現してもよい。
[Example of software implementation]
The control block of the drive control device 31 may be realized by a logic circuit (hardware) formed in an integrated circuit (IC chip) or the like, or may be realized by software using a CPU (Central Processing Unit). .
 後者の場合、駆動制御装置31は、各機能を実現するソフトウェアであるプログラムの命令を実行するCPU、上記プログラムおよび各種データがコンピュータ(またはCPU)で読み取り可能に記録されたROM(Read Only Memory)または記憶装置(これらを「記録媒体」と称する)、上記プログラムを展開するRAM(Random Access Memory)などを備えている。そして、コンピュータ(またはCPU)が上記プログラムを上記記録媒体から読み取って実行することにより、本発明の目的が達成される。上記記録媒体としては、「一時的でない有形の媒体」、例えば、テープ、ディスク、カード、半導体メモリ、プログラマブルな論理回路などを用いることができる。また、上記プログラムは、該プログラムを伝送可能な任意の伝送媒体(通信ネットワークや放送波等)を介して上記コンピュータに供給されてもよい。なお、本発明は、上記プログラムが電子的な伝送によって具現化された、搬送波に埋め込まれたデータ信号の形態でも実現され得る。 In the latter case, the drive control device 31 includes a CPU that executes instructions of a program that is software that implements each function, and a ROM (Read Only Memory) in which the program and various data are recorded so as to be readable by the computer (or CPU). Alternatively, a storage device (these are referred to as “recording media”), a RAM (Random Access Memory) for expanding the program, and the like are provided. And the objective of this invention is achieved when a computer (or CPU) reads the said program from the said recording medium and runs it. As the recording medium, a “non-temporary tangible medium” such as a tape, a disk, a card, a semiconductor memory, a programmable logic circuit, or the like can be used. The program may be supplied to the computer via an arbitrary transmission medium (such as a communication network or a broadcast wave) that can transmit the program. The present invention can also be realized in the form of a data signal embedded in a carrier wave in which the program is embodied by electronic transmission.
 〔まとめ〕
 本発明の態様1に係る姿勢制御装置は、ユーザとの対話が可能であり、且つ複数の駆動部(駆動系1)を駆動させて種々の姿勢をとることが可能なロボット(101,201,301,401)に備えられ、当該ロボット(101,201,301,401)の姿勢を制御する姿勢制御装置(31)であって、上記ロボット(101,201,301,401)の姿勢を、上記各駆動部(駆動系1)の駆動状態から特定する姿勢特定部(31a)と、上記各駆動部(駆動系1)の駆動制御を行う駆動制御部(31b)と、を備え、上記駆動制御部(31b)は、ユーザとの対話開始時に上記姿勢特定部(31a)によって特定された上記ロボット(101,201,301,401)の姿勢が、当該ロボット(101,201,301,401)に発話を行う意図があることを示す発話意図提示姿勢でない場合に、上記各駆動部(駆動系1)を駆動させて当該ロボット(101,201,301,401)に上記発話意図提示姿勢をとらせることを特徴としている。
[Summary]
The posture control apparatus according to the first aspect of the present invention is a robot (101, 201,) capable of interacting with a user and driving a plurality of drive units (drive system 1) to take various postures. 301, 401) and a posture control device (31) for controlling the posture of the robot (101, 201, 301, 401), wherein the posture of the robot (101, 201, 301, 401) is An attitude specifying unit (31a) that specifies the drive state of each drive unit (drive system 1), and a drive control unit (31b) that controls the drive of each drive unit (drive system 1). The unit (31b) indicates that the posture of the robot (101, 201, 301, 401) specified by the posture specifying unit (31a) at the start of the dialogue with the user is the robot (101, 201, 301, 40). ) Is not an utterance intention presentation posture indicating that there is an intention to utter, and each of the driving units (drive system 1) is driven to set the utterance intention presentation posture to the robot (101, 201, 301, 401). It is characterized by letting it take.
 上記の構成によれば、ユーザとの対話開始時に、ロボット(101,201,301,401)の姿勢を常に発話意図提示姿勢にさせることができるので、ユーザはロボットに発話意図があることを当該ロボットの姿勢により視覚的に容易に認識できる。 According to the above configuration, since the posture of the robot (101, 201, 301, 401) can always be set to the utterance intention presentation posture at the start of the dialogue with the user, the user can confirm that the robot has the utterance intention. It can be easily recognized visually by the posture of the robot.
 これにより、ユーザとの対話開始時に、ロボット自体が発話の意図があるか否かをユーザに対して明確に示すことができるので、ユーザとロボットの対話を円滑に行うことが可能となり、その結果、ユーザとロボットとの間で自然なノンバーバルコミュニケーションを実現できる。 As a result, it is possible to clearly indicate to the user whether or not the robot itself intends to speak at the start of the dialog with the user. Natural non-verbal communication can be realized between the user and the robot.
 本発明の態様2に係る姿勢制御装置は、上記態様1において、上記ロボット(101,201,301,401)が、ユーザの音声を入力し、入力した音声に応じて当該ユーザに向かって音声を再生することでユーザとの対話を行うとき、上記ユーザとの対話開始時は、当該ロボット(101,201,301,401)によるユーザへの音声の再生開始時であってもよい。 In the posture control apparatus according to aspect 2 of the present invention, in the aspect 1, the robot (101, 201, 301, 401) inputs a user's voice, and makes a voice toward the user according to the input voice. When a dialog with the user is performed by playing, the start of the dialog with the user may be a time when the robot (101, 201, 301, 401) starts to play a voice to the user.
 上記の構成によれば、ユーザとの対話開始時は、ロボット(101,201,301,401)によるユーザへの音声の再生開始時であるので、ロボットが当に発話しようとするタイミングでユーザに発話意図提示姿勢をとることができる。これにより、ロボットに発話意図があることを当該ロボットの姿勢に加えて、ユーザに対して、音声により明確に認識させることができる。 According to the above configuration, since the robot (101, 201, 301, 401) starts the voice reproduction to the user at the start of the dialogue with the user, the robot prompts the user at the timing when the robot is about to speak. The utterance intention presentation posture can be taken. Thereby, in addition to the posture of the robot, the user can clearly recognize that the robot has an intention to speak by voice.
 本発明の態様3に係る姿勢制御装置は、上記態様1において、上記ロボット(101,201,301,401)が、上記ユーザの音声を入力し、入力した音声に応じて当該ユーザに向かって音声を再生することでユーザとの対話を行うとき、上記ユーザとの対話開始時は、当該ロボット(101,201,301,401)によるユーザの音声の入力終了時であってもよい。 The posture control device according to aspect 3 of the present invention is the posture control apparatus according to aspect 1, in which the robot (101, 201, 301, 401) inputs the user's voice and speaks to the user according to the input voice. When the dialogue with the user is performed by reproducing the above, the start of the dialogue with the user may be the end of the user's voice input by the robot (101, 201, 301, 401).
 上記の構成によれば、ユーザとの対話開始時は、ロボット(101,201,301,401)によるユーザの音声の入力終了時であるので、ユーザの発話終了のタイミングでユーザに発話意図提示姿勢をとることができる。これにより、ロボットは、ユーザに対して迅速に発話の意図があることを知らせることができる。 According to the above configuration, since the dialogue with the user starts when the user's voice input by the robot (101, 201, 301, 401) ends, the utterance intention presentation posture to the user at the timing of the user's utterance end. Can be taken. Thereby, the robot can inform the user that there is an intention to speak quickly.
 本発明の態様4に係る姿勢制御装置は、上記態様1において、ユーザの顔を撮像した顔画像を取得する画像取得部(画像取得装置42)と、上記画像取得部(画像取得装置42)が取得した顔画像が、当該ユーザによる発話終了を示す画像であるか否かを判定する画像判定部(画像判定装置43)と、を備え、上記ユーザとの対話開始時は、上記画像判定部(画像判定装置43)によりユーザの発話終了を示す画像であると判定された時であってもよい。 In the aspect control device according to aspect 4 of the present invention, in the above aspect 1, the image acquisition unit (image acquisition device 42) that acquires a face image obtained by imaging the user's face and the image acquisition unit (image acquisition device 42) include: An image determination unit (image determination device 43) for determining whether or not the acquired face image is an image indicating the end of the utterance by the user, and at the start of the dialogue with the user, the image determination unit ( It may be when the image determination device 43) determines that the image indicates the end of the user's utterance.
 上記の構成によれば、ユーザとの対話開始時は、上記画像判定部(画像判定装置43)によりユーザの発話終了を示す画像であると判定された時であるので、ユーザの発話終了タイミングでユーザに発話意図提示姿勢をとることができる。これにより、ロボットは、ユーザに対して迅速に発話の意図があることを知らせることができる。 According to the above configuration, the start of the dialogue with the user is a time when the image determination unit (image determination device 43) determines that the image indicates the end of the user's utterance. The user can take the utterance intention presentation posture. Thereby, the robot can inform the user that there is an intention to speak quickly.
 本発明の態様5に係るロボットは、上記態様1~4の何れか1態様に係る姿勢制御装置(31)を備えたことを特徴としている。上記の構成によれば、ユーザに対して明確に発話の意図があることを知らせることができる。 A robot according to aspect 5 of the present invention is characterized by including the attitude control device (31) according to any one of the above aspects 1 to 4. According to the above configuration, it is possible to notify the user that there is a clear intention to speak.
 本発明の態様6に係る姿勢制御方法は、ユーザとの対話が可能であり、且つ複数の駆動部(駆動系1)を駆動させて種々の姿勢をとることが可能なロボット(101,201,301,401)の姿勢を制御する姿勢制御方法であって、ユーザとの対話開始時に上記ロボット(101,201,301,401)の姿勢を特定する姿勢特定ステップと、上記姿勢特定ステップにより特定されたロボット(101,201,301,401)の姿勢が、当該ロボット(101,201,301,401)に発話を行う意図があることを示す発話意図提示姿勢でない場合に、上記各駆動部を駆動させて当該ロボット(101,201,301,401)に上記発話意図提示姿勢をとらせる駆動制御ステップと、を含むことを特徴としている。上記の構成によれば、上記態様1と同じ効果を奏する。 The posture control method according to aspect 6 of the present invention is a robot (101, 201, 101) capable of interacting with a user and driving a plurality of drive units (drive system 1) to take various postures. 301, 401) is a posture control method for controlling the posture of the robot (101, 201, 301, 401) at the start of dialogue with the user, and is specified by the posture specifying step and the posture specifying step. When the posture of the robot (101, 201, 301, 401) is not an utterance intention presentation posture indicating that the robot (101, 201, 301, 401) has an intention to utter, the above drive units are driven. And a drive control step for causing the robot (101, 201, 301, 401) to take the utterance intention presentation posture. According to said structure, there exists the same effect as the said aspect 1. FIG.
 本発明の各態様に係る姿勢制御装置は、コンピュータによって実現してもよく、この場合には、コンピュータを上記姿勢制御装置が備える各部(ソフトウェア要素)として動作させることにより上記姿勢制御装置をコンピュータにて実現させる姿勢制御装置の姿勢制御プログラム、およびそれを記録したコンピュータ読み取り可能な記録媒体も、本発明の範疇に入る。 The posture control device according to each aspect of the present invention may be realized by a computer. In this case, the posture control device is operated on each computer by causing the computer to operate as each unit (software element) included in the posture control device. The attitude control program of the attitude control device to be realized and a computer-readable recording medium on which the attitude control program is recorded also fall within the scope of the present invention.
 本発明は上述した各実施形態に限定されるものではなく、請求項に示した範囲で種々の変更が可能であり、異なる実施形態にそれぞれ開示された技術的手段を適宜組み合わせて得られる実施形態についても本発明の技術的範囲に含まれる。さらに、各実施形態にそれぞれ開示された技術的手段を組み合わせることにより、新しい技術的特徴を形成することができる。 The present invention is not limited to the above-described embodiments, and various modifications are possible within the scope shown in the claims, and embodiments obtained by appropriately combining technical means disclosed in different embodiments. Is also included in the technical scope of the present invention. Furthermore, a new technical feature can be formed by combining the technical means disclosed in each embodiment.
1 駆動系(駆動部)、2 音声系、3 姿勢制御装置、4 画像系、21 マイク、22 入力装置、23 音声認識装置、24 対話装置、25 音声合成装置、26 再生装置、27 スピーカ、28 再生状況取得装置、29 通信装置、31 駆動制御装置31a 姿勢特定部、31b 駆動制御部、32 筐体状態取得装置、33 姿勢記録装置、34 挙動パターン記録装置、41 カメラ、42 画像取得装置(画像取得部)、43 画像判定装置(画像判定部)、101,201,301,401 ロボット 1 drive system (drive unit), 2 voice system, 3 attitude control device, 4 image system, 21 microphone, 22 input device, 23 speech recognition device, 24 dialog device, 25 speech synthesizer, 26 playback device, 27 speaker, 28 Playback status acquisition device, 29 communication device, 31 drive control device 31a attitude specifying unit, 31b drive control unit, 32 housing state acquisition device, 33 attitude recording device, 34 behavior pattern recording device, 41 camera, 42 image acquisition device (image Acquisition unit), 43 image determination device (image determination unit) 101, 201, 301, 401 robot

Claims (6)

  1.  ユーザとの対話が可能であり、且つ複数の駆動部を駆動させて種々の姿勢をとることが可能なロボットに備えられ、当該ロボットの姿勢を制御する姿勢制御装置であって、
     上記ロボットの姿勢を、上記各駆動部の駆動状態から特定する姿勢特定部と、
     上記各駆動部の駆動制御を行う駆動制御部と、
     を備え、
     上記駆動制御部は、
     ユーザとの対話開始時に上記姿勢特定部によって特定された上記ロボットの姿勢が、当該ロボットに発話を行う意図があることを示す発話意図提示姿勢でない場合に、上記各駆動部を駆動させて当該ロボットに上記発話意図提示姿勢をとらせることを特徴とする姿勢制御装置。
    A posture control device for controlling a posture of a robot that is capable of interacting with a user and that is provided in a robot that can take various postures by driving a plurality of drive units,
    A posture identifying unit that identifies the posture of the robot from the driving state of each driving unit;
    A drive control unit that performs drive control of each of the drive units;
    With
    The drive control unit
    When the posture of the robot specified by the posture specifying unit at the start of dialogue with the user is not an utterance intention presentation posture indicating that the robot has an intention to speak, the robot is driven by driving the driving units. A posture control device that causes the above-mentioned utterance intention presentation posture to be taken.
  2.  上記ロボットが、ユーザの音声を入力し、入力した音声に応じて当該ユーザに向かって音声を再生することでユーザとの対話を行うとき、
     上記ユーザとの対話開始時は、当該ロボットによるユーザへの音声の再生開始時であることを特徴とする請求項1に記載の姿勢制御装置。
    When the robot inputs a user's voice and interacts with the user by reproducing the voice toward the user according to the input voice,
    The posture control apparatus according to claim 1, wherein the dialogue with the user is started when the robot starts reproducing the voice to the user.
  3.  上記ロボットが、上記ユーザの音声を入力し、入力した音声に応じて当該ユーザに向かって音声を再生することでユーザとの対話を行うとき、
     上記ユーザとの対話開始時は、当該ロボットによるユーザの音声の入力終了時であることを特徴とする請求項1に記載の姿勢制御装置。
    When the robot interacts with the user by inputting the user's voice and playing the voice toward the user according to the input voice,
    The posture control apparatus according to claim 1, wherein the dialogue with the user is started when the user's voice input by the robot is finished.
  4.  ユーザの顔を撮像した顔画像を取得する画像取得部と、
     上記画像取得部が取得した顔画像が、当該ユーザによる発話終了を示す画像であるか否かを判定する画像判定部と、
    を備え、
     上記ユーザとの対話開始時は、上記画像判定部によりユーザの発話終了を示す画像であると判定された時であることを特徴とする請求項1に記載の姿勢制御装置。
    An image acquisition unit that acquires a face image obtained by imaging a user's face;
    An image determination unit for determining whether the face image acquired by the image acquisition unit is an image indicating the end of speech by the user;
    With
    The posture control apparatus according to claim 1, wherein the start of the dialogue with the user is a time when the image determination unit determines that the image indicates the end of the user's utterance.
  5.  ユーザとの対話が可能であり、且つ複数の駆動部を駆動させて種々の姿勢をとることが可能なロボットであって、
     請求項1~4の何れか1項に記載の姿勢制御装置を備えたことを特徴とするロボット。
    A robot capable of interacting with a user and driving a plurality of driving units to take various postures;
    A robot comprising the attitude control device according to any one of claims 1 to 4.
  6.  ユーザとの対話が可能であり、且つ複数の駆動部を駆動させて種々の姿勢をとることが可能なロボットの姿勢を制御する姿勢制御方法であって、
     ユーザとの対話開始時に上記ロボットの姿勢を特定する姿勢特定ステップと、
     上記姿勢特定ステップにより特定されたロボットの姿勢が、当該ロボットに発話を行う意図があることを示す発話意図提示姿勢でない場合に、上記各駆動部を駆動させて当該ロボットに上記発話意図提示姿勢をとらせる駆動制御ステップと、
    を含むことを特徴とする姿勢制御方法。
    An attitude control method for controlling the attitude of a robot capable of interacting with a user and driving a plurality of driving units to take various attitudes,
    A posture identifying step for identifying the posture of the robot at the start of dialogue with the user;
    When the posture of the robot identified by the posture identifying step is not an utterance intention presentation posture indicating that the robot has an intention to utter, the drive unit is driven to set the utterance intention presentation posture to the robot. Drive control step to be taken,
    A posture control method comprising:
PCT/JP2017/005857 2016-02-25 2017-02-17 Pose control device, robot, and pose control method WO2017145929A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201780007508.6A CN108698231A (en) 2016-02-25 2017-02-17 Posture control device, robot and posture control method
JP2018501632A JPWO2017145929A1 (en) 2016-02-25 2017-02-17 Attitude control device, robot, and attitude control method

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2016-034712 2016-02-25
JP2016034712 2016-02-25

Publications (1)

Publication Number Publication Date
WO2017145929A1 true WO2017145929A1 (en) 2017-08-31

Family

ID=59686191

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2017/005857 WO2017145929A1 (en) 2016-02-25 2017-02-17 Pose control device, robot, and pose control method

Country Status (3)

Country Link
JP (1) JPWO2017145929A1 (en)
CN (1) CN108698231A (en)
WO (1) WO2017145929A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004034274A (en) * 2002-07-08 2004-02-05 Mitsubishi Heavy Ind Ltd Conversation robot and its operation method
JP2005288573A (en) * 2004-03-31 2005-10-20 Honda Motor Co Ltd Mobile robot
JP2006181651A (en) * 2004-12-24 2006-07-13 Toshiba Corp Interactive robot, voice recognition method of interactive robot and voice recognition program of interactive robot
JP2015066621A (en) * 2013-09-27 2015-04-13 株式会社国際電気通信基礎技術研究所 Robot control system, robot, output control program and output control method

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007155986A (en) * 2005-12-02 2007-06-21 Mitsubishi Heavy Ind Ltd Voice recognition device and robot equipped with the same
JP4976903B2 (en) * 2007-04-05 2012-07-18 本田技研工業株式会社 robot
JP2009222969A (en) * 2008-03-17 2009-10-01 Toyota Motor Corp Speech recognition robot and control method for speech recognition robot
JP5982840B2 (en) * 2012-01-31 2016-08-31 富士通株式会社 Dialogue device, dialogue program, and dialogue method
JP2013237124A (en) * 2012-05-15 2013-11-28 Fujitsu Ltd Terminal device, method for providing information, and program
US9044863B2 (en) * 2013-02-06 2015-06-02 Steelcase Inc. Polarized enhanced confidentiality in mobile camera applications
CN103753578A (en) * 2014-01-24 2014-04-30 成都万先自动化科技有限责任公司 Wearing service robot
CN104951077A (en) * 2015-06-24 2015-09-30 百度在线网络技术(北京)有限公司 Man-machine interaction method and device based on artificial intelligence and terminal equipment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004034274A (en) * 2002-07-08 2004-02-05 Mitsubishi Heavy Ind Ltd Conversation robot and its operation method
JP2005288573A (en) * 2004-03-31 2005-10-20 Honda Motor Co Ltd Mobile robot
JP2006181651A (en) * 2004-12-24 2006-07-13 Toshiba Corp Interactive robot, voice recognition method of interactive robot and voice recognition program of interactive robot
JP2015066621A (en) * 2013-09-27 2015-04-13 株式会社国際電気通信基礎技術研究所 Robot control system, robot, output control program and output control method

Also Published As

Publication number Publication date
JPWO2017145929A1 (en) 2018-10-25
CN108698231A (en) 2018-10-23

Similar Documents

Publication Publication Date Title
JP4086280B2 (en) Voice input system, voice input method, and voice input program
JP5750380B2 (en) Speech translation apparatus, speech translation method, and speech translation program
JP5533854B2 (en) Speech recognition processing system and speech recognition processing method
US9792901B1 (en) Multiple-source speech dialog input
JP4622384B2 (en) ROBOT, ROBOT CONTROL DEVICE, ROBOT CONTROL METHOD, AND ROBOT CONTROL PROGRAM
JP2017021125A5 (en) Voice dialogue apparatus and voice dialogue method
WO2018135276A1 (en) Speech and behavior control device, robot, control program, and control method for speech and behavior control device
WO2020079918A1 (en) Information processing device and information processing method
JP5137031B2 (en) Dialogue speech creation device, utterance speech recording device, and computer program
JP6448950B2 (en) Spoken dialogue apparatus and electronic device
WO2017145929A1 (en) Pose control device, robot, and pose control method
JP6798258B2 (en) Generation program, generation device, control program, control method, robot device and call system
JP2009104047A (en) Information processing method and information processing apparatus
US8666549B2 (en) Automatic machine and method for controlling the same
JP5495612B2 (en) Camera control apparatus and method
JP6908636B2 (en) Robots and robot voice processing methods
JP2005308950A (en) Speech processors and speech processing system
WO2019187543A1 (en) Information processing device and information processing method
JP2016186646A (en) Voice translation apparatus, voice translation method and voice translation program
JP4143487B2 (en) Time-series information control system and method, and time-series information control program
JP2008051950A (en) Information processing apparatus
JP2015187738A (en) Speech translation device, speech translation method, and speech translation program
JP4735965B2 (en) Remote communication system
WO2024135221A1 (en) Information processing device and game video generation method
JP7007616B2 (en) Training data generator, training data generation method and program

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2018501632

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17756368

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 17756368

Country of ref document: EP

Kind code of ref document: A1