JP2024034347A

JP2024034347A - Sound generation notification device and sound generation notification method

Info

Publication number: JP2024034347A
Application number: JP2022138535A
Authority: JP
Inventors: 晶子坂口; Shoko Sakaguchi
Original assignee: JVCKenwood Corp
Current assignee: JVCKenwood Corp
Priority date: 2022-08-31
Filing date: 2022-08-31
Publication date: 2024-03-13

Abstract

To provide a sound generation notification device and a sound generation notification method that can appropriately notify sound generation.SOLUTION: A sound generation notification device 10 according to this embodiment includes: a sound acquisition unit 21 for acquiring sound collected; a sound direction identification unit 25 for identifying a direction of arrival of the sound; a sound content recognition unit 23 for recognizing contents of the sound; a display unit 12 for displaying images; and a display control unit 26 for displaying an image of a virtual body generated based on the direction of arrival and the contents of the sound on the display unit 12.SELECTED DRAWING: Figure 1

Description

本発明は、視覚を通じて音の発生を通知する音発生通知装置および音発生通知方法に関する。 The present invention relates to a sound generation notification device and a sound generation notification method that visually notify the generation of sound.

一般に、耳が遠い人や聴覚に障がいのある人（以下、聴覚障がい者という）は、周辺音（生活音や報知音など）が聞こえにくい、または聞こえない場合がある。このため、この種の周辺音の到来方向を推定できるマイクで収音して音源定位を行い、携帯端末や家電などの文字表示や点滅で報知する技術が知られている（例えば、特許文献１参照）。この種の技術によれば、周囲で音が発生したことを聴覚障がい者に簡易的に知らせることができる。 In general, people who are hard of hearing or have hearing impairments (hereinafter referred to as hearing-impaired people) have difficulty or may not be able to hear surrounding sounds (such as everyday sounds and notification sounds). For this reason, there is a known technology that localizes the sound source by collecting the sound with a microphone that can estimate the direction of arrival of this type of ambient sound, and then notifies the user by displaying text or blinking on mobile terminals, home appliances, etc. (for example, Patent Document 1) reference). According to this type of technology, it is possible to easily notify a hearing-impaired person that a sound is occurring in the surrounding area.

特開２０１９－６６５２９号公報JP2019-66529A

しかしながら、従来の技術では、予め登録した音源に対してしか音声の発生を報知することができず、音の到来方向や、声の高低、声色（感情）などを詳細に認識することが難しく、音の発生を適切に通知する点で改善の余地があった。 However, with conventional technology, it is only possible to notify the occurrence of sound from pre-registered sound sources, making it difficult to recognize in detail the direction of sound arrival, the pitch of the voice, the tone (emotion), etc. There was room for improvement in properly notifying the occurrence of sounds.

本発明は、上記に鑑みてなされたものであって、音の発生を適切に通知することができる音発生通知装置および音発生通知方法を提供することを目的とする。 The present invention has been made in view of the above, and an object of the present invention is to provide a sound generation notification device and a sound generation notification method that can appropriately notify the generation of sound.

上述した課題を解決し、目的を達成するために、本発明にかかる音発生通知装置は、収音された音声を取得する音声取得部と、音声の到来方向を識別する音声方向識別部と、音声の内容を認識する音声内容認識部と、画像を表示する表示部と、音声の到来方向と内容とに基づいて生成された仮想体の画像を表示部に表示させる表示制御部と、を備える。 In order to solve the above-mentioned problems and achieve the purpose, a sound generation notification device according to the present invention includes: a sound acquisition unit that acquires collected sound; a sound direction identification unit that identifies the direction of arrival of the sound; It includes a voice content recognition unit that recognizes the content of the voice, a display unit that displays an image, and a display control unit that causes the display unit to display an image of the virtual object that is generated based on the direction of arrival and the content of the voice. .

また、本発明にかかる音発生通知方法は、収音された音声を取得する音声取得ステップと、音声の到来方向を識別する音声方向識別ステップと、音声の内容を認識する音声内容認識ステップと、音声の到来方向と内容とに基づいて生成された仮想体の画像を表示部に表示させる表示制御ステップと、を備える。 Further, the sound generation notification method according to the present invention includes a sound acquisition step of acquiring collected sound, a sound direction identification step of identifying the direction of arrival of the sound, and a sound content recognition step of recognizing the content of the sound. The method further includes a display control step of displaying, on a display unit, an image of the virtual object generated based on the arrival direction and content of the audio.

本発明によれば、音の発生を適切に通知することができる、という効果を奏する。 According to the present invention, it is possible to appropriately notify the generation of sound.

図１は、本実施形態にかかる音発生通知装置の構成例を示すブロック図である。FIG. 1 is a block diagram showing a configuration example of a sound generation notification device according to this embodiment. 図２は、立体図形投影モードの動作の手順を示すフローチャートである。FIG. 2 is a flowchart showing the procedure of operation in the three-dimensional figure projection mode. 図３は、立体図形投影モードが実行されるグループミーティングで表示される仮想体の一例を示す図である。FIG. 3 is a diagram showing an example of a virtual object displayed in a group meeting in which the three-dimensional figure projection mode is executed. 図４は、危険・注意報知モードの動作の手順を示すフローチャートである。FIG. 4 is a flowchart showing the procedure of operation in the danger/warning mode. 図５は、危険・注意報知モードが実行される市街地で表示される仮想体の一例を示す図である。FIG. 5 is a diagram showing an example of a virtual object displayed in a city area where the danger/warning notification mode is executed.

図１は、本実施形態にかかる音発生通知装置の構成例を示すブロック図である。音発生通知装置１０は、例えば、ＡＲ（Augmented Reality）グラスや、スマートグラス、ヘッドマウントディスプレイのように、ユーザの頭部に装着して使用される眼鏡タイプの機器である。音発生通知装置１０は、例えば、聴覚に障がいがあるユーザの周囲で音が発生したことを該ユーザに対して視覚を通じて通知する。図１に示すように、音発生通知装置１０は、収音部１１と、表示部１２と、通信部１３と、通知制御装置２０とを備える。 FIG. 1 is a block diagram showing a configuration example of a sound generation notification device according to this embodiment. The sound generation notification device 10 is, for example, a glasses-type device worn on the user's head, such as AR (Augmented Reality) glasses, smart glasses, or a head-mounted display. The sound generation notification device 10 visually notifies a hearing-impaired user that a sound has occurred around the user, for example. As shown in FIG. 1, the sound generation notification device 10 includes a sound collection section 11, a display section 12, a communication section 13, and a notification control device 20.

収音部１１は、マイクロフォンアレイであり、それぞれ異なる位置に配置された複数個のマイクロフォンを有する。収音部１１は、音発生通知装置１０に到来した音を収音し、これら収音した音から複数チャネルの音響信号を生成する。収音部１１は、生成した複数チャネルの音響信号を通知制御装置２０の音声取得部２１に出力する。また、収音部１１として、バイノーラルマイクや立体音響マイクなどの音の到来方向を推定できるマイクロフォンを用いてもよい。 The sound collection unit 11 is a microphone array, and has a plurality of microphones arranged at different positions. The sound collection unit 11 collects sounds that arrive at the sound generation notification device 10, and generates multiple channels of acoustic signals from these collected sounds. The sound collection unit 11 outputs the generated multi-channel audio signals to the audio acquisition unit 21 of the notification control device 20 . Further, as the sound collection unit 11, a microphone capable of estimating the arrival direction of sound, such as a binaural microphone or a stereophonic microphone, may be used.

表示部１２は、音発生通知装置１０を装着したユーザの眼前に配置されて画像を表示するものであり、本実施形態では、例えばグラスモニタである。グラスモニタは、画像を表示するとともに、そのグラスモニタを透かして視界を目で見ることができる構造となっている。表示部１２は、通知制御装置２０の表示制御部２６によって、実際の視界の景色に重畳して画像が表示される。 The display unit 12 is arranged in front of the eyes of the user wearing the sound generation notification device 10 and displays an image, and in this embodiment is, for example, a glass monitor. The glass monitor has a structure that not only displays an image but also allows the field of view to be seen through the glass monitor. On the display unit 12, the display control unit 26 of the notification control device 20 displays an image superimposed on the scenery of the actual field of view.

通信部１３は、通信ユニットである。通信部１３は、インターネット又は携帯電話回線等のいずれの方法で通信を行ってもよい。通信部１３は、通知制御装置２０の通信制御部２７から出力された制御信号に基づいて、例えば、不図示のサーバ装置と各種通信を実行する。 The communication section 13 is a communication unit. The communication unit 13 may communicate using any method such as the Internet or a mobile phone line. The communication unit 13 performs various communications with, for example, a server device (not shown) based on a control signal output from the communication control unit 27 of the notification control device 20.

通知制御装置２０は、音声取得部２１と、音源分離部２２と、音声内容認識部２３と、感情解析部２４と、音声方向識別部２５と、表示制御部２６と、通信制御部２７と、聴力レベル判定部２８と、記憶部２９とを備える。 The notification control device 20 includes a voice acquisition section 21, a sound source separation section 22, a voice content recognition section 23, an emotion analysis section 24, a voice direction identification section 25, a display control section 26, a communication control section 27, It includes a hearing level determination section 28 and a storage section 29.

音声取得部２１は、収音部１１が出力する複数チャネルの音響信号を取得し、取得した音響信号を音源分離部２２に出力する。音源分離部２２は、音声取得部２１が取得した複数チャンネルに含まれる複数の音響信号を音源ごとの音響信号に分離する。音源分離部２２は、例えば、複数人が同時に発話した音声が混在して複数の音響信号を、各人の音声の音響信号に分離したり、例えば、屋外で聞こえる車の走行音と人が発話した音声とが混載した複数の音響信号を、車の走行音の音響信号と人の音声の音響信号とに分離したりする。 The audio acquisition unit 21 acquires multiple channels of audio signals output from the sound collection unit 11 and outputs the acquired audio signals to the sound source separation unit 22 . The sound source separation unit 22 separates the plurality of acoustic signals included in the plurality of channels acquired by the audio acquisition unit 21 into acoustic signals for each sound source. The sound source separation unit 22 may, for example, separate a plurality of acoustic signals in which sounds uttered by multiple people at the same time are mixed into acoustic signals of the voices of each person, or, for example, separate the sound of a car that can be heard outdoors and the utterances of a person. The system separates a plurality of audio signals mixed with the sound of a car into an audio signal of a car running sound and an audio signal of a human voice.

音声内容認識部２３は、音源分離された各音響信号に対する音声認識処理を行うことにより、人の音声の音響信号をテキストデータに変換する。この場合、人が発話した音声以外の音声（例えば、車の走行音や警音、自転車のベルやブレーキ音、各種警報音、雨音、風音、雷音など）については、どの種別の音声内容であるかを判別して認識する。音声認識の手法は既存の手法を用いることができる。 The speech content recognition unit 23 converts the acoustic signal of a human voice into text data by performing speech recognition processing on each acoustic signal whose sound source has been separated. In this case, for sounds other than those uttered by people (for example, car running sounds, alarm sounds, bicycle bells and brake sounds, various alarm sounds, rain sounds, wind sounds, thunder sounds, etc.), which type of sound is used? Determine and recognize the content. An existing method can be used for speech recognition.

感情解析部２４は、発話者の感情を解析する。例えば、感情解析部２４は、発話された音声の音響信号における声の大きさや、声のトーンなどにより、発話者の感情がポジティブ（平穏や幸福など）であるかネガティブ（怒りや悲しみ、恐怖など）であるかを解析する。感情解析部２４は、音声の音響信号に、認識された音声内容を加えて発話者の感情を解析してもよい。感情を解析する手法は既存の手法(例えば、OpenSMILEソフトウェア)を用いることができる。 The emotion analysis unit 24 analyzes the emotion of the speaker. For example, the emotion analysis unit 24 determines whether the speaker's emotion is positive (calm, happy, etc.) or negative (angry, sad, fear, etc.) based on the volume and tone of the voice in the acoustic signal of the uttered voice. ). The emotion analysis unit 24 may add the recognized voice content to the audio signal of the voice and analyze the emotion of the speaker. Existing methods (eg, OpenSMILE software) can be used to analyze emotions.

音声方向識別部２５は、音発生通知装置１０に向けて到来する音声の方向を識別する。例えば、音声方向識別部２５は、収音した複数チャネルの各音響信号の到達時間の差を元に、収音した音声がどの方向から到来したかを識別する。到来する音声方向を識別する手法についてはこの他にも既存の手法を用いることができる。 The audio direction identification unit 25 identifies the direction of audio coming toward the sound generation notification device 10 . For example, the audio direction identification unit 25 identifies from which direction the collected audio has arrived based on the difference in arrival time of each of the collected audio signals of the plurality of channels. Other existing methods can be used to identify the direction of the incoming voice.

表示制御部２６は、表示部１２を通じて透過するユーザの視界内の実際の景色に仮想体を重畳して表示する制御を行う。この仮想体は、立体的なイラスト、図形、文字、アイコンなどを含み、音声の到来方向や音声の大きさ、種別などを示す。表示制御部２６は、ユーザの聴力レベルに応じて、仮想体の示す音声の種別を変更して、表示部１２に表示する。聴力レベルが高い（難聴度が低く、比較的聞き取ることができる）場合、表示制御部２６は、例えば、発話した人物や音声の高低によって表示される仮想体の色を変えたり、音声の大小によって仮想体の大きさを変えたり、声色（感情）によって仮想体の形状を変える。一方、聴力レベルが低い（難聴度が高く、聞き取ることが困難な）場合、表示制御部２６は、例えば、音声の種別（例えば自動車や自転車、警報器など）によって、表示される仮想体の大きさ、形状などを変える。さらに、表示制御部２６は、仮想体に、音声が到来した方向を含んで表示する。例えば、表示制御部２６は、音声の到来方向から仮想体を表示部１２に出現させてもよいし、視界の正面を１２時方向として、ユーザの周囲を１２等分して、仮想体の近くに「○時方向」という到来方向を表示してもよい。 The display control unit 26 performs control to display the virtual object superimposed on the actual scenery within the user's field of view that is transmitted through the display unit 12 . This virtual object includes three-dimensional illustrations, figures, characters, icons, etc., and indicates the direction of arrival of the sound, the volume, type, etc. of the sound. The display control unit 26 changes the type of sound indicated by the virtual object according to the user's hearing level and displays it on the display unit 12. When the hearing level is high (the degree of hearing loss is low and it is possible to hear relatively well), the display control unit 26 may, for example, change the color of the virtual body displayed depending on the person speaking or the pitch of the voice, or change the color depending on the volume of the voice. Change the size of the virtual body or change the shape of the virtual body depending on the tone of voice (emotion). On the other hand, if the hearing level is low (the degree of hearing loss is high and it is difficult to hear), the display control unit 26 controls the size of the virtual object to be displayed depending on the type of sound (for example, a car, a bicycle, an alarm, etc.). Change the shape, etc. Furthermore, the display control unit 26 displays the virtual object including the direction from which the sound came. For example, the display control unit 26 may cause the virtual object to appear on the display unit 12 from the direction in which the sound arrives, or may divide the area around the user into 12 equal parts with the front of the field of view in the 12 o'clock direction, and The direction of arrival may be displayed as "○ o'clock direction".

通信制御部２７は、通信部１３を介して、不図示のサーバ装置と各種通信を実行する。例えば、音声内容認識部２３、感情解析部２４および音声方向識別部２５の機能の少なくとも一部をサーバ装置に備えた構成として、サーバ装置で実行された各種情報を取得してもよい。 The communication control unit 27 performs various communications with a server device (not shown) via the communication unit 13. For example, a server device may be configured to include at least some of the functions of the voice content recognition unit 23, emotion analysis unit 24, and voice direction identification unit 25, and various information executed by the server device may be acquired.

聴力レベル判定部２８は、音発生通知装置１０を使用するユーザの聴力レベル（難聴レベルともいう）を判定する。一般に聴力レベルは、ユーザによって異なり、小さな声が聞きづらい「軽度難聴」、普通の会話が聞きづらい「中等度難聴」、普通の会話が聞き取れない「高度難聴」、耳元で話されても聞き取れない「重度難聴」などに区分けされた例もある。本実施形態では、音発生通知装置１０は、ユーザの聴力レベルに応じて、周囲で発生した音声のうち、表示部１２に表示させる仮想体の示す音声の種別を変更する。すなわち、普通の会話が聞き取れない「高度難聴」および「重度難聴」のユーザに対しては、危険性の高い音や注意を促す音が発生した場合に、この種の音の発生を示す仮想体を表示する。また、小さな声が聞きづらい「軽度難聴」および「中等度難聴」ユーザに対しては、会話における声（音）の発生を示す仮想体を表示する。 The hearing level determining unit 28 determines the hearing level (also referred to as hearing loss level) of the user who uses the sound generation notification device 10. In general, the hearing level differs depending on the user, with ``mild hearing loss'' making it difficult to hear small voices, ``moderate hearing loss'' making it difficult to hear normal conversations, ``severe hearing loss'' making it difficult to hear normal conversations, and ``severe hearing loss'' making it difficult to hear people talking close to their ears. In some cases, it has been categorized as ``hearing loss.'' In this embodiment, the sound generation notification device 10 changes the type of sound indicated by the virtual object displayed on the display unit 12, among sounds generated in the surroundings, according to the user's hearing level. In other words, for users with "severe hearing loss" and "severe hearing loss" who are unable to hear normal conversations, when a highly dangerous sound or a sound that calls for attention occurs, a virtual object indicating the occurrence of this type of sound is displayed. Display. Furthermore, for users with "mild hearing loss" and "moderate hearing loss" who have difficulty hearing small voices, a virtual object indicating the occurrence of voices (sounds) during conversation is displayed.

聴力レベル判定部２８は、例えば、音発生通知装置１０の使用初回時に聴力判定を行う。聴力レベル判定部２８は、例えば、通信部１３を介して、ユーザが使用するテレビと通信接続してユーザの設定音量値（ｄＢ）を取得する。そして、聴力レベル判定部２８は、ユーザの設定音量値が、テレビメーカー等で規定される一般的なテレビ音量値（ｄＢ）よりどのくらい大きく設定されているかにより、聴力（難聴）レベル判定を行う。また、聴力レベル判定部２８は、以下に示す他の手法によって、ユーザの聴力レベルを判定してもよい。 The hearing level determination unit 28 performs a hearing determination, for example, when the sound generation notification device 10 is used for the first time. For example, the hearing level determination unit 28 is communicatively connected to the television used by the user via the communication unit 13 to obtain the user's set volume value (dB). Then, the hearing level determination unit 28 determines the hearing (hearing loss) level based on how much higher the user's set volume value is than the general TV volume value (dB) specified by the TV manufacturer or the like. Further, the hearing level determination unit 28 may determine the user's hearing level using other methods described below.

Ａ；あらかじめテレビは、ネットワーク上のサーバ装置に、視聴中の一般視聴者のテレビ音量値（ｄＢ）データを蓄積しておく。聴力レベル判定部２８は、例えば、通信部１３を介して、ユーザが使用するテレビと通信接続してユーザの設定音量値（ｄＢ）を取得する。そして、聴力レベル判定部２８は、ユーザの設定音量値が、サーバ装置上の一般視聴者の平均音量値とどのくらい違うか基づき、聴力レベル判定を行ってもよい。 A: The TV stores in advance the TV volume value (dB) data of the general viewer who is watching the TV on a server device on the network. For example, the hearing level determination unit 28 is communicatively connected to the television used by the user via the communication unit 13 to obtain the user's set volume value (dB). Then, the hearing level determining unit 28 may determine the hearing level based on how much the user's set volume value differs from the average volume value of the general audience on the server device.

Ｂ；ヘッドホンもしくはイヤホンを取り付けた音発生通知装置１０にオージオメーター機能を搭載し、ユーザは初回使用時に聴力検査を行い、その結果により聴力レベル判定を行ってもよい。また、オージオメーター機能の代わりに、音発生通知装置１０に骨導受信器機能を搭載しての骨導検査による聴力レベル判定でもよい。 B: The sound generation notification device 10 to which headphones or earphones are attached may be equipped with an audiometer function, and the user may perform a hearing test upon first use and determine the hearing level based on the results. Furthermore, instead of the audiometer function, the sound generation notification device 10 may be equipped with a bone conduction receiver function to determine the hearing level through a bone conduction test.

Ｃ；初回使用の設定時に、ヘッドホンもしくはイヤホンを取り付けた音発生通知装置１０から設定案内音声を出し、ユーザは「はい」や「いいえ」等で答えることにより音声認識で音発生通知装置１０の設定ができるようにする。この場合、設定案内音声の音量値により聴力レベルを判定してもよい。なお、重度難聴のユーザは、設定案内音声の最大音レベルが終わった後に、音発生通知装置１０の表示部１２に文字で設定案内を表示させることにより設定することができる。 C: When setting up for the first time, the sound notification device 10 attached to headphones or earphones will emit a setting guidance voice, and the user can set the sound notification device 10 using voice recognition by answering “yes” or “no”. be able to do so. In this case, the hearing level may be determined based on the volume value of the setting guidance voice. Note that a user with severe hearing loss can set the setting by displaying the setting guide in text on the display unit 12 of the sound generation notification device 10 after the maximum sound level of the setting guide voice has finished.

Ｄ；脳波を利用した聴性脳幹反応検査（ＡＢＲ）機能などを音発生通知装置１０に搭載し、ユーザが寝ている時や安静時などに聴力レベルを判定してもよい。さらには、ユーザ自身が聴力レベル設定を行ってもよい。 D: The sound generation notification device 10 may be equipped with an auditory brainstem response test (ABR) function using brain waves, and the hearing level may be determined while the user is sleeping or at rest. Furthermore, the user himself or herself may set the hearing level.

記憶部２９は、音発生通知装置１０の各種制御データや制御プログラムを記憶する。本実施形態では、記憶部２９は、立体図形投影モードと、危険・注意報知モードとをそれぞれ実行する制御プログラムを記憶する。立体図形投影モードは、主に音源定位の困難をサポートするものであり、例えば「軽度難聴」および「中等度難聴」のユーザ向けの制御プログラムである。また、危険・注意報知モードは、周囲で発生した危険や注意を報知するものであり、例えば「高度難聴」および「重度難聴」のユーザ向けの制御プログラムである。また、記憶部２９には、聴力判定により、ユーザの聴力レベルが記憶されている。 The storage unit 29 stores various control data and control programs for the sound generation notification device 10. In this embodiment, the storage unit 29 stores control programs for respectively executing the three-dimensional figure projection mode and the danger/warning mode. The three-dimensional figure projection mode mainly supports difficulties in sound source localization, and is, for example, a control program for users with "mild hearing loss" and "moderate hearing loss." Further, the danger/warning notification mode is for notifying dangers and cautions occurring in the surroundings, and is a control program for users with "severe hearing loss" and "severe hearing loss", for example. The storage unit 29 also stores the user's hearing level based on the hearing ability determination.

次に、音発生通知装置１０の動作について説明する。音発生通知装置１０は、判定されたユーザの聴力レベルに応じて、立体図形投影モードまたは危険・注意報知モードを自動的に実行する。図２は、音発生通知装置の立体図形投影モードの動作の手順を示すフローチャートである。図３は、立体図形投影モードが実行されるグループミーティングで表示される仮想体の一例を示す図である。この図３の例では、ユーザは、５人のメンバＰ１～Ｐ５とともにグループミーティング（会議）に参加しているものとする。 Next, the operation of the sound generation notification device 10 will be explained. The sound generation notification device 10 automatically executes the three-dimensional figure projection mode or the danger/warning mode depending on the determined hearing level of the user. FIG. 2 is a flowchart showing the operation procedure of the sound generation notification device in the three-dimensional figure projection mode. FIG. 3 is a diagram showing an example of a virtual object displayed in a group meeting in which the three-dimensional figure projection mode is executed. In the example of FIG. 3, it is assumed that the user is participating in a group meeting (conference) with five members P1 to P5.

まず、音発生通知装置１０は、収音部１１で収音した周辺音を取得する（ステップＳ１）。より詳しくは、音声取得部２１は、収音部１１が出力する複数チャネルの音響信号を取得する。この取得した音響信号は、音源分離部２２に出力される。 First, the sound generation notification device 10 acquires ambient sound collected by the sound collection unit 11 (step S1). More specifically, the audio acquisition unit 21 acquires multiple channels of audio signals output by the sound collection unit 11. This acquired acoustic signal is output to the sound source separation section 22.

次に、音発生通知装置１０は、音源分離部２２により、複数チャンネルに含まれる複数の音響信号を音源ごとの音響信号に分離する（ステップＳ２）。音源分離部２２は、例えば、複数人が同時に会話している複数の音響信号から各人の音響信号に分離する。また、音源分離部２２は、例えば、車や自転車が往来する市街地での複数の音響信号からそれぞれの音源の音響信号（車の走行音や自転車のベルなど）を分離することができる。 Next, the sound generation notification device 10 uses the sound source separation unit 22 to separate the plurality of acoustic signals included in the plurality of channels into acoustic signals for each sound source (step S2). The sound source separation unit 22 separates, for example, a plurality of acoustic signals of a plurality of people having a conversation at the same time into acoustic signals of each person. Further, the sound source separation unit 22 can separate the sound signals of each sound source (such as the sound of a car running or a bicycle bell) from a plurality of sound signals in a city area where cars and bicycles come and go, for example.

次に、音発生通知装置１０は、音声内容認識部２３により、音源分離された各音響信号に対する音声認識処理を行う（ステップＳ３）。より詳しくは、音声内容認識部２３は、例えば、人が発話した音声の音響信号をテキストデータに変換する。また、音声内容認識部２３は、例えば、人が発話した音声以外の音声（例えば、車の走行音や警音、自転車のベルやブレーキ音、各種警報音、雨音、風音、雷音など）については、音声の種別を判別して認識することができる。 Next, the sound generation notification device 10 uses the speech content recognition unit 23 to perform speech recognition processing on each sound signal whose sound source has been separated (step S3). More specifically, the speech content recognition unit 23 converts, for example, an acoustic signal of a speech uttered by a person into text data. The audio content recognition unit 23 also recognizes, for example, sounds other than those uttered by a person (for example, the sound of a car running, a warning sound, a bicycle bell or brake sound, various warning sounds, the sound of rain, the sound of wind, the sound of thunder, etc.). ) can be recognized by determining the type of voice.

次に、音発生通知装置１０は、取得した音声の音量が所定の閾値より大きいか否かを判別する（ステップＳ４）。音発生通知装置１０は、例えば、５人のメンバＰ１～Ｐ５のいずれかが発話した音声を取得し、この音声の音量が所定の閾値より大きいか否かを判別する。この判別において、音声の音量が所定の閾値より大きくない（ステップＳ４；Ｎｏ）場合には、処理をステップＳ１に戻し、音声の音量が所定の閾値より大きい（ステップＳ４；Ｙｅｓ）場合には、処理をステップＳ５に移動する。 Next, the sound generation notification device 10 determines whether the volume of the acquired sound is greater than a predetermined threshold (step S4). The sound generation notification device 10, for example, acquires the voice uttered by any of the five members P1 to P5, and determines whether the volume of this voice is greater than a predetermined threshold. In this determination, if the volume of the audio is not greater than the predetermined threshold (step S4; No), the process returns to step S1, and if the volume of the audio is greater than the predetermined threshold (step S4; Yes), The process moves to step S5.

次に、音発生通知装置１０は、感情解析部２４により、音声の大小、感情、高低などを解析する（ステップＳ５）。より詳しくは、感情解析部２４は、発話された音声の音響信号における声の大きさや、声のトーンなどにより、発話者の感情がポジティブ（平穏や幸福など）であるかネガティブ（怒りや悲しみ、恐怖など）であるかを解析する。 Next, the sound generation notification device 10 uses the emotion analysis unit 24 to analyze the magnitude, emotion, pitch, etc. of the voice (step S5). More specifically, the emotion analysis unit 24 determines whether the speaker's emotion is positive (calm, happy, etc.) or negative (angry, sad, etc.) based on the volume and tone of the voice in the acoustic signal of the uttered voice. fear, etc.).

次に、音発生通知装置１０は、音声方向識別部２５により、音声の到来方向を識別する（ステップＳ６）。より詳しくは、音声方向識別部２５は、収音した複数チャネルの各音響信号の到達時間の差を算出し、この時間差に基づいて、収音した音声の到来方向を識別する。 Next, the sound generation notification device 10 uses the audio direction identification unit 25 to identify the direction of arrival of the audio (step S6). More specifically, the audio direction identification unit 25 calculates the difference in arrival time of each of the collected audio signals of the plurality of channels, and identifies the direction of arrival of the collected audio based on this time difference.

次に、音発生通知装置１０は、表示制御部２６により、発話された音声の内容、感情などに基づいて仮想体を生成し（ステップＳ７）、この仮想体を表示部１２に重畳して表示する（ステップＳ８）。図３に示すように、グループミーティングにおいて、例えば、メンバＰ２が平常に発話し、メンバＰ４が楽しげに発話している場面では、メンバＰ２の感情が平穏であり、メンバＰ４の感情が幸福であると解析される。このため、表示制御部２６は、メンバＰ２の頭上に平穏を示す仮想体１００を重ねて表示し、メンバＰ４の頭上に幸福を示す仮想体１０１を重ねて表示する。この時、表示制御部２６は、さらに、音声の到来方向を示す情報１１０、１１１をそれぞれ仮想体１００、１０１のそばに表示する。この図３の例では、メンバＰ２の仮想体１００のそばに「１１時方向」を示す情報１１０を表示し、メンバＰ４の仮想体１０１のそばに「１時方向」を示す情報１１１を表示する。なお、音声の到来方向を示す情報と仮想体とを別々に表示せず、音声の到来方向を示す情報１１０を仮想体１００に含めて表示してもよい。同様に音声の到来方向を示す情報１１１を仮想体１０１に含めて表示してもよい。 Next, the sound generation notification device 10 causes the display control unit 26 to generate a virtual object based on the content of the uttered voice, emotion, etc. (step S7), and displays this virtual object in a superimposed manner on the display unit 12. (Step S8). As shown in Figure 3, in a group meeting, for example, in a scene where member P2 is speaking normally and member P4 is speaking happily, member P2's emotions are calm and member P4's emotions are happy. It is analyzed as Therefore, the display control unit 26 displays the virtual object 100 representing peace over the head of the member P2, and displays the virtual object 101 representing happiness over the head of the member P4. At this time, the display control unit 26 further displays information 110 and 111 indicating the direction of arrival of the audio near the virtual objects 100 and 101, respectively. In the example of FIG. 3, information 110 indicating "11 o'clock direction" is displayed near the virtual object 100 of member P2, and information 111 indicating "1 o'clock direction" is displayed near the virtual object 101 of member P4. . Note that the information indicating the direction of arrival of the audio and the virtual object may not be displayed separately, but the information 110 indicating the direction of arrival of the audio may be included in the virtual object 100 and displayed. Similarly, information 111 indicating the direction of arrival of audio may be included in the virtual object 101 and displayed.

この構成では、表示された仮想体１００、１０１を頼りにすることで、どこで誰が発話しているかを認識できるため、音源定位ができるようになる。さらに、仮想体１００、１０１は、発話者の感情も合わせて表示しているため、どのような人物が話しているのか、人物の特徴・個性・感情などが直感的に理解できるようになる。さらに、例えば、視界にない方向から突然話しかけられた場合であっても、話しかけてきた人物の特徴・個性・感情などが直感的にわかるため、例えば危険の有無を瞬時に察知できる。 With this configuration, by relying on the displayed virtual bodies 100 and 101, it is possible to recognize who is speaking and where, and thus it becomes possible to localize the sound source. Furthermore, since the virtual bodies 100 and 101 also display the emotions of the speaker, it becomes possible to intuitively understand what kind of person is speaking and the person's characteristics, personality, emotions, etc. Furthermore, even if someone suddenly speaks to you from a direction out of your field of vision, you can intuitively understand the characteristics, personality, and emotions of the person who is speaking to you, so you can instantly sense whether or not there is danger, for example.

音発生通知装置１０は、ステップＳ１～Ｓ８までの処理を繰り返し実行し、ユーザが例えば、停止スイッチを操作した場合には、処理を終了する。 The sound generation notification device 10 repeatedly executes the processing from steps S1 to S8, and ends the processing when the user operates a stop switch, for example.

続いて、音発生通知装置１０の危険・注意報知モードの動作について説明する。図４は、音発生通知装置１０の危険・注意報知モードの動作の手順を示すフローチャートである。図５は、危険・注意報知モードが実行される市街地で表示される仮想体の一例を示す図である。この図５の例では、ユーザは、市街地を歩行により移動しているものとする。なお、図４のフローチャートにおいて、図２のフローチャートと重複するステップは、同一の符号を付して説明を省略する。 Next, the operation of the sound generation notification device 10 in the danger/warning notification mode will be explained. FIG. 4 is a flowchart showing the operation procedure of the sound generation notification device 10 in the danger/warning notification mode. FIG. 5 is a diagram showing an example of a virtual object displayed in a city area where the danger/warning notification mode is executed. In the example of FIG. 5, it is assumed that the user is moving around the city by walking. In addition, in the flowchart of FIG. 4, steps that overlap with those of the flowchart of FIG. 2 are given the same reference numerals, and the description thereof will be omitted.

音発生通知装置１０は、音声内容認識部２３により、音源分離された各音響信号に対する音声認識処理を行う（ステップＳ３）と、音発生通知装置１０は、音声認識された音声が、危険性が高いまたは注意を促す音声であるか否かを判別する（ステップＳ４ａ）。ここで、危険性の高い音や注意を促す音とは、例えば自動車のクラクション音、自転車のベルやブレーキ音、各種警報音、人の叫び声、大雨、強風、雷などの音を含む。音発生通知装置１０は、音声認識された音声が、危険性が高いまたは注意を促す音声ではない（ステップＳ４ａ；Ｎｏ）場合には、処理をステップＳ１に戻し、音声認識された音声が、危険性が高いまたは注意を促す音声である（ステップＳ４ａ；Ｙｅｓ）場合には、処理をステップＳ５ａに移動する。 When the sound generation notification device 10 performs voice recognition processing on each sound signal whose sound source has been separated by the voice content recognition unit 23 (step S3), the sound generation notification device 10 determines whether the recognized voice is dangerous or not. It is determined whether the sound is high pitched or calls for attention (step S4a). Here, the highly dangerous sounds and the sounds that call for attention include, for example, car horn sounds, bicycle bell and brake sounds, various alarm sounds, human screams, heavy rain, strong winds, thunder, and other sounds. If the recognized voice is not a voice that is highly dangerous or calls for attention (step S4a; No), the sound generation notification device 10 returns the process to step S1, and the recognized voice is not a voice that is dangerous or calls for attention. If the sound is highly sensitive or calls for attention (step S4a; Yes), the process moves to step S5a.

次に、音発生通知装置１０は、音声内容認識部２３により、音声の種別を解析する（ステップＳ５ａ）。より詳しくは、音発生通知装置１０は、音声認識された音声を種別（自動車のクラクション音、自転車のベルやブレーキ音、各種警報音、人の叫び声、雨音、風音、雷音）を解析し、種別ごとに区分けする。 Next, the sound generation notification device 10 analyzes the type of sound using the sound content recognition unit 23 (step S5a). More specifically, the sound generation notification device 10 analyzes the type of the recognized voice (car horn sound, bicycle bell and brake sound, various alarm sounds, human screams, rain sound, wind sound, thunder sound). and separate them by type.

次に、音発生通知装置１０は、表示制御部２６により、音声の種別に基づいて仮想体を生成し（ステップＳ７）、この仮想体を表示部１２に重畳して表示する（ステップＳ８）。図５に示すように、市街地を徒歩で移動中に、例えば、自転車が直接の視認が困難な前方の住宅間から接近してベルを鳴らした場面では、音声は、警音の種別となるため、表示制御部２６は、路上に自転車を示す仮想体１０２を表示部１２に重ねて表示するとともに、この仮想体１０２のそばに、音声の到来方向「２時方向」を示す情報１１２を表示する。これにより、ユーザは、２時方向、すなわち右前方から自転車が接近していることを理解することができ、この自転車に道を譲ることで自転車との接触を防止できる。また、音源の種類の特定や音源定位ができるようになるだけでなく、音が聞こえない場合でも即座に危険を察知することができるようになる。なお、図５の例では、自転車を示す仮想体１０２を表示する構成としたが、例えば、近くの人が「危ない」とか「逃げて」といった言葉を叫んでいる場面では、音声認識したテキストデータ「危ない」、「逃げて」をそのまま仮想体として、表示部１２に到来方向の情報とともに表示することが好ましい。なお、危険・注意報知モードの動作については、高度～重度難聴者だけでなく、例えば、健聴者がイヤホンで音楽を聴いている時などにも使用することができる。 Next, the sound generation notification device 10 uses the display control unit 26 to generate a virtual object based on the type of sound (step S7), and displays this virtual object in a superimposed manner on the display unit 12 (step S8). As shown in Figure 5, while walking in a city area, for example, if a bicycle approaches from between houses in front that are difficult to see directly and rings a bell, the sound will be of the type alarm. , the display control unit 26 displays a virtual object 102 indicating a bicycle on the road superimposed on the display unit 12, and displays information 112 indicating the arrival direction of the voice "2 o'clock direction" beside the virtual object 102. . As a result, the user can understand that a bicycle is approaching from the 2 o'clock direction, that is, from the right front, and can prevent contact with the bicycle by yielding to the bicycle. Furthermore, not only will it be possible to identify the type of sound source and localize the sound source, but it will also be possible to immediately detect danger even when no sound can be heard. In the example of FIG. 5, the virtual object 102 representing a bicycle is displayed, but for example, in a scene where a nearby person is shouting words such as "danger" or "run away", voice-recognized text data is displayed. It is preferable to display "danger" and "run away" as virtual entities on the display unit 12 together with information on the direction of arrival. Note that the operation of the danger/warning mode can be used not only by people with severe to severe hearing loss, but also by people with normal hearing when listening to music with earphones, for example.

以上、本実施形態にかかる音発生通知装置１０は、収音された音声を取得する音声取得部２１と、音声の到来方向を識別する音声方向識別部２５と、音声の内容を認識する音声内容認識部２３と、画像を表示する表示部１２と、音声の到来方向と内容とに基づいて生成された仮想体１００～１０２の画像を表示部１２に表示させる表示制御部２６と、を備えており、ユーザは、仮想体１００～１０２を通じて、音声の到来方向と内容とを認識することができるため、簡単に音源定位ができるようになり、音の発生を適切に通知することができる。 As described above, the sound generation notification device 10 according to the present embodiment includes the audio acquisition unit 21 that acquires collected audio, the audio direction identification unit 25 that identifies the direction of arrival of the audio, and the audio content that recognizes the content of the audio. A recognition unit 23, a display unit 12 that displays an image, and a display control unit 26 that causes the display unit 12 to display images of the virtual objects 100 to 102 generated based on the arrival direction and content of the audio. Since the user can recognize the direction and content of the sound through the virtual bodies 100 to 102, the user can easily localize the sound source and appropriately notify the occurrence of the sound.

また、本実施形態にかかる音発生通知装置１０において、音声に含まれる発話者の感情を解析する感情解析部２４を備え、表示制御部２６は、音声の到来方向と、該音声の内容、大きさおよび感情とに基づいて生成された仮想体１００～１０１の画像を表示部１２に表示させるため、例えば、どのような人物が話しているのか、人物の特徴・個性・感情などが直感的に理解できるようになる。 Furthermore, the sound generation notification device 10 according to the present embodiment includes an emotion analysis unit 24 that analyzes the speaker's emotion contained in the voice, and a display control unit 26 that can detect the arrival direction of the voice, the content of the voice, and the loudness. In order to display on the display unit 12 images of the virtual bodies 100 to 101 that are generated based on the person's character and emotion, it is possible to intuitively understand, for example, what kind of person is speaking and the person's characteristics, personality, and emotion. Be able to understand.

また、本実施形態にかかる音発生通知装置１０において、ユーザの聴力レベルを判定する聴力レベル判定部２８を備え、表示制御部２６は、判定された聴力レベルに応じて、表示部１２に表示される仮想体１００～１０２の示す音声の種別を変更するため、例えば、「高度難聴」および「重度難聴」のユーザに対して、危険性の高い音や注意を促す音が発生した場合であっても、この音声の種別に対応した仮想体を表示させる。これにより、ユーザは、音源の種類の特定や音源定位ができるようになるだけでなく、音が聞こえない場合でも即座に危険を察知することができるようになる。 In addition, the sound generation notification device 10 according to the present embodiment includes a hearing level determining unit 28 that determines the hearing level of the user, and the display control unit 26 controls the display unit 12 to display the information displayed on the display unit 12 according to the determined hearing level. In order to change the type of sound indicated by the virtual objects 100 to 102, for example, when a highly dangerous sound or a sound that calls for attention is generated for a user with "severe hearing loss" or "severe hearing loss", Also, a virtual object corresponding to this audio type is displayed. This allows the user to not only identify the type of sound source and localize the sound source, but also to immediately sense danger even when no sound is heard.

また、本実施形態にかかる音発生通知装置１０において、表示部１２は、人の眼前に配置されて透過した視界に重なる画像を表示するグラスモニタであるため、例えば、音発生通知装置１０を装着するだけで、周囲で音が発生したことを簡単にユーザに通知することができる。 In addition, in the sound generation notification device 10 according to the present embodiment, the display unit 12 is a glass monitor that is placed in front of the person's eyes and displays an image that overlaps the transmitted field of view. You can easily notify the user that a sound has occurred in the surrounding area.

図示した音発生通知装置１０の各構成要素は、機能概念的なものであり、必ずしも物理的に図示の如く構成されていなくてもよい。すなわち、各装置の具体的形態は、図示のものに限られず、各装置の処理負担や使用状況などに応じて、その全部又は一部を任意の単位で機能的又は物理的に分散又は統合してもよい。 Each component of the illustrated sound generation notification device 10 is functionally conceptual, and does not necessarily have to be physically configured as illustrated. In other words, the specific form of each device is not limited to what is shown in the diagram, and all or part of it may be functionally or physically distributed or integrated into arbitrary units depending on the processing load and usage status of each device. It's okay.

音発生通知装置１０の通知制御装置２０の構成は、例えば、ソフトウェアとして、メモリにロードされたプログラムなどによって実現される。上記実施形態では、これらのハードウェア又はソフトウェアの連携によって実現される機能ブロックとして説明した。すなわち、これらの機能ブロックについては、ハードウェアのみ、ソフトウェアのみ、又は、それらの組み合わせによって種々の形で実現できる。 The configuration of the notification control device 20 of the sound generation notification device 10 is realized by, for example, a program loaded into a memory as software. The above embodiments have been described as functional blocks realized by cooperation of these hardware or software. That is, these functional blocks can be implemented in various forms using only hardware, only software, or a combination thereof.

本開示は、ＳＤＧｓ（Sustainable Development Goals：持続可能な開発目標）の「すべての人に健康と福祉を」の実現に貢献し、ヘルスケア製品・サービスによる価値創出に寄与する事項を含む。 This disclosure includes matters that contribute to the realization of "health and well-being for all" in the Sustainable Development Goals (SDGs) and contribute to value creation through healthcare products and services.

１０音発生通知装置
１１収音部
１２表示部
２０通知制御装置
２１音声取得部
２２音源分離部
２３音声内容認識部
２４感情解析部
２５音声方向識別部
２６表示制御部
２８聴力レベル判定部
１００、１０１、１０２仮想体
１１０、１１１、１１２音声の到来方向を示す情報 10 Sound generation notification device 11 Sound collection unit 12 Display unit 20 Notification control device 21 Audio acquisition unit 22 Sound source separation unit 23 Audio content recognition unit 24 Emotion analysis unit 25 Audio direction identification unit 26 Display control unit 28 Hearing level determination unit 100, 101 , 102 Virtual object 110, 111, 112 Information indicating the direction of arrival of audio

Claims

an audio acquisition unit that acquires the collected audio;
a voice direction identification unit that identifies the direction of arrival of the voice;
a voice content recognition unit that recognizes the content of the voice;
a display section that displays an image;
A sound generation notification device comprising: a display control unit that causes the display unit to display an image of a virtual object generated based on the arrival direction and content of the sound.

comprising an emotion analysis unit that analyzes the speaker's emotion contained in the voice,
The sound generation notification according to claim 1, wherein the display control unit causes the display unit to display an image of the virtual object generated based on the arrival direction of the sound and the content, size, and emotion of the sound. Device.

comprising a hearing level determination unit that determines the user's hearing level;
The sound generation notification device according to claim 1 or 2, wherein the display control unit changes the type of the sound indicated by the virtual object displayed on the display unit, depending on the determined hearing level.

The sound generation notification device according to claim 1 or 2, wherein the display unit is a glass monitor that is placed in front of the person's eyes and displays an image that overlaps the transmitted field of view.

an audio acquisition step of acquiring the collected audio;
a voice direction identification step of identifying the direction of arrival of the voice;
a voice content analysis step of analyzing the content of the voice;
A sound generation notification method comprising: a display control step of displaying on a display unit an image of a virtual object generated based on the arrival direction and content of the sound.