JP2020091381A

JP2020091381A - Electronic apparatus, control method for electronic apparatus, and control program for electronic apparatus

Info

Publication number: JP2020091381A
Application number: JP2018227804A
Authority: JP
Inventors: 平野　孝; Takashi Hirano; 孝平野
Original assignee: Onkyo Corp
Current assignee: Onkyo Corp
Priority date: 2018-12-05
Filing date: 2018-12-05
Publication date: 2020-06-11

Abstract

To allow a listener to feel synthesized voice as if the voice were spoken by a person himself or herself.SOLUTION: A speaker system 1 comprises SoC 2. The SoC 2 generates synthesized voice from a text with voice of a predetermined person through voice synthesis. Further, the SoC 2 inserts a keyword, characteristic of the predetermined person, into the voice to be spoken through the voice synthesis from the text. Furthermore, the SoC 2 after inserting the keyword characteristic of the predetermined person into the voice to be spoken from the text through the voice synthesis does not insert the keyword characteristic of the predetermined person into the voice spoken through the speech synthesis from the text for a predetermined period.SELECTED DRAWING: Figure 2

Description

本発明は、音声の読み上げを行う電子機器、電子機器の制御方法、及び、電子機器の制御プログラムに関する。 The present invention relates to an electronic device that reads out a voice, a control method for the electronic device, and a control program for the electronic device.

音声の読み上げを行う電子機器は、読み上げ対象のテキストを音声に変換し（TTS: Text To Speech）、変換した音声を出力する。特許文献１には、読み上げ対象文書の属するカテゴリを判別し、読み上げ対象文書に対し、判別結果のカテゴリに対応する音声読み上げ設定を行い、読み上げ対象文書に対応する読み上げ対象文書データおよび音声読み上げ設定に基づいて音声読み上げを行う発明が開示されている。例えば、読み上げ対象文書のカテゴリが、ニュースであれば、アナウンサーの声で、読み上げ対象文書の読み上げが行われる。 An electronic device that reads a voice converts the text to be read into a voice (TTS: Text To Speech) and outputs the converted voice. In Patent Document 1, the category to which the reading target document belongs is discriminated, and the reading target document is set to read aloud corresponding to the category of the discrimination result, and the reading target document data and the voice reading setting corresponding to the reading target document are set. An invention for reading aloud on the basis of voice reading is disclosed. For example, if the category of the reading target document is news, the voice of the announcer reads the reading target document.

例えば、芸能人、スポーツ選手等の有名人の音声をサンプリングし、特徴量をデータ化した音声合成を使用する場合、入力するテキストから合成される発話音声が、本人にかなり近い音声として再現できたとしても、実際は、同様の声色を持っている人は少なく、本人が話しているとは感じられないことがある。一方、話し方の癖等は、例えば、ニュース原稿等のテキストには含まれないため、ますます聞き分けることが難しいこととなる。 For example, when using the voice synthesis that samples the voices of celebrities such as entertainers and athletes and converts the feature amount into data, even if the uttered voice synthesized from the input text can be reproduced as a voice quite close to the person himself. Actually, there are few people who have similar voices, and it may not seem that the person is talking. On the other hand, the habit of speaking is not included in the text of news manuscripts, for example, so it becomes more difficult to distinguish.

特開２００３−０４４０７２号公報JP, 2003-044072, A

上述したように、音声の特徴量をデータ化した音声合成を使用した場合でも、本人が話しているように感じられないという問題がある。 As described above, there is a problem in that the person does not feel as if he/she is speaking even when using the voice synthesis in which the voice feature amount is converted into data.

本発明の目的は、合成音声であっても、本人が話しているように感じられるようにすることである。 It is an object of the present invention to make a person feel as if he or she is speaking, even with synthetic speech.

第１の発明の電子機器は、所定の人物の声で、テキストから音声合成して発話する電子機器であって、テキストから音声合成して発話する音声に、所定の人物に特有のキーワードを挿入することを特徴とする。 An electronic device according to a first aspect of the invention is an electronic device that synthesizes a voice from a text and speaks with a voice of a predetermined person, and inserts a keyword specific to the predetermined person into a voice that is synthesized from the text and speaks. It is characterized by doing.

本発明では、テキストから音声合成して発話される音声に、所定の人物に特有のキーワードが挿入される。例えば、所定の人物に特有のキーワードは、その人物がよく発するワード（文言）である。このため、ユーザーは、所定の人物に特有のキーワードを聞くことにより、話し手が所定の人物であることを認知しやすくなる。このように、本発明によれば、合成音声であっても、ユーザーは、本人が話しているように感じることができる。 According to the present invention, a keyword peculiar to a predetermined person is inserted into a voice uttered by synthesizing a voice from a text. For example, a keyword peculiar to a given person is a word (word) that the person often speaks. Therefore, the user can easily recognize that the speaker is the predetermined person by listening to the keyword specific to the predetermined person. Thus, according to the present invention, the user can feel as if he/she is speaking, even with synthetic speech.

第２の発明の電子機器は、第１の発明の電子機器において、前記キーワードは、テキストに基づく音声に挿入可能な位置を示す属性を有し、前記属性に基づいて、テキストから音声合成して発話する音声に、前記キーワードを挿入することを特徴とする。 An electronic device according to a second aspect of the present invention is the electronic device according to the first aspect, wherein the keyword has an attribute indicating a position that can be inserted into a voice based on the text, and the voice is synthesized from the text based on the attribute. It is characterized in that the keyword is inserted into a voice to be spoken.

第３の発明の電子機器は、第１又は第２の発明の電子機器において、テキストから音声合成して発話する音声に、前記キーワードを挿入した後、所定の期間、テキストから音声合成して発話する音声に、前記キーワードを挿入しないことを特徴とする。 An electronic apparatus according to a third aspect of the present invention is the electronic apparatus according to the first or second aspect of the present invention, wherein the keyword is inserted into a voice uttered by synthesizing voice from a text and then uttered by synthesizing voice from the text for a predetermined period. It is characterized in that the above-mentioned keyword is not inserted in the voice to be executed.

本発明では、テキストから音声合成して発話する音声に、所定の人物に特有のキーワードが挿入された後、所定の期間、テキストから音声合成して発話される音声に、所定の人物に特有のキーワードが挿入されない。これにより、所定の人物に特有のキーワードが挿入されるタイミングが会話一文毎に毎回発生されないため、ユーザーが、うっとうしくなることがない。 According to the present invention, after a keyword peculiar to a predetermined person is inserted into a voice that is synthesized by synthesizing a voice from a text, a voice that is uttered by synthesizing a voice from the text is peculiar to a predetermined person after a keyword is inserted. Keywords are not inserted. As a result, the timing at which a keyword peculiar to a predetermined person is inserted is not generated for each conversation, so that the user is not annoyed.

第４の発明の電子機器は、第１〜第３のいずれかの発明の電子機器において、発話する所定の人物の選択を受け付け、発話する所定の人物の選択を受け付けた場合、選択を受け付けた所定の人物の声で、発話することを特徴とする。 An electronic device according to a fourth invention, in the electronic device according to any one of the first to third inventions, accepts a selection of a predetermined person who speaks, and when a selection of a predetermined person who speaks is accepted, the selection is accepted. Characterized by speaking in a predetermined person's voice.

第５の発明の電子機器は、第１〜第４のいずれかの発明の電子機器において、発話する所定の人物の変更を受け付けた場合、変更を受け付けた所定の人物の声で、発話し、テキストから音声合成して発話する音声に、変更を受け付けた所定の人物に特有のキーワードを挿入することを特徴とする。 The electronic device of the fifth invention is, in the electronic device of any one of the first to fourth inventions, when a change of a predetermined person who speaks is accepted, a voice of a predetermined person who accepts the change is spoken, It is characterized in that a keyword peculiar to a predetermined person who has accepted a change is inserted into a voice uttered by synthesizing voice from a text.

第６の発明の電子機器は、第１〜第５のいずれかの発明の電子機器において、所定の人物に対して、複数の前記キーワードが対応付けられている場合、テキストから音声合成して発話する音声に、前記キーワードを挿入した後、テキストから音声合成して発話する音声に、所定の人物に特有の別のキーワードを挿入することを特徴とする。 An electronic device according to a sixth aspect of the invention is the electronic device according to any one of the first to fifth aspects, wherein when a plurality of the keywords are associated with a predetermined person, speech is synthesized from text and uttered. After inserting the keyword into the voice to be performed, another keyword peculiar to a predetermined person is inserted into the voice uttered by synthesizing the voice from the text.

第７の発明の電子機器の制御方法は、所定の人物の声で、テキストから音声合成して発話する電子機器の制御方法であって、テキストから音声合成して発話する音声に、所定の人物に特有のキーワードを挿入することを特徴とする。 A control method for an electronic device according to a seventh aspect of the present invention is a control method for an electronic device in which voice is synthesized from a text and uttered by a voice of a predetermined person. It is characterized by inserting a keyword peculiar to.

第８の発明の電子機器の制御プログラムは、所定の人物の声で、テキストから音声合成して発話する電子機器の制御プログラムであって、前記電子機器に、テキストから音声合成して発話する音声に、所定の人物に特有のキーワードを挿入させる。 An electronic device control program according to an eighth aspect of the present invention is a control program for an electronic device that synthesizes a voice from a text and utters a voice of a predetermined person. , A keyword peculiar to a predetermined person is inserted.

本発明によれば、合成音声であっても、ユーザーは、本人が話しているように感じることができる。 According to the present invention, the user can feel as if he/she is speaking, even with synthetic speech.

本発明の実施形態に係るスピーカー装置の構成を示すブロック図である。It is a block diagram showing composition of a speaker device concerning an embodiment of the present invention. 音声の読み上げを行う場合のスピーカー装置の処理動作を説明するための図である。It is a figure for demonstrating the processing operation of the speaker apparatus at the time of reading a voice. ユーザーからの発話に応じて、音声応答（例えば、ニュースの読み上げ）を行う場合のスピーカー装置の処理動作を示すフローチャートである。It is a flow chart which shows processing operation of a speaker device at the time of performing a voice response (for example, reading aloud news) according to an utterance from a user.

以下、本発明の実施形態について説明する。図１は、本発明の実施形態に係るスピーカー装置の構成を示すブロック図である。図１に示すように、スピーカー装置１（電子機器）は、ＳｏＣ（System on Chip）２、無線モジュール３、増幅部４、スピーカー５、マイク６等を備える。 Hereinafter, embodiments of the present invention will be described. FIG. 1 is a block diagram showing a configuration of a speaker device according to an embodiment of the present invention. As shown in FIG. 1, the speaker device 1 (electronic device) includes a SoC (System on Chip) 2, a wireless module 3, an amplification unit 4, a speaker 5, a microphone 6, and the like.

ＳｏＣ２（制御部）は、ＣＰＵ（Central Processing Unit）、ＤＳＰ（Digital Signal Processor）、メモリ等を有し、スピーカー装置１を構成する各部を制御する。無線モジュール３は、Ｗｉ−Ｆｉ規格に従った無線通信を行うためのものである。ＳｏＣ２は、無線モジュール３を介して、クラウドサーバー１０１と通信を行う。増幅部４は、ＳｏＣ２から出力される音声信号を増幅し、増幅した音声信号をスピーカー５に出力する。スピーカー５は、音声信号に基づいて、音声を外部に出力する。すなわち、ＳｏＣ２は、増幅部４に音声信号を出力することで、スピーカー５から、音声を出力する。 The SoC 2 (control unit) has a CPU (Central Processing Unit), a DSP (Digital Signal Processor), a memory, and the like, and controls each unit included in the speaker device 1. The wireless module 3 is for performing wireless communication according to the Wi-Fi standard. The SoC 2 communicates with the cloud server 101 via the wireless module 3. The amplification unit 4 amplifies the audio signal output from the SoC 2 and outputs the amplified audio signal to the speaker 5. The speaker 5 outputs sound to the outside based on the sound signal. That is, the SoC 2 outputs a voice signal to the amplification unit 4 to output a voice from the speaker 5.

マイク６は、周囲の音声を集音する。マイク６により集音された音声信号は、ＳｏＣ２に出力される。ＳｏＣ２は、マイク６により集音された音声信号から、トリガーワードの認識を行う。例えば、トリガーワードは、「ハロー、オンキヨー」等である。また、ＳｏＣ２は、トリガーワード以降の音声信号を、無線モジュール３により、クラウドサーバー１０１に送信する。例えば、ユーザーが、「ハロー、オンキヨー。ニュース教えて。」と発話したとする。この場合、ＳｏＣ２は、「ニュース教えて」をクラウドサーバー１０１に送信する。 The microphone 6 collects ambient sound. The audio signal collected by the microphone 6 is output to the SoC 2. The SoC 2 recognizes the trigger word from the voice signal collected by the microphone 6. For example, the trigger word is "Hello, Onkyo" or the like. Further, the SoC 2 transmits the audio signal after the trigger word to the cloud server 101 by the wireless module 3. For example, suppose a user utters "Hello, Onkyo. Tell me the news." In this case, the SoC 2 transmits “tell me news” to the cloud server 101.

クラウドサーバー１０１は、「ニュース教えて」を受信し、音声認識する。クラウドサーバー１０１は、音声認識結果に基づいて、「ニュース」の情報をスピーカー装置１に送信する。 The cloud server 101 receives "tell me news" and recognizes it by voice. The cloud server 101 transmits “news” information to the speaker device 1 based on the voice recognition result.

以下、音声の読み上げを行う場合のスピーカー装置１の処理動作を説明する。図２に示すように、ＳｏＣ２は、無線モジュール３により、クラウドサーバー１０１から「ニュース」の情報を受信する。次に、ＳｏＣ２は、受信した「ニュース」の情報をテキスト化する。次に、ＳｏＣ２は、所定の人物の声で、テキストの音声合成を行う。すなわち、ＳｏＣ２は、元のテキストを音声データとして、ＷＡＶ等の合成音声に変換する。ユーザーは、例えば、声で、読み上げを行う人物（声）を選択することができる。 Hereinafter, a processing operation of the speaker device 1 when reading a voice will be described. As shown in FIG. 2, the SoC 2 receives the “news” information from the cloud server 101 by the wireless module 3. Next, the SoC 2 converts the received “news” information into text. Next, the SoC 2 performs voice synthesis of the text with the voice of a predetermined person. That is, the SoC 2 converts the original text as voice data into synthetic voice such as WAV. The user can select a person (voice) to read aloud by voice, for example.

「所定の人物」は、例えば、芸能人、スポーツ選手、歌手、ＤＪ（ディスクジョッキー）、声優、アナウンサー等である。これらの人物は、特有のキーワード（以下、「キャラワード」という。）を有している。キャラワードは、例えば、その人物がよく発するワード（文言）である。例えば、ギャグ（「あいーん」、「閉店ガラガラ」等）、決め台詞（「元気ですかー」等）、自身の番組名（「ベストヒットＵＳＡ」等）、口癖等である。キャラワードは、精度と特徴の品質とを確保するため、音声データとしてあらかじめサンプリングして準備し、それぞれどのような箇所に挿入可能かの属性を持たせる。 The “predetermined person” is, for example, an entertainer, an athlete, a singer, a DJ (disc jockey), a voice actor, an announcer, or the like. These persons have unique keywords (hereinafter referred to as “character words”). The character word is, for example, a word (word) that the person often speaks. For example, it is a gag ("Ain", "closed rattle", etc.), a definite dialogue ("Is it fine?", etc.), a program name of oneself ("Best Hit USA", etc.), a habit, etc. In order to ensure accuracy and quality of characteristics, the character word is sampled and prepared in advance as voice data, and each character has an attribute indicating in which part it can be inserted.

キャラワードは、
・発話開始時
・発話終了時
・その他、効果的と考えられるテキストの箇所
に挿入される。このため、キャラワードは、「発話開始時」、「発話終了時」等の属性を有する。キャラワードは、キャラワードＤＢに格納されている。 The character word is
-The text is inserted at the beginning of the utterance, at the end of the utterance, and in other parts of the text that are considered effective. Therefore, the character word has attributes such as "at the start of speech" and "at the end of speech". The character word is stored in the character word DB.

また、ＳｏＣ２は、テキストから、キャラワードの挿入箇所を分析する。次に、ＳｏＣ２は、分析結果と、キャラワードの属性と、に基づいて、合成音声とキャラワードとを連結する。言い換えれば、ＳｏＣ２は、テキストが変換されたＷＡＶファイルに、キャラワードをＷＡＶファイルとして連結挿入する。そして、ＳｏＣ２は、ＷＡＶファイルに基づいて、音声出力する。 In addition, the SoC 2 analyzes the insertion position of the character word from the text. Next, the SoC 2 connects the synthetic voice and the character word based on the analysis result and the character word attribute. In other words, the SoC 2 concatenates and inserts the character word as a WAV file into the WAV file in which the text has been converted. Then, the SoC 2 outputs audio based on the WAV file.

ＳｏＣ２は、声による、発話する所定の人物の選択を受け付ける。また、ＳｏＣ２は、発話する所定の人物の選択を受け付けた場合、選択を受け付けた所定の人物の声で、発話する。例えば、ユーザーが、「声をＢさんに変えて」と発話した場合、その後、ＳｏＣ２は、「Ｂさん」の声で、テキストから音声合成して発話する。 The SoC 2 accepts a voice selection of a predetermined person who speaks. Further, when the SoC 2 receives the selection of the predetermined person who speaks, the SoC 2 speaks with the voice of the predetermined person who receives the selection. For example, when the user utters "Change voice to Mr. B", the SoC2 then utters "Mr. B" by synthesizing voice from the text.

本実施形態では、キャラワードが、テキストから音声合成して発話される音声に適時挿入されることによって、ユーザーが、話し手が所定の人物であることを認識しやすくさせている。しかしながら、キャラワードを挿入するタイミングが会話一文毎に毎回発生されると、ユーザーにとって、非常にうっとうしいこととなる。 In the present embodiment, the character word is timely inserted into the voice uttered by synthesizing the voice from the text, so that the user can easily recognize that the speaker is a predetermined person. However, if the timing of inserting the character word is generated for each sentence of the conversation, it becomes very annoying to the user.

そこで、本実施形態では、キャラワードの付加を、条件に合致したときに制限することで、上記問題を解決する。付加条件は、以下のとおりである。
（１）キャラワードを付加して発生して以降、ある一定期間内は、キャラワードを付加しない。一日の最初の会話の時のみ、キャラワードを付加し、それ以降は、同日であれば、キャラワードを付加しない。
（２）同じ日の間でも、発話話者（読み上げを行う人物）が変わったときは、キャラワードを付加して発生してもよい。
（３）同じ日の間でも、一定時間経過したときには、発話話者が同じであっても、他のキャラワードを有しているときには、キャラワードを付加して発生を行ってもよい。 Therefore, in the present embodiment, the above problem is solved by limiting the addition of character words when the conditions are met. The additional conditions are as follows.
(1) The character word is not added within a certain period after the character word is added and generated. The character word is added only during the first conversation of the day, and thereafter, the character word is not added on the same day.
(2) A character word may be added and generated when the speaker (person who reads aloud) changes during the same day.
(3) Even during the same day, when a certain period of time has elapsed, even if the speaker is the same, if the user has another character word, the character word may be added to generate the character.

従って、ＳｏＣ２は、テキストから音声合成して発話する音声に、キャラワードを挿入した後、所定の期間、テキストから音声合成して発話する音声に、キャラワードを挿入しない。 Therefore, the SoC 2 does not insert the character word into the voice synthesized from the text and then uttered after inserting the character word into the voice synthesized from the text and uttered.

また、ＳｏＣ２は、発話する所定の人物の変更を受け付けた場合、変更を受け付けた所定の人物の声で発話する。また、ＳｏＣ２は、テキストから音声合成して発話する音声に、変更を受け付けた所定の人物のキャラワードを挿入する。 Further, when the change of the predetermined person who speaks is accepted, the SoC 2 speaks with the voice of the predetermined person who accepts the change. In addition, the SoC 2 inserts the character word of the predetermined person who has received the change into the voice uttered by synthesizing the voice from the text.

また、ＳｏＣ２は、所定の人物に対して、複数のキャラワードが対応付けられている場合、テキストから音声合成して発話する音声にキャラワードを挿入した後、テキストから音声して発話する音声に、所定の人物の別のキャラワードを挿入する。 In addition, when a plurality of character words are associated with a predetermined person, the SoC 2 inserts the character word into the voice to be uttered by text-to-speech synthesis, and then inserts the character word into the voice to utter the voice from the text. , Insert another character word for a given person.

ユーザーからの発話に応じて、音声応答（例えば、ニュースの読み上げ）を行う場合のスピーカー装置１の処理動作を、図３に示すフローチャートに基づいて説明する。ＳｏＣ２は、マイク６により集音された音声を受信する（Ｓ１）。次に、ＳｏＣ２は、上記した付加条件を満たすか否かを判断する（Ｓ２）。ＳｏＣ２は、付加条件を満たすと判断した場合（Ｓ２：Ｙｅｓ）、応答音声にキャラワードを付加して応答する（Ｓ４）。すなわち、ＳｏＣ２は、例えば、ニュースの読み上げ前に、キャラワードを付加して、ニュースを読み上げる。ＳｏＣ２は、付加条件を満たさないと判断した場合（Ｓ２：Ｎｏ）、応答音声のまま応答する（Ｓ４）。すなわち、ＳｏＣ２は、例えば、ニュースの読み上げ前に、キャラワードを付加せず、ニュースを読み上げる。 The processing operation of the speaker device 1 when performing a voice response (for example, reading aloud news) in response to an utterance from the user will be described based on the flowchart shown in FIG. The SoC2 receives the voice collected by the microphone 6 (S1). Next, the SoC 2 determines whether or not the above-described additional condition is satisfied (S2). When determining that the addition condition is satisfied (S2: Yes), the SoC 2 responds by adding a character word to the response voice (S4). That is, the SoC 2, for example, adds a character word before reading the news and reads the news. When determining that the additional condition is not satisfied (S2: No), the SoC 2 responds with the response voice as it is (S4). That is, for example, the SoC 2 reads out news without adding a character word before reading out news.

以上説明したように、本実施形態では、ＳｏＣ２は、テキストから音声合成して発話する音声に、所定の人物のキャラワードを挿入する。例えば、所定の人物のキャラワードは、その人物がよく発するワード（文言）である。このため、ユーザーは、所定の人物のキャラワードを聞くことにより、話し手が所定の人物であることを認知しやすくなる。このように、本実施形態によれば、合成音声であっても、ユーザーは、本人が話しているように感じることができる。 As described above, in the present embodiment, the SoC 2 inserts the character word of a predetermined person into the voice that is produced by synthesizing voice from the text. For example, the character word of a given person is a word (word) that the person often speaks. Therefore, the user can easily recognize that the speaker is the predetermined person by listening to the character word of the predetermined person. As described above, according to the present embodiment, the user can feel as if he/she is speaking, even with synthetic speech.

また、本実施形態では、ＳｏＣ２は、テキストから音声合成して発話する音声に、所定の人物のキャラワードが挿入した後、所定の期間、テキストから音声合成して発話する音声に、所定の人物のキャラワードを挿入しない。これにより、所定の人物に特有のキーワードが挿入されるタイミングが会話一文毎に毎回発生されないため、ユーザーが、うっとうしくなることがない。 Further, in the present embodiment, the SoC 2 inserts the character word of a predetermined person into the voice that is synthesized by synthesizing the voice from the text, and then inserts the voice of the predetermined person in the voice that is synthesized from the text for a predetermined period. Do not insert the character word of. As a result, the timing at which a keyword peculiar to a predetermined person is inserted is not generated for each conversation, so that the user is not annoyed.

以上、本発明の実施形態について説明したが、本発明を適用可能な形態は、上述の実施形態には限られるものではなく、以下に例示するように、本発明の趣旨を逸脱しない範囲で適宜変更を加えることが可能である。 Although the embodiment of the present invention has been described above, the mode to which the present invention is applicable is not limited to the above-described embodiment, and as illustrated below, may be appropriately performed without departing from the spirit of the present invention. It is possible to make changes.

上述の実施形態では、音声の読み上げを行う電子機器として、スピーカー装置１を例示したが、これに限らず、スマートフォン等の他の電子機器であってもよい。 In the above-described embodiment, the speaker device 1 is illustrated as an electronic device that reads out a voice, but the present invention is not limited to this, and another electronic device such as a smartphone may be used.

本発明は、音声の読み上げを行う電子機器、電子機器の制御方法、及び、電子機器の制御プログラムに好適に採用され得る。 INDUSTRIAL APPLICATION This invention can be suitably employ|adopted for the electronic device which reads a voice, the control method of an electronic device, and the control program of an electronic device.

１スピーカー装置（電子機器）
２ＳｏＣ（制御部）
３無線モジュール
４増幅部
５スピーカー
６マイク 1 Speaker device (electronic device)
2 SoC (control unit)
3 wireless module 4 amplifier 5 speaker 6 microphone

Claims

An electronic device that synthesizes a voice from a text and speaks with a predetermined person's voice,
An electronic device characterized in that a keyword peculiar to a predetermined person is inserted into a voice uttered by synthesizing voice from a text.

The keyword has an attribute indicating a position that can be inserted in a voice based on text,
The electronic device according to claim 1, wherein the keyword is inserted into a voice uttered by synthesizing a voice from a text based on the attribute.

3. The keyword according to claim 1, wherein the keyword is not inserted into a voice synthesized and uttered from the text for a predetermined period after the keyword is inserted into a voice synthesized and uttered from a text. Electronics.

Accept the selection of a predetermined person to speak,
The electronic device according to claim 1, wherein when the selection of a predetermined person who speaks is accepted, the voice of the predetermined person who accepts the selection speaks.

When the change of the predetermined person who speaks is accepted, a keyword peculiar to the predetermined person who accepts the change is inserted into the voice that is uttered by the voice of the predetermined person who accepted the change and synthesized from the text. The electronic device according to claim 4, wherein:

When a plurality of the keywords are associated with a predetermined person, after inserting the keyword into the voice synthesized from the text and uttered, a predetermined voice is synthesized from the text to the uttered voice. The electronic device according to claim 1, wherein another keyword specific to a person is inserted.

A method for controlling an electronic device that synthesizes a voice from text and speaks with a predetermined person's voice,
A method for controlling an electronic device, which comprises inserting a keyword peculiar to a predetermined person into a voice uttered by synthesizing a voice from a text.

A control program for an electronic device that synthesizes a voice from a text and speaks with a predetermined person's voice,
A control program of an electronic device for causing the electronic device to insert a keyword peculiar to a predetermined person into a voice uttered by synthesizing a voice from a text.