JP2002027177A

JP2002027177A - Voice and image processor

Info

Publication number: JP2002027177A
Application number: JP2000208021A
Authority: JP
Inventors: Iwao Nozaki; 岩夫野崎; Yoshiya Marumoto; 喜也丸本
Original assignee: Noritsu Koki Co Ltd
Current assignee: Noritsu Koki Co Ltd
Priority date: 2000-07-10
Filing date: 2000-07-10
Publication date: 2002-01-25
Anticipated expiration: 2020-07-10
Also published as: JP4319334B2

Abstract

PROBLEM TO BE SOLVED: To improve the input method of voice data in the case of requesting generation of an image sheet with voice. SOLUTION: The voice and image processor provided with a code conversion section 40 that converts a voice data into a voice code image coded so as to be optically readable and with a print section that prints out the voice code image and an image based on image data to generate an image sheet with a voice, is provided with a text input processing section 23 that processes received text data and generates a voice code image to reproduce a voice of the text data based on the text data processed by the text input processing section.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、音声データを光学
的に読み取り可能なようにコード化された音声コードイ
メージに変換するコード変換部と、音声付き画像シート
を作成するために前記音声コードイメージと画像データ
に基づく画像イメージをプリントするプリント部を備え
た音声・画像処理装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a code conversion unit for converting audio data into an audio code image coded so as to be optically readable, and the audio code image for producing an image sheet with audio. And a voice / image processing apparatus having a printing unit for printing an image based on image data.

【０００２】[0002]

【従来の技術】近年、マルチメディア時代を迎えて、情
報の伝達を視覚だけに頼るのではなく、聴覚も利用する
ことが積極的に試みられており、音声付き画像シート、
特に音声付き写真もそのような試みの１つであり、例え
ば、日本国特開平６−２３１４６６号公報、及び、日本
国特開平７−１８１６０６号公報では、図や写真、文字
に加えて音声を光学的に読取可能に変換したドットコー
ド（音声コードイメージ）を同一の用紙上に印刷し、こ
のドットコードを読み取る専用のスキャナーにより音声
が聞こえるという、音声付き画像シートを開示してい
る。このような音声付き画像シートは、特に発音を繰り
返し勉強するための語学教材、動物の鳴き声を収録する
写真図鑑、音の出る絵本、あるいは、結婚式、成人式、
七五三などの記念行事を行事に付随する音声とともに記
録する写真として適用されている。2. Description of the Related Art In recent years, in the era of multimedia, it has been actively attempted to utilize not only visual transmission but also audio transmission of information.
In particular, a photograph with sound is one of such attempts. For example, Japanese Patent Application Laid-Open No. Hei 6-231466 and Japanese Patent Application Laid-Open No. 7-181606 disclose sound in addition to figures, photographs, and characters. An image sheet with sound is disclosed in which a dot code (voice code image) that has been optically readable is printed on the same sheet of paper, and sound is heard by a scanner dedicated to reading the dot code. Such image sheets with sound can be used as language teaching materials, especially for studying pronunciation repeatedly, photographic pictorial books containing the sounds of animals, picture books with sounds, weddings, coming-of-age ceremonies,
It is applied as a photo that records a commemorative event such as Shichigosan with audio accompanying the event.

【０００３】[0003]

【発明が解決しようとする課題】また、最近では、適当
な写真に、音声コード化されたメッセージを付与したも
のが、新しいメッセージカードとして注目されてきてい
るが、このようなメッセージカードの作成をＤＰショッ
プ等に依頼する場合、プリントしたい画像を収めた写真
フィルムやデジタルカメラの記録メディアを提出するだ
けでなく、店頭でマイクを通じて音声メッセージを吹き
込む必要がある。これは、メッセージの内容にかかわら
ず、一般の人にとって結構恥ずかしい行為であり、この
ためにメッセージカードの作成を躊躇する人が少なくな
い。店頭での音声メッセージの吹き込みを避けるため、
予め家で音声メッセージを吹き込んだカセットテープや
ＭＤなどを持参してもよいが、確認のために再生するケ
ースが多いし、簡単なメッセージのためにいちいち家で
録音することは煩わしいものである。上記実状に鑑み、
本発明の課題は、音声付き画像シートを作成依頼する際
の音声データの入力方法を改善することである。Further, recently, a message in which a voice-coded message is added to an appropriate photograph has been attracting attention as a new message card. When making a request to a DP shop or the like, it is necessary not only to submit a photographic film containing images to be printed or a recording medium of a digital camera but also to blow a voice message through a microphone at a store. This is a rather embarrassing act for ordinary people, regardless of the content of the message, and for this reason many people hesitate to create a message card. To avoid in-store voice messages,
You may bring a cassette tape or MD into which a voice message has been previously blown at home, but it is often played back for confirmation, and it is troublesome to record at home each time for a simple message. In view of the above situation,
An object of the present invention is to improve a method of inputting audio data when requesting a creation of an image sheet with audio.

【０００４】[0004]

【課題を解決するための手段】上記課題を解決するた
め、音声データを光学的に読み取り可能なようにコード
化された音声コードイメージに変換するコード変換部
と、音声付き画像シートを作成するために前記音声コー
ドイメージと画像データに基づく画像イメージをプリン
トするプリント部を備えた音声・画像処理装置におい
て、本発明では、入力されたテキストデータを処理する
テキスト入力処理部が備えられ、かつ前記テキスト入力
処理部で処理されたテキストデータに基づいてこのテキ
ストデータの音声を再生する音声コードイメージが生成
されることを特徴としている。In order to solve the above-mentioned problems, a code converter for converting audio data into an audio code image coded so as to be optically readable, and an image sheet with audio are created. A voice / image processing apparatus having a printing unit for printing an image image based on the voice code image and image data, the present invention provides a text input processing unit for processing input text data, and A voice code image for reproducing the voice of the text data is generated based on the text data processed by the input processing unit.

【０００５】この構成では、音声付き画像シートを作成
するために必要な音声コードイメージのソースデータと
してテキストデータの形態で入力されたものを用いるの
で、顧客はメッセージ内容を肉声で吹き込む必要がな
い。テキストデータの入力としては、例えば、この音声
・画像処理装置に接続されたキーボードを用いて直接メ
ッセージ内容を打ち込んでもよいし、予めワープロ等を
利用して作成したメッセージをフロッピー（登録商標）
等の記録メディアに記録して、それを店に持ち込んでも
よい。さらには、電子メールを介して店にメッセージ内
容を送ることも可能であり、その際、作成すべき音声付
き画像シートのための画像データを添付ファイルとして
送るなら、音声付き画像シートの注文時には顧客が店に
出向く必要がなくなる。[0005] In this configuration, since the source data of the voice code image necessary for creating the image sheet with voice is used in the form of text data, the customer does not need to blow the message content by voice. As the input of the text data, for example, a message content may be directly input by using a keyboard connected to the voice / image processing apparatus, or a message prepared in advance by using a word processor or the like may be input to a floppy (registered trademark)
Or the like, and may be brought to a store. Furthermore, it is also possible to send the contents of the message to the store via e-mail. At this time, if the image data for the image sheet with sound to be created is sent as an attached file, the customer can order the image sheet with sound. No longer have to go to the store.

【０００６】本発明の好適な実施形態では、入力された
テキストデータに基づいて合成音声データを生成する音
声合成部が備えられ、前記コード変換部が前記音声合成
部で生成された合成音声データを前記音声コードイメー
ジのためのソース音声データとして使用するように構成
されている。In a preferred embodiment of the present invention, there is provided a voice synthesizing section for generating synthetic voice data based on input text data, and the code converting section converts the synthetic voice data generated by the voice synthesizing section. It is configured to be used as source audio data for the audio code image.

【０００７】この構成では、音声付き画像シートを作成
するために必要な音声データとして、最初テキストデー
タの形態で入力されたものから音声合成技術を利用して
合成音声データ化されるものを利用することができるの
で、やはり、顧客はメッセージ内容を肉声で吹き込む必
要がない。[0007] In this configuration, as speech data necessary to create an image sheet with speech, speech data which is converted into synthesized speech data by using speech synthesis technology from text data first input is used. Again, the customer does not need to speak out the message content.

【０００８】音声合成部の一例として、本発明の好適な
実施形態では、テキスト解析用辞書を用いて入力テキス
トデータを解析することでその読みを同定するとともに
さらにアクセントと韻律を設定して得られた音韻系列か
ら合成音声エレメント辞書を用いて合成音声データを生
成するテキスト音声合成部を備えている。この構成で
は、キーボードから入力された仮名漢字混じりテキスト
データや記録メディアに保存されたテキスト文書や電子
メールを通じて送られてきたテキスト文書を読み込むこ
とで入力されたテキストデータに対してテキスト解析用
辞書を用いて読みと文節のアクセントが与えられ、さら
に合成音声エレメント辞書にアクセスしながらの韻律パ
ラメータの編集工程を通じて音声のパワーと基本周波数
を調整することで、ある程度の口調を設定することがで
きる。従って、顧客の要望に応じて、女性口調や男性口
調、あるいは怒り口調や喜び口調を選択して、最終的な
合成音声データを作成することができる。この点に関す
る、より好ましい形態として、前記合成音声エレメント
辞書に格納される合成音声エレメントを個人別で登録さ
れた肉声データに基づいて作製するならば、合成音声を
顧客の肉声に類似した親しみのある音声とすることも可
能となる。In a preferred embodiment of the present invention, as an example of a speech synthesizer, the input text data is analyzed by using a text analysis dictionary to identify its reading and further set by accent and prosody. A text-to-speech synthesis unit that generates synthesized speech data from the synthesized phoneme sequence using a synthesized speech element dictionary. In this configuration, a text analysis dictionary is created for text data entered using a keyboard, reading text data mixed with kana-kanji characters, text documents stored on storage media, or text documents sent via e-mail. A certain tone can be set by adjusting the power and the fundamental frequency of the speech through the process of editing the prosodic parameters while accessing the synthesized speech element dictionary while giving the pronunciation and the phrase accent. Therefore, the final synthesized voice data can be created by selecting a female tone, a male tone, an angry tone, or a joy tone according to the customer's request. In this regard, as a more preferable form, if the synthesized speech element stored in the synthesized speech element dictionary is created based on real voice data registered for each individual, the synthesized voice is familiar and similar to the real voice of the customer. It can also be voice.

【０００９】上述したようなテキスト音声合成部は高度
の技術を必要とし、装置的にも操作的にも大きな負担と
なるものであることから、これに代わる簡易的な音声合
成技術として、本発明の別な実施形態では、入力テキス
トデータを予め登録された語彙やフレーズの肉声データ
を格納している登録音声エレメント辞書を用いて断片的
に順次合成音声データに変換する音声編集合成部を備え
ているものがある。これは、語彙・フレーズの限定され
た肉声の断片から文音声を生成する編集合成と呼ばれる
方式であり、合成音声データの生成は、テキストデータ
の断片を登録音声エレメント辞書を用いて音声データの
断片で置き換えることで行われるので、高速処理可能で
かつ装置コストも抑えることができる。The above-described text-to-speech synthesizing unit requires a high level of technology, and imposes a heavy burden on equipment and operation. In another embodiment, a voice editing / synthesizing unit is provided which sequentially converts input text data into synthesized voice data in a fragmentary manner using a registered voice element dictionary storing real voice data of vocabulary and phrases registered in advance. There is something. This is a method called editing and synthesis that generates sentence speech from fragments of the real voice with limited vocabulary and phrases. Generation of synthetic speech data is performed by registering text data fragments using a registered speech element dictionary. Therefore, high-speed processing can be performed and the apparatus cost can be reduced.

【００１０】このような編集合成によって得られる肉声
に比べて低品質の合成音声データをより親しみをもって
聞くことができるように、本発明では、その登録音声エ
レメント辞書に、個人別で登録された肉声データを格納
することが提案される。つまり、音声付き画像シートの
顧客に対して予め、編集合成のために最低限必要とされ
る音声エレメントを顧客自身の肉声で登録しておく。音
声付き画像シートの注文時には、音声メッセージのソー
スとしてのテキストデータと適当な画像データを提出す
ると、本人の登録音声エレメントを用いた編集合成で合
成音声データが作成されるので、流暢に流れる音声でな
くとも、本人の肉声断片が使われているだけに、親しみ
のある音声として再生されることになる。According to the present invention, in order to be able to listen to synthesized voice data of lower quality than the real voice obtained by such editing and synthesis with more familiarity, the registered real voice registered in the registered voice element dictionary is used. It is proposed to store the data. In other words, the minimum required audio elements for editing and synthesizing are registered in advance by the customer's own voice for the customer of the image sheet with sound. When ordering an image sheet with voice, submit text data and appropriate image data as the source of the voice message, and synthetic voice data will be created by editing and synthesizing using the registered voice element of the person, so the voice that flows fluently At least, since the real voice fragment is used, it will be reproduced as a familiar sound.

【００１１】さらに、本発明の好適実施形態として、音
声合成部が合成音声データの声質を変形させる声質変形
部を備えているならば、世の中に存在しないような音声
データを作り出すことが可能であり、特に遊び感覚での
音声付き画像シートの交換などの目的にかなったものと
なる。このような音質変形は、例えば、音声データの周
波数を線形変換することにより簡単に実施することがで
きる。その際、音質変形のパラメータを顧客別に管理し
ておくと、顧客は独自の音声特徴をもった音声データ変
形パラメータを自分専用として確保することができる。Further, as a preferred embodiment of the present invention, if the voice synthesizing unit includes a voice quality changing unit for changing the voice quality of the synthesized voice data, it is possible to generate voice data that does not exist in the world. This is particularly suitable for the purpose of exchanging image sheets with sound in a play-like manner. Such sound quality modification can be easily implemented by, for example, linearly converting the frequency of audio data. At this time, if the parameters of the sound quality deformation are managed for each customer, the customer can secure the voice data deformation parameter having the unique voice characteristic for himself / herself.

【００１２】キーボードを用いて直接メッセージ内容を
打ち込んだりすることを嫌う顧客に対する方策として、
本発明の好適な実施形態の１つでは、文字認識装置が追
加的に備えられており、この文字認識装置によって出力
されたテキストデータが音声コードイメージ変換に用い
られる。ここで、文字認識装置は用紙に手書きされた文
字をＯＣＲで読み取ってテキストデータ化したり、タッ
チパネル上で所定のペンで書かれた文字を読み取ってテ
キストデータ化する装置の総称であり、この構成によ
り、音声付き画像シートを作成依頼する際の音声データ
の入力方法はさらに簡単になるとともに、その入力形態
も多様化することになる。[0012] As a measure for customers who do not want to directly type message contents using a keyboard,
In one preferred embodiment of the present invention, a character recognition device is additionally provided, and the text data output by the character recognition device is used for voice code image conversion. Here, the character recognition device is a generic name of a device that reads a character handwritten on a sheet by OCR and converts it into text data, or reads a character written with a predetermined pen on a touch panel and converts it into text data. The method of inputting audio data when requesting the creation of an image sheet with audio is further simplified, and the input form is diversified.

【００１３】以上の説明から明らかなように、本発明の
重要な特徴は、テキストデータを音声化することにある
が、本発明で取り扱っているテキストデータは、印刷さ
れたテキストとしての文字や数字・記号の集まり、印刷
物等に対するスキャナによる読取データ、電子化された
テキストとしての文字や数字・記号の集まり、入力デバ
イスを通じて逐次入力されるキャラクターコード群など
に代表されるように、広義の意味合いで解釈されるべき
であり、コンピュータのメディア変換技術において何ら
かの形で文字情報として認識される全てのデータが含ま
れるものである。本発明によるその他の特徴及び利点
は、以下図面を用いた実施例の説明により明らかになる
だろう。As is apparent from the above description, an important feature of the present invention is to convert text data into speech. However, the text data handled in the present invention includes characters and numerals as printed text. -In a broad sense, as represented by a collection of symbols, data read by a scanner for printed matter, etc., a collection of characters, numbers and symbols as digitized text, and a group of character codes sequentially input through an input device. It should include all data that should be construed and recognized in some way as character information in computer media conversion technology. Other features and advantages according to the present invention will become apparent from the following description of embodiments with reference to the drawings.

【００１４】[0014]

【発明の実施の形態】音声付き画像シートを作成するた
めの、本発明による音声・画像処理装置の１つの実施形
態が、図１の外観図及び図２の機能ブロック図によって
示されている。この音声・画像処理装置の中核となるの
が、汎用コンピュータ１であり、図２で示された音声付
き画像シートの作成に要求される種々の機能をハードウ
エアとソフトウエアによって構築している。この音声・
画像処理装置はＤＰショップなどの店頭に配置され、顧
客の依頼による音声付き写真を作成するサービスを行う
タイプのものである。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS One embodiment of an audio / image processing apparatus according to the present invention for producing an image sheet with audio is shown by the external view of FIG. 1 and the functional block diagram of FIG. The core of the audio / image processing apparatus is a general-purpose computer 1, which constructs various functions required for creating the image sheet with audio shown in FIG. 2 by hardware and software. This voice
The image processing apparatus is of a type that is arranged in a store such as a DP shop and performs a service of creating a photograph with sound at the request of a customer.

【００１５】このコンピュータ１には、Ｉ／Ｏインタフ
ェース部１０を介して種々の入力機器と出力機器が接続
している。出力機器としては、最終的に音声付き画像シ
ートとしての音声付き写真２を出力するプリント部とし
て銀塩写真プリンタ３（銀塩写真フィルムのプリントな
どに使用されているものが兼用される）、作業中の画像
の確認等のためのモニタ４や入力された音声データのチ
ェックのためのスピーカ５が挙げられる。入力機器とし
ては、直接的に音声をコンピュータ１に入力するための
マイク６ａやカセットプレーヤ６ｂ、さらにデジタルカ
メラによる撮影画像の取り込みのためのカードリーダ７
ａや銀塩フィルムからの撮影画像の取り込みのためのフ
ィルムスキャナ７ｂが挙げられる。さらに、テキストデ
ータをコンピュータに入力するための機器として、キー
ボード８ａ、手書き又は印刷された文字を読み取るフラ
ットベットスキャナ８ｃ、インターネットを通じて送ら
れてくるテキストデータを受信するための通信機器８ｄ
が挙げられる。Various input devices and output devices are connected to the computer 1 via an I / O interface unit 10. As an output device, a silver halide photographic printer 3 (which is also used for printing a silver halide photographic film or the like) is used as a print unit for finally outputting a photograph 2 with sound as an image sheet with sound. The monitor 4 includes a monitor 4 for confirming an image inside and a speaker 5 for checking input audio data. The input devices include a microphone 6a and a cassette player 6b for directly inputting sound to the computer 1, and a card reader 7 for capturing an image captured by a digital camera.
a and a film scanner 7b for taking in a photographed image from a silver halide film. Further, as a device for inputting text data into a computer, a keyboard 8a, a flatbed scanner 8c for reading handwritten or printed characters, and a communication device 8d for receiving text data transmitted via the Internet
Is mentioned.

【００１６】また、音声データや画像データの入出力の
ためによく用いられているフロッピドライブ８ｅやＭＯ
ドライブ８ｆもコンピュータ１に内蔵されている。キー
ボード８ａは、マウス８ｂとともに図３で示された各機
能に対しコマンドを与えるためにも用いられるし、通信
機器８ｄは当然テキストデータだけでなく、画像データ
も受信することができる。Also, a floppy drive 8e or MO which is often used for inputting / outputting audio data and image data is used.
The drive 8f is also built in the computer 1. The keyboard 8a is used together with the mouse 8b to give commands to the respective functions shown in FIG. 3, and the communication device 8d can receive not only text data but also image data.

【００１７】入力された画像データと音声データを用い
て音声付き写真２を作成するしくみは後で詳しく説明す
るとして、銀塩写真プリンタ３から出力された音声付き
写真２では、図３に示すように写真画像領域２ａの周辺
に配置された音声コードイメージ領域２ｂに対して専用
の読取スキャナ９０で走査すると、この読取スキャナ９
０に内蔵されている音声再生回路の働きで音声コードイ
メージに対応する音声信号が出力され、例えばイヤフォ
ン９１等で音を聞くことができる。The mechanism for creating the photograph 2 with sound using the input image data and sound data will be described later in detail. For the photograph 2 with sound output from the silver halide photographic printer 3, as shown in FIG. When a dedicated reading scanner 90 scans the voice code image area 2b arranged around the photographic image area 2a, the reading scanner 9
The audio signal corresponding to the audio code image is output by the operation of the audio reproducing circuit built in the audio device 0, and the sound can be heard with the earphone 91 or the like.

【００１８】この音声・画像処理装置は、主な機能ユニ
ットととして、図２から理解できるように、音声付き写
真２における写真画像のソースとしての画像データを受
け取る画像入力処理部２１、音声付き写真２における音
声コードイメージのソースとしての音声データを外部か
ら直接受け取る音声入力処理部２２、音声コードイメー
ジに変換される音声データのソースとなるべきテキスト
データを受け取るテキスト入力処理部２３、入力された
テキストデータに基づいて合成音声データを生成する音
声合成部３０、音声データを光学的に読み取り可能なよ
うにコード化された音声コードイメージに変換するコー
ド変換部４０、画像データ格納部５１、音声コードイメ
ージ格納部５２、そして適正に処理された画像データと
音声コードイメージとから音声付き写真２のためのプリ
ントデータを生成する画像音声合成処理部６０を備えて
いる。As shown in FIG. 2, the audio / image processing apparatus includes, as main functional units, an image input processing unit 21 for receiving image data as a source of a photographic image in the photograph 2 with sound, a photograph with sound, 2, a voice input processing unit 22 for directly receiving voice data as a source of a voice code image from the outside, a text input processing unit 23 for receiving text data to be a source of voice data to be converted to a voice code image, an input text A voice synthesizing unit 30 that generates synthesized voice data based on the data, a code converting unit 40 that converts the voice data into a voice code image coded so as to be optically readable, an image data storage unit 51, a voice code image The storage unit 52, and appropriately processed image data and audio code image And an image voice synthesizing section 60 for generating print data for speech with photos 2 and a.

【００１９】画像入力処理部２１は、画像編集部２１ａ
や画像選択部２１ｂを備えており、カードリーダ７ａ、
フィルムスキャナ７ｂ、通信機器８ｄ、フロッピドライ
ブ８ｅ、ＭＯドライブ８ｆなどから入力された画像デー
タは必要に応じて画像選択部２１ｂによって選択され、
選択された画像データに対して画像編集部２１ａが色調
補正や解像度変換などの編集処理を行う。The image input processing unit 21 includes an image editing unit 21a
And an image selection unit 21b, and a card reader 7a,
Image data input from the film scanner 7b, the communication device 8d, the floppy drive 8e, the MO drive 8f, and the like are selected by the image selection unit 21b as necessary,
The image editing unit 21a performs editing processing such as color tone correction and resolution conversion on the selected image data.

【００２０】音声入力処理部２２は、音声付き写真２に
形成される音声コードイメージのソースとしての音声デ
ータが直接、顧客から与えられる場合に利用されるもの
であり、マイク６ａやカセットプレーヤ６ｂ、カードリ
ーダ７ａ（デジタルボイスレコーダ用メモリカードの使
用時）などから入力された音声データは必要に応じて、
音声選択部２２ｂによって選択され、音声編集部２２ａ
によって編集処理が行われる。The audio input processing unit 22 is used when audio data as a source of an audio code image formed in the photograph 2 with audio is directly supplied from a customer, and includes a microphone 6a, a cassette player 6b, Voice data input from the card reader 7a (when using a digital voice recorder memory card) or the like can be
Selected by the audio selection unit 22b, the audio editing unit 22a
Performs the editing process.

【００２１】テキスト入力処理部２３は、音声付き写真
２に形成される音声コードイメージのソースとして顧客
がテキストデータを与える場合に利用されるもので、顧
客が持参したフロッピディスクに保存されたテキストフ
ァイルや電子メールの形で送付されたテキストデータを
フロッピドライブ８ｅや通信機器８ｄを通じて取り込ん
だ後、テキスト編集部２３ａが必要なテキスト編集を施
す。また、キーボード８ａを通じて、顧客又はオペレー
タが直接入力したテキストデータもこのテキスト編集部
２３ａによって処理される。さらに、オプションとし
て、ＯＣＲ機能を持たせるために文字認識部２４を備え
ることも可能である。ＯＣＲ機能を持たせた場合、顧客
が提示したメッセージ文書をフラットベットスキャナ８
ｃで読み取らせた後、文字認識部２４によってテキスト
データに変換する。つまり、フラットベットスキャナ８
ｃと文字認識部２４が文字認識装置を構築している。The text input processing unit 23 is used when a customer gives text data as a source of a voice code image formed in the photograph 2 with voice, and is a text file stored on a floppy disk brought by the customer. After the text data sent in the form of an e-mail or the like is fetched through the floppy drive 8e or the communication device 8d, the text editing unit 23a performs necessary text editing. Further, text data directly input by the customer or the operator through the keyboard 8a is also processed by the text editing unit 23a. Further, as an option, a character recognizing unit 24 can be provided to have an OCR function. When the OCR function is provided, the message document presented by the customer is sent to the flatbed scanner 8.
After being read by c, the character recognition unit 24 converts the data into text data. That is, the flatbed scanner 8
c and the character recognition unit 24 constitute a character recognition device.

【００２２】テキスト入力処理部２２によって必要な編
集処理を施されたテキストデータを合成音声データに変
換する音声合成部３０はテキスト音声合成部３１とテキ
スト解析用辞書３２と合成音声エレメント辞書３３を備
えており、テキスト音声合成部３１はテキスト解析用辞
書３２を用いて入力テキストデータを解析することでそ
の読みを同定するとともにさらにアクセントと韻律を設
定して得られた音韻系列から合成音声エレメント辞書３
３を用いて合成音声データを生成する。なお、合成音声
エレメント辞書３３のソースとしての音声としては女性
の音声又は男性の音声のいずれでもよいが、両方備えて
選択するようにすることも可能である。さらには、特定
の人物の音声をソースとした数多くの合成音声エレメン
ト辞書３３を用意して、任意に切り換えて利用する構成
も可能である。A speech synthesizer 30 for converting text data subjected to necessary editing processing by the text input processor 22 into synthesized speech data includes a text speech synthesizer 31, a text analysis dictionary 32, and a synthesized speech element dictionary 33. The text-to-speech synthesizing unit 31 analyzes the input text data using the text analysis dictionary 32 to identify the reading of the input text data, and further sets the accent and prosody to obtain the synthesized speech element dictionary 3 from the phoneme sequence obtained.
3 is used to generate synthesized speech data. Note that the voice as the source of the synthesized voice element dictionary 33 may be either a female voice or a male voice, but it is also possible to select and provide both voices. Furthermore, a configuration is also possible in which a number of synthesized speech element dictionaries 33 that use the voice of a specific person as a source are prepared and arbitrarily switched to be used.

【００２３】さらに、音声合成部３０には、上述のよう
に作成された合成音声データの声質を変形させる声質変
形部３４も付随しており、この声質変形部３４は入力し
た音声データに対して、アップ・ダウンサンプリングに
よる周波数の線形変換や時間軸調整によって、テープレ
コーダの早回しや遅回しと類似した変形を施して出力す
るものである。この音声変形部３４は、音声入力処理部
２２から送られてくる音声データに対しても音声変形処
理を施すことができる。Further, the voice synthesis section 30 is also provided with a voice quality deformation section 34 for deforming the voice quality of the synthesized voice data created as described above. In addition, by performing linear transformation of the frequency by up / down sampling and time axis adjustment, a modification similar to the fast or slow rotation of the tape recorder is performed and output. The voice deforming unit 34 can also perform voice deformation processing on voice data sent from the voice input processing unit 22.

【００２４】音声入力処理部２２から送られてきた肉声
の音声データや音声合成部３０から送られてきた合成音
声データを音声コードイメージに変換する音声コード変
換部４０は、波形符号化、分析合成符号化など公知の符
号化手法から適当に選ばれたもので構築された音声デー
タ圧縮符号化部４１と、これにより符号化された音声コ
ードデータを二次元のコードイメージに展開する音声コ
ードイメージ生成部４２と、後ほど行われる画像データ
に基づく画像イメージと音声コードイメージとの音声付
き写真におけるレイアウト編集の際に便利なように音声
付き写真２に形成される音声コードイメージのサイズ
（外形寸法）を算出するプリコードイメージ生成部４３
とを備えている。A voice code conversion unit 40 for converting the real voice voice data transmitted from the voice input processing unit 22 and the synthesized voice data transmitted from the voice synthesis unit 30 into a voice code image is subjected to waveform coding, analysis and synthesis. An audio data compression / encoding unit 41 constructed by a method appropriately selected from known encoding methods such as encoding, and an audio code image generation for expanding the encoded audio code data into a two-dimensional code image The unit 42 determines the size (outer dimension) of the audio code image formed in the audio-accompanied photograph 2 so as to be convenient in layout editing of the audio-accompanied photograph of the image image and the audio code image based on the image data to be performed later. Precode image generator 43 to be calculated
And

【００２５】画像入力処理部２１で編集された画像デー
タは画像イメージとして画像データ格納部５１に、コー
ド変換部２１で変換された音声コードイメージは音声コ
ードイメージ格納部５２に一時的に格納され、画像音声
合成処理部６０によって所望のレイアウトでもってプリ
ンタ３によってプリント出力されるようにプリントデー
タ化される。このため、画像音声合成処理部６０は、画
像データ格納部５１に格納された画像イメージと音声コ
ードイメージ格納部５２に格納された音声コードイメー
ジのレイアウト処理を行う画像・音声コードイメージレ
イアウト編集部６１と、決定されたレイアウトで両イメ
ージを合成してプリントデータを生成する画像・音声コ
ードイメージ合成処理部６２を備えている。このレイア
ウト編集時には、プリコードイメージ生成部４３で算出
された音声コードイメージのサイズに基づくダミーボッ
クスエリアがモニタ４上に表示され、同じく表示されて
いる画像イメージとの位置関係を見比べながらの正確な
レイアウト作業を可能にしている。The image data edited by the image input processing unit 21 is temporarily stored in the image data storage unit 51 as an image image, and the voice code image converted by the code conversion unit 21 is temporarily stored in the voice code image storage unit 52. The image / voice synthesis processing unit 60 converts the print data into a print data so as to be printed out by the printer 3 with a desired layout. For this reason, the image / sound synthesis processing unit 60 performs an image / sound code image layout editing unit 61 that performs a layout process of the image image stored in the image data storage unit 51 and the sound code image stored in the sound code image storage unit 52. And an image / sound code image synthesis processing unit 62 that synthesizes both images with the determined layout to generate print data. At the time of layout editing, a dummy box area based on the size of the audio code image calculated by the pre-code image generation unit 43 is displayed on the monitor 4, and an accurate dummy box area is displayed while comparing the positional relationship with the displayed image image. It enables layout work.

【００２６】上述した音声・画像処理装置による音声付
き写真２の典型的な作成手順を図４のフローチャートを
用いて説明する。ここでは音声付き写真２の注文が電子
メールによってなされているとする。電子メールが到着
すると（＃１）、この電子メールの添付ファイルとして
の画像データが画像入力処理部２１に入力される（＃１
１）と、その画像データは画像編集部２１ａの働きで、
モニタ４でその画像イメージを確認しながらオペレータ
の操作を通じて色調・階調変換、拡大縮小等の編集処理
が行われる（＃１２）。入力された画像が複数存在する
場合は画像選択部２１ｂによって選択された後この編集
処理が行われる。編集処理された画像データは、一旦画
像データ格納部５１に格納される（＃１３）。A typical procedure for creating the photograph 2 with sound by the above-described sound / image processing apparatus will be described with reference to the flowchart of FIG. Here, it is assumed that the order of the photograph 2 with sound is made by e-mail. When the e-mail arrives (# 1), image data as an attached file of the e-mail is input to the image input processing unit 21 (# 1).
1) and the image data is operated by the image editing unit 21a.
Editing processing such as color tone / gradation conversion and enlargement / reduction is performed through an operator's operation while checking the image on the monitor 4 (# 12). When there are a plurality of input images, the image is selected by the image selection unit 21b and then the editing process is performed. The edited image data is temporarily stored in the image data storage unit 51 (# 13).

【００２７】一方、音声コードイメージのソースとして
のテキストデータを含む電子メールファイルは、テキス
ト入力処理部２３のテキスト編集部２３ａに送られ（＃
１４）、そこで、その電子メールから音声付き写真２に
音声コードイメージとして取り込まれるべきメッセージ
だけを含むテキストデータが切り出される（＃１５）。On the other hand, the e-mail file containing the text data as the source of the voice code image is sent to the text editing unit 23a of the text input processing unit 23 (##
14) Then, text data including only a message to be captured as a voice code image in the photograph 2 with voice is cut out from the electronic mail (# 15).

【００２８】漢字仮名混じりテキストとして音声合成部
３０に送られてきたテキストデータは、テキスト音声合
成部３１によってテキスト解析用辞書３２にアクセスし
ながら解析され（＃２１）、単語を同定しながら読み、
アクセントが付与される（＃２２）。次いで、息継ぎ位
置が設定されるとともに文全体のイントネーションが決
定され、音素記号と韻律パラメータからなる音韻系列が
作り出される（＃２３）。作り出された音韻系列に対し
て合成音声エレメント辞書３３にアクセスしながら順次
合成音声エレメントを接続し、合成音声データを生成す
る（＃２４）。The text data sent to the speech synthesizer 30 as text mixed with kanji kana is analyzed by the text-to-speech synthesizer 31 while accessing the text analysis dictionary 32 (# 21).
An accent is given (# 22). Next, the breathing position is set, the intonation of the entire sentence is determined, and a phoneme sequence composed of phoneme symbols and prosodic parameters is created (# 23). The synthesized speech elements are sequentially connected to the generated phoneme sequence while accessing the synthesized speech element dictionary 33 to generate synthesized speech data (# 24).

【００２９】この合成音声データに声質変形処理が要求
されている場合（＃２５YES 分岐）、声質変形部３４に
よって周波数線形変換等が施され（＃２６）、要求され
ていない場合（＃２５NO分岐）、合成音声データはその
ままコード変換部４０に送られる。If voice synthesis processing is required for the synthesized speech data (branch # 25 YES), the voice quality transformation unit 34 performs frequency linear conversion or the like (# 26), and if it is not required (branch # 25 NO). The synthesized voice data is sent to the code conversion unit 40 as it is.

【００３０】まず、合成音声データは音声データ圧縮符
号化部４１に送られ、圧縮処理が行われ、続いて、音声
コードイメージ生成部４２にて、光学的に読取り可能な
音声コードイメージに変換される（＃３１）。さらにこ
の音声コードイメージのサイズ（外形寸法）がプリコー
ドイメージ生成部４３によって算出され（＃３２）、音
声コードイメージのデータとともにサイズデータもは音
声コードイメージ格納部５２に一旦格納される（＃３
３）。First, the synthesized voice data is sent to a voice data compression / encoding unit 41, where it is subjected to a compression process, and subsequently converted into an optically readable voice code image by a voice code image generation unit 42. (# 31). Further, the size (external dimensions) of the voice code image is calculated by the precode image generation unit 43 (# 32), and the size data is temporarily stored in the voice code image storage unit 52 together with the voice code image data (# 3).
3).

【００３１】画像データ格納部５１に記憶された画像デ
ータと、音声コードイメージ格納部５２に記憶された音
声コードイメージは、画像音声合成処理部６０の画像・
音声コードイメージレイアウト編集部６１にそれぞれ取
り込まれて画像イメージと音声コードイメージのレイア
ウト編集処理がなされる（＃４０）。実際のレイアウト
編集処理ではモニタ４の画面にレイアウト編集画面が表
示され、カーソルの指示により画像イメージと音声コー
ドイメージを擬似的に示すダミーボックスエリアのレイ
アウト編集が行われる。このレイアウト編集は予め選択
されたテンプレートを用いて画像イメージと音声コード
イメージを自動的に流し込む方法を採用することも可能
である。その際、例えば、音声コードイメージの長さが
印刷可能長さを越えると、これを２つに分離して２段構
成にするなどの再編集が行われる。The image data stored in the image data storage unit 51 and the audio code image stored in the audio code image storage unit 52 are stored in the image / synthesis processing unit 60.
The layout editing process of the image image and the voice code image is performed by being taken into the voice code image layout editing unit 61 (# 40). In the actual layout editing processing, a layout editing screen is displayed on the screen of the monitor 4, and the layout editing of the dummy box area that simulates the image image and the audio code image is performed by the instruction of the cursor. For this layout editing, it is also possible to adopt a method of automatically injecting the image image and the audio code image using a template selected in advance. At this time, if the length of the audio code image exceeds the printable length, for example, re-editing is performed such that the audio code image is separated into two parts to form a two-stage configuration.

【００３２】画像・音声コードイメージ合成処理部６２
は、画像・音声コードイメージレイアウト編集部６１か
らのレイアウト情報を受け取ると、画像データ格納部５
１及び音声コードイメージ格納部５２にそれぞれリクエ
スト信号を送信し、対応画像データ及び音声コードイメ
ージデータを受け取る。受け取った画像イメージのデー
タと音声コードイメージのデータはレイアウト情報に基
づいて一体化され、プリントデータとして生成される
（＃４１）。このプリントデータがプリンタ３に送信さ
れることにより、画像イメージと音声コードイメージが
印画紙に露光され、露光印画紙が現像処理されることに
より図３で示されるような音声付き写真２が作成される
（＃５０）。Image / voice code image synthesis processing unit 62
Upon receiving the layout information from the image / audio code image layout editing unit 61, the image data storage unit 5
1 and a request signal are transmitted to the voice code image storage unit 52, and the corresponding image data and voice code image data are received. The received image data and audio code image data are integrated based on the layout information and generated as print data (# 41). The print data is transmitted to the printer 3 so that the image image and the audio code image are exposed on the photographic paper, and the exposed photographic paper is subjected to a developing process, thereby creating the photograph 2 with sound as shown in FIG. (# 50).

【００３３】〔別実施形態〕図５で示された本発明の別
実施形態の機能ブロック図では、図２で示された先の実
施形態のものと比べて、音声合成部３０がテキスト音声
合成部３１の代わりに音声編集合成部３５によって構成
されている点で異なっている。[Another Embodiment] In the functional block diagram of another embodiment of the present invention shown in FIG. 5, the speech synthesizing section 30 is different from the previous embodiment shown in FIG. The difference is that a voice editing / synthesizing unit 35 is used instead of the unit 31.

【００３４】語彙・フレーズの限定された肉声の断片か
ら文音声を生成する編集合成と呼ばれるこの方式で合成
音声データを生成するためには、予め登録された語彙や
フレーズの肉声データを格納している登録音声エレメン
ト辞書３６が必要であり、音声編集合成部３５は、テキ
スト入力処理部２３から送られてきたテキストデータを
断片化し、その断片を登録音声エレメント辞書を用いて
音声データの断片で置き換えていく。In order to generate synthesized speech data in this method called edit synthesis in which sentence speech is generated from fragments of real voice having limited vocabulary and phrases, real voice data of vocabulary and phrases registered in advance are stored. The voice editing / synthesizing unit 35 fragments the text data sent from the text input processing unit 23, and replaces the fragment with a voice data fragment using the registered voice element dictionary. To go.

【００３５】この実施形態では、その登録音声エレメン
ト辞書３６に、個人別で登録された肉声データを格納す
ることも可能である。つまり、音声付き画像シートの顧
客に対して予め、編集合成のために最低限必要とされる
音声エレメントを顧客自身の肉声で登録・格納してお
き、音声付き画像シートの注文時には、本人の登録音声
エレメントを用いた編集合成で合成音声データが作成さ
れる。登録されていない顧客に対しては、標準で用意さ
れている音声エレメントが使用される。In this embodiment, the registered voice element dictionary 36 can store real voice data registered for each individual. In other words, the minimum required audio elements for editing and synthesizing are registered and stored in advance by the customer's own voice for the customer of the image sheet with sound, and when ordering the image sheet with sound, the registration of the person is performed. Synthesized speech data is created by edit synthesis using a speech element. For customers who are not registered, the voice elements prepared as standard are used.

【００３６】また、この実施形態の音声・画像処理装置
は、図６に示すような、証明写真装置やプリクラ（登録
商標）装置のようなボックス形の外観を備えており、音
声付き写真２を作成しようとする顧客は、料金を投入し
た後、モニタ４に表示される指示メッセージに従って、
備え付けられているデジタルカメラで自分を撮影すると
ともに、音声メッセージ化したいテキストデータを備え
付けられているタッチパネル式キーボード８ａを使って
入力するか、又はマイク６ａを通じて肉声で入力する。
また、プリント部３として昇華型の熱転写プリンタが採
用されている。The audio / image processing apparatus of this embodiment has a box-shaped appearance like an ID photo apparatus or a Purikura (registered trademark) apparatus as shown in FIG. After inputting the fee, the customer who intends to create the product according to the instruction message displayed on the monitor 4
While photographing oneself with the provided digital camera, the user inputs the text data to be converted into a voice message using the provided touch panel keyboard 8a, or inputs the text data in real voice through the microphone 6a.
In addition, a sublimation type thermal transfer printer is employed as the printing unit 3.

【００３７】この別実施形態の音声・画像処理装置によ
る音声付き写真２の典型的な作成手順を図７のフローチ
ャートを用いて説明する。ここでは音声付き写真２のた
めの画像ソースはデジタルカメラの撮像画像データであ
り、その音声ソースは備え付けのキーボード８ａから直
接入力されたテキストデータとする。A typical procedure for creating a photograph 2 with sound by the sound / image processing apparatus of this embodiment will be described with reference to the flowchart of FIG. Here, the image source for the photograph 2 with sound is image data captured by a digital camera, and the sound source is text data directly input from the built-in keyboard 8a.

【００３８】音声付き写真２の作成を希望する顧客は、
指定された硬貨を硬貨投入口に入れることにより（＃１
０１）モニタ４に表示されるメニュに従って、まず装置
に備えられたデジタルカメラで証明写真装置やプリクラ
装置と同様な手順で自分を撮影する（＃１１０）。この
デジタルカメラはＩ／Ｏインタフェース１０と直接接続
されているので、デジタルカメラによって取得された画
像データは直ちに画像入力処理部２１に転送される（＃
１１１）。画像入力処理部２１に転送された画像データ
は画像編集部２１ａの働きで、モニタ４でその画像イメ
ージを確認しながらトリミングや拡大縮小等の編集処理
を行うことができる（＃１１２）。編集処理された画像
データは、一旦画像データ格納部５１に格納される（＃
１１３）。A customer who wishes to create a photograph 2 with sound
By inserting the specified coin into the coin slot (# 1
01) In accordance with the menu displayed on the monitor 4, first, the digital camera provided in the apparatus shoots the user in the same procedure as the ID photo apparatus and the print apparatus (# 110). Since this digital camera is directly connected to the I / O interface 10, image data obtained by the digital camera is immediately transferred to the image input processing unit 21 (#
111). The image data transferred to the image input processing unit 21 can be subjected to editing processing such as trimming and enlargement / reduction while checking the image on the monitor 4 by the function of the image editing unit 21a (# 112). The edited image data is temporarily stored in the image data storage unit 51 (#
113).

【００３９】続いて、今回、音声コードイメージ化する
ためのソースデータとしてキーボード入力によるテキス
トデータを選択しているので、音声付き写真に組み込み
たい音声メッセージを文としてキーボード８ａから入力
する（＃１１４）。テキスト編集部２３ａはテキストエ
ディタとしての機能を有するので、キーボード８ａを通
じて入力されたデータから文章を作成し、最終的にこの
テキストデータを編集合成に適したフォーマットに変換
して音声合成部３０に送り出す（＃１１５）。Next, since text data by keyboard input has been selected as source data for making a voice code image, a voice message to be incorporated into the photograph with voice is input as a sentence from the keyboard 8a (# 114). . Since the text editing unit 23a has a function as a text editor, it creates a sentence from data input through the keyboard 8a, and finally converts this text data into a format suitable for editing and synthesis and sends it to the speech synthesis unit 30. (# 115).

【００４０】編集合成プロセスでは、まず、この顧客が
予め音声登録しているかどうかをチェックする（＃１２
１）。音声登録している場合、その顧客の登録音声エレ
メントファイルがロードされる（＃１２２）。この登録
音声エレメントファイルのロードに関して種々の形態が
あるが、ここでは代表的な２つの形態を紹介する。In the editing / synthesizing process, first, it is checked whether or not this customer has previously registered voice (# 12).
1). If the voice is registered, the registered voice element file of the customer is loaded (# 122). There are various modes for loading the registered voice element file. Here, two typical modes will be introduced.

【００４１】第１のものは、顧客が、予め音声エレメン
ト登録装置によって、必要な語彙・フレーズを肉声で登
録し、その登録された語彙・フレーズを編集合成に適し
たフォマットでファイル化することによって得られた音
声エレメントファイルをメモリカードに記録しておく形
態である。音声登録しているかどうかのチェック段階で
カードリーダ７ａに該当メモリカードを挿入することに
より、登録音声エレメントファイルが音声合成部３０の
登録音声エレメント辞書３６にロードされる。第２のも
のは、予め音声エレメント登録装置によって作成された
音声エレメントファイルを顧客ＩＤをキーとして登録音
声エレメント辞書３６に格納しておく形態であり、音声
登録しているかどうかのチェック段階で顧客ＩＤを入力
することにより、この顧客の登録音声エレメントファイ
ルが以後の編集合成作業における登録音声エレメント辞
書３６として使用されるように設定される。登録音声エ
レメント辞書３６は、この音声・画像処理装置に内蔵さ
れるのではなく、通信回線でつながったサーバ内に設け
られることが望ましい。つまり、顧客ＩＤを入力する
と、通信回線を通じて該当顧客の登録音声エレメントフ
ァイルが音声合成部３０の登録音声エレメント辞書３６
にロードされる構成とするのである。The first method is that a customer registers a necessary vocabulary / phrase in a real voice by a voice element registration device in advance, and files the registered vocabulary / phrase in a format suitable for editing and synthesis. In this embodiment, the obtained audio element file is recorded on a memory card. By inserting the corresponding memory card into the card reader 7a at the stage of checking whether the voice is registered, the registered voice element file is loaded into the registered voice element dictionary 36 of the voice synthesizer 30. The second type is a form in which a voice element file created in advance by the voice element registration device is stored in the registered voice element dictionary 36 using the customer ID as a key. Is input, the registered voice element file of the customer is set so as to be used as the registered voice element dictionary 36 in the subsequent editing and synthesizing work. It is preferable that the registered voice element dictionary 36 is provided not in the voice / image processing apparatus but in a server connected by a communication line. That is, when the customer ID is input, the registered voice element file of the corresponding customer is stored in the registered voice element dictionary 36 of the voice synthesis unit 30 through the communication line.
It is configured to be loaded to

【００４２】音声登録していない場合、登録音声エレメ
ント辞書３６に格納されている標準音声エレメントファ
イルが以後の編集合成作業における登録音声エレメント
辞書３６として使用されるように設定される。（＃１２
３）。If no voice is registered, the standard voice element file stored in the registered voice element dictionary 36 is set so as to be used as the registered voice element dictionary 36 in the subsequent editing and synthesizing work. (# 12
3).

【００４３】いずれにしても、編集合成プロセスでは、
まず処理すべきテキストデータで表されいるメッセージ
文を語彙・フレーズに分解し（＃１２４）、それぞれ
に、登録音声エレメント辞書３６としての音声エレメン
トファイルから抽出された断片的な音声エレメントを割
り当て、合成音声データを生成する（＃１２５）。In any case, in the edit composition process,
First, the message sentence represented by the text data to be processed is decomposed into vocabulary and phrases (# 124), and fragmentary speech elements extracted from the speech element file as the registered speech element dictionary 36 are assigned to each of them, and synthesized. The audio data is generated (# 125).

【００４４】この合成音声データに声質変形処理が要求
されている場合（＃２５YES 分岐）、声質変形部３４に
よって周波数線形変換等が施され（＃２６）、要求され
ていない場合（＃２５NO分岐）、合成音声データはその
ままコード変換部４０に送られ、以下＃３１〜＃３３で
前述したように合成音声データの音声コードイメージ化
が行われ、生成された音声コードイメージは音声コード
イメージ格納部５２に一旦格納される。When voice quality modification processing is required for the synthesized speech data (branch # 25 YES), the voice quality transformation unit 34 performs frequency linear conversion or the like (# 26), and when the voice quality modification processing is not required (branch # 25 NO). The synthesized voice data is sent to the code conversion unit 40 as it is, and as described above in steps # 31 to # 33, the synthesized voice data is converted into a voice code image, and the generated voice code image is stored in the voice code image storage unit 52. Is stored once.

【００４５】画像データ格納部５１に記憶された画像デ
ータと、音声コードイメージ格納部５２に記憶された音
声コードイメージは、予め選択されたテンプレートを用
いて画像・音声コードイメージレイアウト編集部６１に
よってレイアウト編集処理がなされる（＃４０）。The image data stored in the image data storage unit 51 and the voice code image stored in the voice code image storage unit 52 are laid out by an image / voice code image layout editing unit 61 using a preselected template. An editing process is performed (# 40).

【００４６】画像・音声コードイメージ合成処理部６２
は、画像イメージのデータと音声コードイメージのデー
タをレイアウト情報に基づいて一体化し、プリントデー
タを生成する（＃４１）。このプリントデータがプリン
タ３に送信されることにより、画像イメージと音声コー
ドイメージが専用シートにプリントされ、図３で示され
るような音声付き写真２として、装置前面に設けられた
プリント取り出し口に排出される（＃５０）。Image / voice code image synthesis processing unit 62
Integrates the image data and the audio code image data based on the layout information to generate print data (# 41). By transmitting the print data to the printer 3, the image image and the voice code image are printed on a dedicated sheet, and discharged as a photograph 2 with voice as shown in FIG. Is performed (# 50).

【００４７】上述した実施の形態では、画像データと音
声コードイメージは画像・音声合成処理部６０によって
合成されていたが、画像・音声合成処理部６０を省略し
て、このプリンタ３によってプリント出力されていた
が、画像データと音声コードイメージを別々のプリンタ
でプリント出力してもよい。その際、音声コードイメー
ジのプリント出力にシールプリンタで、音声コードイメ
ージを形成したシールを画像を形成したシート、例えば
写真プリントに貼り付けるように構成するとよい。In the above-described embodiment, the image data and the voice code image are synthesized by the image / voice synthesis processing unit 60. However, the image / voice synthesis processing unit 60 is omitted and the image data and the voice code image are printed out by the printer 3. However, the image data and the voice code image may be printed out by different printers. At this time, it is preferable that the sticker on which the voice code image is formed is attached to a sheet on which the image is formed, for example, a photo print, by using a sticker printer to print out the voice code image.

【００４８】さらに上述した全ての実施の形態では、入
力されたテキストデータは、いったん音声合成部３０で
合成音声データ化され、この合成音声データが音声コー
ドイメージに変換されていたが、テキスト入力処理部２
３で処理されたテキストデータを直接音声コードイメー
ジに変換することも可能である。そのような音声・画像
処理装置は、図８で示すように、音声合成部３０が省略
された代わりに、コード変換部４０に、テキストデータ
を所定の要素に断片化して得られたテキストエレメント
に順次対応する音声コードイメージを割り当てていくテ
キスト／音声コードイメージ置換部４４と、テキストエ
レメントに対応する音声コードイメージを登録した音声
コードイメージ辞書４５を備えている。つまり、テキス
トを構成する語彙やフレーズに対応する音声コードイメ
ージを当てはめながら順次つなぎ合わせていくことによ
り最終的な音声コードイメージを作り出すのである。Further, in all the above-described embodiments, the input text data is once converted into synthesized voice data by the voice synthesis unit 30 and the synthesized voice data is converted into a voice code image. Part 2
It is also possible to directly convert the text data processed in step 3 into a voice code image. As shown in FIG. 8, such a voice / image processing apparatus includes a code conversion unit 40 which, instead of omitting a voice synthesis unit 30, generates a text element obtained by fragmenting text data into predetermined elements. A text / voice code image replacement unit 44 for sequentially assigning corresponding voice code images, and a voice code image dictionary 45 in which voice code images corresponding to text elements are registered. In other words, a speech code image corresponding to the vocabulary or the phrase constituting the text is applied and sequentially connected to create a final speech code image.

[Brief description of the drawings]

【図１】本発明による音声・画像処理装置の１つの実施
形態を示す外観図FIG. 1 is an external view showing an embodiment of a sound / image processing device according to the present invention.

【図２】図１による音声・画像処理装置の機能ブロック
図FIG. 2 is a functional block diagram of the audio / image processing device according to FIG. 1;

【図３】音声・画像処理装置によって作成された音声付
き写真から音声を再生する様子を示す説明図FIG. 3 is an explanatory diagram showing a state in which sound is reproduced from a photograph with sound created by the sound / image processing device.

【図４】図２に示された音声・画像処理装置を用いた音
声付き写真の作成手順を示すフローチャートFIG. 4 is a flowchart showing a procedure for creating a photograph with sound using the sound / image processing device shown in FIG. 2;

【図５】本発明による音声・画像処理装置の別実施形態
を示す機能ブロック図FIG. 5 is a functional block diagram showing another embodiment of the audio / image processing device according to the present invention.

【図６】図５による音声・画像処理装置の外観図6 is an external view of the audio / image processing device according to FIG. 5;

【図７】図５に示された音声・画像処理装置を用いた音
声付き写真の作成手順を示すフローチャート7 is a flowchart showing a procedure for creating a photograph with sound using the sound / image processing apparatus shown in FIG. 5;

【図８】本発明による音声・画像処理装置のさらに別な
実施形態を示す機能ブロック図FIG. 8 is a functional block diagram showing still another embodiment of the audio / image processing device according to the present invention.

[Explanation of symbols]

２音声付き画像シート（音声付き写真）３プリント部（銀塩写真プリンタ、昇華型熱転写プリ
ンタ）２１画像入力部２２音声入力部２３テキスト入力処理部２４文字認識部３０音声合成部３１テキスト音声合成部３２テキスト解析用辞書３３合成音声エレメント辞書３４声質変形部３５音声編集合成部３６登録音声エレメント辞書６０画像音声合成処理部2 Image Sheet with Sound (Photo with Sound) 3 Print Unit (Silver Film Photo Printer, Sublimation Type Thermal Transfer Printer) 21 Image Input Unit 22 Voice Input Unit 23 Text Input Processing Unit 24 Character Recognition Unit 30 Voice Synthesis Unit 31 Text Voice Synthesis Unit 32 Text Analysis Dictionary 33 Synthesized Speech Element Dictionary 34 Voice Quality Deformation Unit 35 Speech Editing / Synthesis Unit 36 Registered Speech Element Dictionary 60 Image Speech Synthesis Processing Unit

フロントページの続きＦターム(参考） 5C052 AA11 DD06 EE08 FA02 FA03 FE01 GA02 GA05 GB07 GD03 GE08 5C062 AA05 AB17 AC02 AC29 AE02 AE07 AE08 AE11 5D045 AA20 BA01 Continued on the front page F term (reference) 5C052 AA11 DD06 EE08 FA02 FA03 FE01 GA02 GA05 GB07 GD03 GE08 5C062 AA05 AB17 AC02 AC29 AE02 AE07 AE08 AE11 5D045 AA20 BA01

Claims

[Claims]

1. A code conversion unit for converting audio data into an audio code image coded so as to be optically readable, and an image based on the audio code image and image data to create an image sheet with audio. An audio / image processing apparatus having a printing unit for printing an image, comprising a text input processing unit for processing input text data, and based on the text data processed by the text input processing unit. A sound / image processing device for generating a sound code image for reproducing the sound of the sound.

2. A speech synthesizing section for generating synthetic speech data based on text data processed by the text input processing section, wherein the code converting section converts the synthesized speech data generated by the speech synthesizing section into the synthesized speech data. The audio / image processing apparatus according to claim 1, wherein the audio / image processing apparatus is used as source audio data for an audio code image.

3. The speech synthesis unit analyzes input text data by using a text analysis dictionary to identify its reading, and further sets an accent and prosody to obtain a synthesized speech element dictionary from a phoneme sequence obtained. The voice / image processing apparatus according to claim 2, further comprising a text-to-speech synthesizing unit that generates synthesized speech data by using the text / speech data.

4. The speech / image processing apparatus according to claim 2, wherein the synthesized speech elements stored in the synthesized speech element dictionary are created based on real voice data registered for each individual.

5. A speech editing / synthesis unit for converting input text data into synthesized speech data in a fragmentary manner using a registered speech element dictionary storing real voice data of vocabulary and phrases registered in advance. The audio / image processing apparatus according to claim 2, further comprising a unit.

6. The voice / image processing apparatus according to claim 5, wherein said registered voice element dictionary stores real voice data registered for each individual.

7. The voice / image processing apparatus according to claim 2, wherein said voice synthesizing unit includes a voice quality changing unit for changing a voice quality of said synthesized voice data.

8. A character recognition device is additionally provided,
8. The voice / image processing device according to claim 1, wherein the text data output by the character recognition device is used for voice code image conversion.