JP2004158913A

JP2004158913A - Audiovisual processor

Info

Publication number: JP2004158913A
Application number: JP2002320122A
Authority: JP
Inventors: Yasunori Ohora; 恭則大洞
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2002-11-01
Filing date: 2002-11-01
Publication date: 2004-06-03

Abstract

<P>PROBLEM TO BE SOLVED: To satisfactorily reproduce images or sounds from audiovisual data, even if images and sounds in the data are edited by separate tools or files are transformed to other formats. <P>SOLUTION: Time information embedding means (a video watermark inserter 6 and an audio watermark inserter 11) embed time information in video data and audio data as watermarks to generate video data and audio data, respectively. In the reproduction, the time information are extracted from the video data and the audio data, and the video data and the audio data are arranged in a corresponding relation, based on the extracted time information, thereby synchronizing resulting images with resulting sounds. <P>COPYRIGHT: (C)2004,JPO

Description

【０００１】
【発明の属する技術分野】
本発明は、音声画像処理装置に関し、特に、音声データと動画像データとを別々のツールで編集した際に行う同期処理のために用いて好適なものである。
【０００２】
【従来の技術】
動画像データの符号化方式には、フレーム内符号化方式、またはフレーム間予測符号化方式等が知られている。
前記フレーム内符号化方式としては、例えば、ＭｏｔｉｏｎＪＰＥＧ（ＪｏｉｎｔＰｈｏｔｏｇｒａｐｈｉｃＣｏｄｉｎｇＥｘｐｅｒｔｓＧｒｏｕｐ）や、ＤｉｇｉｔａｌＶｉｄｅｏ等の符号化方式が挙げられる。
【０００３】
一方、前記フレーム間予測符号化としては、例えば、Ｈ．２６１、Ｈ．２６３、ＭＰＥＧ（ＭｏｖｉｎｇＰｉｃｔｕｒｅＣｏｄｉｎｇＥｘｐｅｒｔｓＧｒｏｕｐ）−１、ＭＰＥＧ−２、及びＭＰＥＧ−４等の符号化方式が挙げられる。
【０００４】
前記２つの符号化方式の規格は、ＩＳＯ（ＩｎｔｅｒｎａｔｉｏｎａｌＯｒｇａｎｉｚａｔｉｏｎｆｏｒＳｔａｎｄａｒｄｉｚａｔｉｏｎ：国際標準化機構）やＩＴＵ（ＩｎｔｅｒｎａｔｉｏｎａｌＴｅｌｅｃｏｍｍｕｎｉｃａｔｉｏｎＵｎｉｏｎ：国際電気通信連合）によって国際標準化されている。
【０００５】
このようなディジタル符号化規格の普及に伴い、前記符号化方式をコンピュータ上で統一的に扱うためのファイルフォーマットが策定されている。例えば、ＭＰＥＧ−４の規格には、そのファイルフォーマットが定義されている。また、コンピュータのＯＳやネットワークの構成に依存して定義されたファイルフォーマットが、多数普及している。
【０００６】
このような状況の中で、コンテンツ業界においては、著作権保護に関する問題提起を大きくするようになってきた。そして、この問題提起を契機にして、セキュリティに関する情報を暗号化するための電子透かし技術が開発されてきている。
【０００７】
この電子透かし技術とは、画像データを再生する時に、前記画像データが変化しないか、または前記画像データの変化を人間が知覚できないレベルで、著作権を保護するために必要なデータを前記画像データに埋め込む技術である。
【０００８】
画像データに対して前記電子透かしデータを埋め込むための技術としては、例えば、特許文献１の「動画像エンコードプログラムを記録した記録媒体及び動画像エンコード装置」、または特許文献２の「電子透かし埋め込み装置および抽出装置」などが提案されている。
【０００９】
また、オーディオデータに関しては、前記電子透かしデータを埋め込むための技術として、例えば、特許文献３の「音声データに透かし情報を埋め込む方法、透かし情報埋め込み装置、透かし情報検出装置、透かし情報が埋め込まれた記録媒体、及び透かし情報を埋め込む方法を記録した記録媒体」、または特許文献４の「電子透かし埋め込み装置、オーディオ符号化装置および記録媒体」などが提案されている。
【００１０】
【特許文献１】
特開平１０−２４３３９８号公報
【特許文献２】
特開平１１−３４１４５０号公報
【特許文献３】
特開２００１−２０２０８９号公報
【特許文献４】
特開平１１−３１６５９９号公報
【００１１】
【発明が解決しようとする課題】
画像データに関するフレーム同期及びフレーム制御については各種各様の方式がある。前述した符号化方式をコンピュータ上で統一的に扱うための従来のファイルフォーマットでは、それぞれのファイルフォーマットごとに、異なるフレーム同期（フレーム制御）方式を採用している。
【００１２】
例えば、同じＭｏｔｉｏｎＪＰＥＧのファイルである、ＡＶＩファイルまたはＱｕｉｃｋＴｉｍｅファイルでも、フレーム同期のためのタイムスタンプを付与する方法はそれぞれ異なっている。
【００１３】
このため、前記タイムスタンプの付与方法が同じ画像ファイル同士でファイルフォーマット変換した場合、変換後の画像ファイルにおいて、画像ファイルに記録されている画像と音を同期させることは比較的容易に行なうことができるが、前記タイムスタンプの付与方法が異なる画像ファイル同士でファイルフォーマット変換した場合、変換前と変換後の画像ファイルで対応する時間情報が消失してしまうことから、画像ファイルに記録されている画像と音を同期させることは非常に困難であるという問題があった。
【００１４】
そこで、本発明は前記問題点にかんがみ、フォーマットが異なる複数の画像データファイルのあいだで、互いにファイル変換を行っても、画像と音声の同期再生を良好に行なうことができるようにすることを目的とする。
【００１５】
【課題を解決するための手段】
本発明の音声画像処理装置は、画像データ及び前記音声データに時間情報を透かしとしてそれぞれ埋め込む時間情報埋め込み手段を有することを特徴としている。
【００１６】
すなわち、本発明の音声画像処理装置は、画像データや音声データに時間情報を透かしデータとして埋め込み、埋め込んだ時間情報を基に、画像と音声との同期をするようになされている。
【００１７】
【発明の実施の形態】
以下、本発明の音声画像処理装置の実施の形態の一例を、図面を用いて詳細に説明する。
（第１の実施の形態）
図１は、本発明の第１の実施の形態における音声画像処理装置１の構成を示すブロック図である。
【００１８】
図１において、１は本実施の形態における音声画像処理装置である。２は被写体を撮像することにより得られる画像信号を、フレーム単位で連続して取得するためのカメラである。８は音声信号をフレーム単位で連続して取得するためのマイクである。
【００１９】
３はカメラ２が取得した画像信号をフレーム単位で符号化して画像符号化データを生成するための画像符号化器である。９はマイク８が取得した音声信号をフレーム単位で符号化して音声符号化データを生成するための音声符号化器である。
【００２０】
４は前述した画像符号化データと音声符号化データとを同期させるための時間情報を発生する同期情報発生器である。５は同期情報発生器４が発生する時間情報から、画像透かしデータを生成するための画像透かし生成器である。
【００２１】
６は画像透かし生成器５で生成した画像透かしデータを、画像符号化器３が生成した画像符号化データに埋め込むための画像透かし挿入器である。７は画像透かし挿入器６が生成した画像符号化データを記録するための画像記録装置である。
【００２２】
１０は同期情報発生器４が出力する時間情報から、音声透かしデータを生成するための音声透かし生成器である。１１は音声透かし生成器１０で生成した音声透かしデータを、音声符号化器９が生成した音声符号化データに埋め込むための音声透かし挿入器である。１２は音声透かし挿入器１１が生成した音声符号化データを記録するための音声記録装置である。なお、画像透かし挿入器６及び音声透かし挿入器１１により、時間情報埋め込み手段がそれぞれ構成されているようにしたが、画像透かし挿入器６及び音声透かし挿入器１１をまとめた１つの透かし挿入器により、時間情報埋め込み手段が構成されているようにしてもよい。
【００２３】
前記のように構成された本実施の形態の音声画像処理装置１が音声画像データを処理する動作を以下に説明する。
先ず、カメラ２で被写体を撮像することによって得られた画像信号は、音声画像処理装置１に設けられている画像符号化器３に、１フレームずつ入力される。
【００２４】
画像符号化器３は、カメラ２より入力された画像信号に対して所定のアルゴリズムに基づく符号化処理を施して画像符号化データを生成し、前記生成した画像符号化データを画像透かし挿入器６へ出力する。
【００２５】
次に、画像透かし挿入器６は、画像透かし生成器５より入力された画像透かしデータを、前述した画像符号化器３より入力された画像符号化データに埋め込んだデータを生成する。そして、画像記録装置７は、画像透かし挿入器６により生成された画像符号化データ（画像透かしデータが埋め込まれた画像符号化データ）を記録する。
【００２６】
なお、画像記録装置７に記録される画像符号化データのファイルフォーマットに、画像データと音声データとを同期させるための時間情報を保存する機能があれば、前述した画像透かしデータを埋め込む処理を省略しても良い。
【００２７】
前述した画像信号と同様に、マイク８で収録された音声信号は、音声符号化器９に１フレームずつ入力される。音声符号化器９は、マイク８より入力される音声信号に対して所定のアルゴリズムに基づく符号化処理を施して、音声符号化データを生成し、前記生成した音声符号化データを音声透かし挿入器１１へ出力する。
【００２８】
音声透かし生成器１０は、同期情報発生器４から画像データと音声データとを同期させるための時間情報が入力されると、前記時間情報に基づいて音声用の透かしデータを生成し、これを音声透かし挿入器１１へ出力する。
【００２９】
音声透かし挿入器１１は、音声透かし生成器１０より入力された音声透かしデータを、前述した音声符号化器９より入力された音声符号化データに埋め込んだデータを生成する。そして、音声透かし挿入器１１で生成された音声符号化データ（音声透かしデータが埋め込まれた音声符号化データ）を音声記録装置１２に記録する。
【００３０】
なお、一般に、音声データ専用のエディタやファイルフォーマットには、時間情報を保存するための機能、または時間情報を保存するための記録領域を有していないことが多い。このため、前述した音声符号化データに対しては、音声透かしデータを埋め込むような処理をする必要がある。
【００３１】
画像用の透かしデータが埋め込まれた符号化データ、及び音声用の透かしデータが埋め込まれた符号化データは、画像記録装置７及び音声記録装置１２の所定の記録領域にそれぞれ格納される。
【００３２】
本実施の形態の音声画像処理装置１によれば、前記一連のデータ生成動作により、画像記録装置７に記録された画像データと、音声記録装置１２に記録された音声データとを、フレーム単位で読み出して前記画像データ及び音声データを合体させて再生する際に、データの合体に必要な同期情報を、透かしデータとして元の画像データ及び音声データに埋め込んでいるため、再生側でのフレーム（同期）操作を容易に行なうことができる。
【００３３】
（第２実施例）
図２は、本発明の第２の実施の形態における音声画像処理装置の構成を示すブロック図である。
【００３４】
図２において、１３は本実施の形態の音声画像処理装置である。７は画像記録装置であり、第１の実施の形態で示した画像符号化データ（画像透かしデータが埋め込まれた画像符号化データ）を記録している。
また、１２は音声記録装置であり、第１の実施の形態で示した音声符号化データ（音声透かしデータが埋め込まれた音声符号化データ）を記録している。
【００３５】
１５は画像記録装置７に記録された画像符号化データから、画像透かしデータを抽出するための画像透かし抽出器である。１８は画像符号化データをフレーム単位で復号するための画像復号器である。２０は音声記録装置１２に記録された音声符号化データから、音声透かしデータを抽出するための音声透かし抽出器である。２１は音声符号化データをフレーム単位で復号するための音声復号器である。
【００３６】
１６は音声符号化データ及び画像符号化データにそれぞれ記録されている時間情報に基づいて、機器の動作を制御するための制御情報を発生して出力する制御情報発生器である。
【００３７】
１７は制御情報発生器１６から出力される制御情報に基づいて、画像記録装置７及び音声記録装置１２から所定のデータを抽出するように動作制御するための制御器である。
１９は画像復号器１８によって復号されて再生された画像データを表示するモニタである。２２は音声復号器２１によって復号されて再生された音声データを発音するスピーカである。
【００３８】
前記のように構成された音声画像処理装置１３が音声画像データを処理する動作を以下に説明する。
【００３９】
先ず、画像透かし抽出器１５は、画像記録装置７に記録された画像符号化データを読み出して、そこに埋め込まれている画像透かしデータを抽出する。同様に、音声画像透かし抽出器２０は、音声記録装置１２に記録された音声符号化データを読み出して、そこに埋め込まれている音声透かしデータを抽出する。
【００４０】
次に、画像透かし抽出器１５及び音声画像透かし抽出器２０は、抽出した透かしデータを制御情報発生器１６にそれぞれ出力する。前記透かしデータが入力された制御情報発生器１６は、抽出された画像透かしデータに含まれている時間情報と、音声透かしデータに含まれている時間情報とを比較する。そして、この比較結果に基づいて、次のデータを抽出するのに必要な制御情報を制御器１７へ出力する。
【００４１】
制御器１７は、前記制御情報発生器１６から入力される制御情報に基づいて、画像記録装置７及び音声記録装置１２にアクセスして、対応するビットストリーム（画像符号化データ及び音声符号化データ）を抽出する。
【００４２】
次に、画像透かし抽出器１５及び音声透かし抽出器２０は、抽出されたビットストリームから、音声透かしデータ及び画像透かしデータをそれぞれ抽出して、同期用の時間情報を再度得ることを繰り返し行なう。
【００４３】
前述した制御情報発生器１６によって画像符号化データの時間情報と音声符号化データの時間情報とを比較する際に、例えば、画像符号化データの時間情報の方が進んでいる場合には、画像用のビットストリームの抽出を一時的に停止するような情報を生成して、これを制御器１７に出力する。このようにして、画像符号化データ及び音声符号化データの読み出しタイミングの同期が図られている。
【００４４】
なお、画像記録装置７及び音声記録装置１２の記録フォーマットが時間情報を含むフォーマットである場合は、前述した透かしデータの抽出を行わずに、前記時間情報を入手するようにしてもよい。
【００４５】
画像復号器１８は、画像透かし抽出器１５より画像データのみのビットストリームを受けて動画像データに変換し、前記変換した動画像データがモニタ１９上に再生表示される。
音声復号器２１は、音声透かし抽出器２０より入力される音声のみのビットストリームを音声データに変換し、前記変換した音声データがスピーカ２２から発音される。
【００４６】
（第３実施例）図３は、本発明の第３の実施の形態における音声画像処理装置の構成を示すブロック図である。なお、前記第１の実施の形態で示した図１と同様の構成要素については同一の番号を付してあり、その詳細な説明を省略している。
【００４７】
図３において、２３は本実施の形態の音声画像処理装置である。音声画像処理装置２３に配設されている２４は、画像符号化データと音声符号化データとを多重化するための多重化器である。また、２５は多重化器２４で１本に多重化されたストリームを保存するための記録装置である。
【００４８】
画像透かし挿入器６、及び音声透かし挿入器１１によって、透かしデータが画像符号化データ及び音声符号化データにそれぞれ埋め込まれるまでの動作は、第１の実施の形態における音声画像処理装置１の動作と同様に行われる。
【００４９】
なお、多重化器２４が画像符号化データと音声符号化データとを１本のストリームに多重化する際に、連続した時間情報をビットストリームに入れる場合もある。その場合、画像符号化データまたは音声符号化データに透かしとして入れた時間情報と、ビットストリームに入れた時間情報とが異なっていてもよい。何故ならば、透かしに入れる時間情報は、音声データと画像データとを同期するために用いる情報であって、経過時間を知るための情報とは異なるので、厳密に連続する時間情報である必要はないためである。
【００５０】
本実施の形態の音声画像処理装置によれば、前記一連のデータ生成動作により、記録媒体に記録された画像フレーム及び音声フレームを読み出して再生する際に、画像データと音声データとを同期するために必要な符号長に関する情報を、１本に多重化したビットストリームに透かしとして埋め込むため、再生側でのフレーム操作（同期操作）を容易に行なうようにすることができる。
【００５１】
なお、本発明の目的は、第１〜第３の実施の形態の音声画像処理装置の各部または全部の機能を実現するソフトウェアのプログラムコードを記憶した記録媒体を、システム或いは装置に供給し、そのシステム或いは装置のコンピュータ（ＣＰＵ等）が記録媒体に格納されたプログラムコードを読みだして実行することによっても達成される。
【００５２】
この場合、記録媒体から読み出されたプログラムコード自体が第１〜第３の実施の形態の機能を実現することとなり、そのプログラムコードを記録した記録媒体及び前記プログラムコードは本発明を構成することとなる。
プログラムコードを供給するための記録媒体としては、ＲＯＭ、フレキシブルディスク、ハードディスク、光ディスク、光磁気ディスク、ＣＤ−ＲＯＭ、ＣＤ−Ｒ、磁気テープ、不揮発性のメモリカード等を用いることができる。
また、コンピュータが読みだしたプログラムコードを実行することにより、第１〜第３の実施の形態の機能が実現されるだけでなく、そのプログラムコードの指示に基づき、コンピュータ上で稼動しているＯＳ等が実際の処理の一部又は全部を行い、その処理によって第１〜第３の実施の形態の機能が実現される場合も含まれることは言うまでもない。
さらに、記録媒体から読み出されたプログラムコードが、コンピュータに挿入された拡張機能ボードやコンピュータに接続された機能拡張ユニットに備わるメモリに書き込まれた後、そのプログラムコードの指示に基づき、その機能拡張ボードや機能拡張ユニットに備わるＣＰＵなどが実際の処理の一部又は全部を行い、その処理によって第１〜第３の実施の形態の機能が実現される場合も含まれることは言うまでもない。
図４は、前記コンピュータシステム３１０の一例を示したものである。
図４において、メモリ３０１には、音声画像処理装置全体を制御し、各種ソフトウェアを動作させるためのＯＳ、音声画像を符号化する符号化ソフトウェア、及び透かしデータの生成と埋め込みを行なうための透かし埋め込みソフトウェア等が格納されている。
【００５３】
また、符号化データ生成の際に画像データを格納する画像エリア、生成された符号や透かしが埋め込まれた符号化データを格納する符号エリア、及び各種演算のパラメータ等を格納するワーキングエリアが存在する。
【００５４】
このような構成において、端末３０３から指示されたマイク３０５、及びカメラ３０６を介して、画像（動画像）信号及び音声信号が入力される。そして、符号化された画像データ及び音声データは、記録部（記録媒体等）３０４に格納される。また、前記画像データ及び音声データの再生時には、記録部３０４に格納された画像データ及び音声データが読み出されて復号化される。この復号化された画像データはモニタ３０７に出力され、復号化された音声データはスピーカ３０８に出力される。
【００５５】
本発明の実施態様の例を以下に列挙する。
〔実施態様１〕画像データ及び前記音声データに時間情報を透かしとしてそれぞれ埋め込む時間情報埋め込み手段を有することを特徴とする音声画像処理装置。
【００５６】
〔実施態様２〕画像データを入力する画像入力手段と、音声データを入力する音声入力手段と、前記画像入力手段により入力された画像データのフレーム、及び前記音声入力手段により入力された音声データに、それぞれ、時間情報を透かしとして埋め込む時間情報埋め込み手段とを有することを特徴とする音声画像処理装置。
【００５７】
〔実施態様３〕前記時間情報が透かしとして埋め込まれた画像データと音声データとを多重化するための多重化手段を有し、前記多重化手段は、前記画像データ及び前記音声データのそれぞれに埋め込まれた時間情報を基にして前記画像データ及び前記音声データを一本のビットストリームとして生成することを特徴とする実施態様２に記載の音声画像処理装置。
【００５８】
〔実施態様４〕時間情報が透かしデータとして埋め込まれた画像データ及び音声データを再生するための音声画像処理装置であって、前記画像データを記録した第１の記録媒体、及び前記音声データを記録した第２の記録媒体にアクセスして、前記画像データの時間情報及び前記音声データの時間情報を対応させて、前記画像データ及び前記音声データを選択して一連のビットストリームを生成するようにことを特徴とする音声画像処理装置。
【００５９】
〔実施態様５〕時間情報が透かしデータとして埋め込まれた画像データを入力する画像データ入力手段と、時間情報が透かしデータとして埋め込まれた音声データを入力する音声データ入力手段と、前記画像データ入力手段により入力された画像データから第１の時間情報を抽出する画像透かし抽出手段と、前記音声データ入力手段により入力された音声データから第２の時間情報を抽出する音声透かし抽出手段とを有することを特徴とする音声画像処理装置。
【００６０】
〔実施態様６〕前記画像透かし抽出手段により抽出された第１の時間情報と、前記音声透かし抽出手段により抽出された第２の時間情報とを比較する時間情報比較手段と、前記時間情報比較手段による比較結果に基づいて、記録媒体から画像データ及び音声データを読み出す際の制御情報を生成する制御情報発生手段と、前記制御情報発生手段により生成された制御情報に基づいて、前記記録媒体から画像データ及び音声データを読み出して、前記画像データ及び前記音声データよりなる一連のビットストリームを生成するビットストリーム生成手段とを有することを特徴とする実施態様５に記載の音声画像処理装置。
【００６１】
〔実施態様７〕画像データ及び前記音声データに時間情報を透かしとしてそれぞれ埋め込む時間情報埋め込み手順を有することを特徴とする音声画像処理方法。
【００６２】
〔実施態様８〕画像データを入力する画像入力手順と、音声データを入力する音声入力手順と、前記画像入力手順により入力された画像データのフレーム、及び前記音声入力手順により入力された音声データに、時間情報を透かしとしてそれぞれ埋め込む時間情報埋め込み手順とを有することを特徴とする音声画像処理方法。
【００６３】
〔実施態様９〕時間情報が透かしデータとして埋め込まれた画像データ及び音声データを再生するための音声画像処理方法であって、前記画像データを記録した第１の記録媒体、及び前記音声データを記録した第２の記録媒体にアクセスして、前記画像データの時間情報及び前記音声データの時間情報を対応させて、前記画像データ及び前記音声データを選択して一連のビットストリームを生成するようにすることを特徴とする音声画像処理方法。
【００６４】
〔実施態様１０〕時間情報が透かしデータとして埋め込まれた画像データを入力する画像データ入力手順と、時間情報が透かしデータとして埋め込まれた音声データを入力する音声データ入力手順と、前記画像データ入力手順により入力された画像データから第１の時間情報を抽出する画像透かし抽出手順と、前記音声データ入力手順により入力された音声データから第２の時間情報を抽出する音声透かし抽出手順とを有することを特徴とする音声画像処理方法。
【００６５】
〔実施態様１１〕コンピュータに画像データ及び前記音声データに時間情報を透かしとしてそれぞれ埋め込む時間情報埋め込み手順を実行させるためのコンピュータプログラム。
【００６６】
〔実施態様１２〕コンピュータに、画像データを入力する画像入力手順と、音声データを入力する音声入力手順と、前記画像入力手順により入力された画像データのフレーム、及び前記音声入力手順により入力された音声データに、時間情報を透かしとしてそれぞれ埋め込む時間情報埋め込み手順とを実行させるためのコンピュータプログラム。
【００６７】
〔実施態様１３〕時間情報が透かしデータとして埋め込まれた画像データ及び音声データを再生するための音声画像処理方法を実行させるためのコンピュータプログラムであって、前記画像データを記録した第１の記録媒体、及び前記音声データを記録した第２の記録媒体にアクセスして、前記画像データの時間情報及び前記音声データの時間情報を対応させて、前記画像データ及び前記音声データを選択して一連のビットストリームを生成するようにすることを特徴とするプログラム。
【００６８】
〔実施態様１４〕時間情報が透かしデータとして埋め込まれた画像データを入力する画像データ入力手順と、時間情報が透かしデータとして埋め込まれた音声データを入力する音声データ入力手順と、前記画像データ入力手順により入力された画像データから第１の時間情報を抽出する画像透かし抽出手順と、前記音声データ入力手順により入力された音声データから第２の時間情報を抽出する音声透かし抽出手順とを実行させるためのコンピュータプログラム。
【００６９】
〔実施態様１５〕上記実施態様１１〜１４の何れか１項に記載のコンピュータプログラムを記憶したことを特徴とするコンピュータ読み取り可能な記憶媒体。
【００７０】
【発明の効果】
以上説明したように、本発明によれば、画像データ及び音声データに時間情報を透かしとしてそれぞれ埋め込むようにしたので、ファイルフォーマット形式に依存することなく画像データ及び音声データを再生することが可能となり、画像と音声との同期を容易かつ迅速に行なうようにすることができる。
【００７１】
また、音声と画像をそれぞれ別のツールで編集したり、異なるファイルフォーマット間でデータ変換を複数回行なったりする場合でも、時間情報が消失することを防止して、画像と音声の同期が適切になされた再生処理を行なうようにすることができる。また、音声データと画像データを、異なるファイルフォーマット形式によってそれぞれ記録した場合でも、それぞれの時間情報に基づいて同期のとれた再生を行なうようにすることができる。
【図面の簡単な説明】
【図１】本発明の第１の実施の形態の音声画像処理装置の構成を示すブロック図である。
【図２】本発明の第２の実施の形態の音声画像処理装置の構成を示すブロック図である。
【図３】本発明の第３の実施の形態の音声画像処理装置の構成を示すブロック図である。
【図４】音声画像処理装置の機能をコンピュータに実現させるためのプログラムをコンピュータ読み取り可能な記録媒体から読み出して実行する場合のシステム構成を示すブロック図である。
【符号の説明】
１音声画像処理装置
２カメラ
３画像符号化器
４同期情報発生器
５画像透かし生成器
６画像透かし挿入器
７画像記録装置
８マイク
９音声符号化器
１０音声透かし生成器
１１音声透かし挿入器
１２音声記録装置[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to an audio / video processing apparatus, and is particularly suitable for use in a synchronization process performed when audio data and moving image data are edited by different tools.
[0002]
[Prior art]
As an encoding method of moving image data, an intra-frame encoding method, an inter-frame prediction encoding method, and the like are known.
Examples of the intra-frame coding method include coding methods such as Motion JPEG (Joint Photographic Coding Experts Group) and Digital Video.
[0003]
On the other hand, as the inter-frame prediction coding, for example, H.264 is used. 261, H .; H.263, MPEG (Moving Picture Coding Experts Group) -1, MPEG-2, and MPEG-4.
[0004]
The standards of the two coding systems are internationally standardized by ISO (International Organization for Standardization) and ITU (International Telecommunication Union).
[0005]
With the spread of such digital encoding standards, a file format for uniformly handling the encoding method on a computer has been formulated. For example, the file format is defined in the MPEG-4 standard. In addition, many file formats defined depending on the OS of the computer and the configuration of the network are widely used.
[0006]
Under these circumstances, the content industry has come to raise the issue of copyright protection. In response to this problem, a digital watermark technology for encrypting information related to security has been developed.
[0007]
This digital watermarking technique refers to a technique in which, when reproducing image data, the image data does not change, or a change necessary for protecting copyright at a level at which humans cannot perceive the change in the image data. It is a technology to be embedded in.
[0008]
As a technique for embedding the digital watermark data in image data, for example, a “recording medium and a moving image encoding device recording a moving image encoding program” in Patent Document 1 or a “digital watermark embedding device” in Patent Document 2 And an extraction device "have been proposed.
[0009]
As for the audio data, as a technique for embedding the digital watermark data, for example, Japanese Patent Application Laid-Open No. H10-157, “Method of embedding watermark information in audio data, watermark information embedding device, watermark information detecting device, watermark information embedded A “recording medium and a recording medium on which a method for embedding watermark information is recorded”, or “a digital watermark embedding device, an audio encoding device, and a recording medium” in Patent Document 4 have been proposed.
[0010]
[Patent Document 1]
JP-A-10-243398
[Patent Document 2]
JP-A-11-341450
[Patent Document 3]
JP 2001-22089 A
[Patent Document 4]
JP-A-11-316599
[0011]
[Problems to be solved by the invention]
There are various types of frame synchronization and frame control for image data. In a conventional file format for uniformly handling the above-described encoding method on a computer, a different frame synchronization (frame control) method is adopted for each file format.
[0012]
For example, AVI files or QuickTime files, which are the same Motion JPEG files, have different methods of adding a time stamp for frame synchronization.
[0013]
For this reason, when the file format conversion is performed between the same image files with the same method of giving the time stamp, it is relatively easy to synchronize the image and the sound recorded in the image file in the converted image file. However, when the file formats are converted between image files having different time stamping methods, the corresponding time information is lost between the image files before and after the conversion, so that the images recorded in the image files are lost. There is a problem that it is very difficult to synchronize the sound.
[0014]
SUMMARY OF THE INVENTION In view of the above problems, an object of the present invention is to make it possible to satisfactorily perform synchronous reproduction of an image and a sound even if a plurality of image data files having different formats are mutually converted. And
[0015]
[Means for Solving the Problems]
The audio image processing apparatus according to the present invention is characterized by having time information embedding means for embedding time information as a watermark in image data and the audio data, respectively.
[0016]
That is, the audio image processing apparatus of the present invention embeds time information as watermark data in image data and audio data, and synchronizes an image and audio based on the embedded time information.
[0017]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, an example of an embodiment of a sound image processing device of the present invention will be described in detail with reference to the drawings.
(First Embodiment)
FIG. 1 is a block diagram illustrating a configuration of the audio-video processing device 1 according to the first embodiment of the present invention.
[0018]
In FIG. 1, reference numeral 1 denotes an audio / video processing device according to the present embodiment. Reference numeral 2 denotes a camera for continuously acquiring an image signal obtained by imaging a subject in frame units. Reference numeral 8 denotes a microphone for continuously acquiring audio signals in frame units.
[0019]
Reference numeral 3 denotes an image encoder for encoding an image signal acquired by the camera 2 in frame units to generate image encoded data. Reference numeral 9 denotes an audio encoder for encoding the audio signal obtained by the microphone 8 in frame units to generate encoded audio data.
[0020]
Reference numeral 4 denotes a synchronization information generator that generates time information for synchronizing the above-described encoded image data and encoded audio data. Reference numeral 5 denotes an image watermark generator for generating image watermark data from time information generated by the synchronization information generator 4.
[0021]
Reference numeral 6 denotes an image watermark inserter for embedding the image watermark data generated by the image watermark generator 5 into the image encoded data generated by the image encoder 3. Reference numeral 7 denotes an image recording device for recording the encoded image data generated by the image watermark inserter 6.
[0022]
Reference numeral 10 denotes an audio watermark generator for generating audio watermark data from the time information output from the synchronization information generator 4. Reference numeral 11 denotes an audio watermark inserter for embedding the audio watermark data generated by the audio watermark generator 10 into the encoded audio data generated by the audio encoder 9. Reference numeral 12 denotes an audio recording device for recording the encoded audio data generated by the audio watermark inserter 11. The time information embedding means is configured by the image watermark inserter 6 and the audio watermark inserter 11, respectively. However, the image watermark inserter 6 and the audio watermark inserter 11 are combined into one watermark inserter. Alternatively, time information embedding means may be configured.
[0023]
The operation of processing the audio image data by the audio image processing device 1 according to the present embodiment configured as described above will be described below.
First, an image signal obtained by capturing an image of a subject with the camera 2 is input to an image encoder 3 provided in the audio image processing device 1 frame by frame.
[0024]
The image encoder 3 performs an encoding process based on a predetermined algorithm on the image signal input from the camera 2 to generate image encoded data, and outputs the generated image encoded data to the image watermark inserter 6. Output to
[0025]
Next, the image watermark inserter 6 generates data in which the image watermark data input from the image watermark generator 5 is embedded in the image encoded data input from the image encoder 3 described above. Then, the image recording device 7 records the encoded image data (the encoded image data in which the image watermark data is embedded) generated by the image watermark inserter 6.
[0026]
If the file format of the encoded image data recorded in the image recording device 7 has a function of storing time information for synchronizing the image data and the audio data, the process of embedding the image watermark data described above is omitted. You may.
[0027]
Similarly to the image signal described above, the audio signal recorded by the microphone 8 is input to the audio encoder 9 one frame at a time. The audio encoder 9 performs encoding processing based on a predetermined algorithm on an audio signal input from the microphone 8 to generate audio encoded data, and outputs the generated audio encoded data to the audio watermark inserter. 11 is output.
[0028]
When time information for synchronizing image data and audio data is input from the synchronization information generator 4, the audio watermark generator 10 generates audio watermark data based on the time information, and outputs the audio watermark data. Output to the watermark inserter 11.
[0029]
The audio watermark inserter 11 generates data in which the audio watermark data input from the audio watermark generator 10 is embedded in the audio encoded data input from the audio encoder 9 described above. Then, the encoded audio data (the encoded audio data in which the audio watermark data is embedded) generated by the audio watermark inserter 11 is recorded in the audio recording device 12.
[0030]
In general, an editor or file format dedicated to audio data often does not have a function for storing time information or a recording area for storing time information. For this reason, it is necessary to perform processing for embedding audio watermark data in the above-mentioned audio encoded data.
[0031]
The encoded data in which the watermark data for the image is embedded and the encoded data in which the watermark data for the audio are embedded are stored in predetermined recording areas of the image recording device 7 and the audio recording device 12, respectively.
[0032]
According to the audio image processing device 1 of the present embodiment, the image data recorded in the image recording device 7 and the audio data recorded in the audio recording device 12 are framed by the series of data generation operations. When reading and merging and reproducing the image data and the audio data, the synchronization information necessary for merging the data is embedded as watermark data in the original image data and the audio data. ) The operation can be performed easily.
[0033]
(Second embodiment)
FIG. 2 is a block diagram illustrating a configuration of the audio-video processing device according to the second embodiment of the present invention.
[0034]
In FIG. 2, reference numeral 13 denotes an audio / video processing apparatus according to the present embodiment. Reference numeral 7 denotes an image recording device, which records the encoded image data (the encoded image data in which the image watermark data is embedded) described in the first embodiment.
Reference numeral 12 denotes an audio recording device which records the encoded audio data (the encoded audio data in which the audio watermark data is embedded) described in the first embodiment.
[0035]
Reference numeral 15 denotes an image watermark extractor for extracting image watermark data from the encoded image data recorded in the image recording device 7. Reference numeral 18 denotes an image decoder for decoding image encoded data in frame units. Reference numeral 20 denotes an audio watermark extractor for extracting audio watermark data from encoded audio data recorded in the audio recording device 12. Reference numeral 21 denotes an audio decoder for decoding encoded audio data on a frame basis.
[0036]
Reference numeral 16 denotes a control information generator that generates and outputs control information for controlling the operation of the device based on time information recorded in the audio encoded data and the image encoded data, respectively.
[0037]
Reference numeral 17 denotes a controller for controlling the operation of extracting predetermined data from the image recording device 7 and the audio recording device 12 based on the control information output from the control information generator 16.
A monitor 19 displays image data decoded and reproduced by the image decoder 18. Reference numeral 22 denotes a speaker that emits audio data decoded and reproduced by the audio decoder 21.
[0038]
The operation of processing the audio image data by the audio image processing device 13 configured as described above will be described below.
[0039]
First, the image watermark extractor 15 reads out the encoded image data recorded in the image recording device 7 and extracts the image watermark data embedded therein. Similarly, the audio / video watermark extractor 20 reads the encoded audio data recorded in the audio recording device 12 and extracts the audio watermark data embedded therein.
[0040]
Next, the image watermark extractor 15 and the audio image watermark extractor 20 output the extracted watermark data to the control information generator 16, respectively. The control information generator 16 to which the watermark data has been input compares the time information included in the extracted image watermark data with the time information included in the audio watermark data. Then, control information necessary for extracting the next data is output to the controller 17 based on the comparison result.
[0041]
The controller 17 accesses the image recording device 7 and the audio recording device 12 based on the control information input from the control information generator 16, and controls the corresponding bit streams (image encoded data and audio encoded data). Is extracted.
[0042]
Next, the image watermark extractor 15 and the audio watermark extractor 20 repeatedly extract the audio watermark data and the image watermark data from the extracted bit stream, and repeatedly obtain synchronization time information.
[0043]
When the time information of the encoded image data and the time information of the audio encoded data are compared by the control information generator 16 described above, for example, if the time information of the encoded image data is advanced, It generates information for temporarily stopping the extraction of the bit stream for use, and outputs it to the controller 17. In this way, the read timings of the image encoded data and the audio encoded data are synchronized.
[0044]
If the recording format of the image recording device 7 and the audio recording device 12 is a format including time information, the time information may be obtained without extracting the watermark data.
[0045]
The image decoder 18 receives the bit stream of only the image data from the image watermark extractor 15 and converts it into moving image data, and the converted moving image data is reproduced and displayed on the monitor 19.
The audio decoder 21 converts the audio-only bit stream input from the audio watermark extractor 20 into audio data, and the converted audio data is emitted from the speaker 22.
[0046]
Third Embodiment FIG. 3 is a block diagram illustrating a configuration of an audio-video processing device according to a third embodiment of the present invention. The same components as those in FIG. 1 shown in the first embodiment are denoted by the same reference numerals, and detailed description thereof is omitted.
[0047]
In FIG. 3, reference numeral 23 denotes an audio / video processing apparatus according to the present embodiment. Reference numeral 24 provided in the audio / video processing device 23 is a multiplexer for multiplexing the encoded image data and the encoded audio data. Reference numeral 25 denotes a recording device for storing the streams multiplexed by the multiplexer 24 into one stream.
[0048]
The operation until the watermark data is embedded in the image encoded data and the audio encoded data by the image watermark inserter 6 and the audio watermark inserter 11, respectively, is the same as the operation of the audio image processing apparatus 1 in the first embodiment. The same is done.
[0049]
When the multiplexer 24 multiplexes the encoded image data and the encoded audio data into one stream, continuous time information may be included in the bit stream. In that case, the time information inserted as a watermark in the image encoded data or the audio encoded data may be different from the time information inserted in the bit stream. Because the time information to be included in the watermark is information used for synchronizing the audio data and the image data, and is different from information for knowing the elapsed time, it is not necessary to be strictly continuous time information. Because there is no.
[0050]
According to the audio-image processing apparatus of the present embodiment, when reading and reproducing the image frames and the audio frames recorded on the recording medium by the series of data generation operations, the image data and the audio data are synchronized. Since the information about the code length necessary for the information is embedded as a watermark in a single multiplexed bit stream, it is possible to easily perform a frame operation (synchronous operation) on the reproduction side.
[0051]
An object of the present invention is to supply a recording medium storing software program codes for realizing each part or all functions of the audio-video processing apparatus according to the first to third embodiments to a system or an apparatus, and It is also achieved by a computer (CPU or the like) of the system or apparatus reading and executing the program code stored in the recording medium.
[0052]
In this case, the program code itself read from the recording medium implements the functions of the first to third embodiments, and the recording medium on which the program code is recorded and the program code constitute the present invention. It becomes.
As a recording medium for supplying the program code, a ROM, a flexible disk, a hard disk, an optical disk, a magneto-optical disk, a CD-ROM, a CD-R, a magnetic tape, a nonvolatile memory card, or the like can be used.
The functions of the first to third embodiments are realized by executing the program code read by the computer, and the OS running on the computer based on the instruction of the program code. Needless to say, this includes a case where the functions of the first to third embodiments are implemented by performing part or all of the actual processing.
Further, after the program code read from the recording medium is written into the memory provided in the extension function board inserted into the computer or the function extension unit connected to the computer, the function extension is performed based on the instruction of the program code. It goes without saying that a CPU or the like provided in the board or the function expansion unit performs part or all of the actual processing, and the processing realizes the functions of the first to third embodiments.
FIG. 4 shows an example of the computer system 310.
In FIG. 4, an OS for controlling the entire audio-video processing apparatus and operating various software, encoding software for encoding audio and video, and watermark embedding for generating and embedding watermark data are stored in a memory 301. Software and the like are stored.
[0053]
Further, there are an image area for storing image data when generating encoded data, a code area for storing encoded data in which generated codes and watermarks are embedded, and a working area for storing parameters of various operations and the like. .
[0054]
In such a configuration, an image (moving image) signal and an audio signal are input via the microphone 305 and the camera 306 designated by the terminal 303. Then, the encoded image data and audio data are stored in the recording unit (recording medium or the like) 304. When the image data and the audio data are reproduced, the image data and the audio data stored in the recording unit 304 are read and decoded. The decoded image data is output to the monitor 307, and the decoded audio data is output to the speaker 308.
[0055]
Examples of embodiments of the present invention are listed below.
[Embodiment 1] An audio / video processing apparatus comprising time information embedding means for embedding time information as watermarks in image data and audio data, respectively.
[0056]
[Embodiment 2] Image input means for inputting image data, audio input means for inputting audio data, frames of image data input by the image input means, and audio data input by the audio input means And a time information embedding means for embedding time information as a watermark.
[0057]
[Embodiment 3] A multiplexing unit for multiplexing the image data and the audio data in which the time information is embedded as the watermark is provided, and the multiplexing unit is embedded in each of the image data and the audio data. The audio image processing apparatus according to claim 2, wherein the image data and the audio data are generated as one bit stream based on the obtained time information.
[0058]
[Embodiment 4] An audio image processing apparatus for reproducing image data and audio data in which time information is embedded as watermark data, wherein a first recording medium on which the image data is recorded and the audio data are recorded Accessing the second recording medium to associate the time information of the image data with the time information of the audio data, select the image data and the audio data, and generate a series of bit streams. A sound image processing apparatus characterized by the above-mentioned.
[0059]
[Embodiment 5] Image data input means for inputting image data in which time information is embedded as watermark data, audio data input means for inputting audio data in which time information is embedded as watermark data, and the image data input means Image watermark extracting means for extracting first time information from the image data input by the above-mentioned method, and audio watermark extracting means for extracting the second time information from the audio data input by the audio data input means. A featured audio / video processing device.
[0060]
[Embodiment 6] Time information comparing means for comparing the first time information extracted by the image watermark extracting means with the second time information extracted by the audio watermark extracting means, and the time information comparing means Control information generating means for generating control information for reading image data and audio data from a recording medium based on the comparison result by the control information generating means; and an image from the recording medium based on the control information generated by the control information generating means. The audio image processing apparatus according to claim 5, further comprising: a bit stream generating unit configured to read data and audio data and generate a series of bit streams including the image data and the audio data.
[0061]
[Embodiment 7] An audio image processing method comprising a time information embedding procedure for embedding time information as a watermark in image data and the audio data, respectively.
[0062]
[Embodiment 8] An image input procedure for inputting image data, a voice input procedure for inputting audio data, a frame of image data input by the image input procedure, and an audio data input by the audio input procedure are described. And a time information embedding procedure for embedding each time information as a watermark.
[0063]
[Embodiment 9] An audio image processing method for reproducing image data and audio data in which time information is embedded as watermark data, wherein the first recording medium on which the image data is recorded and the audio data are recorded And accessing the second recording medium to associate the time information of the image data with the time information of the audio data, select the image data and the audio data, and generate a series of bit streams. A sound image processing method characterized by the above-mentioned.
[0064]
[Embodiment 10] An image data input procedure for inputting image data with time information embedded as watermark data, an audio data input procedure for inputting audio data with time information embedded as watermark data, and the image data input procedure And an audio watermark extraction procedure for extracting second time information from the audio data input by the audio data input procedure. A featured audio / video processing method.
[0065]
[Embodiment 11] A computer program for causing a computer to execute a time information embedding procedure for embedding time information as a watermark in image data and audio data, respectively.
[0066]
[Embodiment 12] An image input procedure of inputting image data, a voice input procedure of inputting audio data, a frame of image data input by the image input procedure, and a computer input by the audio input procedure A computer program for executing a time information embedding procedure for embedding time information as a watermark in audio data.
[0067]
[Thirteenth Embodiment] A computer program for executing an audio image processing method for reproducing image data and audio data in which time information is embedded as watermark data, wherein the first recording medium stores the image data. And accessing a second recording medium on which the audio data is recorded, selecting the image data and the audio data according to the time information of the image data and the time information of the audio data, and selecting a series of bits. A program for generating a stream.
[0068]
[Embodiment 14] An image data input procedure for inputting image data with time information embedded as watermark data, an audio data input procedure for inputting audio data with time information embedded as watermark data, and the image data input procedure For extracting the first time information from the image data input by the above-described method, and the audio watermark extracting procedure for extracting the second time information from the audio data input by the audio data input procedure. Computer programs.
[0069]
[Embodiment 15] A computer-readable storage medium storing the computer program according to any one of Embodiments 11 to 14.
[0070]
【The invention's effect】
As described above, according to the present invention, since time information is embedded as watermarks in image data and audio data, it is possible to reproduce image data and audio data without depending on the file format. In addition, it is possible to easily and quickly synchronize the image and the sound.
[0071]
Also, when editing audio and images with different tools, or performing multiple data conversions between different file formats, it is possible to prevent the loss of time information and properly synchronize images and audio. The performed reproduction process can be performed. Even when audio data and image data are recorded in different file formats, synchronized reproduction can be performed based on the respective time information.
[Brief description of the drawings]
FIG. 1 is a block diagram illustrating a configuration of an audio-video processing device according to a first embodiment of the present invention.
FIG. 2 is a block diagram illustrating a configuration of a sound image processing device according to a second embodiment of the present invention.
FIG. 3 is a block diagram illustrating a configuration of a sound image processing device according to a third embodiment of the present invention.
FIG. 4 is a block diagram showing a system configuration in a case where a program for causing a computer to realize the functions of the audio-video processing apparatus is read from a computer-readable recording medium and executed.
[Explanation of symbols]
1 audio / video processing device
2 Camera
3 Image encoder
4 Synchronization information generator
5 Image watermark generator
6. Image watermark inserter
7 Image recording device
8 microphone
9 Speech encoder
10 Audio watermark generator
11 Voice Watermark Inserter
12 Audio recording device

Claims

An audio / video processing apparatus comprising time information embedding means for embedding time information as watermarks in image data and audio data, respectively.