符号化方式としては、例えばMPEG2(Moving Pictures Experts Group 2)方式においてフレーム内符号化によるIピクチャのみでビデオデータを構成する符号化方式や、Iピクチャと、予測符号化によるPピクチャおよびBピクチャとによりビデオデータを構成する符号化方式を1枚のディスクに混在可能とされる。勿論、MPEG2方式以外の符号化方式を混在させることも可能である。
なお、上述の、Iピクチャのみでビデオデータを構成する符号化方式においては、ランダムアクセスの単位であるGOP(Group Of Picture)は、一枚のIピクチャで構成される。この方式を、以下、便宜上「シングルGOP方式」と称する。この発明の実施の一形態では、このシングルGOP方式は、MPEG2の4:2:2プロファイルが適用される。また、I、PおよびBピクチャを用いてビデオデータを構成する符号化方式では、GOPは、Iピクチャで完結し、1または複数のPおよびBピクチャを含む。以下では、便宜上、複数フレームからGOPが構成されるこの方式を、「ロングGOP方式」と称する。
ビデオデータについては、一例として、上述のシングルGOP方式におけるビットレートモード30Mbps(Mega bit per second)、40Mbpsおよび50Mbpsのビデオデータが1枚のディスクに混在可能とされ、ロングGOPにおけるビットレートモード25Mbpsのビデオデータがさらに1枚のディスクに混在可能とされる。シングルGOPやロングGOPで、さらに他のビットレートモードを混在させてもよい。
また、ビデオデータに関して、走査方式はインタレース方式およびプログレッシブ方式のデータを1枚のディスクに混在可能とされ、それぞれの方式において複数のフレームレートのデータを1枚のディスクに混在可能とされる。画面サイズでは、アスペクト比が4:3および16:9のそれぞれのデータを1枚のディスクに混在して記録可能とされ、例えばアスペクト比が4:3であれば、標準的(SD:Standard Definision)な640画素×480ラインおよびより高精細(HD:High Definision)な1440画素×1088ラインのデータを1枚のディスクに混在可能である。アスペクト比が16:9の場合にも、複数種類の画像サイズのデータを同様に1枚のディスクに混在可能である。
オーディオデータについては、リニアPCM(Pulse Code Modulation)で符号化されたオーディオデータ(以下、リニアPCMオーディオデータと略称する)およびリニアPCM以外の符号化方式で符号化されたオーディオデータ(例えば、リニアPCMオーディオデータをさらに圧縮符号化したオーディオデータ)を1枚のディスクに混在可能である。オーディオデータは、例えば16ビットおよび24ビットといった複数種類のビット解像度に対応し、4chや8chなど、複数のチャンネル組み合わせを1枚のディスクに混在可能とされる。
時系列メタデータは、例えばタイムコード、UMID(Unique Material Identifier)、エッセンスマークが必須データとされる。さらに、撮影時におけるビデオカメラのアイリスやズーム情報といったカメラメタ情報を、時系列メタデータに含めることもできる。さらにまた、ARIB(Association of Radio Industries and Businesses)に規定される情報を時系列メタデータに含めることもできる。なお、ARIBに基づくデータおよびカメラメタ情報は、データサイズが比較的大きいので、排他的に混在させることが好ましい。カメラメタ情報およびARIBは、時間解像度を落として時分割多重で時系列メタデータに含ませることもできる。
ベーシックUMIDは、12バイトのデータ長を有する領域Universal Labelと、1バイトのデータ長を有する領域Length Valueと、3バイトのデータ長を有する領域Instance Numberと、16バイトのデータ長を有する領域Material Numberとから構成される。
領域Universal Labelは、直後から続くデータ列がUMIDであることを識別するための符号が格納される。領域Length Valueは、UMIDの長さが示される。ベーシックUMIDと拡張UMIDとでは符号の長さが異なるため、領域Lengthにおいて、ベーシックUMIDは値〔13h〕で示され、拡張UMIDは値〔33h〕で示される。なお、この括弧〔〕内の表記において、数字の後の「h」は、数字が16進表記であることを示す。領域Instance Numberは、素材データに上書き処理や編集処理が施されたか否かが示される。
領域Material Numberは、4バイトのデータ長を有する領域Time Snapと、8バイトのデータ長を有する領域Rndと、4バイトのデータ長を有する領域Machine nodeの3つの領域からなる。領域Time Snapは、1日のスナップクロックサンプル数を示す。これにより、クロック単位で素材データの作成時刻などが示される。領域Rndは、正確でない時刻をセットしたときや、例えばIEEE(Institute Electrical and Erectronic Engineers)で定義された機器のネットワークアドレスが変化したときに、番号が重複して付されないようにするためのランダムナンバである。
シグネイチャメタデータは、8バイトのデータ長を有する領域Time.Dateと、12バイトのデータ長を有する領域Spatial Co−ordinatedと、それぞれ4バイトのデータ長を有する領域Country、領域Organizationおよび領域Userとから構成される。
領域Time/Dateは、素材が生成された時間と日付とが示される。領域Spatial Co−ordinatedは、素材が生成された時間に関する補正情報(時差情報)や、緯度、経度および高度で表される位置情報とが示される。位置情報は、例えばビデオカメラにGPS(Global Positioning System)に対応する機能を設けることで取得可能である。領域Country、領域Organizationおよび領域Userは、省略されたアルファベットの文字や記号などを用いて、それぞれ国名、組織名およびユーザ名が示される。
図3は、エッセンスマークの一例のデータ構造を示す。エッセンスマークは、図2を用いて説明したように、映像シーンの特徴などがテキストデータにより表され、映像コンテンツデータ(本線系のAVデータ)と関連付けられたメタデータである。エッセンスマークは、KLV(Key Length Value)符号化されて記録や伝送がなされる。図3は、このKLV符号化されたエッセンスマークのフォーマットを示す。このフォーマットは、SMPTE 335M/RP210Aのメタデータ辞書に準拠したものである。
KLV符号化されたエッセンスマークは、16バイトのデータ長を有する「Key」部と、1バイトのデータ長を有する「L(length)」部と、最大32バイトのデータ長を有する「Value」部とからなる。「Key」部は、SMPTE 335M/RP210Aに準拠した、KLV符号化されたデータ項目を示す識別子であり、この例では、エッセンスマークであることを示す値とされる。「L」部は、「L」部以降に続くデータ長をバイト単位で表す。最大で32バイトのデータ長が表現される。「Value」部は、エッセンスマークが格納されるテキストデータからなる領域である。
次に、この発明の実施の一形態におけるデータの管理構造について、図8、図9および図10を用いて説明する。この発明の実施の一形態では、データは、ディレクトリ構造で管理される。ファイルシステムとしては例えばUDF(Universal Disk Format)が用いられ、図8に一例が示されるように、ルートディレクトリ(root)の直下にディレクトリPAVが設けられる。この実施の一形態では、このディレクトリPAV以下を定義する。
説明は図8に戻り、ファイル「INDEX.XML」は、ディレクトリPAV以下に格納された素材情報を管理するインデックスファイルである。この例では、ファイル「INDEX.XML」は、XML(Extensible Markup Language)形式で記述される。このファイル「INDEX.XML」により、上述した各クリップおよびエディットリストが管理される。例えば、ファイル名とUMIDの変換テーブル、長さ情報(Duration)、当該光ディスク1全体を再生する際の各素材の再生順などが管理される。また、各クリップに属するビデオデータ、オーディオデータ、補助AVデータなどが管理されると共に、クリップディレクトリ内にファイルで管理されるクリップ情報が管理される。
ビットレート切り換え点がクリップ分割位置となるため、例えばクリップの開始位置を指定するためのコマンドである”clip Begin”を用いて、実際のファイルの先頭位置に対するクリップの開始位置を調整する必要がある。
スピンドルモータ12は、サーボ制御部15からのスピンドルモータ駆動信号に基づいて、光ディスク1をCLV(Constant Linear Velocity)またはCAV(Constant Angler Velocity)で回転駆動する。
ピックアップ部13は、信号処理部16から供給される記録信号に基づきレーザ光の出力を制御して、光ディスク1に記録信号を記録する。ピックアップ部13はまた、光ディスク1にレーザ光を集光して照射すると共に、光ディスク1からの反射光を光電変換して電流信号を生成し、RF(Radio Frequency)アンプ14に供給する。なお、レーザ光の照射位置は、サーボ制御部15からピックアップ部13に供給されるサーボ信号により所定の位置に制御される。
制御部20は、CPU(Central Processing Unit)、ROM(Read Only Memory)やRAM(Random Access Memory)などのメモリ、これらを接続するためのバスなどからなり、このディスク記録再生装置10の全体を制御する。ROMは、CPUの起動時に読み込まれる初期プログラムや、ディスク記録再生装置10を制御するためのプログラムなどが予め記憶される。RAMは、CPUのワークメモリとして用いられる。また、制御部20により、ビデオカメラ部の制御もなされる。
また、音声信号変換部44は、データ量検出部42から供給されるオーディオ信号がリニアPCMオーディオデータでない場合、制御部20からの指示に従い、当該オーディオ信号をリニアPCMオーディオデータに変換する。これに限らず、音声信号変換部44では、オーディオ信号を、例えばMPEG方式に則った、MP3(Moving Pictures Experts Group 1 Audio Layer 3)やAAC(Advanced Audio Coding)方式などで圧縮符号化することもできる。オーディオデータの圧縮符号化方式は、これらに限定されず、他の方式でもよい。音声信号変換部44から出力されるオーディオデータのデータ系列を、メモリコントローラ17に供給する。
なお、オーディオデータの解像度は、オーディオデータが例えば放送局などで一般的に用いられる、AES/EBU(Audio Engineering Society/European Broadcasting Union)による規格に準拠したフォーマットで伝送される場合には、ヘッダの所定位置に対してビット解像度の情報が格納されるので、このデータを抜き出すことで、判定できる。また、リニアPCMオーディオデータと、ノンオーディオのオーディオデータの識別も、同様にしてヘッダ情報などから判別することができる。
As described above, the present invention provides a method for recording linear audio data encoded on a disc-shaped recording medium while the audio data changes to audio data encoded using an encoding method other than linear PCM. Since audio data encoded by an encoding method other than PCM is replaced with audio data representing silence encoded by linear PCM and recording is continued, audio data encoded by linear PCM is replaced with audio data encoded by linear PCM. During recording on a disk-shaped recording medium, even if recording is continued in a state where audio data in which audio data is encoded by an encoding method other than linear PCM is input, the audio encoded by linear PCM is not reproduced. After the data, audio data encoded by an encoding method other than linear PCM Raised it is not, it is prevented from the noise of the maximum level near after the change point of the coding method to be outputted.
Hereinafter, an embodiment of the present invention will be described. According to the present invention, a plurality of signal types (formats) of audio data and video data (hereinafter, abbreviated as AV data) are applied to one disc-shaped recording medium (hereinafter, abbreviated as a disc). AV data of a plurality of signal types are mixed and continuously recorded so that they can be continuously reproduced.
In the following, a plurality of signal types of AV data are mixed on one disk-shaped recording medium so that the plurality of signal types of AV data can be continuously reproduced. Continuous recording "is referred to as" can be mixed on one disc "as appropriate in order to avoid complexity.
First, an example of signal types (formats) of data that can be mixed on one disk in the present invention will be described.
As an encoding method, for example, in the MPEG2 (Moving Pictures Experts Group 2) method, an encoding method in which video data is composed of only I pictures by intra-frame encoding, an I picture, a P picture and a B picture by predictive encoding, and the like. Thus, it is possible to mix the encoding methods constituting video data on one disc. Of course, it is also possible to mix encoding methods other than the MPEG2 method.
Note that, in the above-described coding scheme in which video data is composed of only I pictures, a GOP (Group Of Picture), which is a unit of random access, is composed of one I picture. This system is hereinafter referred to as "single GOP system" for convenience. In the embodiment of the present invention, the 4: 2: 2 profile of MPEG2 is applied to the single GOP method. Further, in an encoding method in which video data is configured using I, P, and B pictures, a GOP is completed with an I picture and includes one or a plurality of P and B pictures. In the following, for convenience, this method in which a GOP is composed of a plurality of frames is referred to as a “long GOP method”.
As for the video data, as an example, a bit rate mode of 30 Mbps (Mega bit per second), video data of 40 Mbps and 50 Mbps in the above-mentioned single GOP method can be mixed on one disc, and a bit rate mode of 25 Mbps in a long GOP can be mixed. Video data can be further mixed on one disk. Other bit rate modes may be mixed in a single GOP or a long GOP.
The bit rate mode is a mode in which video data is compression-coded so that the bit rate value indicated in the bit rate mode becomes the maximum value. For example, video data in a bit rate mode of 50 Mbps actually includes data of a bit rate of 50 Mbps or less in transmission data depending on the complexity of an image or the like. For frames with a data amount less than the bit rate indicated in the bit rate mode, the difference between the data amount and the bit rate indicated in the bit rate mode is padded with predetermined padding data to reduce the apparent bit rate. The bit rate indicated by the mode can be used.
As for the video data, interlaced data and progressive data can be mixed on one disk, and data of a plurality of frame rates can be mixed on one disk in each system. In the screen size, data with aspect ratios of 4: 3 and 16: 9 can be mixedly recorded on a single disc. For example, if the aspect ratio is 4: 3, standard (SD: Standard Definition) is used. ) 640 pixels × 480 lines and data of 1440 pixels × 1088 lines of higher definition (HD: High Definition) can be mixed on one disc. Even when the aspect ratio is 16: 9, data of a plurality of types of image sizes can be similarly mixed on one disc.
Further, the color profile is not limited to 4: 2: 2, and other formats such as 4: 2: 0 can be mixed.
For audio data, audio data encoded by linear PCM (Pulse Code Modulation) (hereinafter abbreviated as linear PCM audio data) and audio data encoded by an encoding method other than linear PCM (for example, linear PCM) Audio data obtained by further compressing and encoding audio data) can be mixed on one disc. Audio data corresponds to a plurality of types of bit resolutions, for example, 16 bits and 24 bits, and a plurality of channel combinations such as 4 ch and 8 ch can be mixed on one disc.
In one embodiment of the present invention, in addition to the above-mentioned main line system, that is, AV data to be actually broadcast or edited, auxiliary AV data and metadata corresponding to main line system AV data are further included. Recorded on the same disk.
The auxiliary AV data is audio / video data at a lower bit rate based on main-line AV data. The auxiliary AV data is generated by compressing and encoding main-line AV data such that the bit rate is reduced to, for example, several Mbps. There are a plurality of types of encoding schemes for generating auxiliary AV data, such as MPEG4. In one embodiment of the present invention, one piece of auxiliary AV data encoded by a plurality of different encoding schemes is used. Can be mixed on the same disk. Auxiliary AV data encoded using the same encoding method and using different encoding parameters can be mixed on one disc.
Metadata is high-order data related to certain data, and functions as an index for representing the contents of various data. The metadata includes time-series metadata generated along the above-described main-line AV data time-series and non-time-series metadata generated for a predetermined section such as each scene in the main-line AV data. There are two types of data.
The time-series metadata includes, for example, a time code, a UMID (Unique Material Identifier), and an essence mark as essential data. Further, camera meta information such as iris and zoom information of a video camera at the time of shooting can be included in the time-series metadata. Furthermore, information defined in ARIB (Association of Radio Industries and Businesses) can be included in the time-series metadata. Note that data based on ARIB and camera meta information have a relatively large data size, and therefore are preferably exclusively mixed. The camera meta information and ARIB may be included in time-series metadata by time division multiplexing with a reduced time resolution.
The non-time-sequential metadata includes time code and UMID change point information, information on essence marks, user bits, and the like.
The UMID will be schematically described. The UMID is an identifier that is uniquely determined for identifying video data, audio data, and other material data and is standardized by SMPTE-330M.
FIG. 1 shows a data structure of the UMID. The UMID includes a basic UMID as ID information for identifying material data, and signature metadata for identifying each content in the material data. Each of the basic UMID and the signature metadata has a data area having a data length of 32 bytes. An area having a data length of 64 bytes in which the signature metadata is added to the basic UMID is called an extended UMID.
The basic UMID is composed of an area Universal Label having a data length of 12 bytes, an area Length Value having a data length of 1 byte, an area Instance Number having a data length of 3 bytes, and an area Material Number having a data length of 16 bytes. It is composed of
In the area Universal Label, a code for identifying that a data string immediately following is a UMID is stored. The area Length Value indicates the length of the UMID. Since the code lengths of the basic UMID and the extended UMID are different, the basic UMID is indicated by the value [13h] and the extended UMID is indicated by the value [33h] in the area Length. In the notation in the brackets [], "h" after the number indicates that the number is in hexadecimal notation. The area Instance Number indicates whether material data has been overwritten or edited.
The area Material Number is composed of three areas: an area Time Snap having a data length of 4 bytes, an area Rnd having a data length of 8 bytes, and an area Machine node having a data length of 4 bytes. The area Time Snap indicates the number of snap clock samples per day. Thus, the creation time of the material data and the like are indicated in clock units. The region Rnd has a random number for preventing numbers from being assigned repeatedly when an incorrect time is set or when the network address of a device defined by, for example, IEEE (Institute for Electrical and Electronic Engineers) changes. It is.
The signature metadata has an area Time. Date, an area Spatial Co-ordered having a data length of 12 bytes, and an area Country, an area Organization, and an area User each having a data length of 4 bytes.
The area Time / Date indicates the time and date when the material was generated. The area Spatial Co-ordered indicates correction information (time difference information) related to the time when the material was generated, and position information represented by latitude, longitude, and altitude. The position information can be obtained, for example, by providing a video camera with a function corresponding to GPS (Global Positioning System). In the area Country, the area Organization, and the area User, a country name, an organization name, and a user name are indicated using abbreviated alphabetic characters and symbols.
As described above, when the extended UMID is used as described above, the data length is 64 bytes, and the capacity is relatively large for sequentially recording in chronological order. Therefore, when embedding the UMID in the time-series metadata, it is preferable to compress the UMID by a predetermined method.
The UMID has a fixed value of 10 to 13 bytes from the head as long as it is used for the application of the embodiment of the present invention. Therefore, in the embodiment of the present invention, 10 bytes from the beginning to 13 bytes from the beginning of the UMID can be omitted. When storing the UMID in the time-series metadata, it can be encoded by a predetermined method. In this case, it is preferable to use Base64 as the encoding method because the encoding result becomes an ASCII code and can be easily embedded in, for example, an XML document. Further, it is conceivable to use only the difference. For example, data generated at the same time in the same directory is given a UMID partially common. By utilizing this and using only the UMID difference, the data amount can be reduced.
The essence mark will be schematically described. The essence mark represents an index associated with video scene data, which is a video scene (or cut) formed, for example, in video data at the time of shooting. By using the essence mark, it is possible to grasp what kind of scene is after shooting without performing a process of reproducing video scene data.
In one embodiment of the present invention, the essence mark is defined in advance as a reserved word. Therefore, for example, common control can be performed between the interfaces of the imaging device, the reproduction device, and the editing device without converting the essence mark according to the partner device.
FIG. 2 shows an example of a reserved word used to define an essence mark. Note that the example of FIG. 2 is an example, and it is possible to additionally define another essence mark. “_RecStart” is a shooting start mark indicating a recording start position. “_RecEnd” is a shooting end mark indicating a recording end position. “_ShotMark1” and “_ShotMark2” are shot marks that indicate an arbitrary position such as a point of interest. “_Cut” is a cut mark indicating a cut position. “_Flash” is a flash mark indicating a flash detection position at which a position at which a flash has been emitted is detected. “_FilterChange” is a filter change mark indicating a position at which the lens filter has been changed in the imaging apparatus. “_ShutterSpeedChange” is a shutter speed change mark indicating a position at which the shutter speed has been changed in the imaging apparatus. “_GainChange” is a gain change mark indicating a position where the gain of a filter or the like is changed. “_WhiteBalanceChange” is a white balance change mark indicating a position where the white balance has been changed. “_OverBrightness” is a mark indicating a position where the output level of the video signal has exceeded the limit value. “_OverAudioLimiter” is a loud volume mark indicating a position where the output level of the audio signal has exceeded the limit value. Each mark described above is recorded, for example, in video data frame units.
“_In-XXX” is an edit start mark indicating a cut or material cutout start position. “_Out-XXX” is an edit end mark indicating a cut or material cutout end position. The edit start mark and the edit end mark are sequentially numbered in the portion of “XXX” in the numbers and alphabets each time an edit start point (IN point) and an edit end point (OUT point) are added. For example, “_In-001”, “_In-002”,...
By using the essence mark defined as above as index information at the time of the rough editing process, it is possible to efficiently select a target video scene.
FIG. 3 shows a data structure of an example of the essence mark. As described with reference to FIG. 2, the essence mark is metadata in which characteristics of a video scene and the like are represented by text data and are associated with video content data (main-line AV data). The essence mark is recorded and transmitted after being encoded by KLV (Key Length Value). FIG. 3 shows the format of the KLV-coded essence mark. This format is based on the metadata dictionary of SMPTE 335M / RP210A.
The KLV-encoded essence mark includes a “Key” portion having a data length of 16 bytes, an “L (length)” portion having a data length of 1 byte, and a “Value” portion having a data length of up to 32 bytes. Consists of The “Key” portion is an identifier indicating a KLV-encoded data item based on SMPTE 335M / RP210A, and in this example, is a value indicating an essence mark. The “L” part indicates the data length following the “L” part in bytes. A data length of up to 32 bytes is represented. The “Value” part is an area composed of text data in which an essence mark is stored.
Next, data arrangement on a disk according to an embodiment of the present invention will be described. In one embodiment of the present invention, data is recorded such that annual rings are formed on a disk. Annual ring data is data recorded on a disc in units of a data amount indicated by the data reproduction time. For example, the description will be limited to audio data and video data of the main line, and audio data and video data corresponding to the reproduction time zone are alternately arranged for each predetermined reproduction time unit having a data size of one or more rounds of a track. And record. By performing recording in this manner, sets of audio data and video data corresponding to the reproduction time zone are layered in chronological order, and annual rings are formed.
In this embodiment, actually, in addition to audio data and video data corresponding to the playback time zone, auxiliary AV data and time-series metadata corresponding to the playback time zone are recorded as a set of these data. To form an annual ring and record data on the optical disc 1.
The data forming the annual rings is referred to as annual ring data. The annual ring data has a data amount that is an integral multiple of a sector, which is the smallest recording unit on the disk. Also, the annual rings are recorded such that their boundaries coincide with the boundaries of the sectors of the disk.
FIG. 4 shows an example in which annual ring data is formed on the optical disc 1. In the example of FIG. 4, the audio annual ring data # 1, the video annual ring data # 1, the audio annual ring data # 2, the video annual ring data # 2, the auxiliary AV annual ring data # 1, and the time-series meta data The annual ring data # 1 is recorded, and the annual ring data is handled in this cycle. On the outer circumference side of the time-series meta annual ring data # 1, a part of the annual ring data of the next cycle is further shown as audio annual ring data # 3 and video annual ring data # 3.
In the example of FIG. 4, the reproduction time zone for one annual ring data of the time-series meta annual ring data and the reproduction time zone for one annual ring data of the auxiliary AV annual ring data correspond to each other. And the reproduction time zone corresponding to two cycles of the audio annual ring data. Similarly, it shows that the playback time zone for one annual ring data of the time-series meta annual ring data corresponds to the playback time zone for two cycles of the video annual ring data. Such association between the reproduction time zone and the cycle of each annual ring data is set based on, for example, the respective data rates. It should be noted that the playback time for one annual ring data of the video annual ring data and the audio annual ring data is preferably about 1.5 seconds to 2 seconds based on experience.
FIG. 5 shows an example in which data is read from and written to the optical disc 1 on which annual rings are formed as in FIG. 4 described above. If there is a continuous space area of sufficient size on the optical disc 1 and there is no defect in the space area, each data sequence of audio data, video data and auxiliary AV data time-series metadata is determined based on the reproduction time zone. The generated audio annulus data, video annulus data, auxiliary AV annulus data, and time-series meta annulus data are written in the empty area of the optical disc 1 in a single stroke, as shown in an example in FIG. 5A. It is. At this time, the data is written such that the boundaries of any data coincide with the boundaries of the sectors of the optical disc 1. Reading of data from the optical disc 1 is performed in the same manner as at the time of writing.
On the other hand, when reading a specific data series from the optical disc 1, the operation of seeking to the recording position of the read data series and reading the data is repeated. FIG. 5B shows the manner of selectively reading the sequence of auxiliary AV data in this manner. For example, referring also to FIG. 4, when the auxiliary AV annual ring data # 1 is read, the time-series meta annual ring data # 1, the audio annual ring data # 3 and the video annual ring data # 3, and the audio annual ring data that are subsequently recorded are recorded. # 4 and video annulus data # 4 (not shown) are skipped by seek, and auxiliary AV annulus data # 2 of the next cycle is read.
As described above, by recording data on the optical disc 1 periodically as annual ring data corresponding to the playback time zone in units of the playback time, audio annual ring data and video annual ring data in the same playback time zone are obtained. Are located at a close position on the optical disk 1, so that the audio data and the video data corresponding to the reproduction time can be quickly read from the optical disk 1 and reproduced. Further, since the data is recorded so that the boundary between the annual rings and the boundary between the sectors coincide with each other, it is possible to read out only the audio data or the video data from the optical disc 1, and it is possible to quickly edit only the audio data or the video data. It becomes possible. Further, as described above, the audio annual ring data, the video annual ring data, the auxiliary AV annual ring data, and the time-series meta annual ring data have an amount of data that is an integral multiple of the sector of the optical disk 1, and furthermore, the boundary between the annual ring data and the sector. It is recorded so as to match the boundary. Therefore, when only one of the audio annual ring data, the video annual ring data, the auxiliary AV annual ring data, and the time-series meta annual ring data is required, only the necessary data is read without reading other data. Can be read.
In order to utilize the convenience of the data arrangement by the annual rings as described above, it is necessary to record the data on the optical disk 1 so that the continuity of the annual rings is ensured. This will be described with reference to FIG. For example, suppose that only the auxiliary AV annual ring data (displayed as “LR” in FIG. 6) is read.
For example, if a sufficiently large free area that is continuous at the time of recording is secured, a plurality of annual rings can be recorded continuously. In this case, as shown in FIG. 6A, temporally continuous auxiliary AV annual ring data can be read with the minimum track jump. That is, once the auxiliary AV annual ring data is read, the operation of reading the auxiliary AV annual ring data in the annual ring of the next cycle can be repeated, and the distance over which the pickup jumps is minimized.
On the other hand, for example, when a continuous free area cannot be secured at the time of recording, and temporally continuous auxiliary AV data is recorded in discrete areas on the optical disc 1, as shown in FIG. When the auxiliary AV annual ring data is read, the pickup jumps a distance corresponding to, for example, a plurality of cycles of the annual ring, and the next auxiliary AV annual ring data must be read. Since this operation is repeated, the reading speed of the auxiliary AV annual ring data is reduced as compared with the case shown in FIG. 6A. As shown in FIG. 6C, reproduction of unedited AV data (AV clip) may be delayed in mainline AV data.
Therefore, in one embodiment of the present invention, in order to guarantee the continuity of the annual rings, an allocation unit having a length corresponding to a plurality of periods of the annual rings is defined. A continuous free area with a length exceeding the defined allocation unit length is secured.
This will be described more specifically with reference to FIG. The allocation unit length is set in advance. The allocation unit length is set to a multiple of the total reproduction time of each data recorded in one cycle by the annual ring. For example, if the reproduction time corresponding to one cycle of the annual ring is 2 seconds, the allocation unit length is set to 10 seconds. This allocation unit length is used as a ruler for measuring the length of the empty area of the optical disc 1 (see the upper right of FIG. 7). In the initial state, as shown in an example of FIG. 7A, it is assumed that three used areas are arranged at intervals with respect to the optical disc 1, and a portion sandwiched between the used areas is an empty area.
When recording AV data having a certain length on the optical disc 1 and auxiliary AV data corresponding to the AV data, first, the allocation unit length is compared with the length of the empty area to determine the allocation unit length. An empty area having the above length is secured as a reserved area (FIG. 7B). In the example of FIG. 7, the free area on the right side of the two free areas is longer than the allocation unit length, and is reserved as a reserved area. Next, annual ring data is sequentially and continuously recorded in the reserved area from the beginning of the reserved area (FIG. 7C). The annual ring data is recorded in this manner, and when the length of the vacant area of the reserved area is less than the length of one cycle of the annual ring data to be recorded next (FIG. 7D), the reserved area is released, and FIG. As shown in FIG. 7A, a free area that can be made a reserved area is searched for while applying the allocation unit length to still another free area on the optical disc 1.
In this way, by searching for an empty area in which the annual rings for a plurality of cycles can be recorded and recording the annual rings in the empty area, a certain degree of continuity of the annual rings is guaranteed, and the reproduction of the annual ring data can be performed smoothly. It is possible. Although the allocation unit length is set to 10 seconds in the above description, this is not limited to this example, and a length corresponding to a longer reproduction time can be set as the allocation unit length. In practice, it is preferable to set the allocation unit length between 10 seconds and 30 seconds.
Next, a data management structure according to an embodiment of the present invention will be described with reference to FIGS. 8, 9, and 10. FIG. In one embodiment of the present invention, data is managed in a directory structure. For example, a UDF (Universal Disk Format) is used as the file system, and a directory PAV is provided immediately below a root directory (root) as shown in an example in FIG. In this embodiment, the directory PAV and below are defined.
That is, the mixed recording of audio data and video data of a plurality of signal types on one disk is defined under the directory PAV. Recording of data in the directory PAV beyond data management in the embodiment of the present invention is optional.
Immediately below the directory PAV, four files (INDEX.XML, INDEX.RSV, DISCINFO.XML and DISCINFO.RSV) are placed, and two directories (CLPR and EDTR) are provided.
The directory CLPR manages clip data. The clip referred to here is, for example, a set of data from the start of shooting to the stop thereof. For example, in the operation of the video camera, a clip from when the operation start button is pressed to when the operation stop button is pressed (the operation start button is released) is regarded as one clip.
The set of data includes the above-mentioned main audio data and video data, auxiliary AV data generated from the audio data and video data, and time-series metadata corresponding to the audio data and video data. It consists of time-series metadata. In a directory “C0001”, “C0002”,... Provided immediately below the directory CLPR, a set of data constituting the clip is stored for each clip.
FIG. 9 shows an example structure of a directory “C0001” provided immediately below the directory CLPR and corresponding to one clip “C0001”. Hereinafter, a directory corresponding to one clip immediately below the directory CLPR is appropriately referred to as a clip directory. In the clip directory “C0001”, each of the above-described group of data is stored while being distinguished by a file name. In the example of FIG. 9, the file name is composed of 12 digits, of the 8 digits preceding the delimiter ".", The first 5 digits are used to identify a clip, and the 3 digits immediately before the delimiter are audio. It is used to indicate a data type such as data, video data, and auxiliary AV data. The three digits after the delimiter are extensions, which indicate the data format.
More specifically, in the example of FIG. 9, a file “C0001C01.SMI” indicating clip information, a main line video data file “C0001V01.MXF”, and a main line 8ch audio data files “C0001A01.MXF” to “C0001A08.MXF”, auxiliary AV data file “C0001S01.MXF”, non-time-series metadata file “C0001M01.XML”, and time-series metadata file “C0001R01.BIM” ”And the pointer information file“ C0001I01.PPF ”are stored in the clip directory“ C0001 ”.
In an embodiment of the present invention, the above data signal types are allowed to be mixed between clip directories in the directory CLPR. For example, for a signal type of main line video data, a single GOP and a video data with a bit rate of 50 Mbps are stored in the clip directory “C0001”, and a video data with a long GOP and a bit rate of 25 Mbps are stored in the clip directory “C0002”. Is possible. On the other hand, mixing of data signal types in each data in the clip directory is not permitted. For example, in video data, a video data file recorded in a bit rate mode of 50 Mbps from the beginning to a certain point in time and recorded in a bit rate mode of 25 Mbps from the point in time to the end cannot be stored. .
Returning to FIG. 8, the directory EDTR manages editing information. In one embodiment of the present invention, the editing result is recorded as an edit list or a play list. In the directories “E0001”, “E0002”,... Provided immediately below the directory EDTR, a group of data constituting the editing result is stored for each editing result.
The edit list is a list in which edit points (IN points, OUT points, etc.) and reproduction order of the clip are described, and includes a non-destructive edit result of the clip and a playlist described later. When the non-destructive editing result of the edit list is played, the files stored in the clip directory are referred to according to the description of the list, and continuous editing from a plurality of clips is performed as if one edited stream was played. A reproduced video is obtained. However, as a result of the non-destructive editing, the file in the list is referred to regardless of the position of the file on the optical disk 1, and thus continuity at the time of reproduction is not guaranteed.
When it is determined based on the editing result that it is difficult to continuously play back the file or part of the file referred to by the list, the playlist or the part of the file The continuity at the time of reproduction of the edit list is assured by rearranging the edit list in the area.
Based on the result of creating the above-mentioned edit list by the editing operation, the management information of the file used for the editing (for example, an index file “INDEX.XML” described later) is referred to, and the editing result is non-destructive. It is estimated whether or not continuous playback is possible with the file referred to based on each clip directory kept as it is. As a result, when it is determined that continuous reproduction is difficult, the corresponding file is copied to a predetermined area of the optical disc 1. The file relocated to the predetermined area is called a bridge essence file. A list in which the bridge essence file is reflected in the edited result is called a playlist.
For example, if the edited result refers to a complicated clip, when playing back based on the edited result, there is a possibility that the pickup seek may not be in time when transitioning from clip to clip. is there. In such a case, a playlist is created, and the bridge essence file is recorded in a predetermined area of the optical disc 1.
FIG. 10 shows an example of a structure of a directory “E0002” provided immediately below the directory EDTR and corresponding to one editing result “E0002”. Hereinafter, a directory corresponding to one editing result immediately below the directory EDTR is appropriately referred to as an edit directory. In the edit directory “E0002”, data generated as a result of the above-described editing is stored while being distinguished from each other by a file name. The file name consists of 12 digits. Of the 8 digits before the delimiter ".", The first 5 digits are used to identify the editing operation, and the 3 digits immediately before the delimiter are used to indicate the type of data. Used. The three digits after the delimiter are extensions, which indicate the data format.
More specifically, in the example of FIG. 10, as a file constituting the editing result “E0002”, the edit list file “E0002E01.SMI” is a file “E0002M01.SMI” in which information of time-series and non-time-series metadata is described. XML ", the playlist file" E0002P01.SMI ", the bridge essence files" E0002V01.BMX "and" E0002A01.BMX "to" E0002A04.BMX "based on main line data, the bridge essence file" E0002S01.BMX "based on auxiliary AV data, and A bridge essence file “E0002R01.BMX” based on time-series and non-time-series metadata is stored in the edit directory “E0002”.
Of these files stored in the edit directory "E0002", the files indicated with shadows, that is, the bridge essence files "E0002V01.BMX" and "E0002A01.BMX" to "E0002A04.BMX", which are composed of main line data, The bridge essence file “E0002S01.BMX” based on AV data and the bridge essence file “E0002R01.BMX” based on time-series and non-time-series metadata are files belonging to a playlist.
As described above, the edit list refers to, for example, video data stored in the clip directory. Since different data signal types can be mixed between the clip directories, different data signal types can be mixed on the edit list as a result.
Returning to FIG. 8, the file “INDEX.XML” is an index file for managing material information stored under the directory PAV. In this example, the file “INDEX.XML” is described in an XML (Extensible Markup Language) format. The clip and the edit list described above are managed by the file “INDEX.XML”. For example, a conversion table of a file name and a UMID, length information (Duration), and a reproduction order of each material when reproducing the entire optical disc 1 are managed. In addition, video data, audio data, auxiliary AV data, and the like belonging to each clip are managed, and clip information managed by files in the clip directory is managed.
The file “DISCINFO.XML” manages information about the disc. Reproduction position information and the like are also stored in this file “DISCINFO.XML”.
In one embodiment of the present invention, when a predetermined change is detected in a set of data constituting a clip during a period from when shooting is started to when the shooting is stopped, the change corresponding to the change detection position is detected. The clip is divided at the position, and the part after the division position is a new clip. A new directory corresponding to the new clip is automatically created in the directory CLPR, and a set of data constituting the new clip is stored in the created directory.
Clip division is performed when a change in the signal type (format) is detected in at least one of the video data and the audio data constituting the clip. More specifically, the following example can be considered as a condition for division. First, regarding video data,
(1) Change in bit rate
(2) Changes in frame rate
(3) Change in image size
(4) Change in image aspect ratio
(5) Change in coding method
For audio data,
(1) Change in bit resolution
(2) Change in sampling frequency
(3) Change in the number of input channels
(4) Change in encoding method
When a change is detected in any one of these, the clip is automatically divided at a position corresponding to the timing when the change is detected. At this time, if a change is detected in certain data, other data belonging to the same clip as the data is also divided at the same timing.
Of course, clip division is not limited to this, and may be performed according to a change in still another attribute of video data and audio data. The clip division may be performed by detecting a predetermined change in not only video data and audio data but also auxiliary AV data and time-series metadata.
For example, the auxiliary AV data can be divided into clips when the bit rate mode or the encoding method is changed, for example. Further, as for the time-series metadata, for example, when the metadata by ARIB and the camera data are exclusively recorded, when the data type is changed between ARIB and camera data, the clip can be divided. Furthermore, when the data rate initially set for transmitting the time-series metadata is changed, clip division is also possible.
Furthermore, when the clip is divided according to the change of the main line video data, the main line audio data and the time-series metadata may not be divided. By doing so, it is possible to suppress an increase in files due to clip division. Even in this case, the auxiliary AV data is divided according to the change of the main line video data.
At the time of clip division, it is preferable to make the division boundary coincide with the GOP boundary of the auxiliary AV data, because the relationship between the time axis and the byte offset in the clip becomes simple, and the processing becomes easy. For example, when the above-described change is detected in the video data or the audio data, as shown in an example in FIG. 11A, the clip division is waited until the next GOP boundary of the auxiliary AV data (division position B). This is performed by performing clip division by going back to the previous GOP boundary (division position A). Actually, it is preferable to divide the clip at the division position B.
However, the present invention is not limited to this. If the division boundary at the time of clip division does not match the GOP boundary of the auxiliary AV data, the surplus portion of the GOP of the auxiliary AV data is filled with stuffing bytes, and the auxiliary AV data and the main line video data are filled. For example, the data amount may be made equal to other data. That is, as shown in FIG. 11B, in the auxiliary AV data, for example, the GOP immediately before the position where a change is detected in the video data is set as the last GOP of the clip, and from the rear end boundary of the last GOP. The stuffing byte is filled up to the change detection position (indicated by hatching in FIG. 11B).
If the main line video data is a single GOP, clip division can be performed at an arbitrary frame position. On the other hand, if the main line video data is a long GOP, the frame at the clip division position may be a P-picture or B-picture frame by predictive coding. Therefore, when clip division is performed on video data of a long GOP, the GOP is once completed at the clip division position. This can be achieved by, for example, converting the frame immediately before the division position into a P picture or an I picture if the frame is a B picture.
At the time of clip division, the original clip of the division and the clip newly generated by the division may have an overlapping portion. For example, in the original clip and / or the new clip of the division, the clip is divided with a margin for the timing of the change so that the change point of the signal type is temporally included.
As an example, a case where the initial bit rate of 50 Mbps is switched to 30 Mbps in main line video data will be described with reference to FIG. As shown in FIG. 12, in video data with a bit rate of 50 Mbps, the bit rate is kept at 50 Mbps for a predetermined time (diagonally shaded portion) from the position where the bit rate switching is instructed. Is recorded. On the other hand, video data having a bit rate of 30 Mbps is recorded at a bit rate of 30 Mbps from a predetermined time before the position where the bit rate switching is instructed (the hatched portion in the figure).
Since the bit rate switching point is the clip division position, it is necessary to adjust the clip start position with respect to the actual file start position using, for example, "clip Begin" which is a command for designating the clip start position. .
In such recording, as an example, in the baseband video data before the compression encoding, the hatched portions in FIG. 12 are respectively buffered, and compression encoding is performed at the corresponding bit rates. For example, in the case of video data of 50 Mbps, it is possible to add a hatched file to a file of video data before the bit rate switching point. This may be described in the edit list described above or in the file “C0001C01.SMI” indicating the clip information in the clip directory without actually adding the files.
The naming rule of the clip directory name and the file name of each file in the clip directory is not limited to the above example. For example, the UMID described above may be used as a file name or a clip directory name. As described above, considering the extended UMID, the UMID has a data length of 64 bytes and is long for use in a file name or the like. Therefore, it is preferable to use only a part of the UMID. For example, a portion in the UMID where a different value is obtained for each clip is used for a file name or the like.
When a clip is divided, it is preferable to name the clip directory or the file name so as to reflect the reason for dividing the clip from the viewpoint of clip management. In this case, the clip is named so that it can be determined at least whether the clip has been explicitly divided by the user or whether the clip has been automatically processed by the apparatus.
FIG. 13 shows an example of the configuration of a disc recording / reproducing apparatus 10 applicable to an embodiment of the present invention. Here, it is assumed that the disk recording / reproducing device 10 is a recording / reproducing unit built in a video camera (not shown), and a video signal based on an image signal captured by the video camera and an audio signal recorded along with the image capturing. Is input to the signal processing unit 31 and supplied to the disk recording / reproducing device 10. The video signal and the audio signal output from the signal input / output unit 31 are supplied to, for example, a monitor device.
Of course, this is only an example, and the disk recording / reproducing device 10 may be a device used independently. For example, it can be used in combination with a video camera having no recording unit. Video and audio signals output from the video camera, predetermined control signals, and data are input to the disk recording / reproducing device 10 via the signal input / output unit 31. For example, a video signal and an audio signal reproduced by another recording / reproducing device can be input to the signal input / output unit 31. The audio signal input to the signal input / output unit 31 is not limited to the signal input along with the imaging of the video signal. For example, after the imaging, an audio signal is recorded in a desired section of the video signal. Recording).
The spindle motor 12 drives the optical disc 1 to rotate at a constant linear velocity (CLV) or a constant angle velocity (CAV) based on a spindle motor drive signal from the servo controller 15.
The pickup unit 13 controls the output of the laser beam based on the recording signal supplied from the signal processing unit 16, and records the recording signal on the optical disc 1. The pickup unit 13 also focuses and irradiates the optical disc 1 with laser light, generates a current signal by photoelectrically converting reflected light from the optical disc 1, and supplies the current signal to an RF (Radio Frequency) amplifier 14. The irradiation position of the laser beam is controlled to a predetermined position by a servo signal supplied from the servo control unit 15 to the pickup unit 13.
The RF amplifier 14 generates a focus error signal, a tracking error signal, and a reproduction signal based on the current signal from the pickup unit 13, supplies the tracking error signal and the focus error signal to the servo control unit 15, Is supplied to the signal processing unit 16.
The servo control unit 15 controls a focus serve USA and a tracking servo operation. Specifically, the servo control unit 15 generates a focus servo signal and a tracking servo signal based on the focus error signal and the tracking error signal from the RF amplifier 14, respectively, and supplies them to an actuator (not shown) of the pickup unit 13. . Further, the servo control unit 15 generates a spindle motor drive signal for driving the spindle motor 12, and controls a spindle servo operation for rotating the optical disc 1 at a predetermined rotation speed.
Further, the servo control unit 15 performs sled control of moving the pickup unit 13 in the radial direction of the optical disc 1 to change the irradiation position of the laser light. The setting of the signal reading position of the optical disc 1 is performed by the control unit 20, and the position of the pickup unit 13 is controlled so that a signal can be read from the set reading position.
The signal processing unit 16 generates a recording signal by modulating the recording data input from the memory controller 17, and supplies the recording signal to the pickup unit 13. The signal processing unit 16 also demodulates the reproduction signal from the RF amplifier 14 to generate reproduction data, and supplies the reproduction data to the memory controller 17.
The memory controller 17 appropriately stores the recording data from the data conversion unit 19 in the memory 18 as described later, reads the data, and supplies the read data to the signal processing unit 16. The memory controller 17 also stores the read data from the signal processing unit 16 in the memory 18 as appropriate, reads out the read data, and supplies the read data to the data conversion unit 19.
A video signal and an audio signal based on the image captured by the video camera are supplied to the data conversion unit 19 via the signal input / output unit 31. Although the details will be described later, the data conversion unit 19 compresses and encodes the supplied video signal using a compression encoding method such as MPEG2 in a mode instructed by the control unit 20 to generate main line video data. I do. At this time, compression encoding processing with a lower bit rate is also performed, and auxiliary AV data is generated.
Further, the data conversion unit 19 compresses and encodes the supplied audio signal in a method instructed by the control unit 20, and outputs the audio signal as main-line audio data. In the case of an audio signal, linear PCM audio data may be output without compression encoding.
The main line audio data and video data and the auxiliary AV data processed as described above by the data conversion unit 19 are supplied to the memory controller 17.
The data converter 19 also decodes the reproduced data supplied from the memory controller 17 as necessary, converts the data into an output signal in a predetermined format, and supplies the output signal to the signal input / output unit 31.
The control unit 20 includes a CPU (Central Processing Unit), a memory such as a ROM (Read Only Memory) and a RAM (Random Access Memory), and a bus for connecting these, and controls the entire disk recording / reproducing apparatus 10. I do. In the ROM, an initial program which is read when the CPU is started, a program for controlling the disk recording and reproducing apparatus 10, and the like are stored in advance. The RAM is used as a work memory of the CPU. The control unit 20 also controls the video camera unit.
Further, the control unit 20 provides a file system for recording data on the optical disc 1 according to a program stored in the ROM in advance and reproducing the recorded data. That is, in the disc recording / reproducing apparatus 10, recording of data on the optical disc 1 and reproduction of data from the optical disc 1 are performed under the control of the control unit 20.
The operation unit 21 is operated by a user, for example, and supplies an operation signal corresponding to the operation to the control unit 20. The control unit 20 controls the servo control unit 15, the signal processing unit 16, the memory controller 17, and the data conversion unit 19 based on an operation signal from the operation unit 21 and the like, and executes a recording / reproducing process.
Further, based on an operation signal from the operation unit 21, for example, setting of a bit rate, a frame rate, an image size, an image aspect ratio, and the like for recording video data is performed. Further, ON / OFF of the compression encoding process for the recording audio data and setting of the bit resolution may be performed from the operation unit 21. Control signals based on these settings are supplied to the memory controller 17 and the data conversion unit 19.
The disk recording / reproducing apparatus 10 includes an antenna 22 for receiving a signal by GPS, and a GPS unit 23 for analyzing a GPS signal received by the antenna 22 and outputting position information including latitude, longitude, and altitude. And The position information output from the GPS unit 23 is supplied to the control unit 20. The antenna 22 and the GPS unit 23 may be provided in the video camera unit, or may be a device externally attached to the disk recording / reproducing device 10.
FIG. 14 shows an example of the configuration of the data conversion unit 19. When data is recorded on the optical disc 1, a signal to be recorded input from the signal input / output unit 31 is supplied to the demultiplexer 41. A video signal of a moving image and an audio signal accompanying the video signal are input from the video camera unit to the signal input / output unit 31, and shooting information of the camera, for example, information on an iris and zoom is input in real time as camera data. Is done.
The demultiplexer 41 separates a plurality of related data sequences from the signal supplied from the signal input / output unit 31, for example, a video signal of a moving image and an audio signal accompanying the video signal, and detects a data amount. To the unit 42. Further, the demultiplexer 41 separates the camera data from the signal supplied from the signal input / output unit 31 and outputs the separated camera data. This camera data is supplied to the control unit 20.
The data amount detection unit 42 supplies the video signal and the audio signal supplied from the demultiplexer 41 as they are to the image signal conversion units 43A and 43B and the audio signal conversion unit 44, respectively. Is detected and supplied to the memory controller 17. That is, the data amount detection unit 42 detects, for example, a data amount for a predetermined reproduction time for each of the video signal and the audio signal supplied from the demultiplexer 41 and supplies the data amount to the memory controller 17.
The image signal conversion unit 43B compresses and encodes the video signal supplied from the data amount detection unit 42 according to an instruction from the control unit 20 by, for example, the MPEG2 method, and converts the resulting video data sequence into the memory controller 17. To supply. For the image signal conversion unit 43B, the control unit 20 sets, for example, the maximum bit rate of the generated code amount by compression coding. The image signal converter 43B estimates the data amount of one frame after the compression encoding, controls the compression encoding process based on the result, and controls the video encoding so that the generated code amount falls within the set maximum bit rate. An actual compression encoding process is performed. The difference between the set maximum bit rate and the data amount by the actual compression encoding is filled with, for example, predetermined padding data, and the maximum bit rate is maintained. The data sequence of the video data that has been compression-encoded is supplied to the memory controller 17.
On the other hand, the image signal conversion unit 43A compresses and encodes the video signal supplied from the data amount detection unit 42 by, for example, the MPEG4 system according to an instruction from the control unit 20 to generate auxiliary AV data. In this embodiment, at this time, the bit rate is fixed to several Mbps, and a GOP is formed by 10 frames of one I picture and nine P pictures.
When the audio signal supplied from the data amount detection unit 42 is not linear PCM audio data, the audio signal conversion unit 44 converts the audio signal into linear PCM audio data according to an instruction from the control unit 20. Not limited to this, the audio signal conversion unit 44 may compress and encode the audio signal using, for example, an MP3 (Moving Pictures Experts Group 1 Audio Layer 3) or an AAC (Advanced Audio Coding) method in accordance with the MPEG method. it can. The compression coding method of the audio data is not limited to these, and another method may be used. The data sequence of the audio data output from the audio signal converter 44 is supplied to the memory controller 17.
Note that the above-described configuration is an example, and the present invention is not limited to this. For example, when main line AV data, camera data, and the like are independently input to the signal input / output unit 31, the demultiplexer 41 can be omitted. When the main line audio data is linear PCM audio data, the processing in the audio signal conversion unit 44 can be omitted.
Then, the video data and the audio data supplied to the memory controller 17 are supplied to the optical disk 1 and recorded as described above.
Recording is performed while an annual ring is formed on the optical disc 1 as described above. The data amount detection unit 42 of the data conversion unit 19 notifies the memory controller 17 when audio data necessary for reproducing the time corresponding to one annual ring data is detected in the audio data, for example. In response to this notification, the memory controller 17 determines whether or not audio data necessary for reproducing one-year-ring data has been stored in the memory 18, and notifies the control unit 20 of the determination result. The control unit 20 controls the memory controller 17 so as to read out from the memory 18 audio data corresponding to the reproduction time for one year's ring data based on the determination result. Audio data is read from the memory 18 based on this control by the memory controller 17, supplied to the signal control unit 16, and recorded on the optical disc 1.
After the audio data corresponding to the playback time for one annual ring data is recorded, the same processing is performed next on, for example, video data, and the video annual ring data for one annual ring data is recorded after the audio annual ring data. You. Similarly, data corresponding to the reproduction time for one year ring data is sequentially recorded in the auxiliary AV data.
For the time-series metadata, for example, camera data is supplied from the demultiplexer 41 to the control unit 20, and some data such as UMID among the time-series metadata is generated by the control unit 20. The camera data and the data generated by the control unit 20 are combined into time-series metadata, and stored in the memory 18 via the memory controller 17. The memory controller 17 reads out the time-series metadata corresponding to the reproduction time for one-year ring data from the memory 18 and supplies the same to the signal processing unit 16 in the same manner as described above.
Note that the control unit 20 also generates non-time-series metadata. The non-time-series metadata is recorded in a clip directory of a clip to which the data belongs.
The data recorded on the optical disk 1 as described above is stored in a file and managed by a directory structure, as already described with reference to FIGS. 8, 9 and 10. For example, when data is recorded on the optical disc 1, the control unit 20 records management information such as address information of each file, pointer information in a directory structure, file name and directory name information in a predetermined management area of the optical disc 1. Is done. Further, the recorded file information and the like are reflected in the index file “INDEX.XML”.
On the other hand, when data is reproduced from the optical disk 1, video data, audio data, auxiliary AV data, and time-series metadata are read from the optical disk 1 as described above. At this time, at the reproduction speed of main line video data at a high bit rate, low bit rate data such as main line audio data, auxiliary AV data, and time-sequential metadata are also reproduced. Is not changed by the data to be read. The video data and auxiliary AV data read from the optical disk 1 are supplied from the memory controller 17 to the image data conversion units 45B and 45A, respectively. The audio data is supplied from the memory controller 17 to the audio data converter 46.
The image data converters 45 </ b> A and 45 </ b> B decode the data sequence of the auxiliary AV data and the main line video data supplied from the memory controller 17, and supply the resulting video signal to the multiplexer 47. The audio data converter 46 decodes the data sequence of the audio data supplied from the memory controller 17 and supplies the resulting audio signal to the multiplexer 47.
In the image data converters 45A and 45B and the audio data converter 46, the supplied reproduction data can be supplied to the multiplexer 47 as it is without decoding, and multiplexed and output. Further, it is possible to omit the multiplexer 47 and to output each data independently.
In the disk recording / reproducing device 10 configured as described above, when the user operates the operation unit 21 to instruct data recording, the data supplied from the signal input / output unit 31 is converted to the data conversion unit 19 and the memory controller. 17, the signal is supplied to the optical disk 1 via the signal processing unit 16 and the pickup unit 13 and recorded.
At the time of recording, the user can change the bit rate of the main line video data by operating the operation unit 21. For example, initially, recording is performed with the bit rate set to 50 Mbps, and when the recordable area of the optical disc 1 is reduced, the bit rate is changed to a low bit rate such as 30 Mbps to prevent missing recording. Can be used.
At this time, the clip is divided according to the bit rate change timing, and the changed data is recorded on the optical disc 1 as a new clip. The detection of the change in the bit rate may be performed by detecting an operation performed on the operation unit 21, or may be performed based on the result of monitoring the bit rate of the video data by the control unit 20. For example, the memory controller 17 extracts data at a predetermined bit position where bit rate information is described in the header of main line video data supplied from the data conversion unit 19, and confirms that the bit rate has been changed. It is possible to detect.
When the change of the bit rate is detected, for example, the memory controller 17 is controlled by the control unit 20, and the data before the bit rate is changed is swept out of the memory 18 and recorded on the optical disc 1, and the data after the change changes to a new Annual rings are formed.
When the change of the main line video data is detected, the other data, that is, the main line audio data, the auxiliary AV data, and the time-series metadata are similarly controlled by the memory controller 17, and the clip is divided. Is At this time, as described above, main-line AV data can be divided according to the GOP boundary of the auxiliary AV data.
Further, when the bit rate of main line video data is changed, it is preferable to gradually change the actual bit rate of the video data because an unnatural change does not appear in the reproduced image.
First, the case of changing from a high bit rate to a low bit rate will be described with reference to FIG. It is assumed that the bit rate mode is initially set to 50 Mbps. By operating the operation unit 21 during recording, the time t 0 Is instructed to change the bit rate mode to 30 Mbps. Upon receiving the instruction, the control unit 20 instructs the image signal conversion unit 43B of the data conversion unit 19 to change the bit rate. At this time, time t 0 Time t after a predetermined time from 1 , A time constant process is performed on the rate of change of the bit rate so that the bit rate gradually decreases. And time t 1 Is a change in the actual bit rate, at which point clip division is performed.
In this case, the time t 0 Even if the change of the bit rate is instructed by the 1 Until the video data reaches the bit rate mode video data before the change. For example, the difference between the amount of data at the bit rate specified in the bit rate mode and the amount of code generated by actual compression encoding is filled with predetermined padding data.
When changing from a low bit rate to a high bit rate, the above process is reversed. That is, for example, when the bit rate initially set to 30 Mbps is changed to 50 Mbps, the bit rate mode is first changed from 30 Mbps to 50 Mbps at the timing of the change instruction. Then, the control unit 20 performs time constant processing on the bit rate change speed so that the image signal conversion unit 43B of the data conversion unit 19 gradually increases the bit rate over a predetermined time. . Also, for example, a difference between the data amount at the bit rate specified in the bit rate mode and the code amount generated by actual compression encoding is filled with predetermined padding data. The clip division is performed, for example, at a change point of the bit rate mode.
By instructing the image signal conversion unit 43B to gradually decrease the bit rate at predetermined time intervals from the control unit 20, the bit rate can be gradually changed as described above. The image signal conversion unit 43B estimates the total code amount of the encoded frame according to the bit rate value instructed little by little, and performs the encoding process according to the estimated value.
On the other hand, regarding audio data, for example, it is possible to cope with a change in the bit resolution of main line audio data input as linear PCM audio data. When a change is detected, the clip is split at the change point, as in the case of the video data described above. Also in this case, it is possible to perform clip division in accordance with the GOP boundary of the auxiliary AV data.
In the case of audio data, the bit resolution before the change is maintained after the bit resolution is changed, and clip division due to the change in the bit resolution can be prevented. For example, when recording audio data externally input to the disk recording / reproducing apparatus 10 according to the embodiment of the present invention on the optical disk 1, the bit resolution of the input audio data may initially be 24 bits. If the bit resolution is changed to 16 bits at a certain point in time, the bit resolution can remain at 24 bits even after the bit resolution is changed.
Hereinafter, regarding the audio data, “24-bit bit resolution” and “16-bit bit resolution” are appropriately abbreviated as “24 bits” and “16 bits”, respectively.
This will be described with reference to FIG. The audio data that was initially input in 24 bits is changed to 16 bits at the bit resolution change point (FIG. 16A). At this time, data indicating silence in the audio data (for example, a value “0”) is added to the lower 8 bits (LSB side) of the audio data changed to 16 bits, as shown in an example in FIG. 16B. The total is 24 bits. At this time, the 8-bit data to be added is not limited to silence, and dither may be added.
Also, for example, when the audio data is changed from 16 bits at the beginning to 24 bits, the bit resolution can be kept at 16 bits even after the bit resolution is changed.
This will be described with reference to FIG. The audio data that was initially input in 16 bits has its bit resolution changed to 24 bits at the bit resolution change point (FIG. 17A). At this time, as shown in an example in FIG. 17B, the lower 8 bits (LSB side) of the audio data input with 24 bits are discarded, and the total becomes 16 bits.
Further, when audio data input as linear PCM audio data is changed to audio data encoded by an encoding method other than linear PCM (hereinafter referred to as non-audio audio data), non-audio Can be muted, and recording can be continued without dividing the clip. Muting is performed, for example, by recording audio data representing silence, and non-audio audio data is recorded as silence audio data. That is, non-audio data is replaced with audio data representing silence.
When the non-audio data is changed to linear PCM audio data, the linear PCM audio data can be recorded with the divided clips.
The above-described conversion processing of the bit resolution of the audio data and the silence processing at the time of inputting the non-audio audio data can be performed by the audio signal conversion unit 45 based on an instruction of the control unit 20, for example. However, the present invention is not limited to this, and can be performed by a process when audio data is read from the memory 18 under the control of the memory controller 17 based on an instruction from the control unit 20. For example, data representing non-audio audio data for one sample is stored in the memory 18 and the data is repeatedly read.
Note that the resolution of the audio data is determined by the header of the audio data when the audio data is transmitted in a format based on AES / EBU (Audio Engineering Society / European Broadcasting Union), which is generally used in a broadcasting station, for example. Since bit resolution information is stored at a predetermined position, determination can be made by extracting this data. In addition, identification of linear PCM audio data and non-audio audio data can be similarly determined from header information and the like.
In the above description, the change of the bit rate during recording of main line video data has been described. However, the present invention is not limited to this example. It is also possible to respond to changes in frame rate, image size, and aspect ratio. In this case, at the time of reproduction, interpolation / decimation processing in the time axis direction is performed when the frame rate is changed, and interpolation / decimation processing within the frame is performed when the image size or aspect ratio is changed. Video data can be output at a fixed frame rate, image size, and image aspect ratio. Such an interpolation / thinning-out process is performed, for example, on the video data stored in the memory 18 by the memory controller 17. This may be performed in the image signal conversion unit 43B.
In the above description, the encoding method of main line video data is described as MPEG2, but this is not limited to this example, and video data encoded by another method can be mixedly recorded. In addition, the video data bit rate and other parameters can be similarly applied to other parameters than those described above.
Similarly, when audio data is encoded to be non-audio, still another encoding method can be used. The bit resolution of audio data is not limited to 16 bits and 24 bits, and audio data of other bit resolutions such as 32 bits, 8 bits, and 12 bits can be mixedly recorded. The sampling frequency of the audio data is typically 48 kHz, but this is not limited to this example, and audio data of another sampling frequency such as 96 kHz or 192 kHz can be mixedly recorded.
Further, the auxiliary AV data is not limited to the MPEG4 system, and video data encoded by another system can be mixedly recorded.
Furthermore, it is preferable that a list of clips recorded on the optical disc 1 can be displayed on a monitor device (not shown). For example, the index file “INDEX.XML” is read in response to a user operation on the operation unit 21, and information on all clips recorded on the optical disc 1 is obtained. Then, by referring to each clip directory, a thumbnail image is automatically created based on the auxiliary AV data. The thumbnail image is created each time, for example, by reading a frame at a predetermined position of the auxiliary AV data and reducing it to a predetermined image size.
The thumbnail image data of each clip is supplied to the memory controller 17 and stored in the memory 18. Then, the thumbnail image data stored in the memory 18 is read out by the memory controller 17 and supplied to a monitor device (not shown) via the data conversion unit 19 and the signal input / output unit 31, and the thumbnail images are displayed in a list on the monitor device. You. The display control of the thumbnail image on the monitor device can be performed by an operation from the operation unit 21. In addition, by performing a predetermined operation on the operation unit 21, a desired image can be selected from the thumbnail images, and a clip corresponding to the selected thumbnail image can be reproduced.
When displaying the thumbnail image on the monitor device, various information of the clip corresponding to the displayed thumbnail image, for example, the bit rate and the encoding method of the main line video data are displayed together with the thumbnail image. Can be. This can be achieved by reading time-series metadata and non-time-series metadata from each clip directory.
【The invention's effect】
As described above, according to the present invention, if audio data changes to audio data encoded by an encoding method other than linear PCM while recording audio data encoded by linear PCM on a disk-shaped recording medium, Since audio data encoded by an encoding method other than linear PCM is replaced with audio data representing silence by linear PCM encoding and recording is continued, audio data encoded by linear PCM is replaced with audio data encoded by linear PCM. During recording on a disk-shaped recording medium, even if recording is continued in a state where audio data in which audio data is encoded by an encoding method other than linear PCM is input, the audio encoded by linear PCM is not reproduced. Following the data, audio data encoded using an encoding method other than linear PCM is connected. Regenerated is it is not, there is an effect that noise including the audio data up to the level near to the changed point of the encoding system is prevented from being output.
