JP2003244539A

JP2003244539A - Sequential automatic caption production processing system

Info

Publication number: JP2003244539A
Application number: JP2002040540A
Authority: JP
Inventors: Eiji Sawamura; 英治沢村; Takao Monma; 隆雄門馬; Noriyoshi Uratani; 則好浦谷; Katsuhiko Shirai; 克彦白井
Original assignee: NEC Corp; Nippon Hoso Kyokai NHK; Telecommunications Advancement Organization; NHK Engineering Services Inc; Japan Broadcasting Corp
Current assignee: NEC Corp; National Institute of Information and Communications Technology; Japan Broadcasting Corp; NHK Engineering System Inc
Priority date: 2002-02-18
Filing date: 2002-02-18
Publication date: 2003-08-29
Anticipated expiration: 2022-02-18
Also published as: JP3969570B2

Abstract

(57)【要約】【課題】番組の実時間に近い時間内に字幕の自動制作
処理および試写・修正処理を完了させる。【解決手段】入力された音声の指定区間において、文
単位でのアナウンス音声の開始、終了のタイミングを検
出し、検出されたタイミングを表示単位字幕文の開始、
終了のタイミングの少なくとも一部として適用する字幕
へのタイミング情報を付与して所定の処理単位毎に字幕
データを作成する自動字幕制作部１１１と、少なくとも
テレビ番組の映像、音声およびその字幕データをモニタ
ー装置１５５に表示し、モニター装置１５５に表示され
た字幕データについて、キー入力装置１５９から予備試
写のキー入力があったときに、少なくともその操作タイ
ミングおよびキー種別に関する情報を記憶装置１５３に
記録する試写・修正支援部１５１とから成る。 (57) [Summary] [Problem] To complete automatic caption production processing and preview / correction processing within a time close to the real time of a program. SOLUTION: In a designated section of input speech, the start and end timings of announce speech in sentence units are detected, and the detected timing is used as the start of a display unit subtitle sentence.
An automatic subtitle production unit 111 that adds subtitle timing information to be applied as at least a part of the end timing and generates subtitle data for each predetermined processing unit; and monitors at least video and audio of a television program and its subtitle data. Preview of subtitle data displayed on the device 155 and displayed on the monitor 155 when at least key operation of the preliminary preview is input from the key input device 159 and information on the operation timing and key type is recorded in the storage device 153. A correction support unit 151;

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、制作時間の短縮を
可能にした逐次自動字幕制作処理システムに関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a sequential automatic subtitle production processing system capable of reducing production time.

【０００２】［発明の概要］本発明は、制作時間の短縮
を目的とする逐次自動字幕制作処理システムに関するも
のである。逐次自動字幕制作処理システムは、例えば番
組音声の適切な区切りを検出して字幕制作の処理単位と
し、この処理単位毎に字幕制作を実行するものである。
この処理単位の時間内に全ての自動字幕制作を完了でき
るように高速化し、直ちに次の処理単位の字幕制作処理
に進めるようにする。結局、ＶＴＲから連続再生された
字幕番組素材に対し、処理単位時間分だけ遅れてはいる
が、追いかけ自動字幕制作が可能となり、字幕制作時間
を大幅に短縮することができるようにしたものである。[Summary of the Invention] The present invention relates to a sequential automatic subtitle production processing system for the purpose of reducing production time. The automatic subtitle production processing system sequentially detects, for example, an appropriate segment of program sound and sets it as a subtitle production processing unit, and executes subtitle production for each processing unit.
Speed up so that all automatic subtitle production can be completed within the time of this processing unit, and immediately proceed to the subtitle production processing of the next processing unit. After all, although the subtitle program material continuously played back from the VTR is delayed by the processing unit time, chasing automatic subtitle production becomes possible, and the subtitle production time can be greatly shortened. .

【０００３】[0003]

【従来の技術】社会の情報化が著しく進展する中で、聴
覚障害者はその機能障害により、情報の入手に多くの制
約を受けている状況にある。聴覚障害者が健常者と同様
に放送を利用し、楽しむために有効な手段として、現
在、一部の番組を対象として字幕放送が実施されている
が、聴覚障害者のニーズに照らすと、その実施状況はき
わめて不十分である。2. Description of the Related Art With the remarkable progress of computerization in society, hearing impaired persons are in many situations restricted in obtaining information due to their functional impairment. As an effective means for hearing-impaired people to use and enjoy broadcasting as well as healthy people, captioned broadcasting is currently being implemented for some programs, but in light of the needs of hearing-impaired people, The implementation status is extremely insufficient.

【０００４】ところが、字幕放送は、現時点ではその制
作過程の大部分を手作業に依存しているため、番組制作
に多大の労力・費用・時間を要し、字幕放送の普及を阻
害する要因の一つとなっている。今後、字幕放送の一層
の普及を図るためには、字幕データの作成等を効率的に
行う字幕番組制作技術の開発などにより番組制作プロセ
スの合理化・効率化を図ることが不可欠である。However, since most of the production process of subtitle broadcasting depends on manual work at present, it takes a lot of labor, cost and time to produce a program, which is a factor that hinders the spread of subtitle broadcasting. It is one. In order to further spread subtitle broadcasting in the future, it is essential to streamline and streamline the program production process by developing subtitle program production technology that efficiently creates subtitle data.

【０００５】従来の手動字幕制作システムにおける処理
手順について説明すると、先ず、字幕作成素材としてタ
イムコードを映像にスーパーした番組テープとタイムコ
ードを音声チャンネルに記録した番組テープおよび番組
台本などを使用する。Explaining the processing procedure in the conventional manual caption production system, first, as a caption creating material, a program tape in which a time code is superposed on a video, a program tape in which the time code is recorded in an audio channel, a program script, etc. are used.

【０００６】これを放送関係経験のあるＯＢなど専門知
識のある人に依頼して、番組アナウンスの要約書起こし
と字幕表示イメージ化（別途定める字幕原稿作成要領な
どを参考にする）およびその開始・終了タイムコード記
入を行って字幕原稿を作成する。[0006] This is requested to a person with specialized knowledge such as OB who has experience in broadcasting, and a summary transcription of the program announcement and an image of caption display (refer to a separately prepared caption manuscript, etc.) and its start Create a subtitle manuscript by entering the end time code.

【０００７】この字幕原稿をもとに、オペレータが電子
化字幕を作成する。An operator creates an electronic subtitle based on this subtitle original.

【０００８】この電子化字幕を、担当の字幕制作責任
者、原稿作成者、電子化したオペレータなどの立ち会い
のもとで、試写・修正を行って完成字幕としている。The digitized subtitles are previewed / corrected in the presence of the person in charge of subtitle production, the manuscript creator, the digitized operator, and the like to obtain the completed subtitles.

【０００９】[0009]

【発明が解決しようとする課題】しかしながら、従来の
手動字幕制作システムにおいて字幕原稿作成は、タイム
コードを映像に多重して記録した番組テープや台本など
を使用して、字幕表示単位とする台詞などの書き起こし
と字幕表示イメージ化を行うとともに、画面上のタイム
コードを読み取って、その開始・終了タイムコードを記
入するが、人間の知能・能力に負うところが大きいもの
であるため、番組時間の数十倍の時間を必要としてい
る。However, in the conventional manual subtitle production system, in the production of subtitle originals, a program tape or a script in which a time code is multiplexed and recorded is used, and a line as a subtitle display unit is used. In addition to transcribing and subtitle display images, the time code on the screen is read and the start and end time codes are entered, but it depends on human intelligence and ability. You need ten times as much time.

【００１０】また、試写・修正は、人手によらざるを得
ない作業であり、番組としての最終チェックでもあるの
で、その重要性からも経験豊富な専門家の高度な能力に
負っており、また、多くの人手と番組時間の数倍の時間
を必要としている。[0010] Also, previewing and modifying are works that must be done manually, and are the final check as a program, so their importance lies in the advanced abilities of experienced experts. , Requires a lot of manpower and several times as long as the program time.

【００１１】通常、試写・修正作業は、貴重な複数の人
材を、高度の緊張状態かつ長時間拘束（例えば、人数は
３人、作業時間は番組時間の２．７倍）するものであ
る。また、試写・修正作業の一般的な例では、先ず字幕
番組映像・音声・タイムコードを連続的に再生するとと
もに、そのタイムコードに対応する表示単位字幕を順次
表示し、試写・修正担当者は不具合ありと思われる字幕
についてその字幕番号と可能な範囲での不具合の概要を
メモにとる（予備試写と仮称する）。次いで、予備試写
で作成されたメモの字幕個々について不具合状況を詳細
に調べ、その場で直ぐ修正を行うとか、別途一括修正の
ために不具合の具体的修正事項をメモするとかが行われ
る。Normally, the preview / correction work involves restraining a plurality of valuable human resources in a highly tense state for a long time (for example, the number of people is 3, and the working time is 2.7 times the program time). In addition, in a typical example of preview / correction work, first, the subtitle program video / audio / timecode is continuously played, and the display unit subtitles corresponding to the timecode are sequentially displayed. Make a note of the subtitle numbers and possible outlines of the subtitles that are considered to be defective (tentatively called preliminary screening). Next, the defect status is examined in detail for each subtitle of the memo created in the preliminary preview, and the problem is corrected immediately on the spot, or a specific correction item of the defect is noted for separate batch correction.

【００１２】この予備試写で、不具合ありと思われる字
幕の字幕番号と不具合の概要をメモにとる作業は、１ペ
ージの字幕の表示時間内（３〜６秒）に、不具合に関す
る７〜１０項目のチェックと概要のメモが必要であり、
番組の字幕ページ数分（例えば、４５分番組で４５０ペ
ージ）繰り返し行われなければならない、かなり過酷な
作業である。[0012] In this preliminary preview, the work of taking note of the subtitle number of the subtitle that seems to be defective and the outline of the defect is 7-10 items related to the defect within the display time (3-6 seconds) of the subtitle of one page. I need a check and a summary note,
This is a fairly demanding task that must be repeated for the number of subtitle pages of a program (for example, 450 pages for a 45-minute program).

【００１３】本発明は上記事情に鑑みて成されたもの
で、字幕用テキストが予め存在する番組に関しては、特
にタイミング付与を高速化した自動字幕制作を行うとと
もに、試写・修正担当者の試写・修正業務を効果的に支
援することで、字幕制作に要する時間を大幅に短縮する
ことができる逐次自動字幕制作システムを提供すること
を目的としている。The present invention has been made in view of the above circumstances. For a program in which a subtitle text is present in advance, automatic subtitle production is performed, especially when the timing is added at high speed, and a preview / correction The objective is to provide a sequential automatic caption production system that can significantly reduce the time required for caption production by effectively supporting correction work.

【００１４】[0014]

【課題を解決するための手段】上記の目的を達成するた
めに本発明は、請求項１では、入力された音声の少なく
ともポーズを区切りとする指定区間において、少なくと
も文単位でのアナウンス音声の開始、終了のタイミング
を高速検出する検出手段と、検出されたタイミングを表
示単位字幕文の改ページおよび開始、終了のタイミング
の少なくとも一部として適用する字幕への情報付与手段
を備え、指定区間の処理単位毎に字幕データを順次自動
制作することを特徴としている。To achieve the above object, according to the present invention, in claim 1, at least a start of an announcement voice in sentence units is started in a designated section delimited by at least a pause of input voice. A detection unit for detecting the end timing at high speed, and a unit for adding information to the subtitle for applying the detected timing as at least a part of the page break and start / end timing of the display unit subtitle sentence, and processing of the designated section The feature is that caption data is automatically produced sequentially for each unit.

【００１５】請求項２では、入力された音声の少なくと
もポーズを区切りとする指定区間において、少なくとも
文単位でのアナウンス音声の開始、終了のタイミングを
高速検出する検出手段と、検出されたタイミングを表示
単位字幕文の改ページおよび開始、終了のタイミングの
少なくとも一部として適用する字幕への情報付与手段を
備え、所定の処理単位毎に字幕データを作成する自動字
幕制作部と、少なくともテレビ番組の映像、音声および
その字幕データを表示し、番組音声を出力するモニター
装置と、前記モニター装置に表示された字幕データにつ
いて、キー入力装置から予備試写のキー入力があったと
きに、少なくともその操作タイミングおよびキー種別に
関する情報を記憶装置に記録する字幕修正情報収集装置
とから成る試写・修正支援部とを備え、前記自動字幕制
作部で実行される字幕の自動制作処理の進行と並行して
試写・修正支援部で予備試写を実行して字幕番組制作を
逐次処理することを特徴としている。According to a second aspect of the present invention, at least a detection unit that detects at high speed the start and end timings of the announcement voice in sentence units in the designated section delimited by at least the pause of the input voice is displayed, and the detected timing is displayed. An automatic caption production unit that includes means for adding information to captions to be applied as at least part of page breaks and start / end timings of unit caption sentences, and an automatic caption production unit that creates caption data for each predetermined processing unit, and at least a video of a TV program For a monitor device that displays audio and subtitle data thereof and outputs program audio, and for subtitle data displayed on the monitor device, when there is a key input for preliminary preview from the key input device, at least the operation timing and A preview consisting of a caption correction information collection device that records information about key types in a storage device. A primary support unit is provided, and a preliminary preview is performed by the preview / correction support unit in parallel with the progress of the automatic subtitle production process executed by the automatic subtitle production unit, and the subtitle program production is sequentially processed. There is.

【００１６】請求項３では、請求項１または２に記載の
逐次自動字幕制作処理システムにおいて、前記検出手段
は、ブロック・ケプストラム・フラックス法によって音
声のポーズ区間を検出して音声の開始、終了のタイミン
グを検出することを特徴としている。According to a third aspect of the present invention, in the sequential automatic caption production processing system according to the first or second aspect, the detecting means detects a pause section of the voice by a block cepstrum flux method to start and end the voice. It is characterized by detecting the timing.

【００１７】[0017]

【発明の実施の形態】＜本発明の原理・背景の説明＞実
施の形態の説明に先立って、本発明の原理的な説明をす
る。DESCRIPTION OF THE PREFERRED EMBODIMENTS <Description of Principle and Background of the Present Invention> Prior to the description of the embodiments, the principle of the present invention will be described.

【００１８】現在放送中の字幕番組のなかで、予めアナ
ウンス原稿が作成され、その原稿がほとんど修正される
ことなく実際の放送字幕となっていると推測される番組
がいくつかある。これらの番組では、アナウンス音声と
字幕内容はほぼ共通であり、ほぼ共通の原稿をアナウン
ス用と字幕用の両方に利用していると推測できる。Among the subtitle programs currently being broadcast, there are some programs in which an announcement manuscript is created in advance, and it is presumed that the manuscript becomes the actual broadcast subtitles with almost no modification. In these programs, the announcement sound and the subtitle content are almost the same, and it can be inferred that the almost common manuscript is used for both the announcement and the subtitle.

【００１９】そこで、本発明者らは、このようにアナウ
ンス音声と字幕の内容が極めて類似し、アナウンス用と
字幕用の両方に共通の原稿を利用しており、その原稿が
電子化されている番組を想定したとき、字幕用テキスト
を所定の表示形式に従う適切箇所で自動分割した後の表
示単位字幕の各々に対し、その分割箇所に対応した高精
度のタイミング情報を自動的に付与し、これによって所
定の処理単位毎に逐次的に字幕データを自動作成すると
ともに、さらに必要ならば作成された処理単位毎の字幕
データを逐次的に試写・修正処理することで字幕番組制
作時間を大幅に短縮する逐次自動字幕制作処理システム
を想到するに至った。Therefore, the present inventors use a manuscript common to both the announcement and the subtitle, which is very similar in content of the announcement voice and the subtitle, and the manuscript is digitized. When a program is assumed, high-precision timing information corresponding to the division is automatically added to each of the display unit subtitles after the subtitle text is automatically divided at appropriate places according to a predetermined display format. The subtitle data is automatically created sequentially for each predetermined processing unit, and if necessary, the subtitle data for each processing unit that is created is also previewed and modified sequentially, greatly reducing the time to create subtitle programs. We have come up with the idea of a sequential automatic subtitle production processing system.

【００２０】図２は、本発明の原理を示す説明図であ
る。後述するように、本発明に係る逐次自動字幕制作処
理システムの例は自動字幕制作部と、試写・修正支援部
とから構成され、例えば番組音声の適切な区切りを字幕
制作の処理単位とし、この処理単位毎に自動字幕制作を
実行する一方、必要ならば処理単位毎に字幕データの試
写・修正処理を実行する。FIG. 2 is an explanatory view showing the principle of the present invention. As will be described later, an example of the sequential automatic subtitle production processing system according to the present invention includes an automatic subtitle production unit and a preview / correction support unit. For example, an appropriate division of program audio is used as a subtitle production processing unit. While performing automatic caption production for each processing unit, preview and correction processing of caption data is performed for each processing unit if necessary.

【００２１】図２（Ａ）は、ＶＴＲ再生出力、同（Ｂ）
は自動字幕制作部での処理、同（Ｃ）は試写・修正支援
部での処理を示している。（Ｂ）に示すように、自動字
幕制作部では、先ず、所定の処理単位毎に分割された作
成処理１を実行する。作成処理１が実行されると、
（Ｃ）に示すように、試写・修正支援部では、試写・修
正処理１を逐次的に実行する。こうして自動字幕作成処
理が終了すると直ちに作成された字幕データの試写・修
正処理がパイプライン的に実行される。このように、処
理単位の時間で自動字幕制作の処理が完了できれば、直
ちに次の処理単位の字幕制作処理に進むことができ、ま
た、試写・修正処理も処理単位時間内に実行できるもの
とすると、結局ＶＴＲから連続再生された字幕番組素材
に対し、処理単位時間分ずつ遅れながらではあるが、追
いかけ自動字幕制作と試写・修正処理が可能であること
を意味し、ＶＴＲの再生開始から制作時間＋処理単位時
間×２の時間でここまでの作業を終えることができる。FIG. 2A shows a VTR reproduction output, and FIG.
Shows the processing in the automatic caption production unit, and (C) shows the processing in the preview / correction support unit. As shown in (B), the automatic caption production unit first executes the creation processing 1 divided into predetermined processing units. When the creation process 1 is executed,
As shown in (C), the preview / correction support unit sequentially executes preview / correction processing 1. Immediately after the automatic caption creation processing is completed, the preview / correction processing of the created caption data is executed in a pipeline manner. In this way, if the automatic subtitle production process can be completed within the processing unit time, it is possible to immediately proceed to the next processing unit subtitle production process, and the preview / correction process can also be executed within the processing unit time. Although, after all, it means that it is possible to perform chasing automatic subtitle production and preview / correction processing for the subtitle program material continuously played back from the VTR by the processing unit time, but the production time from the start of the VTR playback The work up to here can be completed in + processing unit time x 2.

【００２２】逐次字幕制作処理の要素は、（１）処理単
位の時間内にその間の自動字幕制作がすべて完了できる
高速処理、（２）適切な字幕処理単位の設定、（３）設
定時間内で完了できる適切な試写・修正処理である。The elements of the sequential subtitle production processing are (1) high-speed processing capable of completing all automatic subtitle production during the processing unit time, (2) setting of an appropriate subtitle processing unit, and (3) within the set time. It is an appropriate preview / correction process that can be completed.

【００２３】（１）高速処理については、現状の自動字
幕制作システムでは３倍以上の処理時間を必要としてい
ることから、このままでは無理である。自動字幕制作シ
ステムの処理システムの中で同期システムがそのほとん
どの時間を要しており、この部分の高速化もしくは別の
高速手法の適用が必要である。(1) Regarding high-speed processing, the current automatic caption production system requires a processing time of three times or more, so that it is impossible as it is. The synchronization system takes most of the time in the processing system of the automatic caption production system, and it is necessary to speed up this part or apply another high speed method.

【００２４】その手法として、ブロック・ケプストラム
・フラックス法に代表される音声レベルなどの特徴を用
いるタイミング検出行うことで、高速処理を可能とし、
この手法のみでタイミング付与が完了するような番組の
場合は、その番組音声時間の数分の一以下の時間で高速
処理することが可能となる。As the method, high-speed processing is enabled by performing timing detection using features such as voice level represented by the block cepstrum flux method.
In the case of a program for which the timing assignment is completed only by this method, it is possible to perform high-speed processing within a fraction of the program audio time.

【００２５】（２）適切な字幕制作処理単位の設定につ
いては、基本的には表示単位字幕の生成に影響を与えな
いよう設定する。具体的には、一定時間以上の番組音声
の非スピーチ部分で区切るのが適切である。しかし、こ
の方法だけでは、時として区切りが大幅に長くなる欠点
がある。この場合の簡便な対策として、経験的に標準的
な表示単位字幕の１０ページに相当する時間に区切りを
設定すると、ほとんど影響がない。(2) Regarding the setting of an appropriate subtitle production processing unit, basically, it is set so as not to affect the generation of the display unit subtitle. Specifically, it is appropriate to divide the program sound by a non-speech portion of the program sound for a certain time or longer. However, this method alone has the drawback that the breaks sometimes become significantly longer. As a simple measure in this case, if a break is set to a time corresponding to 10 pages of display unit subtitles that is empirically standard, there is almost no effect.

【００２６】（３）として、自動字幕制作部で実行され
る字幕の自動制作処理の進行と並行して行う試写・修正
支援部での適切な予備試写を実行することで、これらの
字幕番組制作全体を逐次処理することで字幕番組制作の
高速化を実現している。(3) As a result of executing an appropriate preliminary preview in the preview / correction support unit in parallel with the progress of the automatic subtitle production processing executed in the automatic subtitle production unit, the production of these subtitle programs is performed. By sequentially processing the whole, the subtitle program production has been speeded up.

【００２７】＜実施の形態の説明＞図１は本発明に係る
逐次自動字幕制作処理システムの構成例を示すブロック
図である。<Description of Embodiments> FIG. 1 is a block diagram showing a configuration example of a sequential automatic subtitle production processing system according to the present invention.

【００２８】なお、以下の説明において、表示対象とな
る字幕文の全体集合を「字幕文テキスト」と言い、字幕
文テキストのうち、字幕として適宜に区切られたひとか
たまりの字幕文の部分集合を「単位字幕文」と言い、デ
ィスプレイの表示画面上において表示単位となる字幕を
「表示単位字幕」と言い、表示単位字幕に含まれる各行
の個々の字幕を表現するとき、これを「表示単位字幕
行」と言い、表示単位字幕行のうちの任意の文字を表現
するとき、これを「字幕文字」と言うことにする。In the following description, the entire set of subtitle sentences to be displayed is referred to as "subtitle sentence text", and a sub-set of a set of subtitle sentences that is appropriately separated as subtitles from the subtitle sentence text is referred to as "subtitle sentence text". "Subunit text" is the subtitle that is the display unit on the display screen of the display is "display unit subtitle". When expressing individual subtitles of each line included in the display unit subtitle, this is referred to as "display unit subtitle line". When expressing an arbitrary character in the display unit subtitle line, this will be referred to as a “subtitle character”.

【００２９】同図に示すように、この逐次自動字幕番組
制作システム１０１は、自動字幕制作部１１１と、試写
・修正支援部１５１とを備えている。自動字幕制作部１
１１は、電子化原稿記録媒体１１３と、同期検出装置１
１５と、統合化装置１１７と、形態素解析部１１９と、
分割ルール記憶部１２１と、番組素材ＶＴＲ例えばディ
ジタル・ビデオ・テープ・レコーダ（以下、「Ｄ−ＶＴ
Ｒ」と言う）１２３と、を含んで構成されている。ま
た、試写・修正支援部１５１は、記憶装置１５３と、モ
ニター装置１５５と、遅延装置１５７と、キーボード１
５９とを含んで構成されている。As shown in the figure, the sequential automatic subtitle program production system 101 includes an automatic subtitle production section 111 and a preview / correction support section 151. Automatic subtitle production department 1
Reference numeral 11 denotes an electronic document recording medium 113 and the synchronization detection device 1.
15, an integration device 117, a morphological analysis unit 119,
The division rule storage unit 121 and the program material VTR, for example, a digital video tape recorder (hereinafter referred to as "D-VT").
(Referred to as “R”) 123. The preview / correction support unit 151 also includes a storage device 153, a monitor device 155, a delay device 157, and a keyboard 1.
And 59.

【００３０】《自動字幕制作部１１１の構成と作用》電
子化原稿記録媒体１１３は、例えばハードディスク記憶
装置やフロッピー（登録商標）ディスク装置等より構成
され、表示対象となる字幕の全体集合を表す字幕文テキ
ストを記憶している。なお、本実施形態では、ほぼ共通
の電子化原稿をアナウンス用と字幕用の双方に利用する
形態を想定しているので、電子化原稿記録媒体１１３に
記憶される字幕文テキストの内容は、表示したい字幕と
一致するばかりでなく、素材ＶＴＲに収録されたアナウ
ンス音声とも一致しているものとする。<< Structure and Operation of Automatic Subtitle Production Unit 111 >> The electronic document recording medium 113 is composed of, for example, a hard disk storage device, a floppy (registered trademark) disk device, or the like, and represents a whole set of subtitles to be displayed. I remember the sentence text. Note that, in the present embodiment, it is assumed that almost common digitized manuscripts are used for both announcements and subtitles. Therefore, the content of the subtitle sentence text stored in the digitized manuscript recording medium 113 is displayed. Not only does it match the subtitles you want, but it also matches the announcement sound recorded on the material VTR.

【００３１】同期検出装置１１５は、同期検出点付字幕
文と、これを読み上げたアナウンス音声との間における
時間同期を検出する機能等を有している。この場合の同
期検出点は、通常字幕文テキストの各文の文頭、文末を
指定する。さらに詳しく述べると、同期検出装置１１５
には、統合化装置１１７で付与した同期検出点付字幕文
と、番組素材ＶＴＲから取り込んだこの字幕文に対応す
るアナウンス音声及びそのタイムコードが入力されてお
り、このアナウンス音声に含まれるポーズ点の検出と確
度検証機能、および検証されなかった指定同期検出点の
タイミング情報、すなわちタイムコードを音声認識処理
手法で検出する機能があり、これら機能で検出したタイ
ムコードやポーズ区間データを統合化装置１１７宛に送
出する機能を有している。The synchronization detection device 115 has a function of detecting the time synchronization between the synchronization detection pointed subtitle sentence and the announcement voice that has read the sentence. In this case, the synchronization detection points usually specify the beginning and end of each sentence of subtitle sentence text. More specifically, the synchronization detection device 115
The subtitle sentence with synchronization detection point added by the integration device 117, the announcement voice corresponding to the subtitle sentence captured from the program material VTR, and the time code thereof are input to the pause point included in the announcement voice. There is a function to detect and verify the accuracy, and a function to detect the timing information of the unsynchronized specified sync detection point, that is, the time code by the voice recognition processing method. The time code and pause section data detected by these functions are integrated into the integrated device. It has a function of sending to 117.

【００３２】なお、アナウンス音声を対象とした音声認
識処理を含むアナウンス音声と字幕文テキスト間の同期
検出は、本発明者らがすでに研究開発した技術を適用す
ることで低速ながら高精度に実現可能である。Note that the synchronization detection between the announcement voice and the subtitle sentence text including the voice recognition processing for the announcement voice can be realized with high accuracy at a low speed by applying the technique already researched and developed by the present inventors. Is.

【００３３】実施形態におけるポーズ時間の検出機能
は、前記のような音声認識処理をすることなく、素材Ｖ
ＴＲから供給される音声のレベルや継続時間、およびそ
のタイムコードから、例えばそのレベルが指定レベル以
下で所定時間連続する開始、終了タイムコードを検出す
るものであり、後述するブロック・ケプストラム・フラ
ックス法などの方法によって実行される。The pause time detecting function in the embodiment does not perform the voice recognition processing as described above, but the material V
The block cepstrum flux method, which will be described later, is used to detect, for example, a start and end time code in which the level is equal to or less than a designated level and continues for a predetermined time, from the level and duration of the sound supplied from TR and the time code. It is executed by a method such as.

【００３４】統合化装置１１７は、電子化原稿記録媒体
１１３から読み出した字幕文テキストのうち、文頭を起
点とした句点や所要文字数範囲などを目安とした単位字
幕文を順次抽出する単位字幕文抽出機能と、単位字幕文
抽出機能を発揮することで抽出した単位字幕文を、所望
の表示形式に従う表示単位字幕に変換する表示単位字幕
化機能と、表示単位字幕化機能を発揮することで変換さ
れた表示単位字幕に対し、同期検出装置１１５から送出
されてきたタイムコード及びポーズ点を利用し、さらに
適切な内挿処理によってタイミング情報を付与するタイ
ミング情報付与機能と、を有している。The uniting device 117 extracts unit caption sentences which sequentially extract unit caption sentences based on the punctuation starting from the beginning of the sentence and the required number of characters range from the caption sentence text read from the computerized manuscript recording medium 113. Function and unit subtitle sentence extraction function to convert the unit subtitle sentence extracted to display unit subtitle according to the desired display format and display unit subtitle conversion function For the display unit subtitles, a timing information adding function is provided that uses the time code and the pause point sent from the synchronization detecting device 115 and adds the timing information by an appropriate interpolation process.

【００３５】形態素解析部１１９は、漢字かな交じり文
で表記されている単位字幕文を対象として、形態素毎に
分割する分割機能と、分割機能を発揮することで分割さ
れた各形態素毎に、表現形、品詞、読み、標準表現など
の付加情報を付与する付加情報付与機能と、各形態素を
文節や節単位にグループ化し、いくつかの情報素列を得
る情報素列取得機能と、を有している。これにより、単
位字幕文は、表面素列、記号素列（品詞列）、標準素
列、及び情報素列として表現される。The morpheme analysis unit 119 expresses, for each morpheme that is divided by demonstrating the division function, by dividing the morpheme into unit subtitle sentences written in kanji and kana mixed sentences. It has an additional information addition function that adds additional information such as shape, part of speech, reading, standard expression, etc., and an information element sequence acquisition function that groups each morpheme into phrases or clause units and obtains several information element sequences. ing. As a result, the unit subtitle sentence is expressed as a surface element sequence, a symbol element sequence (part of speech sequence), a standard element sequence, and an information element sequence.

【００３６】分割ルール記憶部１２１は、単位字幕文を
対象とした表示単位字幕化への改行・改頁箇所の最適化
を行う際に参照される分割ルールを記憶する機能を有し
ている。The division rule storage unit 121 has a function of storing a division rule that is referred to when optimizing a line feed / page break portion for display unit subtitle conversion for a unit subtitle sentence.

【００３７】Ｄ−ＶＴＲ１２３は、番組素材が収録され
ている番組素材ＶＴＲテープから、映像、音声、及びそ
れらのタイムコードを再生出力する機能を有している。The D-VTR 123 has a function of reproducing and outputting video, audio, and their time codes from the program material VTR tape in which program materials are recorded.

【００３８】次に、自動字幕制作部１１１において主要
な役割を果たす統合化装置１１７の内部構成について説
明していく。Next, the internal structure of the integration device 117 that plays a major role in the automatic caption production unit 111 will be described.

【００３９】統合化装置１１７は、単位字幕文抽出部１
３３と、表示単位字幕化部１３５と、タイミング情報付
与部１３７と、を含んで構成されている。The integrating device 117 includes the unit caption sentence extraction unit 1
33, a display unit captioning unit 135, and a timing information adding unit 137.

【００４０】単位字幕文抽出部１３３は、電子化原稿記
録媒体１１３から読み出した、単位字幕文が表示時間順
に配列された字幕文テキストのなかから、例えば句点や
７０〜９０字幕文字程度を目安とし、付加した区切り可
能箇所情報等を活用するなどして処理単位とするテキス
ト文を順次抽出する機能を有している。なお、区切り可
能箇所情報としては、形態素解析部１１９で得られた文
節データ付き形態素解析データ、及び分割ルール記憶部
１２１に記憶されている分割ルール（改行・改頁デー
タ）を利用することもできる。ここで、上述した分割ル
ール（改行・改頁データ）について述べると、分割ルー
ル（改行・改頁データ）で定義される改行・改頁推奨箇
所は、第１に句点の後ろ、第２に読点の後ろ、第３に文
節と文節の間、第４に形態素品詞の間、を含んでおり、
分割ルール（改行・改頁データ）を適用するにあたって
は、上述した記述順の先頭から優先的に適用するのが好
ましい。The unit subtitle sentence extraction unit 133 uses, for example, a punctuation mark or about 70 to 90 subtitle characters as a guideline among the subtitle sentence texts in which the unit subtitle sentences are arranged in the display time order, which is read from the electronic document recording medium 113. , And has a function of sequentially extracting text sentences as processing units by utilizing the added delimitable part information and the like. Note that the morpheme analysis data with phrase data obtained by the morpheme analysis unit 119 and the division rule (line feed / page break data) stored in the division rule storage unit 121 can be used as the delimitable part information. . Here, the above-mentioned division rule (line feed / page break data) is described. The recommended line feed / page feed point defined by the division rule (line feed / page break data) is first after the punctuation and secondly at the reading point. After, the third is between bunsetsu, the fourth is between morphological parts of speech,
When applying the division rule (line feed / page break data), it is preferable to apply the division rule preferentially from the beginning of the description order.

【００４１】表示単位字幕化部１３５は、単位字幕文抽
出部１３３で抽出した単位字幕文、単位字幕文に付加し
た区切り可能箇所情報、及び同期検出装置１１５からの
情報等に基づいて、単位字幕文抽出部１３３で抽出した
単位字幕文を、所望の表示形式に従う少なくとも１以上
の表示単位字幕に変換する表示単位字幕化機能を有しタ
イミング情報付与部１３７は、表示単位字幕化部１３５
で変換された表示単位字幕に対し、同期検出装置１１５
から送出されてきた字幕文テキストの各文単位のポーズ
情報や同期検出点情報としてのタイムコードを利用し、
さらに適切なタイミング内挿手法を用いてタイミング情
報を付与するタイミング情報付与機能を有している。The display unit subtitle conversion unit 135, based on the unit subtitle sentence extracted by the unit subtitle sentence extraction unit 133, the delimitable part information added to the unit subtitle sentence, the information from the synchronization detection device 115, etc. The timing information addition unit 137 has a display unit subtitle conversion function of converting the unit subtitle sentence extracted by the sentence extraction unit 133 into at least one display unit subtitle according to a desired display format, and the timing information addition unit 137 includes a display unit subtitle conversion unit 135.
With respect to the display unit subtitles converted by
Using the time code as pause information and synchronization detection point information for each sentence of the subtitle sentence text sent from
Further, it has a timing information adding function for adding timing information using an appropriate timing interpolation method.

【００４２】次に、本発明に係る逐次自動字幕制作処理
システムで実行される字幕へのタイミング情報付与方法
の例について、図３乃至図７を参照しつつ説明する。Next, an example of a method for adding timing information to subtitles, which is executed by the sequential automatic subtitle production processing system according to the present invention, will be described with reference to FIGS. 3 to 7.

【００４３】既述したように、アナウンス音声に対応す
る字幕に関するタイミング情報の同期検出は、本発明者
らがすでに研究開発したアナウンス音声を対象とした音
声認識処理を含むアナウンス音声と字幕文テキスト間の
同期検出技術を適用することで高精度に実現可能である
が、この同期検出処理は前記のようにかなり複雑であ
り、多くの処理時間を要する。このため、各表示単位字
幕の全ての始点／終点を対象として同期検出技術を適用
したのでは、同期検出点が過多となることも含め、字幕
番組の制作に非常に長い時間を必要とし、逐次処理が不
可能である。As described above, the synchronization detection of the timing information regarding the subtitle corresponding to the announcement voice is performed between the announcement voice and the subtitle sentence text including the voice recognition processing for the announcement voice which has been already researched and developed by the present inventors. Although it can be realized with high accuracy by applying the synchronization detection technology of No. 1, this synchronization detection processing is considerably complicated as described above and requires a lot of processing time. For this reason, if the synchronization detection technology is applied to all the start points / end points of each display unit subtitle, it takes a very long time to create a subtitle program, including an excessive number of synchronization detection points. Processing is impossible.

【００４４】また、字幕文テキストを字幕表示に適した
行数、文字数の各表示単位字幕文に分割する際、アナウ
ンスの長いポーズ（ナレーションの隙間）にまたがる字
幕分割は好ましくない。しかし、極めて短いポーズの場
合は、むしろ連続した文として扱う方が好ましいので、
字幕分割にはアナウンスのポーズ時間を考慮する必要が
ある。この点に関しては、例えば、ブロック・ケプスト
ラム・フラックス法などを適用して、音声レベルやその
継続時間などの特徴を巧みに処理することにより、例え
ば字幕文テキストへの文単位でのアナウンス音声の開
始、終了タイミングやこれに伴う文間ポーズ時間をポー
ズデータとして検出することが可能である。しかも、こ
れらの処理は、番組音声時間の数分の一以下の時間で高
速処理することができる。Further, when the subtitle text is divided into display unit subtitle sentences of which the number of lines and the number of characters are suitable for subtitle display, it is not preferable to divide the subtitle over a long announcement pause (narration gap). However, in the case of extremely short poses, it is preferable to treat them as continuous sentences, so
It is necessary to consider the pause time of the announcement for subtitle division. Regarding this point, for example, by applying the block cepstrum flux method and the like, by skillfully processing features such as the voice level and its duration, for example, the start of an announcement voice in sentence units to subtitle sentence text is started. It is possible to detect the end timing and the inter-sentence pause time associated therewith as pause data. Moreover, these processes can be performed at high speed in a time period which is a fraction of the program audio time or less.

【００４５】ただし、この方法は音声にアナウンス音声
以外の音声が混じっている場合など、正しいタイミング
検出を阻害する要因もあるので、その検出結果を検証
し、確度の高いタイミングのみを使用しなければならな
い。ただし、適切な手法を適用すれば、各字幕文テキス
トにおける開始、終了のタイミングのかなりの部分は、
前記の音声レベルなどを用いて検出した前記のタイミン
グを適用することができる。そして、表示単位字幕文の
開始、終了タイミングにも適用するが、不足なものは、
後述する適切なタイミング内挿手法により付与する。な
お、音声レベルなどを用いるタイミング検出で必要な結
果が得られない部分は、従来の字幕文テキストとアナウ
ンス音声との音声処理技術を活用した照合法を適用す
る。However, this method has a factor that hinders the correct timing detection, such as when a voice other than the announce voice is mixed in the voice, so the detection result must be verified and only the highly accurate timing should be used. I won't. However, if you apply an appropriate method, a considerable part of the start and end timings in each subtitle text will be
It is possible to apply the timing detected using the voice level or the like. And it applies to the start and end timings of display unit subtitle sentences,
It is given by an appropriate timing interpolation method described later. For the part where the required result cannot be obtained by the timing detection using the voice level or the like, the conventional matching method utilizing the voice processing technology of the subtitle sentence text and the announcement voice is applied.

【００４６】内挿手法の例は、図３のフローチャートに
示すように、表示単位字幕文として字幕文テキストを要
約せずにそのまま用いる場合、先ず、音声データからポ
ーズ開始タイミング、継続時間を求める。その適否を検
証し選択する（ステップＳＴ１）。As an example of the interpolation method, as shown in the flowchart of FIG. 3, when the subtitle sentence text is used as it is as a display unit subtitle sentence without being summarized, first, the pause start timing and the duration are obtained from the audio data. The suitability is verified and selected (step ST1).

【００４７】次に、比較的長いポーズ（例えば２秒以
上）で字幕用テキストをブロックに分割し、ブロックテ
キスト文としてその開始、終了タイミングを付与する
（ステップＳＴ２、３）。Next, the subtitle text is divided into blocks with a relatively long pause (for example, 2 seconds or more), and the start and end timings thereof are given as block text sentences (steps ST2, 3).

【００４８】次いで、ブロックテキスト文の継続時間を
その総読み数（計算推定）で割り、当該範囲の平均読み
速度を求める（ステップＳＴ４）。Next, the duration of the block text sentence is divided by the total number of readings (calculation estimation) to obtain the average reading speed in the range (step ST4).

【００４９】次いで、各ブロックテキスト文を、中の長
さのポーズ箇所を改行点とする表示単位字幕文に分割す
る。この場合、分割ルールを適用する（ステップＳＴ
５）。Next, each block text sentence is divided into display unit subtitle sentences in which the pause portion of the middle length serves as a line feed point. In this case, the division rule is applied (step ST
5).

【００５０】次いで、各分割字幕文の文頭、文末に対応
するタイミングを、ブロックテキスト文の開始、終了タ
イミングやポーズのタイミング、平均読み速度を基に計
算し、付与する。この場合、文字数、文字種法、または
発音数法を適用する（ステップＳＴ６）そして、各表示単位字幕の表示時間をチェックし、必要
ならば終了タイミングを修正する（ステップＳＴ７）。Then, the timings corresponding to the beginning and end of each divided subtitle sentence are calculated and added based on the start and end timings of the block text sentence, the pause timing, and the average reading speed. In this case, the number of characters, the character type method, or the pronunciation method is applied (step ST6), the display time of each display unit subtitle is checked, and the end timing is corrected if necessary (step ST7).

【００５１】次に、上述した図３に示す要約処理をしな
い場合のタイミング情報の付与の処理手順について図４
乃至図６に示す具体例を用いて説明する。Next, FIG. 4 shows a processing procedure for adding timing information when the above-mentioned summary processing shown in FIG. 3 is not performed.
Through the specific example shown in FIGS.

【００５２】図４は、音声のポーズ検出によるポーズ情
報を活用した、表示単位字幕へのタイミング付与例にお
ける字幕用原文テキスト、図５は、図４に示した各字幕
用原文テキストのかな数、漢字数、読み、時間、ポー
ズ、テキストのスタート時間、テキストのストップ時
間、次のテキスト文のスタート時間、および読速度をそ
れぞれ示している。また、図６は図４に示した字幕用原
文テキストから作成された表示単位字幕文（／の左側が
一行目、／の右側が二行目）とそのタイミング情報を示
している。FIG. 4 is a subtitle original text in an example of timing addition to display unit subtitles utilizing the pause information by voice pause detection, and FIG. 5 is a kana number of each subtitle original text shown in FIG. The number of kanji, reading, time, pause, start time of text, stop time of text, start time of next text sentence, and reading speed are shown. Further, FIG. 6 shows display unit subtitle sentences (the left side of / is the first line and the right side of / is the second line) created from the subtitle original text shown in FIG. 4 and its timing information.

【００５３】図４のNo.１「今日の舞台は東アフリカケ
ニアの大草原です。」とある字幕原文テキストでは、図
５から理解できるように、かな数は“１２”、漢字数は
“８”、読み数（ｙｏｍｉ）は、“２５．０２”、読み
の時間は“３８５０ｍＳ”、ポーズの時間は、“１０１
０ｍＳ”、スタート時間は、４９１５０ｍＳ（４９．１
５０Ｓ）、ストップ時間は、“５３０００ｍＳ（＝５
３．０００Ｓ）、次のスタート時間は“５４０１０ｍＳ
（５４．０１０Ｓ）”、話速は、１５．３９ｍＳ（＝３
８５／２５．０２）となる。ここで、“ｙｏｍｉ”は、
漢字部分（および数字部分）がかなの約１．８６倍の読
み時間で表わすことができることから、１２＋７×１．
８６＝２５．０２と計算したものである。As can be understood from FIG. 5, in the subtitle original text "No. 1 in FIG. 4"Today's stage is the Great Plains of Kenya in East Africa ", the number of kana is" 12 "and the number of kanji is" 8 ". , The reading number (yomi) is “25.02”, the reading time is “3850 mS”, and the pause time is “101”.
0 ms ”, start time is 49150 ms (49.1
50S), stop time is "53000mS (= 5
3.000S), next start time is "54010mS
(54.010S) ", the speech speed is 15.39 mS (= 3
85 / 25.02). Here, "yomi" is
Since the kanji part (and the number part) can be represented in about 1.86 times the reading time of kana, 12 + 7 × 1.
It is calculated as 86 = 25.02.

【００５４】このようにして求められたポーズ情報中
の、比較的長いポーズ（例えば２秒以上）で字幕用テキ
ストをブロックに分割し、ブロックテキスト文としてそ
の開始、終了タイミングを付与して作成（図３のステッ
プＳＴ１〜ＳＴ５の処理で作成）された表示単位字幕文
が図６に示されている。図６中、太い実線で囲んだ数字
が計算で求められた時間であり、長い処理時間を必要と
する可能性のある同期検出点としての指定を大幅に低減
できることを示している。また、右端に「ブロック」と
して示す区切りは、このブロック間に２秒程度以上のポ
ーズがあり、それを根拠として字幕処理単位を設定した
ものである。つまり字幕用の改行、改頁、タイミング処
理がそれぞれの処理単位内で完結させることができる区
切りであり、またそのタイミング付与処理などが、その
ブロックの時間以内に完了（本発明による高速化で）で
きるようになれば、自動字幕データ作成はブロック時間
経過後には終了し、この部分については直ちに予備試写
可能となり、逐次字幕制作・試写手法が適用可能とな
る。この手法の所要時間は、ほぼ番組時間と同じであ
り、全字幕データ作成後に試写する場合は２倍以上の時
間となるので、大幅に時間短縮を可能とする大きな効果
がある。The subtitle text is divided into blocks with a relatively long pose (for example, 2 seconds or more) in the pose information thus obtained, and the start and end timings thereof are added as block text sentences to create ( The display unit caption text created in the processing of steps ST1 to ST5 of FIG. 3 is shown in FIG. In FIG. 6, the number surrounded by a thick solid line is the time obtained by calculation, and it is shown that the designation as a synchronization detection point that may require a long processing time can be significantly reduced. Further, the delimiter shown as “block” at the right end has a pause of about 2 seconds or more between the blocks, and the subtitle processing unit is set on the basis of this. In other words, line breaks, page breaks, and timing processing for subtitles are breaks that can be completed within each processing unit, and timing addition processing and the like are completed within the time of the block (according to the speedup of the present invention). If it becomes possible, the automatic subtitle data creation will be terminated after the block time has elapsed, and this part will be ready for preliminary preview, and the successive subtitle production / testing method will be applicable. The time required for this method is almost the same as the program time, and when previewing after creating all the subtitle data, it is twice or more the time, so there is a great effect that the time can be greatly shortened.

【００５５】《ブロック・ケプストラム・フラックス法
などによる音声のポーズ区間の検出》本発明では、音声
のポーズ区間を検出する方法の例として、音声のレベル
情報を利用した例えばブロック・ケプストラム・フラッ
クス法などを用いて行った。ブロック・ケプストラム・
フラックス法は、音響データ内の複数のＬＰＣケプスト
ラムベクトルを基準フレームから相互に比較すること
で、音響データ内容の切り替わり点をより安定に検出す
る手法である。<< Detection of Voice Pause Section by Block Cepstrum Flux Method >> In the present invention, as an example of a method of detecting a voice pause section, for example, a block cepstrum flux method using voice level information is used. Was performed using. Block cepstrum
The flux method is a method of more stably detecting the switching point of the acoustic data content by comparing a plurality of LPC cepstrum vectors in the acoustic data with each other from the reference frame.

【００５６】図７は、実際のテレビ番組（ハンドウイル
カ）の音声をブロック・ケプストラム・フラックス法を
用いて分析した結果を示している。なお、実際には、左
右両チャンネルの音声があるが、図７では、右チャンネ
ルの結果のみが示されている。また、図７において、棒
グラフは実際に調べた音声（スピーチ）区間を示してい
る。解析波形を適当なレベル（例えば、図７では０．０
５５）でスライスして、上の範囲を音声（スピーチ）区
間として比較すると、所定の継続時間以上では棒グラフ
で示す音声（スピーチ）区間とかなり一致しているのが
分かる。一方、“↑”で示す部分がポーズ区間を示して
いるが、これもかなり一致している。FIG. 7 shows the result of analysis of the sound of an actual TV program (hand dolphin) using the block cepstrum flux method. Although there are actually left and right channel sounds, FIG. 7 shows only the right channel results. Further, in FIG. 7, the bar graph shows the actually investigated voice (speech) section. Set the analysis waveform to an appropriate level (eg, 0.0 in FIG. 7).
When sliced at 55) and comparing the above range as a speech (speech) section, it can be seen that at a predetermined duration or longer, it is in good agreement with the speech (speech) section shown by the bar graph. On the other hand, the part indicated by "↑" indicates the pause section, which also agrees considerably.

【００５７】《ポーズ検出法の改良と検出したポーズの
検証法》ポーズの検出法の改良と検出したポーズの検証
法として、以下ような手法を適用する。<< Improvement of Pose Detection Method and Verification Method of Detected Pose >> As an improvement method of the pose detection method and a verification method of the detected pose, the following method is applied.

【００５８】例えば、他の背景音に対する前記のアナウ
ンス音声の主な特徴を活用する、ポーズ検出の方法とし
ては、先ず、入力音声から、帯域制限音声を形成し、次
に、指定区間の音声レベルを規準化する（指定区間内の
高レベル音声で規準化）。次いで、音声のパワー値を求
め、その積分処理を行う（窓関数は、帯域制限と関
連）。For example, as a method of pause detection that makes use of the main features of the announcement voice with respect to other background sounds, first, a band-limited voice is formed from the input voice, and then the voice level of the designated section. Is standardized (normalized by high-level voice in the specified section). Next, the power value of the voice is obtained and its integration processing is performed (the window function is associated with band limitation).

【００５９】また、ポーズの検証方法としては、先ず、
音声パワーのスレッシュホールドを設定する（ｅｘ．最
高レベルに対して、１／４，１／９，１／１６）。次い
で、設定した各スレッシュホールドでのポーズをそれぞ
れ求める（順にＰ１，Ｐ２，Ｐ３）。次に、ポーズの確
度をそれぞれ求める。次に、継続時間によるポーズのチ
ェック（一定時間以上の場合有効）をし、また、字幕文
テキストの句点、読点とのタイミング相関をチェックす
る。このチェックでは、一定時間以内の場合有効として
取り扱う。As a method of verifying the pose, first,
Set the threshold of voice power (ex. 1/4, 1/9, 1/16 with respect to the highest level). Then, the pose at each of the set thresholds is obtained (P1, P2, P3 in order). Next, each pose accuracy is calculated. Next, the pause is checked by the duration (valid for a certain time or more), and the timing correlation with the punctuation marks and the reading marks of the subtitle text is checked. In this check, if it is within a certain time, it is treated as valid.

【００６０】このようにして、検出されたポーズに対し
てポーズの検出法の改良と検出したポーズの検証法を行
うことにより、より正確なタイミング付与が可能とな
る。なお、ポーズ検出法としてより改良された方法もあ
り、例えば、スピーチ近似データを作成して、それを活
用し、スピーチ区間を容易に把握できるようにすること
で、スピーチの開始・終了タイミングを把握し、ポーズ
を検出するものである。次に、図８、図９を参照してこ
のポーズ検出を説明する。In this way, more accurate timing can be given by improving the method of detecting the pose and verifying the detected pose for the detected pose. There is also an improved method as a pause detection method, for example, by creating approximate speech data and using it to easily grasp the speech segment, the start and end timing of speech can be grasped. Then, the pose is detected. Next, this pose detection will be described with reference to FIGS.

【００６１】図８は、スピーチ近似データとして音声デ
ータ波形５１を表示した例である。FIG. 8 is an example in which a voice data waveform 51 is displayed as the speech approximate data.

【００６２】横軸は、番組の時間経過を示したタイムラ
インであり、音声を再生するとこの経過時間に応じた位
置にカーソルが表示され、かつ時間経過とともに移動す
るようにしてある。したがって、カーソルの各位置にお
ける再生音声と音声波形の対応付けができる。The horizontal axis is a timeline showing the passage of time of the program, and when the sound is reproduced, the cursor is displayed at a position corresponding to this elapsed time and moves with the passage of time. Therefore, the reproduced voice and the voice waveform at each position of the cursor can be associated with each other.

【００６３】音声における背景音が充分小さい場合とか
波形に関する経験状況によっては、この音声波形データ
からスピーチタイミングをある程度把握することができ
るが、通常の番組音声では、種々の背景音がありそのレ
ベルも様々であることから、一般的には、この音声波形
データからスピーチの開始・終了タイミングを正確に把
握することは難しい。The speech timing can be understood to some extent from the sound waveform data depending on the background sound in the sound being sufficiently small or the experience of the waveform, but in the normal program sound, there are various background sounds and their levels are also different. Because of the variety, it is generally difficult to accurately grasp the start and end timings of speech from this voice waveform data.

【００６４】ここで、スピーチ成分を強調したスピーチ
近似データを利用するとタンミング把握の確度を高める
ことが可能となる。Here, by using the speech approximation data in which the speech component is emphasized, it is possible to increase the accuracy of grasping the taming.

【００６５】図９は、音声データを特殊処理したスピー
チ近似データを用いた例である。図９において、波形６
１は音声のcflx解析値（ブロック・ケプストラム・フラ
ックス法による）、波形６２は音声power値の特定周波
数範囲（例えば４〜７Hz）成分抽出値、波形６３は波形
６２を適当なレベルでスライスし、２値化したデータで
ある。FIG. 9 shows an example in which speech approximation data obtained by specially processing voice data is used. In FIG. 9, the waveform 6
1 is a cflx analysis value of the voice (by the block cepstrum flux method), a waveform 62 is a specific frequency range (for example, 4 to 7 Hz) component extraction value of the voice power value, and a waveform 63 is a slice of the waveform 62 at an appropriate level. This is binarized data.

【００６６】波形６３において、高レベル範囲はスピー
チ、低レベル範囲は非スピーチ（ポーズ）の区間を表し
ており、この例ではほとんど実測したタイミングと合致
しているが、波形６２の方が精度が高い。したがって、
波形６３から音声中のスピーチの開始・終了タイミング
をある程度正確に把握することができる。In the waveform 63, the high level range represents a speech section and the low level range represents a non-speech (pause) section. In this example, almost coincides with the actually measured timing, but the waveform 62 is more accurate. high. Therefore,
From the waveform 63, it is possible to grasp the start and end timings of the speech in the voice with some accuracy.

【００６７】このように、音声データを特殊処理したス
ピーチ近似データを、スピーチ区間指定の指針として活
用することで、より確度の高いポーズとして利用でき
る。As described above, by using the speech approximation data obtained by specially processing the voice data as a guideline for designating the speech section, it is possible to use it as a pose with higher accuracy.

【００６８】《試写・修正支援部１５１の構成と作用》
図１に示すように、試写・修正支援部１５１は、記憶装
置１５３と、モニター装置１５５と、遅延装置１５７
と、キーボード１５９とを含んで構成されている。<< Structure and Operation of Preview / Correction Support Unit 151 >>
As shown in FIG. 1, the preview / correction support unit 151 includes a storage device 153, a monitor device 155, and a delay device 157.
And a keyboard 159.

【００６９】記憶装置１５３は、自動字幕制作部１１１
で作成された字幕データを記憶するとともに、モニター
装置１５５上で発見された修正データを記憶する。The storage device 153 has an automatic caption production section 111.
In addition to storing the subtitle data created in step 1, the correction data found on the monitor device 155 is stored.

【００７０】モニター装置１５５は、自動字幕制作部１
１１から出力される少なくとも字幕データと遅延装置１
５７から出力される映像、音声とを受けてモニター画面
に映像と字幕文を表示し音声を出力する。即ち、モニタ
ー装置１５５では、処理単位時間に相当する遅延装置１
５７を介した映像・音声とともに自動制作した字幕を表
示して、逐次の字幕制作と並行して、制作した字幕に対
し、実時間で実施可能な範囲の予備試写が行えるように
なっている。また、モニター装置１５５は単なるモニタ
ーではなく、キーボード１５９からの入力、記憶装置１
５３の入出力、簡単な信号処理機能を備えているもので
ある。The monitor device 155 is the automatic subtitle production unit 1.
11 and at least the subtitle data and the delay device 1
Upon receiving the video and audio output from 57, the video and subtitles are displayed on the monitor screen and the audio is output. That is, in the monitor device 155, the delay device 1 corresponding to the processing unit time is used.
The subtitles that have been automatically produced are displayed together with the video / audio via 57, and in parallel with the successive subtitle production, a preliminary preview within a range that can be performed in real time can be performed on the produced subtitles. Further, the monitor device 155 is not a simple monitor, but an input from the keyboard 159, a storage device 1
It has the input / output of 53 and a simple signal processing function.

【００７１】遅延装置１５７は、番組素材ＶＴＲ１２３
からの映像、音声、必要ならばタイムコードを少なくと
も上記字幕制作処理単位の時間分遅延させることがで
き、かつ遅延時間を可変操作できるようになっている。
遅延装置１５７の出力は、モニター装置１５５に与えら
れている。The delay device 157 uses the program material VTR123.
The video, audio, and time code, if necessary, can be delayed by at least the time of the subtitle production processing unit, and the delay time can be variably operated.
The output of the delay device 157 is given to the monitor device 155.

【００７２】ここで、「処理単位」としては、例えば、
比較的長い非スピーチ区間（例えば、３秒以上）の存在
周期を考慮した、字幕処理の区切りとする字幕制作処理
単位時間を設定できる。これにより、番組素材ＶＴＲ１
２３から連続再生される音声に応答してその処理単位時
間毎に逐次字幕データを制作するものである。自動字幕
制作部１１１で生成された字幕データは、モニター装置
１５５に与えられている。Here, as the "processing unit", for example,
It is possible to set a subtitle production processing unit time as a delimiter of subtitle processing in consideration of the existence period of a relatively long non-speech section (for example, 3 seconds or more). As a result, the program material VTR1
In response to the sound continuously reproduced from No. 23, caption data is sequentially produced for each processing unit time. The caption data generated by the automatic caption production unit 111 is given to the monitor device 155.

【００７３】この予備試写時の作業を支援し、できるだ
け内容の豊富なチェックを実時間で行えるようにするた
めに、モニター装置１５５は、モニター画面に表示され
た字幕文について、キーボード１５９からキー入力があ
ったとき、字幕修正のために操作されたキーの種別とそ
の時の字幕ページ番号もしくはタイムコードとからなる
修正データを記憶装置１５３に記録する処理を行う。In order to support the work at the time of this preliminary preview and enable checking with as much content as possible in real time, the monitor device 155 uses the keyboard 159 to input the subtitle text displayed on the monitor screen. When there is, a process of recording the correction data including the type of the key operated for caption correction and the caption page number or time code at that time in the storage device 153 is performed.

【００７４】また、モニター装置１５５は、記憶装置１
５３に記録された内容をキーの種別毎にまたはタイミン
グ情報毎に集計し、集計結果をモニター画面に一覧表示
可能に記憶装置１５３に蓄積する。そして、字幕の修正
作業時にキーボード１５９からの指令を受けて一覧デー
タを記憶装置１５３から読み出しモニター画面に表示す
る処理の実行する。The monitor device 155 is the storage device 1.
The contents recorded in 53 are totaled for each type of key or each timing information, and the totalized result is stored in the storage device 153 so that it can be displayed as a list on the monitor screen. Then, upon receiving a command from the keyboard 159 during the subtitle correction work, a process of reading the list data from the storage device 153 and displaying it on the monitor screen is executed.

【００７５】キーボード１５９は、各種の修正データを
入力するために、図１０に示すような、試写・修正処理
に必要な機能を割り当てた各種のキーを備えている。The keyboard 159 is provided with various keys to which various functions necessary for preview / correction processing are assigned, as shown in FIG. 10, in order to input various kinds of correction data.

【００７６】＜試写・修正部１５１の作用＞次に、図１
０〜図１４を用いて試写・修正を支援するための、字幕
修正情報収集機能等を説明する。なお、図１０は、具体
的なキーの機能付与例を示す図である。図１１は、キー
操作の記録ファイルの構成例である。図１２は、図１１
に示す修正押下キー合計情報１６１の具体例を示す図で
ある。図１２は、図１１に示す修正押下キー詳細情報１
６３の具体例を示す図である。図１３は、修正作業時に
一覧表示する記録ファイルの内容例を示す図である。<Operation of Preview / Correction Unit 151> Next, referring to FIG.
A subtitle correction information collection function and the like for supporting the preview / correction will be described with reference to FIGS. Note that FIG. 10 is a diagram showing a specific example of key function addition. FIG. 11 shows an example of the structure of a key operation recording file. 12 is the same as FIG.
It is a figure which shows the specific example of the correction press key total information 161 shown in FIG. FIG. 12 shows the modification press key detailed information 1 shown in FIG.
It is a figure which shows the specific example of 63. FIG. 13 is a diagram showing an example of contents of recording files displayed in a list at the time of correction work.

【００７７】記録内容を説明する。図１１に示すよう
に、修正押下キーの内容は、修正押下キー合計情報１６
１と修正押下キー詳細情報１６３とに分けて記録され
る。The recorded contents will be described. As shown in FIG. 11, the content of the correction press key is the correction press key total information 16
1 and the correction press key detailed information 163 are recorded separately.

【００７８】図１０において、「↑key」は、字幕位置
を上に変更したい場合のキーである。「↓key」は、字
幕位置を下に変更したい場合のキーである。「PageDown
key」は、表示タイミングが前にずれていることを示す
キーである。「PageUpkey」は、表示タイミングが後ろ
にずれていることを示すキーである。「Endkey」は、表
示タイミングが正常であることを示すキーである。「F1
key」は、字幕文の内容が異常であることを示すキーで
ある。In FIG. 10, "↑ key" is a key for changing the subtitle position upward. "↓ key" is a key for changing the subtitle position downward. "Page Down
The "key" is a key indicating that the display timing is shifted forward. “Page Up key” is a key indicating that the display timing is shifted backward. "End key" is a key indicating that the display timing is normal. "F1
"key" is a key indicating that the content of the subtitle text is abnormal.

【００７９】したがって、予備試写担当者は、番組映像
・音声を参照し、モニター画面に表示された自動制作字
幕を見ながら以下のようにキーボードを操作することに
より、不具合のある字幕の指定と、その大まかな不具合
内容（あるいは修正内容）を指摘し記録することができ
る。Therefore, the person in charge of preliminary preview refers to the video / audio of the program and operates the keyboard as follows while watching the automatically produced subtitles displayed on the monitor screen to specify the defective subtitles. It is possible to point out and record the rough contents of the defect (or the contents of the correction).

【００８０】具体的には、（１）字幕の位置を上方に修
正したい場合には「↑」のキーを押す。オープン字幕と
干渉するのを避ける場合等である。（２）「↑」キーを
押し過ぎた場合には「↓」のキーを押す。（３）表示タ
イミングが、前にずれた箇所では「PageDown」のキーを
押す。（４）同じく、後ろにずれた箇所では「PageUp」
のキーを押す。（５）ＯＫになった箇所では「End」の
キーを押す。（６）字幕文の内容が良くない箇所では
「Ｆ１」のキーを押す。（７）なお、他にワープロ機能
を使用してのメモを付けることもできる。Specifically, (1) When it is desired to correct the subtitle position upward, the "↑" key is pressed. For example, to avoid interfering with open subtitles. (2) If the "↑" key is pressed too much, press the "↓" key. (3) Press the "Page Down" key when the display timing is shifted to the front. (4) Similarly, "PageUp" at the back
Press the key. (5) Press the "End" key when OK. (6) Press the "F1" key when the content of the subtitles is not good. (7) In addition, a memo using the word processing function can be attached.

【００８１】したがって、修正押下キー詳細情報１６３
のファイルは、不具合の指摘である修正Keyの押し下げ
ごとに、図１３に示す「修正key押下タイムコード」
「修正押下キー名」「字幕文」「開始タイムコード」
「終了タイムコード」が記録される。Therefore, the modification press key detailed information 163.
The file of "correction key pressing time code" shown in Fig. 13 is displayed for each pressing of the correction key, which is the indication of the problem.
"Corrected key name""Subtitlesentence""Start time code"
The "end time code" is recorded.

【００８２】なお、「修正key押下タイムコード」は、
先頭から時分秒フレームが各２バイトで示される。「修
正押下キー名」には、図１０における項目のkey名が示
される。「字幕文」には、修正Key押し下げ時にモニタ
ーに表示されている字幕文が示される。「開始タイムコ
ード」「終了タイムコード」は、当該字幕データ制作時
に付与されたものであり、それぞれ、先頭から時分秒フ
レームが各２バイトで示される。The "correction key press time code" is
An hour, minute, second frame from the beginning is indicated by 2 bytes each. The "correction press key name" shows the key name of the item in FIG. The "subtitle sentence" shows the subtitle sentence displayed on the monitor when the correction key is pressed. The “start time code” and “end time code” are added when the caption data is created, and the hour, minute, and second frames are indicated by 2 bytes each from the beginning.

【００８３】以上の操作を予備試写担当者が番組の最後
まで行うと、記憶装置１５３には、図１１の修正押下キ
ー詳細情報１６３の情報の外、これらの修正押下キー合
計情報１６１として図１２に示す情報が自動的に記録さ
れる。When the person in charge of preliminary preview performs the above operation to the end of the program, the storage device 153 stores the correction press key detailed information 163 in FIG. The information shown in is automatically recorded.

【００８４】予備試写が終了すると、字幕修正時に一覧
表示される一覧データが記憶装置１５３の記録ファイル
に蓄積される。その一覧データは、図１４に示すよう
に、「修正key合計情報」と「修正key詳細情報」とから
なっている。When the preliminary preview is completed, the list data displayed as a list when the caption is corrected is accumulated in the recording file of the storage device 153. As shown in FIG. 14, the list data includes “correction key total information” and “correction key detailed information”.

【００８５】修正押下キー合計情報１６１のファイル
は、図１２に示すように、「項目」と「内容説明と合計
値」の欄で構成されている。「項目」の欄には、「↑ke
y合計」、「↓key合計」、「PageDownkey合計」、「Pag
eUpkey合計」、「Endkey合計」、「F1key合計」、「そ
の他key合計」がそれぞれ記録される。「内容説明と合
計値」の欄には、項目欄の対応するキー名についての内
容説明と合計値が記録される。As shown in FIG. 12, the file of the correction press key total information 161 is composed of columns of "item" and "contents description and total value". In the "Item" column, "↑ ke
y total ”,“ ↓ key total ”,“ PageDownkey total ”,“ Pag
"eUpkey total", "Endkey total", "F1 key total", "Other key total" are recorded respectively. In the "contents description and total value" field, the content description and the total value for the corresponding key name in the item field are recorded.

【００８６】「修正key合計情報」は、例えば“F1key：
字幕文内容異常：３”“pageDownkey：表示タイミング
前にズレ：１”“↑key：字幕位置を上に変更：０”
“↓key：字幕位置を下に変更：０”“pageUpkey：表示
タイミング前にズレ：０”“Endkry：表示タイミング正
常：０”“その他：０”“修正総合係数：４”となって
いる。The "correction key total information" is, for example, "F1key:
Subtitle sentence content error: 3 "" pageDownkey: Deviation before display timing: 1 "" ↑ key: Change subtitle position up: 0 "
"↓ key: Change subtitle position downward: 0""pageUpkey: Deviation before display timing: 0""Endkry: Display timing normal: 0""Others:0""Correction total coefficient: 4".

【００８７】また、「修正key詳細情報」は、“0000000
5,pageDown,00000008,F1,岐阜県の飛騨地方に、0000000
1,00000009,直結する安房トンネルが今、00000020,0000
0029”などとなっている。00000005と00000008は、２桁
づつで、時、分、秒、フレームを表している。0000000
1,00000009と00000020,00000029はタイムコードであ
る。The "correction key detailed information" is "0000000".
5, pageDown, 00000008, F1, in Hida region of Gifu prefecture, 0000000
1,00000009, Awa Tunnel directly connected is now 00000020,0000
0029 ”and the like. 00000005 and 00000008 are two digits each and represent hour, minute, second, and frame.
1,00000009 and 00000020,00000029 are time codes.

【００８８】次いで、図１４に示すような記録ファイル
の内容が、字幕修正時にキーボード２１からの指令を受
けてモニター画面に表示される。字幕修正作業者は、モ
ニター画面の一覧表示における、「修正key合計情報」
から修正内容の全体の様子を掴むことができ、「修正ke
y詳細情報」から個々の字幕文について要修正個所を的
確に把握することができる。したがって、試写・修正で
の字幕修正作業を効果的に支援することができる。Then, the contents of the recording file as shown in FIG. 14 are displayed on the monitor screen in response to a command from the keyboard 21 when the caption is corrected. The caption correction operator can display the “correction key total information” in the list display on the monitor screen.
You can grasp the whole state of the correction contents from,
From "y detailed information", it is possible to accurately grasp the part to be corrected for each subtitle sentence. Therefore, it is possible to effectively support the subtitle correction work in the preview / correction.

【００８９】[0089]

【発明の効果】以上説明したように、本発明では、各字
幕作成プロセスを番組時間内で十分完了するように高速
化改良を行って、設定した字幕制作処理単位の時間毎に
逐次字幕データを制作できるようにした。その結果、逐
次字幕制作が可能となり、自動制作作業開始から番組実
時間で自動字幕制作が完了し、また番組実時間＋最大処
理単位時間後には、自動字幕制作と予備試写による修正
支援データの取得まで終了することができ、作業者の負
担が少なく、かつ作業時間を大幅に低減できる。As described above, according to the present invention, the speedup and improvement are performed so that each subtitle creation process is sufficiently completed within the program time, and the subtitle data is sequentially generated for each set subtitle production processing time. I made it possible to produce. As a result, it is possible to create subtitles one after another, and the automatic subtitle production is completed in real time from the start of the automatic production work, and after the program real time + maximum processing unit time, automatic subtitle production and acquisition of correction support data by preliminary preview Can be completed, the burden on the operator is small, and the working time can be significantly reduced.

【００９０】また、音声認識のみによらず、主に音声レ
ベルなどの特徴を用いて、例えば文単位でのアナウンス
音声の開始、終了のタイミングを検出することにより、
アナウンサが話していないポーズ区間を検出し、そのタ
イミングを表示単位字幕文の開始、終了のタイミングの
少なくとも一部として適用することによって、音声認識
手法への依存度を低減し、タイミング情報の自動付与を
高速化することが可能となる。Further, not only by the voice recognition, but mainly by using the features such as the voice level, for example, by detecting the start and end timings of the announcement voice in sentence units,
By detecting the pause section where the announcer is not speaking and applying the timing as at least a part of the start and end timing of the display unit subtitle sentence, the dependency on the voice recognition method is reduced and the timing information is automatically added. Can be speeded up.

[Brief description of drawings]

【図１】本発明に係る逐次自動字幕制作処理システムの
実施形態の構成を示すブロック図である。FIG. 1 is a block diagram showing a configuration of an embodiment of a sequential automatic caption production processing system according to the present invention.

【図２】本発明に係る逐次自動字幕制作処理システムの
原理を示す説明図である。FIG. 2 is an explanatory diagram showing the principle of a sequential automatic caption production processing system according to the present invention.

【図３】要約処理を行わない場合の表示字幕文へのタイ
ミング付与の処理手順を示すフローチャートである。FIG. 3 is a flowchart showing a processing procedure for adding timing to a displayed caption text when the summary processing is not performed.

【図４】要約処理を行わない場合におけるタイミング付
与の処理の際の字幕用原文テキストを示す説明図であ
る。FIG. 4 is an explanatory diagram showing an original text for a subtitle in a process of timing addition in a case where a summary process is not performed.

【図５】図４に示した字幕用原文テキストに対する処理
結果を示す説明図である。5 is an explanatory diagram showing a processing result for the subtitle original text shown in FIG. 4. FIG.

【図６】図４に示した字幕用原文テキストから作成され
た表示単位字幕文を示す説明図である。FIG. 6 is an explanatory diagram showing a display unit subtitle sentence created from the subtitle original text shown in FIG. 4;

【図７】ブロック・ケプストラム・フラックス法を用い
て音声（スピーチ）区間（見方を変えればポーズ区間）
を検出した結果を示す説明図である。FIG. 7: Speech (speech) section using the block cepstrum flux method (pause section if the viewpoint is changed)
It is explanatory drawing which shows the result of having detected.

【図８】スピーチ近似データとしての音声データ波形を
示す説明図である。FIG. 8 is an explanatory diagram showing a voice data waveform as speech approximate data.

【図９】音声データを特殊処理したスピーチ近似データ
を示す説明図である。FIG. 9 is an explanatory diagram showing speech approximation data obtained by specially processing voice data.

【図１０】具体的なキーの機能付与例を示す図である。FIG. 10 is a diagram showing a specific example of key function addition.

【図１１】キー操作の記録ファイルの構成例である。FIG. 11 is a configuration example of a key operation recording file.

【図１２】図１１に示す修正押下キー合計情報の具体例
を示す図である。FIG. 12 is a diagram showing a specific example of total information of modified pressed keys shown in FIG. 11.

【図１３】図１１に示す修正押下キー詳細情報の具体例
を示す図である。FIG. 13 is a diagram showing a specific example of correction press key detailed information shown in FIG. 11.

【図１４】試写・修正時に一覧表示される記録ファイル
の内容を示す図である。FIG. 14 is a diagram showing the contents of recording files displayed as a list at the time of preview / correction.

[Explanation of symbols]

１０１逐次自動字幕制作処理システム１１１自動字幕制作部１１３電子化原稿記録媒体１１５同期検出装置１１７統合化装置１１９形態素解析部１２１分割ルール記憶部１２３ディジタル・ビデオ・テープ・レコーダ（Ｄ−
ＶＴＲ）１３３単位字幕文抽出部１３５表示単位字幕化部１３７タイミング情報付与部１５１試写・修正支援部１５３記憶装置１５５モニター装置１５７遅延装置１５９キーボード101 Automatic Subtitle Production Processing System 111 Automatic Subtitle Production Unit 113 Digitized Manuscript Recording Medium 115 Sync Detection Device 117 Integration Device 119 Morphological Analysis Unit 121 Division Rule Storage Unit 123 Digital Video Tape Recorder (D-
VTR) 133 Unit subtitle sentence extraction unit 135 Display unit subtitle conversion unit 137 Timing information addition unit 151 Preview / correction support unit 153 Storage device 155 Monitor device 157 Delay device 159 Keyboard

───────────────────────────────────────────────────── フロントページの続き (71)出願人 000004352 日本放送協会東京都渋谷区神南２丁目２番１号 (72)発明者沢村英治東京都港区芝２−31−19 通信・放送機構内 (72)発明者門馬隆雄東京都港区芝２−31−19 通信・放送機構内 (72)発明者浦谷則好東京都港区芝２−31−19 通信・放送機構内 (72)発明者白井克彦東京都港区芝２−31−19 通信・放送機構内Ｆターム(参考） 5C023 AA18 BA16 CA01 CA05 5D015 CC11 CC14 DD03 FF01 FF04 ─────────────────────────────────────────────────── ─── Continued front page (71) Applicant 000004352 Japan Broadcasting Corporation 2-2-1 Jinnan, Shibuya-ku, Tokyo (72) Inventor Eiji Sawamura 2-31-19 Shiba, Minato-ku, Tokyo Communications and Broadcasting Organization Within (72) Inventor Takao Kadoma 2-31-19 Shiba, Minato-ku, Tokyo Communications and Broadcasting Organization Within (72) Inventor Noriyoshi Uraya 2-31-19 Shiba, Minato-ku, Tokyo Communications and Broadcasting Organization Within (72) Inventor Katsuhiko Shirai 2-31-19 Shiba, Minato-ku, Tokyo Communications and Broadcasting Organization Within F-term (reference) 5C023 AA18 BA16 CA01 CA05 5D015 CC11 CC14 DD03 FF01 FF04

Claims

[Claims]

1. A detection unit that detects at high speed the start and end timings of at least an announcement voice in sentence units in at least a designated section of the input voice with a pause as a delimiter, and the detected timing is a display unit subtitle sentence. Sequential automatic subtitle production processing characterized by including means for adding information to subtitles applied as at least part of page breaks, start and end timings, and automatically producing subtitle data for each processing unit in a specified section. system.

2. A detection unit that detects at high speed the start and end timings of the announcement voice in sentence units at least in a designated section where at least a pause of the input voice is used as a delimiter, and the detected timing is a display unit subtitle sentence. An automatic caption production unit that includes means for adding information to captions applied as at least part of the page breaks and start / end timings, and creates caption data for each predetermined processing unit, and at least video, audio, and Regarding the monitor device that displays the caption data and outputs the program sound, and the caption data displayed on the monitor device, at least when the key input device performs a preliminary preview key input, at least the operation timing and the key type A preview / correction support unit including a caption correction information collection device for recording information in a storage device And a subtitle program production is sequentially processed by executing a preliminary preview by the preview / correction support unit in parallel with the progress of the automatic subtitle production process executed by the automatic subtitle production unit. Production processing system.

3. The sequential automatic caption production processing system according to claim 1, wherein the detection unit detects a pause period of voice by a block cepstrum flux method to detect timing of start and end of voice. Sequential automatic subtitle production processing system characterized by: