JP2015127894A

JP2015127894A - Support apparatus, information processing method, and program

Info

Publication number: JP2015127894A
Application number: JP2013273221A
Authority: JP
Inventors: 香新川; Kaori Shinkawa; 新齋藤; Arata Saito; 佐藤　大介; Daisuke Sato; 大介佐藤; カッツォーリ、エルニッサ; Cazzoli Elnissa
Original assignee: International Business Machines Corp
Current assignee: International Business Machines Corp
Priority date: 2013-12-27
Filing date: 2013-12-27
Publication date: 2015-07-09
Anticipated expiration: 2033-12-27
Also published as: JP6323828B2

Abstract

PROBLEM TO BE SOLVED: To solve the following problem: when modification of subtitles is repeated by a plurality of persons, when to conclude the modification cannot be determined properly, or if only the modification is repeated, the modification is trapped in local optimization and the quality of the subtitles cannot be improved sufficiently.SOLUTION: The support apparatus aids in creation of expression information by a plurality of users, comprising: an editing unit for having first expression information expressing an expression target edited by any one of the plurality of users to obtain the edited information as second expression information; an input unit for inputting new third expression information expressing the expression target from any one of the plurality of users; and an integration unit for integrating the second expression information and the third expression information to generate integrated expression information. There are also provided an information processing method using the support apparatus, and a program for causing the support apparatus to operate.

Description

本発明は、支援装置、情報処理方法、及び、プログラムに関する。 The present invention relates to a support device, an information processing method, and a program.

音声から字幕を生成する手法が知られている（例えば、特許文献１及び２）。また、作業者に字幕を作成させ、別の作業者に作成済みの字幕を修正させることで、複数の作業者を協力させて字幕を生成する方法が知られている（例えば、特許文献３）。また、複数の作業者により字幕の修正をさせる方法が知られている（例えば、特許文献４）。また、自動音声認識により作成した字幕を作業者により修正する方法が知られている（例えば、非特許文献１）。また、複数の作業者が生成したテキストを合成する方法が知られている（例えば、非特許文献２）。
［特許文献１］特開２００５−２２８１７８号公報
［特許文献２］特開２００８−３２７８９号公報
［特許文献３］特開２０１０−１５７９６１号公報
［特許文献４］特開２００４−２２６９１０号公報
［非特許文献１］長妻，福田，柳沼，広瀬，"クラウドソーシングを活用した効率良い字幕作成方法"，電子情報通信学会信学技報，Ｖｏｌ．１１２，Ｎｏ．３３６（２０１２）
［非特許文献２］Ｉ．Ｎａｉｍ，Ｄ．Ｇｉｌｄｅａ，Ｗ．Ｌａｓｅｃｋｉ，Ｊ．Ｐ．Ｂｉｇｈａｍ，"ＴｅｘｔＡｌｉｇｎｍｅｎｔｆｏｒＲｅａｌ−ＴｉｍｅＣｒｏｗｄＣａｐｔｉｏｎｉｎｇ" ＴｈｅＡｓｓｏｃｉａｔｉｏｎｆｏｒＣｏｍｐｕｔａｔｉｏｎａｌＬｉｎｇｕｉｓｔｉｃｓ，ＨＬＴ−ＮＡＡＣＬ，ｐａｇｅ２０１−２１０（２０１３） Methods for generating subtitles from audio are known (for example, Patent Documents 1 and 2). Also, a method is known in which a subtitle is generated in cooperation with a plurality of workers by causing a worker to create a subtitle and causing another worker to correct the generated subtitle (for example, Patent Document 3). . Also, a method for correcting subtitles by a plurality of workers is known (for example, Patent Document 4). In addition, a method for correcting a subtitle created by automatic speech recognition by an operator is known (for example, Non-Patent Document 1). A method of synthesizing text generated by a plurality of workers is known (for example, Non-Patent Document 2).
[Patent Document 1] JP 2005-228178 [Patent Document 2] JP 2008-32789 [Patent Document 3] JP 2010-157961 [Patent Document 4] JP 2004-226910 [Non-Patent Document 3] [Patent Document 1] Nagatsuma, Fukuda, Yanaginuma, Hirose, “Efficient subtitle creation method using crowdsourcing”, IEICE Technical Report, Vol. 112, no. 336 (2012)
[Non Patent Literature 2] Naim, D.M. Gildea, W.M. Lasecki, J. et al. P. Bigham, “Text Alignment for Real-Time Crown Capturing” The Association for Computational Linguistics, HLT-NAACL, page 201-210 (2013)

しかし、字幕の修正のみを連続して繰り返すと字幕が局所最適化されて、字幕の品質を十分に向上させることができなかった。また、従来の方法により複数の作業者により字幕の修正を繰り返す場合、字幕の修正をどのタイミングで終了するか適切に判断することができなかった。 However, if only subtitle correction is repeated continuously, the subtitles are locally optimized and the subtitle quality cannot be sufficiently improved. Further, when subtitle correction is repeatedly performed by a plurality of workers according to the conventional method, it has not been possible to appropriately determine when the subtitle correction is to be finished.

本発明の第１の態様においては、複数のユーザによる表現情報の作成を支援する支援装置であって、表現対象を表現した第１表現情報を複数のユーザのうちいずれか一のユーザに編集させて、第２表現情報として取得する編集部と、表現対象を表現する新たな第３表現情報を複数のユーザのうちいずれか一のユーザから入力する入力部と、第２表現情報及び第３表現情報を統合して統合表現情報を生成する第１統合部とを備える支援装置、当該支援装置を用いた情報処理方法、及び、当該支援装置を動作させるプログラムを提供する。 In the first aspect of the present invention, the support device supports the creation of expression information by a plurality of users, and allows any one of the plurality of users to edit the first expression information that represents the expression target. An editing unit that is acquired as second expression information, an input unit that inputs new third expression information that expresses the expression target from any one of a plurality of users, second expression information, and third expression A support device including a first integration unit that integrates information to generate integrated expression information, an information processing method using the support device, and a program for operating the support device are provided.

なお、上記の発明の概要は、本発明の必要な特徴の全てを列挙したものではない。また、これらの特徴群のサブコンビネーションもまた、発明となりうる。 It should be noted that the above summary of the invention does not enumerate all the necessary features of the present invention. In addition, a sub-combination of these feature groups can also be an invention.

本実施形態の支援装置１０の構成を示す。The structure of the assistance apparatus 10 of this embodiment is shown. 本実施形態の支援装置１０の処理フローを示す。The processing flow of the assistance apparatus 10 of this embodiment is shown. 本実施形態における第３表現情報の入力画面の一例を示す。An example of the input screen of the 3rd expression information in this embodiment is shown. 本実施形態における第１表現情報の編集画面の一例を示す。An example of the edit screen of the 1st expression information in this embodiment is shown. 本実施形態における表現情報の統合方法の一例を示す。An example of the expression information integration method in the present embodiment will be described. 本実施形態における支援装置１０の効果の一例を示す箱ひげ図である。It is a box-and-whisker figure which shows an example of the effect of the assistance apparatus 10 in this embodiment. 本実施形態における支援装置１０の効果の一例を示す箱ひげ図である。It is a box-and-whisker figure which shows an example of the effect of the assistance apparatus 10 in this embodiment. 本実施形態における支援装置１０の効果の一例を示す箱ひげ図である。It is a box-and-whisker figure which shows an example of the effect of the assistance apparatus 10 in this embodiment. 本実施形態の変形例における支援装置１０の処理フローを示す。The processing flow of the assistance apparatus 10 in the modification of this embodiment is shown. コンピュータ１９００のハードウェア構成の一例を示す。2 shows an example of a hardware configuration of a computer 1900.

以下、発明の実施の形態を通じて本発明を説明するが、以下の実施形態は特許請求の範囲にかかる発明を限定するものではない。また、実施形態の中で説明されている特徴の組み合わせの全てが発明の解決手段に必須であるとは限らない。 Hereinafter, the present invention will be described through embodiments of the invention, but the following embodiments do not limit the invention according to the claims. In addition, not all the combinations of features described in the embodiments are essential for the solving means of the invention.

図１は、本実施形態の支援装置１０の構成を示す。支援装置１０は、複数のユーザにより表現対象から表現情報を作成する作業を支援する。例えば、支援装置１０は、オーディオ又はビデオ等の音声を含む表現対象のコンテンツから音声の内容を表す字幕等のテキストである表現情報を作成する作業を支援する。一例として、支援装置１０は、音声の内容を表すテキストとして、人物の話した内容をそのまま表す字幕のテキスト、又は、人物の話した内容を翻訳したテキストを生成する作業を支援してよい。支援装置１０は、分割部１０２、自動認識部１０４、編集部１０６、判断部１０８、入力部１１０、第１統合部１１２、制御部１１４、及び、第２統合部１２０を備える。 FIG. 1 shows a configuration of a support apparatus 10 according to the present embodiment. The support device 10 supports the work of creating expression information from expression objects by a plurality of users. For example, the support device 10 supports the work of creating expression information that is text such as subtitles representing the contents of audio from content to be expressed including audio such as audio or video. As an example, the support apparatus 10 may support a work of generating a subtitle text that directly represents the content spoken by the person or a text obtained by translating the content spoken by the person as the text representing the content of the voice. The support apparatus 10 includes a dividing unit 102, an automatic recognition unit 104, an editing unit 106, a determination unit 108, an input unit 110, a first integration unit 112, a control unit 114, and a second integration unit 120.

分割部１０２は、表現対象のコンテンツを分割して、作業の対象となる複数の表現対象を生成する。例えば、分割部１０２は、ビデオ等の音声を含む表現対象のコンテンツを入力し、コンテンツを文単位、パラグラフ単位、又はコンテキスト単位等のまとまりごとに分割して、複数の表現対象を生成する。分割部１０２は、音声の切れ目及び／又は発声者の変化等を検出することにより、表現対象のコンテンツを分割してよい。分割部１０２は、生成した表現対象を自動認識部１０４に供給する。 The dividing unit 102 divides the content to be expressed and generates a plurality of expression targets to be worked. For example, the dividing unit 102 inputs content to be expressed including audio such as video, and divides the content into groups such as sentence units, paragraph units, or context units, and generates a plurality of expression targets. The dividing unit 102 may divide the content to be expressed by detecting breaks in audio and / or changes in the speaker. The dividing unit 102 supplies the generated expression target to the automatic recognition unit 104.

自動認識部１０４は、分割部１０２から受け取った表現対象から、当該表現対象を表現した第１表現情報を自動的に生成する。例えば、自動認識部１０４は、従来技術である自動音声認識（ＡＳＲ）により、表現対象の音声から音声の内容を表すテキストを生成する。自動認識部１０４は、生成したテキストを第１表現情報として編集部１０６に供給する。 The automatic recognition unit 104 automatically generates first expression information expressing the expression object from the expression object received from the dividing unit 102. For example, the automatic recognition unit 104 generates text representing the content of speech from speech to be expressed by automatic speech recognition (ASR), which is a conventional technique. The automatic recognition unit 104 supplies the generated text to the editing unit 106 as first expression information.

編集部１０６は、第１表現情報を複数のユーザのうちいずれか一のユーザに編集させて、編集した結果を第２表現情報として取得する。ここで、第１表現情報を編集する一のユーザを第１ユーザとする。編集部１０６は、編集前の第１表現情報及び編集後の第２表現情報を判断部１０８に供給し、第２表現情報を第１統合部１１２及び第２統合部１２０に供給する。 The editing unit 106 causes any one of the plurality of users to edit the first expression information, and acquires the edited result as second expression information. Here, one user who edits the first expression information is defined as a first user. The editing unit 106 supplies the first expression information before editing and the second expression information after editing to the determination unit 108, and supplies the second expression information to the first integration unit 112 and the second integration unit 120.

判断部１０８は、編集部１０６の編集において、第１表現情報と当該第１表現情報を編集した第２表現情報とが相違するか否か判断する。判断部１０８は、第１表現情報と第２表現情報とが相違しないと判断したことに応じて、編集部１０６に再度当該第１表現情報を第１ユーザに提示させて編集させる。 The determination unit 108 determines whether or not the first expression information is different from the second expression information obtained by editing the first expression information in editing by the editing unit 106. In response to determining that the first expression information and the second expression information are not different, the determination unit 108 causes the editing unit 106 to again present the first expression information to the first user for editing.

この再編集において、判断部１０８は、予め定められた基準回数連続した回数の編集において、第１表現情報と第２表現情報とが一致したか否か判断する。判断部１０８は、予め定められた基準回数連続した回数の編集において第１表現情報と第２表現情報とが一致したと判断したことに応じて、編集部１０６による編集を中止させ、第２統合部１２０にその旨を通知する。 In this re-editing, the determination unit 108 determines whether or not the first expression information and the second expression information coincide with each other in a predetermined number of consecutive edits. In response to determining that the first expression information and the second expression information coincide with each other after a predetermined reference number of consecutive edits, the determination unit 108 stops editing by the editing unit 106 and performs the second integration. This is notified to the unit 120.

入力部１１０は、判断部１０８が第１表現情報と第２表現情報とが相違すると判断したことに応じて、表現対象を表現する新たな第３表現情報を複数のユーザのうちいずれか一のユーザから入力する。ここで、第３表現情報を入力する一のユーザを第２ユーザとする。入力部１１０は、入力した第３表現情報を第１統合部１１２及び第２統合部１２０に供給する。 In response to the determination unit 108 determining that the first expression information and the second expression information are different from each other, the input unit 110 transmits new third expression information representing the expression target to any one of the plurality of users. Input from the user. Here, one user who inputs the third expression information is a second user. The input unit 110 supplies the input third expression information to the first integration unit 112 and the second integration unit 120.

第１統合部１１２は、第３表現情報を受け取ったことに応じて、第２表現情報及び第３表現情報を統合して統合表現情報を生成する。第１統合部１１２は、統合表現情報を制御部１１４に供給する。 In response to receiving the third expression information, the first integration unit 112 integrates the second expression information and the third expression information to generate integrated expression information. The first integration unit 112 supplies the integrated expression information to the control unit 114.

制御部１１４は、統合表現情報を新たな第１表現情報として編集部１０６に供給し、編集部１０６に新たな第１表現情報を第１ユーザに提示して編集させて、第１ユーザから別の第２表現情報を取得させる。これにより、制御部１１４は、編集部１０６による表現情報の編集処理を繰り返させる。 The control unit 114 supplies the integrated expression information to the editing unit 106 as new first expression information, causes the editing unit 106 to present and edit the first first expression information to the first user, and separates the first expression information from the first user. To obtain the second expression information. Thereby, the control unit 114 causes the editing process of the expression information by the editing unit 106 to be repeated.

第２統合部１２０は、判断部１０８の判断により、編集処理の繰り返しが終了したことに応じて、入力済の１以上の第１表現情報、１以上の第２表現情報、及び、１以上の第３表現情報を統合して、統合された表現情報を生成する。例えば、第２統合部１２０は、判断部１０８が編集部１０６の編集を中止させたことに応じて、統合した表現情報を作成済の表現情報として出力する。 The second integration unit 120 determines that one or more input first expression information, one or more second expression information, and one or more second expression information are input in response to the end of the repetition of the editing process according to the determination of the determination unit 108. The third expression information is integrated to generate integrated expression information. For example, the second integration unit 120 outputs the integrated expression information as created expression information in response to the determination unit 108 canceling editing by the editing unit 106.

このように、支援装置１０は、編集部１０６による編集された第２表現情報の取得、入力部１１０による新たな第３表現情報の入力、及び、第１統合部による第２表現情報と第３表現情報との統合の処理を繰り返す。そして、支援装置１０は、予め定められた基準回数連続した回数の編集で修正がないと判断した場合に繰り返しの処理を終了し、それまでに生成された第１表現情報、第２表現情報、及び、第３表現情報を統合して出力する。 As described above, the support device 10 acquires the edited second expression information by the editing unit 106, inputs new third expression information by the input unit 110, and the second expression information and the third information by the first integration unit. Repeat the process of integration with expression information. Then, the support device 10 ends the iterative process when it is determined that there is no correction by editing a predetermined reference number of consecutive times, and the first expression information, the second expression information generated so far, And the third expression information is integrated and output.

これにより、支援装置１０は、表現情報の品質が向上して修正が生じなくなったタイミングを字幕等の修正を終了する基準のタイミングとして判断することができる。また、支援装置１０によれば、表現情報が修正された場合、すなわち表現情報に修正の余地があると判断された場合にユーザに表現情報を新規に入力させるので、局所最適化を防ぐことができる。 Thereby, the support apparatus 10 can determine the timing when the quality of the expression information is improved and the correction no longer occurs as the reference timing for ending the correction of the subtitles and the like. Further, according to the support device 10, when the expression information is corrected, that is, when it is determined that there is room for correction in the expression information, the user is newly input the expression information, thereby preventing local optimization. it can.

図２は、本実施形態の支援装置１０の処理フローを示す。本実施形態において、支援装置１０は、Ｓ１００〜Ｓ１７０の処理を実行することにより、複数のユーザによる表現情報の作成を支援する。支援装置１０は、分割部１０２がコンテンツを分割して生成した複数の表現情報のそれぞれについて処理フローを実行してよい。 FIG. 2 shows a processing flow of the support apparatus 10 of the present embodiment. In the present embodiment, the support device 10 supports the creation of expression information by a plurality of users by executing the processes of S100 to S170. The support apparatus 10 may execute a processing flow for each of a plurality of expression information generated by dividing the content by the dividing unit 102.

まず、Ｓ１１０において、自動認識部１０４は、ＡＳＲにより、表現対象から第１表現情報となるテキストを自動的に生成する（ＡＳＲ）。自動認識部１０４は、生成した第１表現情報を編集部１０６に供給する。 First, in S110, the automatic recognizing unit 104 automatically generates a text as first expression information from an expression target by ASR (ASR). The automatic recognition unit 104 supplies the generated first expression information to the editing unit 106.

次に、支援装置１０は、図２において点線で示すＳ１２０〜Ｓ１６０に係る繰り返し処理を１回以上実行する。 Next, the support device 10 executes the iterative process related to S120 to S160 indicated by a dotted line in FIG. 2 once or more.

Ｓ１２０において、第１表現情報となるテキストを第１ユーザに編集させて、第２表現情報として取得する（ＦＩＸ）。例えば、編集部１０６は、第１ユーザに表現対象の音声を提供し、第１ユーザに第１表現情報のテキストを編集可能な状態で提示して、第１ユーザに第１表現情報を直接修正させて、第２表現情報となるテキストを取得する。一例として、編集部１０６は、ネットワーク経由で、第１ユーザの情報端末に表現対象及び第１表現情報のテキストのデータを供給し、第１ユーザの情報端末から第２表現情報のテキストを取得してよい。 In S120, the first user edits the text as the first expression information and acquires it as the second expression information (FIX). For example, the editing unit 106 provides the first user with the voice to be expressed, presents the first expression information text to the first user in an editable state, and directly corrects the first expression information to the first user. Then, the text that becomes the second expression information is acquired. As an example, the editing unit 106 supplies the data of the expression target and the text of the first expression information to the information terminal of the first user via the network, and acquires the text of the second expression information from the information terminal of the first user. It's okay.

１回目の繰り返し処理において、編集部１０６は、自動認識部１０４が生成した第１表現情報を第１ユーザに提示して第２表現情報を取得する。２回目以降の繰り返し処理において、編集部１０６は、第１統合部１１２が生成した統合表現情報を第１表現情報として第１ユーザに提示し、前回入力された第２表現情報と異なってよい別の第２表現情報を取得する。編集部１０６は、編集前の第１表現情報及び編集後の第２表現情報を判断部１０８に供給し、第２表現情報を第１統合部１１２及び第２統合部１２０に供給する。 In the first iteration, the editing unit 106 presents the first expression information generated by the automatic recognition unit 104 to the first user and acquires the second expression information. In the second and subsequent iterations, the editing unit 106 presents the integrated expression information generated by the first integration unit 112 to the first user as the first expression information, which may be different from the previously input second expression information. The second expression information of is acquired. The editing unit 106 supplies the first expression information before editing and the second expression information after editing to the determination unit 108, and supplies the second expression information to the first integration unit 112 and the second integration unit 120.

次に、Ｓ１３０において、判断部１０８は、直前のＳ１２０の編集において、第１表現情報と当該第１表現情報を編集した第２表現情報とが相違するか否か判断する。即ち、判断部１０８は、直前のＳ１２０において第１表現情報が編集により修正されたか否か判断する。判断部１０８は、判断部１０８が第１表現情報と第２表現情報とが相違すると判断した場合は処理をＳ１４０に進め、第１表現情報と第２表現情報とが相違しないと判断した場合は処理をＳ１６０に進める。 Next, in S130, the determination unit 108 determines whether or not the first expression information is different from the second expression information obtained by editing the first expression information in the previous editing in S120. That is, the determination unit 108 determines whether or not the first expression information has been corrected by editing in the previous S120. If the determination unit 108 determines that the first expression information and the second expression information are different, the determination unit 108 proceeds to S140, and if the determination unit 108 determines that the first expression information and the second expression information are not different The process proceeds to S160.

Ｓ１４０において、表現対象を表現する新たな第３表現情報を第２ユーザから入力する（ＴＹＰＥ）。例えば、入力部１１０は、第２ユーザに表現対象の音声を提供して、第２ユーザに表現対象に対応する第３表現情報となるテキストを入力させる。一例として、入力部１１０は、ネットワーク経由で、第２ユーザの情報端末に表現対象のデータを供給し、第２ユーザの情報端末から第３表現情報のテキストを取得してよい。 In S140, new third expression information expressing the expression target is input from the second user (TYPE). For example, the input unit 110 provides the second user with voice to be expressed and causes the second user to input text that is third expression information corresponding to the expression target. As an example, the input unit 110 may supply data to be expressed to the information terminal of the second user via the network, and obtain the text of the third expression information from the information terminal of the second user.

複数回の繰り返し処理において、入力部１１０は、前回入力された第３表現情報と異なってよい別の第３表現情報を第２ユーザから入力する。入力部１１０は、入力した第３表現情報を第１統合部１１２及び第２統合部１２０に供給する。 In the repeated processing of a plurality of times, the input unit 110 inputs another third expression information that may be different from the previously input third expression information from the second user. The input unit 110 supplies the input third expression information to the first integration unit 112 and the second integration unit 120.

Ｓ１５０において、第１統合部１１２は、第２表現情報及び第３表現情報を統合して統合表現情報を生成する（ＭＥＲＧＥ）。例えば、まず、第１統合部１１２は、第２表現情報のテキスト及び第３表現情報のテキストを複数の単語の要素に分割する。 In S150, the first integration unit 112 integrates the second expression information and the third expression information to generate integrated expression information (MERGE). For example, first, the first integration unit 112 divides the text of the second expression information and the text of the third expression information into a plurality of word elements.

第１統合部１１２は、要素のうち第２表現情報のテキスト及び第３表現情報の対応する位置に共通して含まれる同一の単語、及び、類似する単語（例えば、既存の単語のミススペル、及び／又は、異形同音異義語等）を対応付ける。例えば、第１統合部１１２は、ＬｅｖｅｎｓｈｔｅｉｎＤｉｓｔａｎｃｅが予め定められた閾値以内の単語を類似する単語として検出し、単語同士を対応付けてよい。第１統合部１１２は、対応付けられた単語を基準として対応付けられていない単語を含む全ての単語を配置する。 The first integration unit 112 includes the same word included in the corresponding positions of the text of the second expression information and the third expression information among the elements, and similar words (for example, misspelled existing words, and / Or variant homonyms). For example, the first integration unit 112 may detect words having a left distance within a predetermined threshold as similar words, and associate the words with each other. The 1st integration part 112 arrange | positions all the words containing the word which is not matched on the basis of the matched word.

第１統合部１１２は、第２表現情報のテキスト及び第３表現情報のテキストにおいて互いに対応する２つの要素が存在した場合はより正しいと推定されるものを選択し、対応する要素が存在しない場合は当該要素を含めるように新しく統合されたテキストを統合表現情報として生成してよい。 When there are two corresponding elements in the text of the second expression information and the text of the third expression information, the first integration unit 112 selects the one that is estimated to be more correct, and the corresponding element does not exist May generate newly integrated text as integrated expression information so as to include the element.

第１統合部１１２は、繰り返し処理ごとに統合表現情報を生成する。例えば、複数の繰り返し処理のそれぞれにおいて、第１統合部１１２は、前回取得したものと異なってよい別の第２表現情報と、前回取得したものと異なってよい別の第３表現情報を統合して新しい統合表現情報を生成する。 The first integration unit 112 generates integrated expression information for each repetition process. For example, in each of the plurality of iterative processes, the first integration unit 112 integrates another second expression information that may be different from the previously acquired information and another third expression information that may be different from the previously acquired information. New integrated expression information.

第１統合部１１２は、生成した統合表現情報を制御部１１４に供給する。制御部１１４は、統合表現情報を新たな第１表現情報として編集部１０６に供給し、処理をＳ１２０に戻す。これにより、２回目以降のＳ１２０の処理において、制御部１１４は、編集部１０６に新たな第１表現情報を第１ユーザに提示させ、別の第２表現情報を取得させる。 The first integration unit 112 supplies the generated integrated expression information to the control unit 114. The control unit 114 supplies the integrated expression information to the editing unit 106 as new first expression information, and returns the process to S120. Thereby, in the process of S120 after the 2nd time, control part 114 makes edit part 106 present new 1st expression information to the 1st user, and makes another 2nd expression information acquired.

Ｓ１６０において、判断部１０８は、予め定められた基準回数連続した回数の編集において、第１表現情報と第２表現情報とが一致したか否か判断する。例えば、判断部１０８は、予め定められた基準回数（ｎ回：例えば２回）連続してＳ１２０→Ｓ１３０→Ｓ１６０となる処理が基準回数連続して繰り返されたか否か判断する。 In S160, the determination unit 108 determines whether or not the first expression information and the second expression information match in the editing of a predetermined reference number of consecutive times. For example, the determination unit 108 determines whether or not the process of S120 → S130 → S160 has been repeated continuously for a predetermined reference number (n times: for example, twice).

判断部１０８は、予め定められた基準回数連続した回数の編集において、第１表現情報と第２表現情報とが一致したと判断する場合は処理をＳ１７０に進め、そうでない場合は処理をＳ１２０に戻す。 If the determination unit 108 determines that the first expression information and the second expression information match in the editing of the predetermined number of consecutive reference times, the determination unit 108 proceeds to S170, and if not, the process proceeds to S120. return.

Ｓ１７０において、第２統合部１２０は、編集部１０６及び入力部１１０に入力済みの１以上の第１表現情報、１以上の第２表現情報、及び、１以上の第３表現情報を統合して、統合された表現情報を生成する。例えば、第２統合部１２０は、入力済みの全ての第１表現情報、第２表現情報、及び、第３表現情報を統合してよい。 In S170, the second integration unit 120 integrates the one or more first expression information, the one or more second expression information, and the one or more third expression information that have been input to the editing unit 106 and the input unit 110. , Generate integrated expression information. For example, the second integration unit 120 may integrate all input first expression information, second expression information, and third expression information.

まず、第２統合部１２０は、統合の対象となる複数の表現情報のそれぞれを複数の文節、単語、及び／又は、文字等の要素に分割し、複数の表現情報の対応する位置に共通して含まれる同一の要素、及び、類似する要素を対応付けてよい。ここで、第２統合部１２０は、ＬｅｖｅｎｓｈｔｅｉｎＤｉｓｔａｎｃｅが予め定められた閾値以内の要素を類似する要素として検出し、要素同士を対応付けてよい。更に、第２統合部１２０は、ＭｕｌｔｉｐｌｅＳｅｑｕｅｎｃｅＡｌｉｇｎｍｅｎｔ（ＭＳＡ）を利用して、対応付けられた要素のアライメントを実行してよい。更に、第２統合部１２０は、Ａ＊アルゴリズムを利用してアライメント処理の計算量を低減してもよい。 First, the second integration unit 120 divides each of the plurality of expression information to be integrated into a plurality of elements such as clauses, words, and / or characters, and is common to the corresponding positions of the plurality of expression information. The same elements and similar elements included may be associated with each other. Here, the second integration unit 120 may detect an element having a left distance within a predetermined threshold as a similar element and associate the elements with each other. Further, the second integration unit 120 may perform alignment of the associated elements using a multiple sequence alignment (MSA). Further, the second integration unit 120 may reduce the amount of calculation of the alignment process using the A * algorithm.

次に、第２統合部１２０は、ＭａｊｏｒｉｔｙＶｏｔｅ等により、最終的な表現情報に採用すべき要素を決定してよい。ここで、第２統合部１２０は、第２表現情報を第１表現情報及び第３表現情報よりも大きく重みづけを付与してよい。これにより、支援装置１０は、修正の結果をより重く反映することができる。支援装置１０は、第２統合部１２０により統合された表現情報を作成済の表現情報として出力する。 Next, the 2nd integration part 120 may determine the element which should be adopted as final expression information by Majority Vote etc. Here, the second integration unit 120 may give the second expression information a greater weight than the first expression information and the third expression information. Thereby, the support apparatus 10 can reflect the correction result more heavily. The support apparatus 10 outputs the expression information integrated by the second integration unit 120 as created expression information.

このように、支援装置１０は、繰り返しの初回において、編集部１０６に自動認識部１０４が生成した第１表現情報を第１ユーザに編集させ、編集において修正があれば入力部１１０に新しく第３表現情報を入力させ、修正がなければ編集部１０６に再度編集処理をさせる。支援装置１０は、編集において修正がないことが基準回数連続して発生するまで、編集部１０６による編集と入力部１１０による入力を繰り返す。 As described above, the support apparatus 10 causes the editing unit 106 to edit the first expression information generated by the automatic recognition unit 104 by the first user in the first iteration, and if there is a correction in editing, the input unit 110 newly adds a third one. The expression information is input, and if there is no correction, the editing unit 106 is caused to edit again. The support device 10 repeats the editing by the editing unit 106 and the input by the input unit 110 until there is no reference correction in the editing for a reference number of times.

ここで、支援装置１０は、複数の繰り返し処理において、第１ユーザ及び第２ユーザを、それぞれ同一のユーザ及び／又は異なるユーザとしてよい。異なる第１ユーザ及び／又は異なる第２ユーザから第２表現情報及び／又は第３表現情報を入力する場合、支援装置１０は、多様性のある表現情報を取得することが期待できる。また、同一の第１ユーザ及び／又は同一の第２ユーザから第２表現情報及び／又は第３表現情報を入力する場合、支援装置１０は、繰り返しが進むごとに改善された表現情報を取得することが期待できる。また、第１ユーザとなるユーザは、第２ユーザとなるユーザと共通してもよく、異なっていてもよい。 Here, the support apparatus 10 may set the first user and the second user as the same user and / or different users in a plurality of repetitive processes. When the second expression information and / or the third expression information is input from different first users and / or different second users, the support apparatus 10 can be expected to acquire diverse expression information. Further, when the second expression information and / or the third expression information is input from the same first user and / or the same second user, the support apparatus 10 acquires improved expression information each time the repetition proceeds. I can expect that. Moreover, the user who becomes the first user may be the same as or different from the user who becomes the second user.

また、支援装置１０は、Ｓ１７０の処理を省略してもよい。この場合、支援装置１０は、最後のＳ１２０の処理で編集された第２表現情報を作成済の表現情報として出力してよい。 Further, the support device 10 may omit the process of S170. In this case, the support apparatus 10 may output the second expression information edited in the last process of S120 as the created expression information.

また、支援装置１０は、Ｓ１２０→Ｓ１３０→Ｓ１４０→Ｓ１５０→Ｓ１２０の繰り返し処理が、合計又は連続で、予め定められた基準回数（例えば、２回）以上行われたことに応じて、処理をＳ１７０に進めてよい。これにより、支援装置１０は、編集の繰り返し処理が必要以上に行われて表現情報の生成が長引くことを防ぐことができる。 In addition, the support apparatus 10 performs the process in S170 in response to the repetition of S120 → S130 → S140 → S150 → S120 being performed in total or continuously for a predetermined reference number (for example, twice) or more. You can proceed to. As a result, the support apparatus 10 can prevent the generation of expression information from being prolonged due to unnecessary editing repetition processing.

図３は、本実施形態のＳ１４０における第３表現情報の入力画面の一例を示す。図示するように、第２ユーザは、情報端末等で表現対象の音声を動画等と共に再生し、表現対象に対応する第３表現情報となるテキストを入力して、入力部１１０に取得させる。 FIG. 3 shows an example of an input screen for third expression information in S140 of the present embodiment. As shown in the figure, the second user reproduces the voice to be expressed together with a moving image or the like on an information terminal or the like, inputs text serving as third expression information corresponding to the expression target, and causes the input unit 110 to acquire the text.

図４は、本実施形態のＳ１２０における第１表現情報の編集画面の一例を示す。図示するように、第１ユーザは、情報端末等で表現対象の音声を動画等と共に再生し、予め表示された第１表現情報を編集して第２表現情報を作成し、編集部１０６に取得させる。 FIG. 4 shows an example of the editing screen for the first expression information in S120 of the present embodiment. As shown in the figure, the first user reproduces the voice to be expressed together with a moving image or the like on an information terminal or the like, edits the first expression information displayed in advance, creates second expression information, and obtains it in the editing unit 106 Let

図５は、本実施形態のＳ１５０における表現情報の統合方法の一例を示す。ＦＩＸの行は、第１統合部１１２が編集部１０６から受け取った第２表現情報に対応するテキストを示す。ＴＹＰＥの行は、第１統合部１１２が入力部１１０から受け取った第３表現情報に対応するテキストを示す。ＭＥＲＧＥの行は、第１統合部１１２が第２表現情報及び第３表現情報を統合して生成した統合表現情報に対応するテキストを示す。 FIG. 5 shows an example of the expression information integration method in S150 of the present embodiment. The FIX line indicates text corresponding to the second expression information received by the first integration unit 112 from the editing unit 106. The TYPE line indicates text corresponding to the third expression information received from the input unit 110 by the first integration unit 112. The MERGE line indicates text corresponding to the integrated expression information generated by the first integration unit 112 by integrating the second expression information and the third expression information.

図示するように、第１統合部１１２は、第２表現情報及び第３表現情報のテキストを複数の要素に分割して、これらの要素をアライメントした後に統合する。例えば、第１統合部１１２は、第２表現情報及び第３表現情報に共通して含まれる単語「Ｉｔ」、「ｍｕｓｔ」、及び、「ｓｕｐｐｌｙｉｎｇ」を対応付けて同じ位置（１番目、２番目、及び６番目）に配置する。 As illustrated, the first integration unit 112 divides the text of the second expression information and the third expression information into a plurality of elements, and integrates these elements after alignment. For example, the first integration unit 112 associates the words “It”, “must”, and “suppliing” that are included in common in the second expression information and the third expression information in the same position (first, second, , And 6th).

第１統合部１１２は、ＬｅｖｅｎｓｈｔｅｉｎＤｉｓｔａｎｃｅにより単語等の要素を対応付けることに加えて／代えて、異形同音異義語及び／又はスペルミスに対応する複数の単語を対応付けてよい。例えば、第１統合部１１２は、第２表現情報及び第３表現情報に含まれる異形同音異義語でないスペルミスの関係にある単語「ｃｏｎｓｉｄｅｒ」と「ｃｏｎｓｄｅｒ」及び異形同音異義語の単語「ｄｉｓｃ」と「ｄｉｓｋ」を対応付けて同じ位置（５番目及び７番目）に配置してよい。第１統合部１１２は、対応付けられた単語を基準として対応付けられない単語「ｎｏｗ」及び「ｎｏｔ」を異なる位置（３番目及び４番目）に配置する。 The first integration unit 112 may associate a plurality of words corresponding to a variant homophone and / or a spelling error in addition to / instead of associating an element such as a word with the left distance. For example, the first integration unit 112 includes the words “consider” and “consder” and the word “disc” of the variant homophones having the spelling error that is not the variant homophones included in the second representation information and the third representation information. “Disk” may be associated and arranged at the same position (fifth and seventh). The first integration unit 112 arranges the words “now” and “not” that are not associated with the associated word as a reference at different positions (third and fourth).

ここで、第１統合部１１２は、配置した単語から統合表現情報を生成する。例えば、第１統合部１１２は、対応する複数の単語が存在しない単語（例えば「ｎｏｔ」及び「ｎｏｗ」）は、そのまま統合表現情報に採用する。また、第１統合部１１２は、対応する複数の単語が存在する場合は、１個の単語を選択して統合表現情報に採用する。 Here, the first integration unit 112 generates integrated expression information from the arranged words. For example, the first integration unit 112 directly adopts words (for example, “not” and “now”) that do not have a plurality of corresponding words as integrated expression information. In addition, when there are a plurality of corresponding words, the first integration unit 112 selects one word and adopts it in the integrated expression information.

例えば、対応する複数の単語が異形同音異義語である場合、第１統合部１１２は、第３表現情報（ＴＹＰＥ）に係る単語を選択する。これは、第２表現情報（ＦＩＸ）に係る単語は、自動音声認識（ＡＳＲ）に由来する単語が含まれている可能性があり、異形同音異義語について文脈的に正しくない可能性が高いためである。一例として、第１統合部１１２は、７番目に異形同音異義語として配置された「ｄｉｓｃ」及び「ｄｉｓｋ」から第３表現情報（ＴＹＰＥ）に係る「ｄｉｓｋ」を採用する。 For example, when a plurality of corresponding words are heteromorphic homonyms, the first integration unit 112 selects a word related to the third expression information (TYPE). This is because the word related to the second expression information (FIX) may include a word derived from automatic speech recognition (ASR), and it is highly likely that the anomalous homonyms are not contextually correct. It is. As an example, the first integration unit 112 employs “disk” related to the third expression information (TYPE) from “disc” and “disk” that are arranged as the seventh variant homonym.

また、対応する複数の単語が異形同音異義語でない場合、第１統合部１１２は、第２表現情報（ＦＩＸ）に係る単語を選択する。このような場合、第３表現情報（ＴＹＰＥ）に係る単語は、単なる誤入力である可能性が高いためである。一例として、第１統合部１１２は、５番目に配置された異形同音異義語でない「ｃｏｎｓｉｄｅｒ」及び「ｃｏｎｓｄｅｒ」から第２表現情報（ＦＩＸ）に係る「ｃｏｎｓｉｄｅｒ」を採用する。 In addition, when the plurality of corresponding words are not heteromorphic homonyms, the first integration unit 112 selects a word related to the second expression information (FIX). In such a case, the word related to the third expression information (TYPE) is likely to be a simple input error. As an example, the first integration unit 112 employs “consider” related to the second expression information (FIX) from “consider” and “consder” that is not the fifth homomorphic homonym.

この結果、第１統合部１１２は、第２表現情報（ＦＩＸ）及び第３表現情報（ＴＹＰＥ）から統合表現情報として「Ｉｔｍｕｓｔｎｏｔｎｏｗｃｏｎｓｉｄｅｒｓｕｐｐｌｙｉｎｇｄｉｓｋ」を生成する。この統合表現情報においては「ｎｏｔ」及び「ｎｏｗ」が重複して含まれるが、支援装置１０は、この後の繰り返しの処理及び第２統合部１２０による統合処理により当該重複を解消することができる。このように、支援装置１０は、異形同音異義語の場合に自動音声認識（ＡＳＲ）に基づかない第３表現情報（ＴＹＰＥ）に係る単語を採用し、そうでない場合は第２表現情報（ＦＩＸ）に係る単語を採用することにより、支援装置１０はより高い精度で統合表現情報を生成することができる。 As a result, the first integration unit 112 generates “It must not know capacitor supplying disk” as integrated expression information from the second expression information (FIX) and the third expression information (TYPE). In this integrated expression information, “not” and “now” are included in duplicate, but the support apparatus 10 can eliminate the overlap by the subsequent repeated processing and the integration processing by the second integration unit 120. . As described above, the support device 10 employs a word related to the third expression information (TYPE) that is not based on automatic speech recognition (ASR) in the case of a variant homonym, otherwise the second expression information (FIX). The support apparatus 10 can generate integrated expression information with higher accuracy by adopting the word related to.

図６は、本実施形態における支援装置１０の効果の一例を示す箱ひげ図である。図６は、４種類の異なる方法で字幕を生成した場合の単語誤り率（ＷｏｒｄＥｒｒｏｒＲａｔｅ：ＷＥＲ）を示す。 FIG. 6 is a box-and-whisker diagram illustrating an example of the effect of the support device 10 in the present embodiment. FIG. 6 shows a word error rate (WER) when subtitles are generated by four different methods.

図中の最も左の列（ＡＳＲ）は、自動音声認識（ＡＳＲ）のみを用いて音声から字幕を生成した場合の単語誤り率を示す。左から２番目の列（ＣａｐＣａｐ）は、本実施形態の支援装置１０を用いて音声から字幕を生成した場合の単語誤り率を示す。左から３番目の列（Ｔｙｐｅ）は、ユーザに字幕を新規に入力させて字幕を生成した場合（すなわち入力部１１０が生成した第３表現情報単独）の単語誤り率を示す。左から４番目の列（ＦＩＸ）は、ＡＳＲから生成した字幕を１回ユーザが修正した場合（すなわち、自動認識部１０４が生成した第１表現情報が編集部１０６により１回編集された第２表現情報）の単語誤り率を示す。 The leftmost column (ASR) in the figure shows the word error rate when subtitles are generated from speech using only automatic speech recognition (ASR). The second column from the left (CapCap) indicates the word error rate when captions are generated from speech using the support device 10 of the present embodiment. The third column (Type) from the left indicates a word error rate when a subtitle is generated by causing a user to newly input a subtitle (that is, the third expression information generated by the input unit 110 alone). The fourth column (FIX) from the left is a second column in which the subtitle generated from the ASR is corrected once by the user (that is, the first expression information generated by the automatic recognition unit 104 is edited once by the editing unit 106). The word error rate of (expression information) is shown.

単語誤り率は、ＷＥＲ＝（Ｉ＋Ｄ＋Ｓ）／Ｎにより計算される。ここで、Ｉは正しい字幕に至るまでに必要な挿入文字数を示し、Ｄは正しい字幕に至るまでに必要な削除文字数を示し、Ｓは正しい字幕に至るまでに必要な置換文字数を示し、Ｎは正しい字幕の文字数を示す。 The word error rate is calculated by WER = (I + D + S) / N. Here, I indicates the number of inserted characters necessary for reaching the correct subtitle, D indicates the number of deleted characters necessary for reaching the correct subtitle, S indicates the number of replacement characters required until the correct subtitle is reached, and N indicates Indicates the correct number of subtitle characters.

図中の大きさの異なる箱は、各方法のＷＥＲのうち第１四分位点から第３四分位点までの範囲に含まれるサンプルのＷＥＲの分布を示す。矩形内の中央線は、矩形に含まれるサンプルのＷＥＲの中央値を示す。−（ひげ）及び＋のプロットは、第１四分位点から第３四分位点までの範囲の１．５倍の範囲に含まれるサンプル、及び、当該１．５倍の範囲に含まれずデータ範囲全体に含まれるサンプルのＷＥＲを示す。 The boxes of different sizes in the figure show the WER distribution of the samples included in the range from the first quartile to the third quartile among the WERs of each method. The center line in the rectangle indicates the median WER of the samples included in the rectangle. The-(beard) and + plots are included in the range of 1.5 times the range from the first quartile to the third quartile, and not in the 1.5 times range. The WER of the samples included in the entire data range is shown.

図示するように、本実施形態の支援装置１０による結果（ＣａｐＣａｐ）は、矩形が約０％のＷＥＲと一致し、最もＷＥＲが低い結果となった。これにより、本実施形態の支援装置１０によると、他の方法よりも優れた字幕の生成を支援できることは明らかである。 As shown in the figure, the result (CapCap) by the support device 10 of the present embodiment is the result that the rectangle matches the WER of about 0%, and the WER is the lowest. As a result, according to the support device 10 of the present embodiment, it is apparent that subtitle generation superior to other methods can be supported.

図７は、本実施形態における支援装置１０の効果の一例を示す箱ひげ図である。図７は、本実施形態の支援装置１０を用いた場合に、Ｓ１１０のＡＳＲ段階のみにより生成された字幕（左側の列）、及び、Ｓ１１０のＡＳＲ段階→Ｓ１２０のＦＩＸ→Ｓ１３０のＴＹＰＥ→Ｓ１５０のＭＥＲＧＥを経て生成された字幕（右側の列）を示す。図示するように、ＡＳＲにより生成された字幕より、ＦＩＸ、ＴＹＰＥ、及びＭＥＲＧＥを経て生成された字幕の方が低いＷＥＲを有する。 FIG. 7 is a box-and-whisker diagram showing an example of the effect of the support device 10 in the present embodiment. FIG. 7 shows subtitles (left column) generated only by the ASR stage of S110 and the ASR stage of S110 → FIX of S120 → TYPE of S130 → TYPE S150 when the support apparatus 10 of the present embodiment is used. The subtitles (right column) generated through MERGE are shown. As illustrated, subtitles generated through FIX, TYPE, and MERGE have lower WER than subtitles generated by ASR.

図８は、本実施形態における支援装置１０の効果の別の一例を示す箱ひげ図である。図８は、本実施形態の支援装置１０を用いた場合に、Ｓ１１０のＡＳＲ段階→Ｓ１２０のＦＩＸ→Ｓ１３０のＴＹＰＥ→Ｓ１５０のＭＥＲＧＥを経て生成された字幕（左側の列）、及び、Ｓ１１０のＡＳＲ段階→Ｓ１２０のＦＩＸ→Ｓ１３０のＴＹＰＥ→Ｓ１５０のＭＥＲＧＥ→Ｓ１２０のＦＩＸ→Ｓ１３０のＴＹＰＥ→Ｓ１５０のＭＥＲＧＥを経て生成された字幕（右側の列）を示す。 FIG. 8 is a box-and-whisker diagram showing another example of the effect of the support device 10 in the present embodiment. FIG. 8 shows subtitles (left column) generated through the ASR stage of S110, the FIX of S120, the TYPE of S130, the MERGE of S150, and the ASR of S110 when the support apparatus 10 of the present embodiment is used. The subtitles (right column) generated through the steps FIX of S120, TYPE of S130, MERGE of S150, FIX of S120, TYPE of S130, and MERGE of S150 are shown.

図８に示すように、ＦＩＸ、ＴＹＰＥ、及びＭＥＲＧＥの繰り返しを１回のみ経て生成された字幕より、繰り返しを２回経て生成された字幕の方が低いＷＥＲを有する。従って、本実施形態の支援装置１０により、ＦＩＸ、ＴＹＰＥ、及びＭＥＲＧＥの繰り返しを少なくとも２回実行することにより、十分に正確性の高い字幕の生成を支援できることが明らかである。例えば、支援装置１０は、Ｓ１２０→Ｓ１３０→Ｓ１４０→Ｓ１５０の繰り返し処理を予め定められた回数（例えば、２回）経た場合には、繰り返し処理を終了してＳ１７０に処理を進めてもよい。 As shown in FIG. 8, subtitles generated after two repetitions have lower WER than subtitles generated after only one FIX, TYPE, and MERGE repetition. Therefore, it is apparent that the support apparatus 10 of the present embodiment can support generation of sufficiently accurate subtitles by executing FIX, TYPE, and MERGE at least twice. For example, when the repetitive process of S120 → S130 → S140 → S150 has passed a predetermined number of times (for example, twice), the support apparatus 10 may end the repetitive process and proceed to S170.

図９は、本実施形態の変形例における支援装置１０の処理フローを示す。本変形例において、編集部１０６は、入力された第１表現情報を順次第１ユーザにより編集させていき、基準回数の編集後の第２表現情報を取得し、予め定められた基準回数の編集において第１表現情報と第２表現情報とが相違すると判断部１０８が判断した場合に、入力部１１０は第３表現情報を入力する。 FIG. 9 shows a processing flow of the support device 10 in a modification of the present embodiment. In this modification, the editing unit 106 sequentially edits the input first expression information by the first user, acquires the second expression information after editing the reference number, and edits the predetermined reference number. When the determination unit 108 determines that the first expression information and the second expression information are different from each other, the input unit 110 inputs the third expression information.

以下、本変形例において、図２において説明した実施形態と同様の部分については説明を省略することがある。例えば、本変形例の支援装置１０は、Ｓ２１０、Ｓ２２０、Ｓ２４０、及び、Ｓ２７０の処理を、それぞれ図２のＳ１１０、Ｓ１２０、Ｓ１４０、及び、Ｓ１７０の処理と同様に実行してよい。 Hereinafter, in this modification, the description of the same part as the embodiment described in FIG. 2 may be omitted. For example, the support device 10 according to the present modification may execute the processes of S210, S220, S240, and S270 in the same manner as the processes of S110, S120, S140, and S170 in FIG.

本変形例のＳ２３０において、判断部１０８は、直前のＳ２２０の編集において、第１表現情報と当該第１表現情報を編集した第２表現情報とが相違するか否か判断する。判断部１０８は、判断部１０８が第１表現情報と第２表現情報とが相違すると判断した場合は処理をＳ２３２に進め、第１表現情報と第２表現情報とが相違しないと判断した場合は処理をＳ２６０に進める。 In S230 of this modification, the determination unit 108 determines whether or not the first expression information is different from the second expression information obtained by editing the first expression information in the previous editing in S220. If the determination unit 108 determines that the first expression information and the second expression information are different, the determination unit 108 advances the process to S232, and determines that the first expression information and the second expression information are not different. The process proceeds to S260.

Ｓ２３２において、判断部１０８は、Ｓ１２０において、予め定められた基準回数（ｍ回）の連続した編集において第１表現情報と第２表現情報とが相違するか否か判断する。例えば、判断部１０８は、予め定められた基準回数連続してＳ２２０→Ｓ２３０→Ｓ２３２と進む処理が基準回数連続して繰り返されたか否か判断する。判断の結果が肯定的である場合、判断部１０８は、処理をＳ２４０に進め、そうでない場合には処理をＳ２２０に戻す。 In S232, the determination unit 108 determines whether or not the first expression information and the second expression information are different in the continuous editing of a predetermined reference number (m times) in S120. For example, the determination unit 108 determines whether or not the process of proceeding from S220 → S230 → S232 continuously for a predetermined reference number is repeated for the reference number of times. If the result of the determination is affirmative, the determination unit 108 proceeds with the process to S240, and otherwise returns the process to S220.

Ｓ２６０において、判断部１０８は、Ｓ２２０の複数回の編集において、予め定められた基準回数（ｎ回）の連続した編集において、第１表現情報と第２表現情報とが一致したか否か判断する。例えば、判断部１０８は、予め定められた基準回数連続してＳ２２０→Ｓ２３０→Ｓ２６０と進む処理が基準回数連続して繰り返されたか否か判断する。なお、ｎ及びｍは、同一又は異なる自然数であってよい。例えば、ｎ＝２、ｍ＝２であってよい。 In S260, the determination unit 108 determines whether or not the first expression information and the second expression information match in the continuous editing of a predetermined reference number (n times) in the plurality of edits in S220. . For example, the determination unit 108 determines whether or not the process of advancing from S220 → S230 → S260 for a predetermined reference number of times is repeated for the reference number of times. Note that n and m may be the same or different natural numbers. For example, n = 2 and m = 2 may be set.

判断部１０８は、予め定められた基準回数連続した回数の編集において、第１表現情報と第２表現情報とが一致したと判断する場合は処理をＳ２７０に進め、そうでない場合は処理をＳ２２０に戻す。 If the determination unit 108 determines that the first expression information and the second expression information coincide with each other in the predetermined number of consecutive edits, the process proceeds to S270. If not, the process proceeds to S220. return.

このように、本変形例の支援装置１０は、編集において修正がないことが基準回数連続して発生するまで、編集部１０６による編集と入力部１１０による入力を繰り返すことに加え、編集における修正が基準回数連続して発生するまで編集部１０６による編集を繰り返す。これにより、本変形例の支援装置１０によれば、複数回の連続する編集を経て品質がより向上した第２表現情報と第３表現情報とを統合するので、最終的に生成する表現情報の品質を更に向上することができる。 As described above, the support device 10 according to the present modified example repeats editing by the editing unit 106 and input by the input unit 110 until correction is not performed in editing, and the correction by editing is repeated until the standard number of times occurs. Editing by the editing unit 106 is repeated until the standard number of occurrences continues. Thereby, according to the support device 10 of the present modification example, the second expression information and the third expression information whose quality is further improved through a plurality of continuous edits are integrated, so the expression information to be finally generated The quality can be further improved.

ここまで説明した本実施形態及び変形例の支援装置１０は、表現対象として音声を含む情報を用い、第１表現情報、第２表現情報、及び、第３表現情報として音声の内容を表す字幕等のテキストを生成することを支援したが、これに限られない。例えば、支援装置１０は、表現対象として音声に代えて／加えて、写真、イラスト、テキスト及び／又は符号等を含む静止画、及び／又は、動画等を用いてよく、表現情報は表現対象に対応する字幕、翻訳、及び／又は、音符等でもよい。 The support device 10 according to the present embodiment and the modification described so far uses information including sound as an expression target, and includes first expression information, second expression information, subtitles representing the contents of sound as third expression information, and the like. Assisted in generating the text of, but is not limited to this. For example, the support apparatus 10 may use a still image including a photograph, an illustration, a text, and / or a code, and / or a moving image instead of / in addition to the voice as the expression target, and the expression information is the expression target. Corresponding subtitles, translations, and / or musical notes may be used.

図１０は、支援装置１０として機能するコンピュータ１９００のハードウェア構成の一例を示す。本実施形態に係るコンピュータ１９００は、ホスト・コントローラ２０８２により相互に接続されるＣＰＵ２０００、ＲＡＭ２０２０、グラフィック・コントローラ２０７５、及び表示装置２０８０を有するＣＰＵ周辺部と、入出力コントローラ２０８４によりホスト・コントローラ２０８２に接続される通信インターフェイス２０３０、ハードディスクドライブ２０４０、及びＣＤ−ＲＯＭドライブ２０６０を有する入出力部と、入出力コントローラ２０８４に接続されるＲＯＭ２０１０、フレキシブルディスク・ドライブ２０５０、及び入出力チップ２０７０を有するレガシー入出力部を備える。 FIG. 10 shows an example of a hardware configuration of a computer 1900 that functions as the support apparatus 10. A computer 1900 according to this embodiment is connected to a CPU peripheral unit having a CPU 2000, a RAM 2020, a graphic controller 2075, and a display device 2080 that are connected to each other by a host controller 2082, and to the host controller 2082 by an input / output controller 2084. Input / output unit having communication interface 2030, hard disk drive 2040, and CD-ROM drive 2060, and legacy input / output unit having ROM 2010, flexible disk drive 2050, and input / output chip 2070 connected to input / output controller 2084 Is provided.

ホスト・コントローラ２０８２は、ＲＡＭ２０２０と、高い転送レートでＲＡＭ２０２０をアクセスするＣＰＵ２０００及びグラフィック・コントローラ２０７５とを接続する。ＣＰＵ２０００は、ＲＯＭ２０１０及びＲＡＭ２０２０に格納されたプログラムに基づいて動作し、各部の制御を行う。グラフィック・コントローラ２０７５は、ＣＰＵ２０００等がＲＡＭ２０２０内に設けたフレーム・バッファ上に生成する画像データを取得し、表示装置２０８０上に表示させる。これに代えて、グラフィック・コントローラ２０７５は、ＣＰＵ２０００等が生成する画像データを格納するフレーム・バッファを、内部に含んでもよい。 The host controller 2082 connects the RAM 2020 to the CPU 2000 and the graphic controller 2075 that access the RAM 2020 at a high transfer rate. The CPU 2000 operates based on programs stored in the ROM 2010 and the RAM 2020 and controls each unit. The graphic controller 2075 acquires image data generated by the CPU 2000 or the like on a frame buffer provided in the RAM 2020 and displays it on the display device 2080. Instead of this, the graphic controller 2075 may include a frame buffer for storing image data generated by the CPU 2000 or the like.

入出力コントローラ２０８４は、ホスト・コントローラ２０８２と、比較的高速な入出力装置である通信インターフェイス２０３０、ハードディスクドライブ２０４０、ＣＤ−ＲＯＭドライブ２０６０を接続する。通信インターフェイス２０３０は、有線又は無線によりネットワークを介して他の装置と通信する。また、通信インターフェイスは、通信を行うハードウェアとして機能する。ハードディスクドライブ２０４０は、コンピュータ１９００内のＣＰＵ２０００が使用するプログラム及びデータを格納する。ＣＤ−ＲＯＭドライブ２０６０は、ＣＤ−ＲＯＭ２０９５からプログラム又はデータを読み取り、ＲＡＭ２０２０を介してハードディスクドライブ２０４０に提供する。 The input / output controller 2084 connects the host controller 2082 to the communication interface 2030, the hard disk drive 2040, and the CD-ROM drive 2060, which are relatively high-speed input / output devices. The communication interface 2030 communicates with other devices via a network by wire or wireless. The communication interface functions as hardware that performs communication. The hard disk drive 2040 stores programs and data used by the CPU 2000 in the computer 1900. The CD-ROM drive 2060 reads a program or data from the CD-ROM 2095 and provides it to the hard disk drive 2040 via the RAM 2020.

また、入出力コントローラ２０８４には、ＲＯＭ２０１０と、フレキシブルディスク・ドライブ２０５０、及び入出力チップ２０７０の比較的低速な入出力装置とが接続される。ＲＯＭ２０１０は、コンピュータ１９００が起動時に実行するブート・プログラム、及び／又は、コンピュータ１９００のハードウェアに依存するプログラム等を格納する。フレキシブルディスク・ドライブ２０５０は、フレキシブルディスク２０９０からプログラム又はデータを読み取り、ＲＡＭ２０２０を介してハードディスクドライブ２０４０に提供する。入出力チップ２０７０は、フレキシブルディスク・ドライブ２０５０を入出力コントローラ２０８４へと接続するとともに、例えばパラレル・ポート、シリアル・ポート、キーボード・ポート、マウス・ポート等を介して各種の入出力装置を入出力コントローラ２０８４へと接続する。 The input / output controller 2084 is connected to the ROM 2010, the flexible disk drive 2050, and the input / output chip 2070, which are relatively low-speed input / output devices. The ROM 2010 stores a boot program that the computer 1900 executes at startup and / or a program that depends on the hardware of the computer 1900. The flexible disk drive 2050 reads a program or data from the flexible disk 2090 and provides it to the hard disk drive 2040 via the RAM 2020. The input / output chip 2070 connects the flexible disk drive 2050 to the input / output controller 2084 and inputs / outputs various input / output devices via, for example, a parallel port, a serial port, a keyboard port, a mouse port, and the like. Connect to controller 2084.

ＲＡＭ２０２０を介してハードディスクドライブ２０４０に提供されるプログラムは、フレキシブルディスク２０９０、ＣＤ−ＲＯＭ２０９５、又はＩＣカード等の記録媒体に格納されて利用者によって提供される。プログラムは、記録媒体から読み出され、ＲＡＭ２０２０を介してコンピュータ１９００内のハードディスクドライブ２０４０にインストールされ、ＣＰＵ２０００において実行される。 A program provided to the hard disk drive 2040 via the RAM 2020 is stored in a recording medium such as the flexible disk 2090, the CD-ROM 2095, or an IC card and provided by the user. The program is read from the recording medium, installed in the hard disk drive 2040 in the computer 1900 via the RAM 2020, and executed by the CPU 2000.

コンピュータ１９００にインストールされ、コンピュータ１９００を支援装置１０として機能させるプログラムは、分割モジュールと、自動認識モジュールと、編集モジュールと、判断モジュールと、入力モジュールと、第１統合モジュールと、制御モジュールと、第２統合モジュールとを備える。これらのプログラム又はモジュールは、ＣＰＵ２０００等に働きかけて、コンピュータ１９００を、分割部１０２と、自動認識部１０４と、編集部１０６と、判断部１０８と、入力部１１０と、第１統合部１１２と、制御部１１４と、第２統合部１２０としてそれぞれ機能させてよい。 A program that is installed in the computer 1900 and causes the computer 1900 to function as the support device 10 includes a division module, an automatic recognition module, an editing module, a determination module, an input module, a first integration module, a control module, and a first module. 2 integrated modules. These programs or modules work on the CPU 2000 or the like to make the computer 1900 into a dividing unit 102, an automatic recognition unit 104, an editing unit 106, a determination unit 108, an input unit 110, a first integration unit 112, You may function as the control part 114 and the 2nd integration part 120, respectively.

これらのプログラムに記述された情報処理は、コンピュータ１９００に読込まれることにより、ソフトウェアと上述した各種のハードウェア資源とが協働した具体的手段である分割部１０２と、自動認識部１０４と、編集部１０６と、判断部１０８と、入力部１１０と、第１統合部１１２と、制御部１１４と、第２統合部１２０として機能する。そして、これらの具体的手段によって、本実施形態におけるコンピュータ１９００の使用目的に応じた情報の演算又は加工を実現することにより、使用目的に応じた特有の支援装置１０が構築される。 The information processing described in these programs is read into the computer 1900, whereby the dividing unit 102, which is a specific means in which the software and the various hardware resources described above cooperate, the automatic recognition unit 104, The editing unit 106, the determination unit 108, the input unit 110, the first integration unit 112, the control unit 114, and the second integration unit 120 function. And the specific assistance apparatus 10 according to the intended purpose is constructed | assembled by implement | achieving the calculation or processing of the information according to the intended purpose of the computer 1900 in this embodiment by these specific means.

一例として、コンピュータ１９００と外部の装置等との間で通信を行う場合には、ＣＰＵ２０００は、ＲＡＭ２０２０上にロードされた通信プログラムを実行し、通信プログラムに記述された処理内容に基づいて、通信インターフェイス２０３０に対して通信処理を指示する。通信インターフェイス２０３０は、ＣＰＵ２０００の制御を受けて、ＲＡＭ２０２０、ハードディスクドライブ２０４０、フレキシブルディスク２０９０、又はＣＤ−ＲＯＭ２０９５等の記憶装置上に設けた送信バッファ領域等に記憶された送信データを読み出してネットワークへと送信し、もしくは、ネットワークから受信した受信データを記憶装置上に設けた受信バッファ領域等へと書き込む。このように、通信インターフェイス２０３０は、ＤＭＡ（ダイレクト・メモリ・アクセス）方式により記憶装置との間で送受信データを転送してもよく、これに代えて、ＣＰＵ２０００が転送元の記憶装置又は通信インターフェイス２０３０からデータを読み出し、転送先の通信インターフェイス２０３０又は記憶装置へとデータを書き込むことにより送受信データを転送してもよい。 As an example, when communication is performed between the computer 1900 and an external device or the like, the CPU 2000 executes a communication program loaded on the RAM 2020 and executes a communication interface based on the processing content described in the communication program. A communication process is instructed to 2030. Under the control of the CPU 2000, the communication interface 2030 reads transmission data stored in a transmission buffer area or the like provided on a storage device such as the RAM 2020, the hard disk drive 2040, the flexible disk 2090, or the CD-ROM 2095, and sends it to the network. The reception data transmitted or received from the network is written into a reception buffer area or the like provided on the storage device. As described above, the communication interface 2030 may transfer transmission / reception data to / from the storage device by a DMA (direct memory access) method. Instead, the CPU 2000 transfers the storage device or the communication interface 2030 as a transfer source. The transmission / reception data may be transferred by reading the data from the data and writing the data to the communication interface 2030 or the storage device of the transfer destination.

また、ＣＰＵ２０００は、ハードディスクドライブ２０４０、ＣＤ−ＲＯＭドライブ２０６０（ＣＤ−ＲＯＭ２０９５）、フレキシブルディスク・ドライブ２０５０（フレキシブルディスク２０９０）等の外部記憶装置に格納されたファイルまたはデータベース等の中から、全部または必要な部分をＤＭＡ転送等によりＲＡＭ２０２０へと読み込ませ、ＲＡＭ２０２０上のデータに対して各種の処理を行う。そして、ＣＰＵ２０００は、処理を終えたデータを、ＤＭＡ転送等により外部記憶装置へと書き戻す。このような処理において、ＲＡＭ２０２０は、外部記憶装置の内容を一時的に保持するものとみなせるから、本実施形態においてはＲＡＭ２０２０及び外部記憶装置等をメモリ、記憶部、または記憶装置等と総称する。 The CPU 2000 is all or necessary from among files or databases stored in an external storage device such as a hard disk drive 2040, a CD-ROM drive 2060 (CD-ROM 2095), and a flexible disk drive 2050 (flexible disk 2090). This portion is read into the RAM 2020 by DMA transfer or the like, and various processes are performed on the data on the RAM 2020. Then, CPU 2000 writes the processed data back to the external storage device by DMA transfer or the like. In such processing, since the RAM 2020 can be regarded as temporarily holding the contents of the external storage device, in the present embodiment, the RAM 2020 and the external storage device are collectively referred to as a memory, a storage unit, or a storage device.

本実施形態における各種のプログラム、データ、テーブル、データベース等の各種の情報は、このような記憶装置上に格納されて、情報処理の対象となる。なお、ＣＰＵ２０００は、ＲＡＭ２０２０の一部をキャッシュメモリに保持し、キャッシュメモリ上で読み書きを行うこともできる。このような形態においても、キャッシュメモリはＲＡＭ２０２０の機能の一部を担うから、本実施形態においては、区別して示す場合を除き、キャッシュメモリもＲＡＭ２０２０、メモリ、及び／又は記憶装置に含まれるものとする。 Various types of information such as various programs, data, tables, and databases in the present embodiment are stored on such a storage device and are subjected to information processing. Note that the CPU 2000 can also store a part of the RAM 2020 in the cache memory and perform reading and writing on the cache memory. Even in such a form, the cache memory bears a part of the function of the RAM 2020. Therefore, in the present embodiment, the cache memory is also included in the RAM 2020, the memory, and / or the storage device unless otherwise indicated. To do.

また、ＣＰＵ２０００は、ＲＡＭ２０２０から読み出したデータに対して、プログラムの命令列により指定された、本実施形態中に記載した各種の演算、情報の加工、条件判断、情報の検索・置換等を含む各種の処理を行い、ＲＡＭ２０２０へと書き戻す。例えば、ＣＰＵ２０００は、条件判断を行う場合においては、本実施形態において示した各種の変数が、他の変数または定数と比較して、大きい、小さい、以上、以下、等しい等の条件を満たすか否かを判断し、条件が成立した場合（又は不成立であった場合）に、異なる命令列へと分岐し、またはサブルーチンを呼び出す。 In addition, the CPU 2000 performs various operations, such as various operations, information processing, condition determination, information search / replacement, etc., described in the present embodiment, specified for the data read from the RAM 2020 by the instruction sequence of the program. Is written back to the RAM 2020. For example, when performing the condition determination, the CPU 2000 determines whether or not the various variables shown in the present embodiment satisfy the conditions such as large, small, above, below, equal, etc., compared to other variables or constants. If the condition is satisfied (or not satisfied), the program branches to a different instruction sequence or calls a subroutine.

また、ＣＰＵ２０００は、記憶装置内のファイルまたはデータベース等に格納された情報を検索することができる。例えば、第１属性の属性値に対し第２属性の属性値がそれぞれ対応付けられた複数のエントリが記憶装置に格納されている場合において、ＣＰＵ２０００は、記憶装置に格納されている複数のエントリの中から第１属性の属性値が指定された条件と一致するエントリを検索し、そのエントリに格納されている第２属性の属性値を読み出すことにより、所定の条件を満たす第１属性に対応付けられた第２属性の属性値を得ることができる。 Further, the CPU 2000 can search for information stored in a file or database in the storage device. For example, in the case where a plurality of entries in which the attribute value of the second attribute is associated with the attribute value of the first attribute are stored in the storage device, the CPU 2000 displays the plurality of entries stored in the storage device. The entry that matches the condition in which the attribute value of the first attribute is specified is retrieved, and the attribute value of the second attribute that is stored in the entry is read, thereby associating with the first attribute that satisfies the predetermined condition The attribute value of the specified second attribute can be obtained.

以上に示したプログラム又はモジュールは、外部の記録媒体に格納されてもよい。記録媒体としては、フレキシブルディスク２０９０、ＣＤ−ＲＯＭ２０９５の他に、ＤＶＤ又はＣＤ等の光学記録媒体、ＭＯ等の光磁気記録媒体、テープ媒体、ＩＣカード等の半導体メモリ等を用いることができる。また、専用通信ネットワーク又はインターネットに接続されたサーバシステムに設けたハードディスク又はＲＡＭ等の記憶装置を記録媒体として使用し、ネットワークを介してプログラムをコンピュータ１９００に提供してもよい。 The program or module shown above may be stored in an external recording medium. As the recording medium, in addition to the flexible disk 2090 and the CD-ROM 2095, an optical recording medium such as DVD or CD, a magneto-optical recording medium such as MO, a tape medium, a semiconductor memory such as an IC card, and the like can be used. Further, a storage device such as a hard disk or RAM provided in a server system connected to a dedicated communication network or the Internet may be used as a recording medium, and the program may be provided to the computer 1900 via the network.

以上、本発明を実施の形態を用いて説明したが、本発明の技術的範囲は上記実施の形態に記載の範囲には限定されない。上記実施の形態に、多様な変更または改良を加えることが可能であることが当業者に明らかである。その様な変更または改良を加えた形態も本発明の技術的範囲に含まれ得ることが、特許請求の範囲の記載から明らかである。 As mentioned above, although this invention was demonstrated using embodiment, the technical scope of this invention is not limited to the range as described in the said embodiment. It will be apparent to those skilled in the art that various modifications or improvements can be added to the above-described embodiment. It is apparent from the scope of the claims that the embodiments added with such changes or improvements can be included in the technical scope of the present invention.

特許請求の範囲、明細書、および図面中において示した装置、システム、プログラム、および方法における動作、手順、ステップ、および段階等の各処理の実行順序は、特段「より前に」、「先立って」等と明示しておらず、また、前の処理の出力を後の処理で用いるのでない限り、任意の順序で実現しうることに留意すべきである。特許請求の範囲、明細書、および図面中の動作フローに関して、便宜上「まず、」、「次に、」等を用いて説明したとしても、この順で実施することが必須であることを意味するものではない。 The order of execution of each process such as operations, procedures, steps, and stages in the apparatus, system, program, and method shown in the claims, the description, and the drawings is particularly “before” or “prior to”. It should be noted that the output can be realized in any order unless the output of the previous process is used in the subsequent process. Regarding the operation flow in the claims, the description, and the drawings, even if it is described using “first”, “next”, etc. for convenience, it means that it is essential to carry out in this order. It is not a thing.

１０支援装置、１０２分割部、１０４自動認識部、１０６編集部、１０８判断部、１１０入力部、１１２第１統合部、１１４制御部、１２０第２統合部、１９００コンピュータ、２０００ＣＰＵ、２０１０ＲＯＭ、２０２０ＲＡＭ、２０３０通信インターフェイス、２０４０ハードディスクドライブ、２０５０フレキシブルディスク・ドライブ、２０６０ＣＤ−ＲＯＭドライブ、２０７０入出力チップ、２０７５グラフィック・コントローラ、２０８０表示装置、２０８２ホスト・コントローラ、２０８４入出力コントローラ、２０９０フレキシブルディスク、２０９５ＣＤ−ＲＯＭ DESCRIPTION OF SYMBOLS 10 Support apparatus, 102 division | segmentation part, 104 automatic recognition part, 106 edit part, 108 judgment part, 110 input part, 112 1st integration part, 114 control part, 120 2nd integration part, 1900 computer, 2000 CPU, 2010 ROM, 2020 RAM, 2030 communication interface, 2040 hard disk drive, 2050 flexible disk drive, 2060 CD-ROM drive, 2070 input / output chip, 2075 graphic controller, 2080 display device, 2082 host controller, 2084 input / output controller, 2090 flexible disk 2095 CD-ROM

Claims

A support device that supports creation of expression information by a plurality of users,
An editing unit that causes any one of a plurality of users to edit the first expression information expressing the expression target, and obtains the second expression information;
An input unit for inputting new third expression information expressing the expression object from any one of a plurality of users;
A first integration unit that integrates the second expression information and the third expression information to generate integrated expression information;
A support device comprising:

A controller that supplies the integrated expression information as new first expression information to the editing unit and repeats the processing by the editing unit;
The editing unit receives the new first expression information, presents it to any one of a plurality of users and edits it, and acquires another second expression information from the user.
The support device according to claim 1.

The input unit inputs another third expression information from any one of a plurality of users,
The first integration unit generates the integrated expression information by integrating the other second expression information and the other third expression information.
The support device according to claim 2.

A determination unit for determining whether the first expression information is different from the second expression information obtained by editing the first expression information;
In response to determining that the first expression information and the second expression information are different from each other, the input unit acquires the third expression information from any one of the plurality of users. ,
The support device according to any one of claims 1 to 3.

The editing unit sequentially edits the input first expression information by any one of a plurality of users, acquires the second expression information after editing the reference number of times,
When the determination unit determines that the first expression information and the second expression information are different in editing by the editing unit with a predetermined reference number of times, the input unit inputs the third expression information. To
The support device according to claim 4.

In response to the determination unit determining that the first expression information is not different from the second expression information, the editing unit again determines the first expression information as one of the plurality of users. Present and edit to the user
The support device according to claim 4 or 5.

The determination unit cancels editing by the editing unit when it is determined that the first expression information and the second expression information coincide with each other in a predetermined number of consecutive second reference number edits. Let
The support device according to any one of claims 4 to 6.

An automatic recognition unit that automatically generates first expression information from the expression object;
The editing unit causes a user to edit the first expression information generated by the automatic recognition unit in the first iteration.
The support apparatus according to claim 2 or 3.

A second integration unit that integrates one or more input first expression information, one or more second expression information, and one or more third expression information;
Outputting the expression information integrated by the second integration unit as created expression information;
The support device according to claim 1.

A division unit that divides the content to be expressed and generates a plurality of the expression objects;
The support apparatus of any one of Claim 1 to 9.

The expression object includes sound,
The first expression information, the second expression information, and the third expression information are texts representing the contents of the speech.
The support device according to any one of claims 1 to 10.

An information processing method executed by a computer to support creation of expression information by a plurality of users,
An editing stage in which the first expression information expressing the expression object is edited by any one of a plurality of users and acquired as second expression information;
An input step of inputting new third expression information expressing the expression object from any one of a plurality of users;
A first integration step of generating integrated expression information by integrating the second expression information and the third expression information;
An information processing method comprising:

A program for supporting creation of expression information by a plurality of users,
When executed on a computer, the computer is
An editing unit that causes any one of a plurality of users to edit the first expression information expressing the expression target, and obtains the second expression information;
An input unit for inputting new third expression information expressing the expression object from any one of a plurality of users;
A first integration unit that integrates the second expression information and the third expression information to generate integrated expression information;
Program to make it work.