JP6413828B2

JP6413828B2 - Information processing method, information processing apparatus, and program

Info

Publication number: JP6413828B2
Application number: JP2015031888A
Authority: JP
Inventors: 克朗須田
Original assignee: Brother Industries Ltd
Current assignee: Brother Industries Ltd
Priority date: 2015-02-20
Filing date: 2015-02-20
Publication date: 2018-10-31
Anticipated expiration: 2035-02-20
Also published as: JP2016153958A

Description

本発明は、カラオケ用の動画のメタ情報を生成する方法の技術分野に関する。 The present invention relates to a technical field of a method for generating meta information of a moving image for karaoke.

従来、動画データをオンラインで配信する動画サイトが知られている。動画サイトは、例えば動画データに関連付けてメタ情報を記憶しておき、ユーザから指定された条件に基づいてメタ情報を検索することにより、ユーザが所望する動画データを配信する。メタ情報は、例えば動画データの作者により入力された情報に基づいて生成されたり、動画データを解析することにより生成されたりする。例えば、特許文献１には、動画データに含まれるフレーム画像から文字情報や画質に関する情報を抽出することにより、メタ情報を生成する技術が開示されている。 Conventionally, video sites that distribute video data online are known. For example, the moving image site stores meta information in association with moving image data, and searches for meta information based on conditions specified by the user, thereby distributing moving image data desired by the user. The meta information is generated based on, for example, information input by the creator of the moving image data, or is generated by analyzing the moving image data. For example, Patent Literature 1 discloses a technique for generating meta information by extracting character information and information about image quality from a frame image included in moving image data.

特開２０１０−０２０６３０号公報JP 2010-020630 A

一般的に、カラオケの動画データは、映像情報、楽曲の音情報、歌詞情報等の複数の材料情報から生成される。この場合、動画サイトは、生成されたカラオケ動画データ登録して配信する。カラオケ動画データのメタ情報を生成する場合、カラオケ動画データを解析する必要がある。しかしながら、カラオケ動画データからは、元になった材料の情報が有する詳細な情報が失われている。そのため、カラオケ動画データからは元の詳細な情報を正確に抽出することが難しいので、正確性に欠けるメタ情報が生成されてしまうという問題がある。 In general, karaoke video data is generated from a plurality of material information such as video information, music sound information, and lyrics information. In this case, the video site registers and distributes the generated karaoke video data. When generating meta information of karaoke video data, it is necessary to analyze the karaoke video data. However, from the karaoke video data, detailed information included in the original material information is lost. For this reason, since it is difficult to accurately extract the original detailed information from the karaoke video data, there is a problem that meta information lacking accuracy is generated.

本発明は、以上の点に鑑みてなされたものであり、カラオケ動画を検索するためのメタ情報として正確な情報を容易に生成することを可能とする情報処理方法等を提供することを課題とする。 The present invention has been made in view of the above points, and it is an object to provide an information processing method and the like that can easily generate accurate information as meta-information for searching for a karaoke video. To do.

請求項１に記載の発明は、一時記憶手段と、取得手段と、展開手段と、カラオケ動画生成手段と、メタ情報生成手段とを備える情報処理装置のコンピュータにより実行される情報処理方法であって、前記取得手段が、カラオケ動画の材料となる複数の材料情報であって、映像情報、カラオケ楽曲の演奏音を示す音情報、及び前記カラオケ楽曲の歌詞を示す歌詞情報であって、前記カラオケ楽曲の演奏が開始されてから前記歌詞の表示が開始されるまでの経過時間を含む歌詞情報を少なくとも含む複数の材料情報を取得する取得ステップと、前記展開手段が、前記取得ステップにより取得された前記複数の材料情報を前記一時記憶手段に１回展開する展開ステップと、前記カラオケ動画生成手段が、前記展開ステップにより前記一時記憶手段に前記複数の材料情報が１回展開されている状態における前記複数の材料情報に基づいて、前記カラオケ動画を生成するカラオケ動画生成ステップと、前記メタ情報生成手段が、前記展開ステップにより前記一時記憶手段に前記複数の材料情報が１回展開されている状態における前記複数の材料情報のうち少なくとも１つの材料情報に基づいて、前記カラオケ動画の検索に用いられるメタ情報であって、前記歌詞情報に含まれる前記経過時間を、歌唱の開始時刻として含むメタ情報を生成するメタ情報生成ステップと、を含むことを特徴とする。 The invention according to claim 1 is an information processing method executed by a computer of an information processing apparatus including temporary storage means, acquisition means, expansion means, karaoke video generation means, and meta information generation means. The acquisition means is a plurality of material information used as a material for a karaoke video, and includes video information, sound information indicating a performance sound of karaoke music, and lyrics information indicating lyrics of the karaoke music, wherein the karaoke music An acquisition step of acquiring a plurality of material information including at least lyrics information including an elapsed time from the start of the performance until the display of the lyrics is started, and the expanding means is acquired by the acquisition step An expansion step of expanding a plurality of material information once in the temporary storage means, and the karaoke video generation means is stored in the temporary storage means by the expansion step. A karaoke video generation step for generating the karaoke video based on the plurality of material information in a state in which the plurality of material information is expanded once, and the meta information generation means are the temporary storage means by the expansion step. Meta information used for searching the karaoke video based on at least one material information among the plurality of material information in a state where the plurality of material information is expanded once, and included in the lyrics information A meta information generating step of generating meta information including the elapsed time as a singing start time.

請求項２に記載の発明は、一時記憶手段と、取得手段と、展開手段と、カラオケ動画生成手段と、語抽出手段と、比較手段と、難易度決定手段と、メタ情報生成手段とを備える情報処理装置のコンピュータにより実行される情報処理方法であって、前記取得手段が、カラオケ動画の材料となる複数の材料情報であって、映像情報、カラオケ楽曲の演奏音を示す音情報、及び前記カラオケ楽曲の歌詞を示す歌詞情報であって、前記歌詞に含まれる語ごとに、前記語の表示の開始から終了までの表示時間を含む歌詞情報を少なくとも含む複数の材料情報を取得する取得ステップと、前記展開手段が、前記取得ステップにより取得された前記複数の材料情報を前記一時記憶手段に１回展開する展開ステップと、前記カラオケ動画生成手段が、前記展開ステップにより前記一時記憶手段に前記複数の材料情報が１回展開されている状態における前記複数の材料情報に基づいて、前記カラオケ動画を生成するカラオケ動画生成ステップと、前記語抽出手段が、歌唱の難度が高い語として予め定められた高難度語と、前記高難度語の歌唱の基準時間とを高難度語ごとに対応付けて記憶する第１記憶手段に記憶された複数の高難度語のうち、前記一時記憶手段に展開された前記歌詞情報に含まれる高難度語を抽出する語抽出ステップと、前記比較手段が、前記語抽出ステップにより抽出された高難度語ごとに、前記高難度語の歌唱の基準時間と、前記高難度語の表示時間とを比較する比較ステップと、前記難易度決定手段が、前記比較ステップによる比較結果に基づいて、前記歌詞の難易度を決定する難易度決定ステップと、前記メタ情報生成手段が、前記展開ステップにより前記一時記憶手段に前記複数の材料情報が１回展開されている状態における前記複数の材料情報のうち少なくとも１つの材料情報に基づいて、前記カラオケ動画の検索に用いられるメタ情報であって、前記難易度決定ステップにより決定された前記難易度を含むメタ情報を生成するメタ情報生成ステップと、を含むことを特徴とする。 The invention according to claim 2 comprises temporary storage means, acquisition means, expansion means, karaoke video generation means, word extraction means, comparison means, difficulty determination means, and meta information generation means. An information processing method executed by a computer of an information processing device, wherein the acquisition means is a plurality of material information used as materials for a karaoke video, and includes video information, sound information indicating a performance sound of karaoke music, and the Acquiring step of acquiring a plurality of material information including at least lyrics information including display time from the start to the end of display of the word, for each word included in the lyrics, the lyrics information indicating the lyrics of the karaoke music The unfolding step in which the unfolding unit unfolds the plurality of material information acquired in the obtaining step in the temporary storage unit; and the karaoke video generation unit includes the unfolding step. A karaoke video generation step for generating the karaoke video based on the plurality of material information in a state where the plurality of material information is developed once in the temporary storage means by the step, and the word extraction means Among a plurality of high difficulty words stored in the first storage means for storing a high difficulty word predetermined as a word having a high difficulty and a reference time for singing the high difficulty word in association with each high difficulty word A word extraction step for extracting a high difficulty word included in the lyrics information developed in the temporary storage means; and the comparison means, for each high difficulty word extracted by the word extraction step, for the high difficulty word A comparison step for comparing the reference time of singing with the display time of the high difficulty word, and the difficulty level determination means determine the difficulty level of the lyrics based on the comparison result of the comparison step And at least one material information among the plurality of material information in a state in which the plurality of material information is expanded once in the temporary storage unit by the expanding step. A meta information generating step for generating meta information used for searching for the karaoke video, the meta information including the difficulty determined by the difficulty determining step.

請求項３に記載の発明は、一時記憶手段と、カラオケ動画の材料となる複数の材料情報であって、映像情報、カラオケ楽曲の演奏音を示す音情報、及び前記カラオケ楽曲の歌詞を示す歌詞情報であって、前記カラオケ楽曲の演奏が開始されてから前記歌詞の表示が開始されるまでの経過時間を含む歌詞情報を少なくとも含む複数の材料情報を取得する取得手段と、前記取得手段により取得された前記複数の材料情報を前記一時記憶手段に１回展開する展開手段と、前記展開手段により前記一時記憶手段に前記複数の材料情報が１回展開されている状態における前記複数の材料情報に基づいて、前記カラオケ動画を生成するカラオケ動画生成手段と、前記展開手段により前記一時記憶手段に前記複数の材料情報が１回展開されている状態における前記複数の材料情報のうち少なくとも１つの材料情報に基づいて、前記カラオケ動画の検索に用いられるメタ情報であって、前記歌詞情報に含まれる前記経過時間を、歌唱の開始時刻として含むメタ情報を生成するメタ情報生成手段と、を備えることを特徴とする。
請求項４に記載の発明は、一時記憶手段と、カラオケ動画の材料となる複数の材料情報であって、映像情報、カラオケ楽曲の演奏音を示す音情報、及び前記カラオケ楽曲の歌詞を示す歌詞情報であって、前記歌詞に含まれる語ごとに、前記語の表示の開始から終了までの表示時間を含む歌詞情報を少なくとも含む複数の材料情報を取得する取得手段と、前記取得手段により取得された前記複数の材料情報を前記一時記憶手段に１回展開する展開手段と、前記展開手段により前記一時記憶手段に前記複数の材料情報が１回展開されている状態における前記複数の材料情報に基づいて、前記カラオケ動画を生成するカラオケ動画生成手段と、歌唱の難度が高い語として予め定められた高難度語と、前記高難度語の歌唱の基準時間とを高難度語ごとに対応付けて記憶する第１記憶手段に記憶された複数の高難度語のうち、前記一時記憶手段に展開された前記歌詞情報に含まれる高難度語を抽出する語抽出手段と、前記語抽出手段により抽出された高難度語ごとに、前記高難度語の歌唱の基準時間と、前記高難度語の表示時間とを比較する比較手段と、前記比較手段による比較結果に基づいて、前記歌詞の難易度を決定する難易度決定手段と、前記展開手段により前記一時記憶手段に前記複数の材料情報が１回展開されている状態における前記複数の材料情報のうち少なくとも１つの材料情報に基づいて、前記カラオケ動画の検索に用いられるメタ情報であって、前記難易度決定手段により決定された前記難易度を含むメタ情報を生成するメタ情報生成手段と、を備えることを特徴とする。 The invention according to claim 3 is a temporary storage means and a plurality of material information used as a material of the karaoke video, and includes video information, sound information indicating the performance sound of the karaoke music, and lyrics indicating the lyrics of the karaoke music Acquisition means for acquiring a plurality of material information including at least lyrics information including elapsed time from the start of the performance of the karaoke music until the display of the lyrics is started, and acquired by the acquisition means A plurality of material information in a state where the plurality of material information is expanded once in the temporary storage means by the expansion means, and the plurality of material information is expanded once in the temporary storage means by the expansion means. And the karaoke video generation means for generating the karaoke video, and the expansion means in a state where the plurality of material information is expanded once in the temporary storage means. Meta information used for searching the karaoke video based on at least one material information of the plurality of material information, the meta information including the elapsed time included in the lyrics information as a singing start time. Meta information generating means for generating.
The invention according to claim 4 is a temporary storage means and a plurality of material information used as materials for a karaoke video, and includes video information, sound information indicating performance sound of karaoke music, and lyrics indicating lyrics of the karaoke music Acquisition means for acquiring a plurality of material information including at least lyrics information including display time from the start to the end of display of the word for each word included in the lyrics; and acquired by the acquisition means Further, based on the plurality of material information in a state in which the plurality of material information is expanded once in the temporary storage means by the expansion means, the expansion means that expands the plurality of material information once in the temporary storage means The karaoke video generation means for generating the karaoke video, a high difficulty word predetermined as a word having high singing difficulty, and a reference time for singing the high difficulty word for each high difficulty word Of the plurality of high difficulty words stored in the first storage means for storing in association, word extraction means for extracting high difficulty words included in the lyrics information developed in the temporary storage means; and the word extraction means Comparing means for comparing the reference time of singing the high difficulty word with the display time of the high difficulty word for each high difficulty word extracted by the above, difficulty of the lyrics based on the comparison result by the comparison means Based on at least one material information among the plurality of material information in a state in which the plurality of material information is expanded once in the temporary storage means by the expansion means, the difficulty level determination means for determining the degree, Meta-information used for searching for a karaoke video, and meta-information generating means for generating meta-information including the difficulty determined by the difficulty determining means.

請求項５に記載の発明は、一時記憶手段を備える情報処理装置のコンピュータに、カラオケ動画の材料となる複数の材料情報であって、映像情報、カラオケ楽曲の演奏音を示す音情報、及び前記カラオケ楽曲の歌詞を示す歌詞情報であって、前記カラオケ楽曲の演奏が開始されてから前記歌詞の表示が開始されるまでの経過時間を含む歌詞情報を少なくとも含む複数の材料情報を取得する取得ステップと、前記取得ステップにより取得された前記複数の材料情報を前記一時記憶手段に１回展開する展開ステップと、前記展開ステップにより前記一時記憶手段に前記複数の材料情報が１回展開されている状態における前記複数の材料情報に基づいて、前記カラオケ動画を生成するカラオケ動画生成ステップと、前記展開ステップにより前記一時記憶手段に前記複数の材料情報が１回展開されている状態における前記複数の材料情報のうち少なくとも１つの材料情報に基づいて、前記カラオケ動画の検索に用いられるメタ情報であって、前記歌詞情報に含まれる前記経過時間を、歌唱の開始時刻として含むメタ情報を生成するメタ情報生成ステップと、を実行させることを特徴とする。
請求項６に記載の発明は、一時記憶手段を備える情報処理装置のコンピュータに、カラオケ動画の材料となる複数の材料情報であって、映像情報、カラオケ楽曲の演奏音を示す音情報、及び前記カラオケ楽曲の歌詞を示す歌詞情報であって、前記歌詞に含まれる語ごとに、前記語の表示の開始から終了までの表示時間を含む歌詞情報を少なくとも含む複数の材料情報を取得する取得ステップと、前記取得ステップにより取得された前記複数の材料情報を前記一時記憶手段に１回展開する展開ステップと、前記展開ステップにより前記一時記憶手段に前記複数の材料情報が１回展開されている状態における前記複数の材料情報に基づいて、前記カラオケ動画を生成するカラオケ動画生成ステップと、歌唱の難度が高い語として予め定められた高難度語と、前記高難度語の歌唱の基準時間とを高難度語ごとに対応付けて記憶する第１記憶手段に記憶された複数の高難度語のうち、前記一時記憶手段に展開された前記歌詞情報に含まれる高難度語を抽出する語抽出ステップと、前記語抽出ステップにより抽出された高難度語ごとに、前記高難度語の歌唱の基準時間と、前記高難度語の表示時間とを比較する比較ステップと、前記比較ステップによる比較結果に基づいて、前記歌詞の難易度を決定する難易度決定ステップと、前記展開ステップにより前記一時記憶手段に前記複数の材料情報が１回展開されている状態における前記複数の材料情報のうち少なくとも１つの材料情報に基づいて、前記カラオケ動画の検索に用いられるメタ情報であって、前記難易度決定ステップにより決定された前記難易度を含むメタ情報を生成するメタ情報生成ステップと、を実行させることを特徴とする。 According to a fifth aspect of the present invention, in a computer of an information processing apparatus including temporary storage means, a plurality of material information used as a material for a karaoke video, video information, sound information indicating a performance sound of karaoke music, and Acquiring a plurality of material information including at least lyrics information including lyrics time information indicating lyrics of a karaoke song and an elapsed time from the start of the performance of the karaoke song until the display of the lyrics is started. And a developing step of expanding the plurality of material information acquired in the acquiring step once in the temporary storage means, and a state in which the plurality of material information is expanded once in the temporary storage means by the expanding step The karaoke video generation step for generating the karaoke video based on the plurality of material information in Meta information used for searching for the karaoke video based on at least one material information among the plurality of material information in a state where the plurality of material information is developed once in the storage means, the lyrics information A meta information generating step of generating meta information including the elapsed time included in the singing start time as a singing start time.
According to a sixth aspect of the present invention, in a computer of an information processing apparatus including temporary storage means, a plurality of material information used as materials for a karaoke video, video information, sound information indicating a performance sound of karaoke music, and Acquiring step of acquiring a plurality of material information including at least lyrics information including display time from the start to the end of display of the word, for each word included in the lyrics, the lyrics information indicating the lyrics of the karaoke music In the state where the plurality of material information acquired by the acquisition step is expanded once in the temporary storage means, and the plurality of material information is expanded once in the temporary storage means by the expansion step A karaoke video generation step for generating the karaoke video based on the plurality of material information, and a predetermined high word as a singing difficulty level. Of the plurality of high difficulty words stored in the first storage means that stores the degree words and the reference time of singing the high difficulty words in association with each high difficulty word, the unfolded in the temporary storage means A word extraction step for extracting high difficulty words included in the lyrics information, a reference time for singing the high difficulty words, and a display time of the high difficulty words for each high difficulty word extracted by the word extraction step. The plurality of material information is expanded once in the temporary storage means by the comparing step for comparing, the difficulty determining step for determining the difficulty level of the lyrics based on the comparison result by the comparing step, and the expanding step. Meta information used for searching the karaoke video based on at least one material information among the plurality of material information in a state of being determined by the difficulty determining step Characterized in that to execute, and meta information generation step of generating meta information including the difficulty.

請求項１乃至６に記載の発明によれば、一時記憶手段に展開された材料情報に基づいて、カラオケ動画が生成され、且つメタ情報が生成される。従って、このカラオケ動画を検索するためのメタ情報として、カラオケ動画自体から生成した場合と比較してより正確な情報を容易に生成することができる。 According to the first to sixth aspects of the invention, a karaoke video is generated and meta information is generated based on the material information developed in the temporary storage means. Therefore, more accurate information can be easily generated as meta information for searching for this karaoke video as compared to the case where the karaoke video is generated from the karaoke video itself.

更に請求項１、３又は５に記載の発明によれば、正確な歌唱開始時刻を含むメタ情報を容易に生成することができる。 Furthermore , according to invention of Claim 1 , 3 or 5 , meta-information including exact song start time can be produced | generated easily.

更に請求項２、４又は６に記載の発明によれば、正確な歌詞に基づいて、歌唱の難易度を含むメタ情報を容易に生成することができる。 Furthermore , according to invention of Claim 2 , 4 or 6 , meta information including the difficulty of a song can be produced | generated easily based on exact lyrics.

（Ａ）は、一実施形態の通信システムＳの概要構成例を示す図である。（Ｂ）は、一実施形態のカラオケ動画生成サーバ１の概要構成例を示す図である。(A) is a figure showing an example of outline composition of communications system S of one embodiment. (B) is a figure showing an example of outline composition of karaoke animation generation server 1 of one embodiment. （Ａ）は、テロップデータの構成例を示す図である。（Ｂ）は、難歌詞リストの構成例を示す図である。（Ｃ）は、キーワードリストの構成例を示す図である。（Ｄ）は、種類リストの構成例を示す図である。(A) is a figure which shows the structural example of telop data. (B) is a figure which shows the structural example of a difficult lyrics list. (C) is a figure which shows the structural example of a keyword list. (D) is a figure which shows the structural example of a kind list. （Ａ）は、歌唱開始時刻を含むメタ情報の構成例を示す図である。（Ｂ）は、歌唱難易度を含むメタ情報の構成例を示す図である。（Ｃ）は、キーワードの出現度の例を示す図である。（Ｄ）は、キーワードを含むメタ情報の構成例を示す図である。（Ｅ）は、楽曲種別を含むメタ情報の構成例を示す図である。(A) is a figure which shows the structural example of meta information containing singing start time. (B) is a figure which shows the structural example of meta information containing a singing difficulty level. (C) is a figure which shows the example of the appearance degree of a keyword. (D) is a figure which shows the structural example of the meta information containing a keyword. (E) is a figure which shows the structural example of the meta information containing a music classification. （Ａ）は、カラオケ動画生成サーバ１におけるサーバ処理の一例を示すフローチャートである。（Ｂ）は、カラオケ動画生成サーバ１におけるメタ情報生成処理の一例を示すフローチャートである。(A) is a flowchart showing an example of server processing in the karaoke video generation server 1. (B) is a flowchart showing an example of meta information generation processing in the karaoke video generation server 1. （Ａ）は、カラオケ動画生成サーバ１における歌唱開始時刻メタ情報生成処理の一例を示すフローチャートである（Ｂ）は、カラオケ動画生成サーバ１における歌唱難易度メタ情報生成処理の一例を示すフローチャートである。(A) is a flowchart which shows an example of the singing start time meta-information production | generation process in the karaoke video production | generation server 1, (B) is a flowchart which shows an example of the singing difficulty meta-information production | generation process in the karaoke animation production | generation server 1. FIG. . （Ａ）は、カラオケ動画生成サーバ１におけるキーワードメタ情報生成処理の一例を示すフローチャートである。（Ｂ）は、カラオケ動画生成サーバ１における楽曲種別メタ情報生成処理の一例を示すフローチャートである。(A) is a flowchart showing an example of keyword meta information generation processing in the karaoke video generation server 1. (B) is a flowchart showing an example of music type meta information generation processing in the karaoke video generation server 1.

以下、本発明の実施形態を図面に基づいて説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

［１．通信システムの構成］
図１（Ａ）は、本実施形態の通信システムＳの概要構成例を示す図である。図１（Ａ）に示すように、通信システムＳは、カラオケ動画生成サーバ１、データ送信端末２、ネットワークストレージ３、カラオケ動画配信サーバ４、及び複数の利用者端末５を含んで構成される。カラオケ動画生成サーバ１は、本発明の情報処理装置の一例である。カラオケ動画生成サーバ１、データ送信端末２、カラオケ動画配信サーバ４、及び複数の利用者端末５は、それぞれネットワーク１０に接続される。ネットワーク１０は、例えば、インターネット等により構成される。カラオケ動画生成サーバ１及びカラオケ動画配信サーバ４と、ネットワークストレージ３とは、例えばＬＡＮ（Local Area Network）等のネットワークにより接続される。 [1. Configuration of communication system]
FIG. 1A is a diagram illustrating a schematic configuration example of the communication system S of the present embodiment. As shown in FIG. 1A, the communication system S includes a karaoke video generation server 1, a data transmission terminal 2, a network storage 3, a karaoke video distribution server 4, and a plurality of user terminals 5. The karaoke video generation server 1 is an example of an information processing apparatus of the present invention. The karaoke video generation server 1, the data transmission terminal 2, the karaoke video distribution server 4, and the plurality of user terminals 5 are each connected to a network 10. The network 10 is configured by, for example, the Internet. The karaoke video generation server 1 and karaoke video distribution server 4 are connected to the network storage 3 via a network such as a LAN (Local Area Network).

カラオケ動画生成サーバ１は、データ送信端末２から複数の材料データを受信する。そして、カラオケ動画生成サーバ１は、複数の材料データに基づいて、カラオケ動画データを生成する。カラオケ動画データは、カラオケ楽曲の歌詞のテロップが合成された映像のデータと、カラオケ楽曲の演奏音のデータとを含む。カラオケ動画データの形式は、例えばＭＰ４（ISO/IEC 14496-14:2003）等であってもよい。 The karaoke video generation server 1 receives a plurality of material data from the data transmission terminal 2. And the karaoke moving image production | generation server 1 produces | generates karaoke moving image data based on several material data. The karaoke video data includes video data in which telops of lyrics of karaoke music are synthesized and performance sound data of karaoke music. The format of the karaoke video data may be MP4 (ISO / IEC 14496-14: 2003), for example.

材料データは、カラオケ動画データを生成するための材料となるデータである。複数の材料データは、少なくとも映像データ、音声データ、及びテロップデータを含む。映像データは、カラオケ動画データに含まれる映像を示すデータである。映像データの形式は、例えばＨ．２６４等であってもよい。音声データは、カラオケ楽曲の演奏音を示すデータである。音声データの形式は、例えばＭＩＤＩ（Musical Instrument Digital Interface）、ＡＡＣ（Advanced Audio Coding）等であってもよい。テロップデータは、カラオケ楽曲の歌詞と歌詞を表示するタイミングとを示すデータである。図２（Ａ）は、テロップデータの構成例を示す図である。図２（Ａ）に示すように、テロップデータは、歌詞の文字情報を含む。また、テロップデータは、歌詞を構成する文字ごとの表示開始時刻及び表示終了時刻を含む。表示開始時刻は、カラオケ楽曲の演奏が開始されてから、対応する文字を表示させるまでに経過する時間を示す。表示終了時刻は、カラオケ楽曲の演奏が開始されてから、対応する文字の表示を終了させるまでに経過する時間を示す。図２（Ａ）の例では、「生」の表示開始時刻及び終了時刻がそれぞれ５００ミリ秒及び１０００ミリ秒に設定されている。また、「麦」の表示開始時刻及び終了時刻がそれぞれ１０００ミリ秒及び１２００ミリ秒に設定されている。 The material data is data serving as a material for generating karaoke video data. The plurality of material data includes at least video data, audio data, and telop data. The video data is data indicating video included in the karaoke video data. The format of the video data is, for example, H.264. H.264 or the like may be used. The audio data is data indicating the performance sound of karaoke music. The format of the audio data may be, for example, MIDI (Musical Instrument Digital Interface), AAC (Advanced Audio Coding), or the like. The telop data is data indicating the lyrics of karaoke music and the timing for displaying the lyrics. FIG. 2A is a diagram illustrating a configuration example of telop data. As shown in FIG. 2A, the telop data includes text information of lyrics. The telop data includes a display start time and a display end time for each character constituting the lyrics. The display start time indicates the time that elapses from when the performance of the karaoke music is started until the corresponding character is displayed. The display end time indicates the time that elapses from when the performance of the karaoke music is started until the display of the corresponding character is ended. In the example of FIG. 2A, the display start time and end time of “raw” are set to 500 milliseconds and 1000 milliseconds, respectively. The display start time and end time of “wheat” are set to 1000 milliseconds and 1200 milliseconds, respectively.

また、カラオケ動画生成サーバ１は、複数の材料データのうち少なくとも１つに基づいて、カラオケ動画データの検索に用いられるメタ情報を生成する。カラオケ動画生成サーバ１は、生成したカラオケ動画データ及びメタ情報をネットワークストレージ３に記憶させる。 Moreover, the karaoke moving image production | generation server 1 produces | generates the meta information used for the search of karaoke moving image data based on at least one among several material data. The karaoke video generation server 1 stores the generated karaoke video data and meta information in the network storage 3.

データ送信端末２は、例えばカラオケ動画データの配信サービスの運営者等により利用される。運営者は、例えば材料データをデータ送信端末２に入力し、又はデータ送信端末２を操作して材料データを作成する。データ送信端末２は、入力又は作成された材料データをカラオケ動画生成サーバ１へ送信する。 The data transmission terminal 2 is used, for example, by an operator of a karaoke video data distribution service. For example, the operator inputs material data to the data transmission terminal 2 or operates the data transmission terminal 2 to create material data. The data transmission terminal 2 transmits the input or created material data to the karaoke video generation server 1.

ネットワークストレージ３は、複数のカラオケ動画データを記憶する。ネットワークストレージ３は、カラオケ楽曲を識別可能な識別情報に関連付けて、カラオケ動画データを記憶する。識別情報は、例えば楽曲番号等であってもよい。また、ネットワークストレージ３は、カラオケ動画データごとに、１又は複数のメタ情報を記憶する。ネットワークストレージ３は、例えばハードディスクドライブ等により構成されている。 The network storage 3 stores a plurality of karaoke video data. The network storage 3 stores karaoke video data in association with identification information that can identify karaoke music. The identification information may be a music number, for example. The network storage 3 stores one or a plurality of meta information for each karaoke video data. The network storage 3 is composed of, for example, a hard disk drive.

カラオケ動画配信サーバ４は、ネットワークストレージ３から、利用者端末５から送信されてきた検索条件に合致するメタ情報を検索する。カラオケ動画配信サーバ４は、検索されたメタ情報に含まれる楽曲番号に関連付けられたカラオケ動画データをネットワークストレージ３から特定する。これにより、カラオケ動画配信サーバ４は、カラオケ動画データを検索する。そして、カラオケ動画配信サーバ４は、検索されたカラオケ動画データを利用者端末５へストリーミング配信する。 The karaoke video distribution server 4 searches the network storage 3 for meta information that matches the search condition transmitted from the user terminal 5. The karaoke video distribution server 4 identifies karaoke video data associated with the music number included in the searched meta information from the network storage 3. Thereby, the karaoke video delivery server 4 searches for karaoke video data. The karaoke video distribution server 4 distributes the searched karaoke video data to the user terminal 5 in a streaming manner.

利用者端末５は、カラオケの配信サービスの利用者により利用される。利用者は、利用者端末５を操作して、カラオケ動画データを検索するための検索条件を指定する。利用者端末５は、指定された検索条件をカラオケ動画配信サーバ４へ送信する。また、利用者端末５は、カラオケ動画配信サーバ４から配信されてきたカラオケ動画データを再生する。利用者端末５の種類としては、例えばパーソナルコンピュータ、テレビ、ＳＴＢ、携帯電話機、スマートフォン、タブレット型コンピュータ、カラオケ装置等がある。 The user terminal 5 is used by a user of a karaoke distribution service. The user operates the user terminal 5 to specify a search condition for searching for karaoke video data. The user terminal 5 transmits the designated search condition to the karaoke video distribution server 4. Further, the user terminal 5 reproduces the karaoke video data distributed from the karaoke video distribution server 4. Examples of the user terminal 5 include a personal computer, a television, an STB, a mobile phone, a smartphone, a tablet computer, and a karaoke device.

［２．カラオケ動画生成サーバ１の構成］
次に、図１（Ｂ）、図２（Ｂ）乃至図２（Ｄ）を参照して、カラオケ動画生成サーバ１の構成について説明する。図１（Ｂ）は、本実施形態のカラオケ動画生成サーバ１の概要構成例を示す図である。図１（Ｂ）に示すように、カラオケ動画生成サーバ１は、ＣＰＵ（Center Processing Unit）１１、ＲＯＭ（Read Only Memory）１２、ＲＡＭ（Random Access Memory）１３、記憶部１４及び通信部１５等を備えて構成される。これらの構成要素は、バス１６に接続されている。通信部１５は、ネットワーク１０に接続される。記憶部１４は、例えばハードディスクドライブにより構成される。記憶部１４には、ＯＳ、及びサーバプログラム等が記憶されている。サーバプログラムは、コンピュータとしてのＣＰＵに、カラオケ動画データの生成及びメタ情報の生成等を実行させるプログラムである。サーバプログラム等は、例えばネットワーク１０等を介して他のサーバ等からダウンロードされてもよい。或いは、サーバプログラム等は、例えば光ディスク、磁気テープ、メモリカード等の記録媒体に記録されてドライブ装置を介して記憶部１４に読み込まれてもよい。また、記憶部１４には、メタ情報を生成するための情報等が記憶されている。具体的に、記憶部１４には、難歌詞リスト、キーワードリスト、及び種類リストが記憶されている。これらのリストは、例えばカラオケ動画データの配信サービスの運営者等により作成されてもよい。 [2. Configuration of Karaoke Movie Generation Server 1]
Next, the configuration of the karaoke video generation server 1 will be described with reference to FIGS. 1 (B) and 2 (B) to 2 (D). FIG. 1B is a diagram illustrating a schematic configuration example of the karaoke video generation server 1 of the present embodiment. As shown in FIG. 1B, the karaoke video generation server 1 includes a CPU (Center Processing Unit) 11, a ROM (Read Only Memory) 12, a RAM (Random Access Memory) 13, a storage unit 14, a communication unit 15, and the like. It is prepared for. These components are connected to the bus 16. The communication unit 15 is connected to the network 10. The storage unit 14 is configured by, for example, a hard disk drive. The storage unit 14 stores an OS, a server program, and the like. The server program is a program that causes a CPU as a computer to generate karaoke video data, meta information, and the like. The server program or the like may be downloaded from another server or the like via the network 10 or the like, for example. Alternatively, the server program or the like may be recorded on a recording medium such as an optical disc, a magnetic tape, or a memory card and read into the storage unit 14 via a drive device. The storage unit 14 stores information for generating meta information. Specifically, the storage unit 14 stores a difficult lyrics list, a keyword list, and a type list. These lists may be created, for example, by an operator of a distribution service for karaoke video data.

図２（Ｂ）は、難歌詞リストの構成例を示す図である。難歌詞リストは、カラオケ楽曲の難歌詞のリストである。難歌詞は、歌唱の難度が高い語として予め定められた語である。具体的に、難歌詞リストには、難歌詞ごとに、難歌詞と基準歌唱時間とが対応付けて登録されている。基準歌唱時間は、対応する難歌詞の歌唱が難しく感じる時間の基準値である。図２（Ｂ）の例では、難歌詞「生麦」と基準歌唱時間１０００ミリ秒とが対応付けて登録されている。この場合、カラオケ楽曲において「生麦」の歌唱時間が１０００ミリ秒未満である場合、「生麦」の歌唱が難しいと判定されてもよい。 FIG. 2B is a diagram illustrating a configuration example of the difficult lyrics list. The difficult lyrics list is a list of difficult lyrics of karaoke music. Difficult lyrics are words that are predetermined as words that are difficult to sing. Specifically, in the difficult lyrics list, the difficult lyrics and the standard singing time are registered in association with each difficult lyrics. The reference singing time is a reference value for the time when it is difficult to sing the corresponding difficult lyrics. In the example of FIG. 2B, the difficult lyrics “Namamugi” and the standard singing time of 1000 milliseconds are registered in association with each other. In this case, when the singing time of “Namamugi” is less than 1000 milliseconds in the karaoke music piece, it may be determined that the singing of “Namamugi” is difficult.

図２（Ｃ）は、キーワードリストの構成例を示す図である。キーワードリストは、キーワードとそのキーワードに関連する単語と示すリストである。具体的に、キーワードリストには、キーワードごとに、キーワードと、そのキーワードに関連する複数の単語とが対応付けて登録されている。図２（Ｃ）の例では、キーワード「春」に対応付けて、単語「桜」、「梅」、「卒業」、「巣立」、「入学」が登録されている。 FIG. 2C is a diagram illustrating a configuration example of the keyword list. The keyword list is a list indicating keywords and words related to the keywords. Specifically, in the keyword list, for each keyword, a keyword and a plurality of words related to the keyword are registered in association with each other. In the example of FIG. 2C, the words “cherry blossom”, “ume”, “graduation”, “nest”, and “enrollment” are registered in association with the keyword “spring”.

図２（Ｄ）は、種類リストの構成例を示す図である。種類リストは、楽曲の種類のリストである。具体的に、種類リストには、楽曲の種類ごとに、楽曲種別と、１又は複数の音色情報とが対応付けて登録されている。楽曲種別は、楽曲の種類を識別可能な識別情報である。音色情報は、楽曲種別が示す種類の楽曲の演奏に用いられる音源の音色を示す情報である。音色情報の付与体系は、例えばジェネラルＭＩＤＩで定義されている音色の番号の付与体系と同一であってもよい。図２（Ｄ）に示す種類リストには、楽曲種別「民謡」に対応付けて音色情報「三味線」が登録されている。 FIG. 2D is a diagram illustrating a configuration example of the type list. The type list is a list of music types. Specifically, in the type list, for each type of music, a music type and one or a plurality of timbre information are registered in association with each other. The music type is identification information that can identify the type of music. The timbre information is information indicating the timbre of the sound source used for the performance of the type of music indicated by the music type. The timbre information assigning system may be the same as the timbre number assigning system defined by General MIDI, for example. In the type list shown in FIG. 2D, tone color information “shamisen” is registered in association with the music type “folk song”.

［３．メタ情報の生成］
例えば、生成されたカラオケ動画データに基づいてメタ情報を生成しようとする場合、カラオケ動画データを解析する必要がある。しかしながら、カラオケ動画データからは、元になった材料データが有する詳細な情報が失われている。そこで、カラオケ動画データの生成に用いられた複数の材料データの少なくとも１つに基づいて運営者がメタ情報を作成することを考える。ところが、完成したカラオケ動画データについて、後でメタ情報を作成しようとした場合、次に述べる問題が生じる。すなわち、メタ情報を作成するまでに、カラオケ動画データの生成に用いられた材料データを、データストレージ３、記憶部１４等の記憶手段に保存させておくか、或いはメモリカード、磁気テープ、光ディスク等の記録媒体に保存しておく必要がある。従って、材料データを保存するためにコンピュータリソースを浪費する。また、カラオケ動画データの生成時と、メタ情報の生成時とで、それぞれ材料データを記憶手段又は記録媒体から読み出す必要があるので、二度手間となる。 [3. Meta information generation]
For example, when generating meta information based on generated karaoke video data, it is necessary to analyze karaoke video data. However, the detailed information of the original material data is lost from the karaoke video data. Therefore, consider that the operator creates meta information based on at least one of a plurality of material data used for generating karaoke video data. However, if meta information is to be created later for completed karaoke video data, the following problem arises. That is, the material data used for generating the karaoke video data is stored in the storage means such as the data storage 3 and the storage unit 14 before the meta information is created, or the memory card, magnetic tape, optical disk, etc. It is necessary to save it on a recording medium. Therefore, computer resources are wasted to store material data. Moreover, since it is necessary to read material data from a memory | storage means or a recording medium, respectively at the time of the production | generation of karaoke moving image data, and the production | generation of meta information, it becomes a trouble twice.

そこで、カラオケ動画生成サーバ１のＣＰＵ１１は、データ送信端末２から取得した複数の材料データをＲＡＭ１３に展開する。ＣＰＵ１１は、例えば１回のみ、複数の材料データをＲＡＭ１３に展開してもよい。ＣＰＵ１１は、ＲＡＭ１３に複数の材料データが１回展開された状態における複数の材料データに基づいて、カラオケ動画データを生成し、且つＲＡＭ１３に複数の材料データが１回展開された状態における複数の材料データの少なくとも１つに基づいて、メタ情報を生成する。すなわち、ＣＰＵ１１は、カラオケ動画データの生成に用いられた正にその材料データを用いてメタ情報を生成し、又はメタ情報の生成に用いられた正にその材料データを用いてカラオケ動画データを生成する。ＲＡＭ１３は、データを一時的に記憶する作業用の記憶手段である。ＲＡＭ１３に記憶されたデータは、いつかは消去される。ＣＰＵ１１は、メタ情報がＲＡＭ１３に記憶されている間に、カラオケ動画データの生成とメタ情報の生成とを行うことで、材料データを保存するためのリソースの浪費を防止することができるとともに、材料データの読み出しの手間を減らすことができる。更に、カラオケ動画データの生成に用いられる材料データに基づいて正確なメタ情報を生成することができる。ＣＰＵ１１は、例えばカラオケ動画データの生成とメタ情報の生成とを同時に又は連続して実行してもよいし、別々のタイミングで生成してもよい。メタ情報がＲＡＭ１３から消去されるまでの間に、カラオケ動画データとメタ情報が生成されればよい。また、ＣＰＵ１１は、カラオケ動画データを先に生成してもよいし、メタ情報を先に生成してもよい。ＲＡＭ１３は、本発明の一時記憶手段の一例である。一時記憶手段は、ＲＡＭ１３に限定されるものではない。例えば、保存を目的とせずに材料データが一時的に記憶されるのであれば、例えば不揮発性のメモリに対しても本発明の一時記憶手段を適用することができる。 Therefore, the CPU 11 of the karaoke video generation server 1 develops a plurality of material data acquired from the data transmission terminal 2 in the RAM 13. The CPU 11 may expand a plurality of material data in the RAM 13 only once, for example. The CPU 11 generates karaoke video data based on a plurality of material data in a state in which a plurality of material data is expanded once in the RAM 13, and a plurality of materials in a state in which the plurality of material data is expanded in the RAM 13 once. Meta information is generated based on at least one of the data. That is, the CPU 11 generates meta information using exactly the material data used for generating the karaoke moving image data, or generates karaoke moving image data using the material data used for generating the meta information. To do. The RAM 13 is a working storage unit that temporarily stores data. The data stored in the RAM 13 will be erased sometime. While the meta information is stored in the RAM 13, the CPU 11 generates karaoke video data and meta information, thereby preventing waste of resources for storing the material data. The trouble of reading data can be reduced. Furthermore, accurate meta information can be generated based on the material data used for generating the karaoke video data. CPU11 may perform the production | generation of karaoke moving image data and the production | generation of meta information simultaneously or continuously, for example, and may produce | generate at a separate timing. Karaoke video data and meta information may be generated before the meta information is erased from the RAM 13. Further, the CPU 11 may generate karaoke video data first, or may generate meta information first. The RAM 13 is an example of a temporary storage unit of the present invention. The temporary storage means is not limited to the RAM 13. For example, if the material data is temporarily stored without the purpose of storage, the temporary storage means of the present invention can be applied to, for example, a nonvolatile memory.

次に、図３を参照して、メタ情報の生成の具体例について説明する。例えば、ＣＰＵ１１は、歌唱開始時刻を含むメタ情報を生成してもよい。歌唱開始時刻は、カラオケ楽曲の演奏が開始されてから、カラオケ楽曲の最初の歌詞の歌唱が開始されるまでの時間である。具体的に、ＣＰＵ１１は、ＲＡＭ１３にテロップデータから、歌詞の先頭の文字の表示開始時刻を、歌唱開始時刻として取得する。カラオケ動画データの再生時、利用者は、歌詞のテロップの表示のタイミングに合わせて歌唱する。従って、最初の歌詞部分の表示開始時刻を、歌唱開始時刻と考えることができる。図３（Ａ）は、歌唱開始時刻を含むメタ情報の構成例を示す図である。図３（Ａ）に示すメタ情報は、楽曲番号と歌唱開始時刻とを含む。カラオケ動画データの検索時、利用者は、例えば検索条件として歌唱開始時刻の条件を指定することができる。この条件として、例えば１０秒、５秒以上、２０秒以下等の指定が可能であってもよい。カラオケ動画配信サーバ４は、指定された歌唱開始時刻の条件を満たす歌唱開始時刻を含むメタ情報をネットワークストレージ３から検索する。 Next, a specific example of generation of meta information will be described with reference to FIG. For example, the CPU 11 may generate meta information including the singing start time. The singing start time is the time from the start of the performance of the karaoke music to the start of the singing of the first lyrics of the karaoke music. Specifically, the CPU 11 acquires the display start time of the first character of the lyrics from the telop data in the RAM 13 as the singing start time. When reproducing the karaoke video data, the user sings in accordance with the display timing of the lyrics telop. Therefore, the display start time of the first lyrics portion can be considered as the singing start time. FIG. 3A is a diagram illustrating a configuration example of meta information including a singing start time. The meta information shown in FIG. 3A includes a music number and a singing start time. When searching for karaoke video data, the user can specify a singing start time condition as a search condition, for example. As this condition, for example, designation of 10 seconds, 5 seconds or more, 20 seconds or less, etc. may be possible. The karaoke video distribution server 4 searches the network storage 3 for meta information including the singing start time that satisfies the condition of the specified singing start time.

また例えば、ＣＰＵ１１は、歌唱難易度を含むメタ情報を生成してもよい。歌唱難易度は、カラオケ楽曲の歌唱の難しさを示す情報である。歌唱難易度が高いほど、歌唱が難しい。具体的に、ＣＰＵ１１は、難歌詞リストに登録されている難歌詞のうち、テロップデータに含まれる難歌詞を抽出する。ＣＰＵ１１は、抽出された難歌詞の歌唱時間をテロップデータから取得する。例えば、ＣＰＵ１１は、難歌詞の最初の文字の表示開始時刻から、難歌詞の最後の文字の表示終了時刻までの時間を、歌唱時間として取得する。ＣＰＵ１１は、テロップデータから取得した歌唱時間と、難歌詞の基準歌唱時間とを比較する。そして、ＣＰＵ１１は、この比較の結果に基づいて、歌唱難易度を決定する。例えば、ＣＰＵ１１は、取得した歌唱時間が基準歌唱時間よりも短い難歌詞が多いほど、高い歌唱難易度を決定してもよい。例えば、図２（Ａ）に示すテロップデータにおいては、「生」と「麦」が連続している。従って、このテロップデータは、図２（Ｂ）に示す難歌詞「生麦」を含む。「生」の表示開始時刻は５００ミリ秒であり、「麦」の表示終了時刻は１２００ミリ秒である。従って、「生麦」の歌唱時間は７００ミリ秒である。図２（Ｂ）に示す難歌詞「生麦」の基準歌唱時間は１０００ミリ秒であるので、ＣＰＵ１１は、例えば歌唱難易度を１増加させてもよい。図３（Ｂ）は、歌唱難易度を含むメタ情報の構成例を示す図である。図３（Ｂ）に示すメタ情報は、楽曲番号と歌唱難易度とを含む。カラオケ動画データの検索時、利用者は、例えば検索条件として歌唱難易度の条件を指定することができる。この条件として、例えば難易度０、難易度５以上、難易度８以下等の指定が可能であってもよい。カラオケ動画配信サーバ４は、指定された歌唱難易度の条件を満たす歌唱難易度を含むメタ情報をネットワークストレージ３から検索する。 Further, for example, the CPU 11 may generate meta information including the singing difficulty level. The singing difficulty level is information indicating the difficulty of singing karaoke music. The higher the singing difficulty, the more difficult it is to sing. Specifically, the CPU 11 extracts difficult lyrics included in the telop data from the difficult lyrics registered in the difficult lyrics list. CPU11 acquires the singing time of the extracted difficult lyrics from telop data. For example, the CPU 11 acquires the time from the display start time of the first character of difficult lyrics to the display end time of the last character of difficult lyrics as the singing time. The CPU 11 compares the singing time acquired from the telop data with the reference singing time for difficult lyrics. And CPU11 determines a singing difficulty level based on the result of this comparison. For example, CPU11 may determine a high singing difficulty degree, so that there are many difficult lyrics whose acquired singing time is shorter than reference | standard singing time. For example, in the telop data shown in FIG. 2A, “raw” and “wheat” are continuous. Therefore, the telop data includes the difficult lyrics “raw wheat” shown in FIG. The display start time of “raw” is 500 milliseconds, and the display end time of “wheat” is 1200 milliseconds. Therefore, the singing time of “Namamugi” is 700 milliseconds. Since the standard singing time of the difficult lyrics “Namamugi” shown in FIG. 2B is 1000 milliseconds, the CPU 11 may increase the singing difficulty level by 1, for example. FIG. 3B is a diagram illustrating a configuration example of meta information including the singing difficulty level. The meta information shown in FIG. 3B includes a music number and a singing difficulty level. When searching for karaoke video data, the user can specify a singing difficulty level condition as a search condition, for example. As this condition, for example, designation of difficulty 0, difficulty 5 or higher, difficulty 8 or lower, etc. may be possible. The karaoke video distribution server 4 searches the network storage 3 for meta information including the singing difficulty level that satisfies the specified singing difficulty level.

また例えば、ＣＰＵ１１は、キーワードを含むメタ情報を生成してもよい。具体的に、ＣＰＵ１１は、キーワードリストに登録されている単語のうち、テロップデータに含まれる単語に対応付けられたキーワードを、キーワードリストから抽出する。ＣＰＵ１１は、抽出されたキーワードの抽出頻度を出現度として、キーワードリストに登録されているキーワードごとに計数する。図３（Ｃ）は、キーワードの出現度の例を示す図である。図３（Ｃ）の例では、キーワード「春」の出現度が４であり、キーワード「演歌」の出現度が１である。ＣＰＵ１１は、出現度が所定の頻度を超えるキーワードを、カラオケ楽曲のキーワードに決定する。図３（Ｄ）は、キーワードを含むメタ情報の構成例を示す図である。図３（Ｄ）に示すメタ情報は、楽曲番号と１又は複数のキーワードとを含む。例えば所定の頻度が３である場合、図３（Ｃ）に示すキーワードのうち「春」が、カラオケ楽曲のキーワードに決定される。なお、所定の頻度は３以外であってもよい。カラオケ動画データの検索時、利用者は、例えば検索条件としてキーワードを指定することができる。カラオケ動画配信サーバ４は、指定されたキーワードを含むメタ情報をネットワークストレージ３から検索する。 Further, for example, the CPU 11 may generate meta information including a keyword. Specifically, the CPU 11 extracts, from the keyword list, a keyword associated with a word included in the telop data among words registered in the keyword list. CPU11 counts for every keyword registered into the keyword list by making the extraction frequency of the extracted keyword into appearance degree. FIG. 3C is a diagram illustrating an example of the appearance degree of keywords. In the example of FIG. 3C, the appearance degree of the keyword “Spring” is 4, and the appearance degree of the keyword “Enka” is 1. CPU11 determines the keyword whose appearance degree exceeds predetermined frequency as a keyword of karaoke music. FIG. 3D is a diagram illustrating a configuration example of meta information including a keyword. The meta information shown in FIG. 3D includes a music number and one or more keywords. For example, when the predetermined frequency is 3, “Spring” is determined as the keyword of the karaoke song among the keywords shown in FIG. The predetermined frequency may be other than 3. When searching for karaoke video data, the user can specify a keyword as a search condition, for example. The karaoke video distribution server 4 searches the network storage 3 for meta information including the designated keyword.

仮に、図３（Ａ）、図３（Ｂ）、図３（Ｄ）に示すメタ情報を、カラオケ動画データに基づいて生成するとする。この場合、カラオケ動画データに含まれる映像データを解析して、歌詞を抽出する必要がある。そのため、ＣＰＵ１１の処理負荷が増大するとともに、歌詞を正確に抽出することができない場合がある。対照的に、本実施形態においては、歌詞を明確に示すテロップデータに基づいてＣＰＵ１１がメタ情報を生成するので、正確なメタ情報を生成することができる。 Assume that the meta information shown in FIGS. 3A, 3B, and 3D is generated based on karaoke video data. In this case, it is necessary to analyze the video data included in the karaoke video data and extract the lyrics. As a result, the processing load on the CPU 11 increases and the lyrics may not be extracted accurately. In contrast, in the present embodiment, since the CPU 11 generates meta information based on telop data that clearly indicates lyrics, accurate meta information can be generated.

また例えば、ＣＰＵ１１は、楽曲種別を含むメタ情報を生成してもよい。例えば、音声データがＭＩＤＩデータである場合、音声データは、１又は複数のトラックを含む。トラックは音色情報を含む場合がある。この音色情報は、カラオケ楽曲の演奏に用いられる音源の音色を示す。ＣＰＵ１１は、種類リストに登録されている音色情報のうち、音声データに含まれる音色情報に対応付けられた楽曲種別を、種類リストから抽出する。そして、ＣＰＵ１１は、抽出した楽曲種別を含むメタ情報を生成する。図３（Ｅ）は、楽曲種別を含むメタ情報の構成例を示す図である。図３（Ｅ）に示すメタ情報は、楽曲番号と１又は複数の楽曲種別とを含む。カラオケ動画データの検索時、利用者は、例えば検索条件として楽曲種別を指定することができる。カラオケ動画配信サーバ４は、指定された楽曲種別を含むメタ情報をネットワークストレージ３から検索する。なお、１つのカラオケ楽曲について、複数の楽曲種別を抽出する場合がある。この場合、ＣＰＵ１１は、例えば抽出された頻度が最も高い楽曲種別を決定し、決定した楽曲種別のみを含むメタ情報を生成してもよい。抽出された頻度が高い楽曲種別ほど、カラオケ楽曲の種類を示す蓋然性が高いと考えられる。或いは、ＣＰＵ１１は、例えば抽出された頻度が最も高い楽曲種別から順に所定数までの複数の楽曲種別を決定し、決定した複数の楽曲種別を含むメタ情報を生成してもよい。或いは、ＣＰＵ１１は、抽出した楽曲種別ごとに、抽出された頻度を示す情報を生成してもよい。そして、ＣＰＵ１１は、楽曲種別と抽出された頻度を示す情報とを含むメタ情報を生成してもよい。 Further, for example, the CPU 11 may generate meta information including a music type. For example, when the audio data is MIDI data, the audio data includes one or more tracks. A track may contain timbre information. This timbre information indicates the timbre of the sound source used for the performance of karaoke music. The CPU 11 extracts, from the type list, the music type associated with the timbre information included in the audio data from the timbre information registered in the type list. Then, the CPU 11 generates meta information including the extracted music type. FIG. 3E is a diagram illustrating a configuration example of meta information including a music type. The meta information shown in FIG. 3E includes a music number and one or more music types. When searching for karaoke video data, the user can specify a music type as a search condition, for example. The karaoke video distribution server 4 searches the network storage 3 for meta information including the designated music type. In addition, about one karaoke music, a some music type may be extracted. In this case, for example, the CPU 11 may determine the music type extracted with the highest frequency and generate meta information including only the determined music type. It is considered that the more frequently extracted music types have a higher probability of indicating the type of karaoke music. Alternatively, for example, the CPU 11 may determine a plurality of song types up to a predetermined number in order from the extracted song type having the highest frequency, and generate meta information including the determined plurality of song types. Alternatively, the CPU 11 may generate information indicating the extracted frequency for each extracted music type. Then, the CPU 11 may generate meta information including the music type and information indicating the extracted frequency.

仮に、図３（Ｅ）に示すメタ情報を、カラオケ動画データに基づいて生成するとする。この場合、カラオケ動画データに含まれる音声データを解析して、音源や音色を特定する必要がある。しかしながら、この音声データは、例えば、ＡＡＣ等の、音声をサンプリングして得られるデータである。そのため、ＣＰＵ１１の処理負荷が増大するとともに、音色を正確に特定することができない場合がある。対照的に、本実施形態においては、音色情報を含むＭＩＤＩ形式の音声データに基づいてＣＰＵ１１がメタ情報を生成するので、正確なメタ情報を生成することができる。 Suppose that the meta information shown in FIG. 3E is generated based on karaoke video data. In this case, it is necessary to analyze the sound data included in the karaoke video data and specify the sound source and tone color. However, this audio data is data obtained by sampling audio, such as AAC. For this reason, the processing load on the CPU 11 increases and the timbre may not be specified accurately. In contrast, in the present embodiment, since the CPU 11 generates meta information based on MIDI format audio data including timbre information, accurate meta information can be generated.

ＣＰＵ１１は、これまでに説明された４種類のメタ情報の全てを生成してもよいし、一部のメタ情報のみを生成してもよい。また、ＣＰＵ１１は、例えば別の種類のメタ情報を生成してもよい。また、ＣＰＵ１１は、複数の材料データに含まれる映像データに基づいて、メタ情報を生成してもよい。また、ＣＰＵ１１は、例えば２以上の材料データに基づいて、メタ情報を生成してもよい。また、ＣＰＵ１１は、複数種類の情報を含むメタ情報を生成してもよい。例えば、ＣＰＵ１１は、歌唱開始時刻、難易度、キーワード、楽曲種別等を含む１つのメタ情報を生成してもよい。 The CPU 11 may generate all of the four types of meta information described so far, or may generate only a part of the meta information. Further, the CPU 11 may generate another type of meta information, for example. Further, the CPU 11 may generate meta information based on video data included in a plurality of material data. Further, the CPU 11 may generate meta information based on, for example, two or more material data. Further, the CPU 11 may generate meta information including a plurality of types of information. For example, the CPU 11 may generate one piece of meta information including a singing start time, a difficulty level, a keyword, a music type, and the like.

［４．通信システムＳの動作］
次に、図４乃至図６を参照して、本実施形態の通信システムＳの動作について説明する。図４（Ａ）は、カラオケ動画生成サーバ１におけるサーバ処理の一例を示すフローチャートである。例えば、カラオケ動画生成サーバ１においてサーバプログラムが起動すると、ＣＰＵ１１は、サーバ処理を実行する。図４（Ａ）に示すように、ＣＰＵ１１は、サーバプログラムが終了するか否かを判定する（ステップＳ１）。このとき、ＣＰＵ１１は、サーバプログラムが終了しないと判定した場合には（ステップＳ１：ＮＯ）、ステップＳ２に進む。ステップＳ２において、ＣＰＵ１１は、データ送信端末２から複数の材料データを受信したか否かを判定する。このとき、ＣＰＵ１１は、複数の材料データを受信していないと判定した場合には（ステップＳ２：ＮＯ）、ステップＳ１に進む。一方、ＣＰＵ１１は、複数の材料データを受信したと判定した場合には（ステップＳ２：ＹＥＳ）、ステップＳ３に進む。 [4. Operation of communication system S]
Next, the operation of the communication system S of the present embodiment will be described with reference to FIGS. FIG. 4A is a flowchart illustrating an example of server processing in the karaoke video generation server 1. For example, when a server program is activated in the karaoke video generation server 1, the CPU 11 executes server processing. As shown in FIG. 4A, the CPU 11 determines whether or not the server program ends (step S1). At this time, if the CPU 11 determines that the server program does not end (step S1: NO), the CPU 11 proceeds to step S2. In step S <b> 2, the CPU 11 determines whether or not a plurality of material data has been received from the data transmission terminal 2. At this time, if the CPU 11 determines that a plurality of material data has not been received (step S2: NO), the process proceeds to step S1. On the other hand, if the CPU 11 determines that a plurality of material data has been received (step S2: YES), the CPU 11 proceeds to step S3.

ステップＳ３において、ＣＰＵ１１は、受信された複数の材料データをＲＡＭ１３に展開する。また、ＣＰＵ１１は、生成されるカラオケ動画データの楽曲番号を取得する。例えば、ＣＰＵ１１は、データ送信端末２から楽曲番号を取得してもよいし、ＣＰＵ１１が楽曲番号を生成してもよい。次いで、ＣＰＵ１１は、後述するメタ情報生成処理を実行する（ステップＳ４）。 In step S <b> 3, the CPU 11 expands the received plurality of material data in the RAM 13. Moreover, CPU11 acquires the music number of the karaoke moving image data produced | generated. For example, the CPU 11 may acquire a music number from the data transmission terminal 2, or the CPU 11 may generate a music number. Next, the CPU 11 executes meta information generation processing to be described later (step S4).

次いで、ＣＰＵ１１は、ＲＡＭ１３に展開された複数の材料データに基づいて、カラオケ動画データを生成する（ステップＳ５）。例えば、材料データに含まれる音声データがＭＩＤＩ形式のデータである場合、ＣＰＵ１１は、音声データを、例えばＡＡＣ等の形式の音声データに変換する。また、ＣＰＵ１１は、材料データに含まれる映像データに、材料データに含まれるテロップデータが示す歌詞を合成する。例えば、ＣＰＵ１１は、歌詞を構成する文字ごとに、映像の再生開始から表示開始時刻のタイミングでその文字の表示が開始され、映像の再生開始から表示終了時刻のタイミングでその文字の表示が終了するように、合成を行う。ＣＰＵ１１は、音声データと、歌詞が合成された映像データとを多重化して、カラオケ動画データを生成する。次いで、ＣＰＵ１１は、カラオケ動画データと楽曲番号とを対応付けてネットワークストレージ３に記憶させる。そして、ＣＰＵ１１は、ステップＳ１に進む。ステップＳ１において、ＣＰＵ１１は、サーバプログラムが終了すると判定した場合には（ステップＳ１：ＹＥＳ）、サーバ処理を終了させる。 Next, the CPU 11 generates karaoke video data based on the plurality of material data expanded in the RAM 13 (step S5). For example, when the audio data included in the material data is MIDI format data, the CPU 11 converts the audio data into audio data in a format such as AAC, for example. Further, the CPU 11 synthesizes the lyrics indicated by the telop data included in the material data with the video data included in the material data. For example, for each character constituting the lyrics, the CPU 11 starts displaying the character at the timing of the display start time from the start of the video reproduction, and ends the display of the character at the timing of the display end time from the start of the video reproduction. Thus, the synthesis is performed. The CPU 11 multiplexes the audio data and the video data synthesized with the lyrics to generate karaoke video data. Next, the CPU 11 stores the karaoke video data and the music number in the network storage 3 in association with each other. Then, the CPU 11 proceeds to step S1. In step S1, if the CPU 11 determines that the server program ends (step S1: YES), the CPU 11 ends the server process.

図４（Ｂ）は、カラオケ動画生成サーバ１におけるメタ情報生成処理の一例を示すフローチャートである。図４（Ｂ）に示すように、ＣＰＵ１１は、後述する歌唱開始時刻メタ情報生成処理（ステップＳ１１）、歌唱難易度メタ情報生成処理（ステップＳ１２）、キーワードメタ情報生成処理（ステップＳ１３）、及び楽曲種別メタ情報生成処理（ステップＳ１４）を実行して、メタ情報生成処理を終了させる。 FIG. 4B is a flowchart illustrating an example of meta information generation processing in the karaoke video generation server 1. As shown in FIG. 4B, the CPU 11 performs singing start time meta information generation processing (step S11), singing difficulty meta information generation processing (step S12), keyword meta information generation processing (step S13), and The music type meta information generation process (step S14) is executed to end the meta information generation process.

図５（Ａ）は、カラオケ動画生成サーバ１における歌唱開始時刻メタ情報生成処理の一例を示すフローチャートである。図５（Ａ）に示すように、ＣＰＵ１１は、ＲＡＭ１３に展開されたテロップデータから、歌詞の先頭の文字の表示開始時刻を、歌唱開始時刻として取得する（ステップＳ２１）。次いで、ＣＰＵ１１は、楽曲番号と、取得した歌唱開始時刻とを含むメタ情報を生成する。そして、ＣＰＵ１１は、メタ情報をネットワークストレージ３に記憶させて（ステップＳ２２）、歌唱開始時刻メタ情報生成処理を終了させる。 FIG. 5A is a flowchart illustrating an example of the singing start time meta information generation process in the karaoke video generation server 1. As shown in FIG. 5A, the CPU 11 acquires the display start time of the first character of the lyrics as the singing start time from the telop data expanded in the RAM 13 (step S21). Next, the CPU 11 generates meta information including the music number and the acquired singing start time. And CPU11 memorize | stores meta information in the network storage 3 (step S22), and complete | finishes a song start time meta-information production | generation process.

図５（Ｂ）は、カラオケ動画生成サーバ１における歌唱難易度メタ情報生成処理の一例を示すフローチャートである。図５（Ｂ）に示すように、ＣＰＵ１１は、記憶部１４から難歌詞リストを読み出す（ステップＳ３１）。次いで、ＣＰＵ１１は、歌唱難易度を０に設定するとともに、番号ｉを１に設定する（ステップＳ３２）。次いで、ＣＰＵ１１は、ＲＡＭ１３に展開されたテロップデータに、難歌詞（ｉ）が含まれているか否かを判定する（ステップＳ３３）。難歌詞（ｉ）は、難歌詞リストに含まれる難歌詞のうち、ｉ番目の難歌詞である。ＣＰＵ１１は、テロップデータに、難歌詞（ｉ）が含まれていないと判定した場合には（ステップＳ３３：ＮＯ）、ステップＳ３７に進む。一方、ＣＰＵ１１は、テロップデータに、難歌詞（ｉ）が含まれていると判定した場合には（ステップＳ３３：ＹＥＳ）、ステップＳ３４に進む。 FIG. 5B is a flowchart showing an example of the singing difficulty level meta information generation processing in the karaoke video generation server 1. As shown in FIG. 5B, the CPU 11 reads the difficult lyrics list from the storage unit 14 (step S31). Next, the CPU 11 sets the singing difficulty level to 0 and sets the number i to 1 (step S32). Next, the CPU 11 determines whether or not the difficult lyric (i) is included in the telop data expanded in the RAM 13 (step S33). Difficult lyrics (i) is the i-th difficult lyrics among the difficult lyrics included in the difficult lyrics list. When the CPU 11 determines that the difficult text (i) is not included in the telop data (step S33: NO), the CPU 11 proceeds to step S37. On the other hand, if the CPU 11 determines that the difficult text (i) is included in the telop data (step S33: YES), the CPU 11 proceeds to step S34.

ステップＳ３４において、ＣＰＵ１１は、テロップデータから、難歌詞（ｉ）の先頭の文字の表示開始時刻と、難歌詞（ｉ）の最後の文字の表示終了時刻とを取得する。そして、ＣＰＵ１１は、取得した表示終了時刻から表示開始時刻を減算することにより、歌唱時間を取得する。次いで、ＣＰＵ１１は、難歌詞（ｉ）に対応付けて難歌詞リストに登録されている基準歌唱時間が、取得された歌唱時間よりも長いか否かを判定する（ステップＳ３５）。このとき、ＣＰＵ１１は、基準歌唱時間が、取得された歌唱時間よりも長くはないと判定した場合には（ステップＳ３５：ＮＯ）、ステップＳ３７に進む。一方、ＣＰＵ１１は、基準歌唱時間が、取得された歌唱時間よりも長いと判定した場合には（ステップＳ３５：ＹＥＳ）、ステップＳ３６に進む。ステップＳ３６において、ＣＰＵ１１は、歌唱難易度を１増加させる。 In step S34, the CPU 11 obtains the display start time of the first character of the difficult lyrics (i) and the display end time of the last character of the difficult lyrics (i) from the telop data. And CPU11 acquires singing time by subtracting display start time from acquired display end time. Next, the CPU 11 determines whether or not the reference singing time registered in the difficult lyric list in association with the difficult lyric (i) is longer than the acquired singing time (step S35). At this time, if the CPU 11 determines that the reference singing time is not longer than the acquired singing time (step S35: NO), the CPU 11 proceeds to step S37. On the other hand, if the CPU 11 determines that the reference singing time is longer than the acquired singing time (step S35: YES), the CPU 11 proceeds to step S36. In step S36, the CPU 11 increases the singing difficulty level by one.

次いで、ＣＰＵ１１は、番号ｉが、難歌詞リストに登録されている難歌詞の数未満であるか否かを判定する（ステップＳ３７）。このとき、ＣＰＵ１１は、番号ｉが難歌詞の数未満であると判定した場合には（ステップＳ３７：ＹＥＳ）、ステップＳ３８に進む。ステップＳ３８において、ＣＰＵ１１は、番号ｉを１増加させて、ステップＳ３３に進む。一方、ＣＰＵ１１は、番号ｉが難歌詞の数未満ではないと判定した場合には（ステップＳ３７：ＮＯ）、ステップＳ３９に進む。ステップＳ３９において、ＣＰＵ１１は、楽曲番号と歌唱難易度とを含むメタ情報を生成する。そして、ＣＰＵ１１は、メタ情報をネットワークストレージ３に記憶させて、歌唱難易度メタ情報生成処理を終了させる。 Next, the CPU 11 determines whether or not the number i is less than the number of difficult lyrics registered in the difficult lyrics list (step S37). At this time, if the CPU 11 determines that the number i is less than the number of difficult lyrics (step S37: YES), the CPU 11 proceeds to step S38. In step S38, the CPU 11 increments the number i by 1 and proceeds to step S33. On the other hand, if the CPU 11 determines that the number i is not less than the number of difficult lyrics (step S37: NO), the CPU 11 proceeds to step S39. In step S39, the CPU 11 generates meta information including the music number and the singing difficulty level. And CPU11 memorize | stores meta information in the network storage 3, and complete | finishes a singing difficulty meta information production | generation process.

図６（Ａ）は、カラオケ動画生成サーバ１におけるキーワードメタ情報生成処理の一例を示すフローチャートである。図６（Ａ）に示すように、ＣＰＵ１１は、記憶部１４からキーワードリストを読み出す（ステップＳ４１）。次いで、ＣＰＵ１１は、キーワードリストに含まれる各キーワードの出現度を０に設定する。また、ＣＰＵ１１は、番号ｉを１に設定する（ステップＳ４２）。次いで、ＣＰＵ１１は、ＲＡＭ１３に展開されたテロップデータに、単語（ｉ）が含まれているか否かを判定する（ステップＳ４３）。単語（ｉ）は、キーワードリストに含まれる単語のうち、ｉ番目の単語である。ＣＰＵ１１は、テロップデータに、単語（ｉ）が含まれていないと判定した場合には（ステップＳ４３：ＮＯ）、ステップＳ４５に進む。一方、ＣＰＵ１１は、テロップデータに、単語（ｉ）が含まれていると判定した場合には（ステップＳ４３：ＹＥＳ）、ステップＳ４４に進む。ステップＳ４４において、ＣＰＵ１１は、単語（ｉ）に対応付けてキーワードリストに登録されているキーワードの出現度を１増加させて、ステップＳ４５に進む。 FIG. 6A is a flowchart showing an example of keyword meta information generation processing in the karaoke video generation server 1. As shown in FIG. 6A, the CPU 11 reads a keyword list from the storage unit 14 (step S41). Next, the CPU 11 sets the appearance degree of each keyword included in the keyword list to 0. Further, the CPU 11 sets the number i to 1 (step S42). Next, the CPU 11 determines whether or not the word (i) is included in the telop data expanded in the RAM 13 (step S43). The word (i) is the i-th word among the words included in the keyword list. If the CPU 11 determines that the word (i) is not included in the telop data (step S43: NO), the CPU 11 proceeds to step S45. On the other hand, when the CPU 11 determines that the word (i) is included in the telop data (step S43: YES), the CPU 11 proceeds to step S44. In step S44, the CPU 11 increases the appearance degree of the keyword registered in the keyword list in association with the word (i) by 1, and proceeds to step S45.

ステップＳ４５において、ＣＰＵ１１は、番号ｉが、キーワードリストに登録されている単語の数未満であるか否かを判定する。このとき、ＣＰＵ１１は、番号ｉが単語の数未満であると判定した場合には（ステップＳ４５：ＹＥＳ）、ステップＳ４６に進む。ステップＳ４６において、ＣＰＵ１１は、番号ｉを１増加させて、ステップＳ４３に進む。一方、ＣＰＵ１１は、番号ｉが単語の数未満ではないと判定した場合には（ステップＳ４５：ＮＯ）、ステップＳ４７に進む。 In step S45, the CPU 11 determines whether the number i is less than the number of words registered in the keyword list. At this time, if the CPU 11 determines that the number i is less than the number of words (step S45: YES), the CPU 11 proceeds to step S46. In step S46, the CPU 11 increments the number i by 1 and proceeds to step S43. On the other hand, when the CPU 11 determines that the number i is not less than the number of words (step S45: NO), the CPU 11 proceeds to step S47.

ステップＳ４７において、ＣＰＵ１１は、番号ｉを１に設定する。次いで、ＣＰＵ１１は、キーワード（ｉ）の出現度が３よりも大きいか否かを判定する（ステップＳ４８）。キーワード（ｉ）は、キーワードリストに登録されているキーワードのうち、ｉ番目のキーワードである。ＣＰＵ１１は、キーワード（ｉ）の出現度が３よりも大きくはないと判定した場合には（ステップＳ４８：ＮＯ）、ステップＳ５０に進む。一方、ＣＰＵ１１は、キーワード（ｉ）の出現度が３よりも大きいと判定した場合には（ステップＳ４８：ＹＥＳ）、ステップＳ４９に進む。ステップＳ４９において、ＣＰＵ１１は、キーワード（ｉ）を、カラオケ楽曲のキーワードの１つとして、ＲＡＭ１３に保存する。次いで、ＣＰＵ１１は、ステップＳ５０に進む。 In step S47, the CPU 11 sets the number i to 1. Next, the CPU 11 determines whether or not the appearance degree of the keyword (i) is greater than 3 (step S48). Keyword (i) is the i-th keyword among the keywords registered in the keyword list. When the CPU 11 determines that the appearance degree of the keyword (i) is not greater than 3 (step S48: NO), the CPU 11 proceeds to step S50. On the other hand, if the CPU 11 determines that the appearance degree of the keyword (i) is greater than 3 (step S48: YES), the CPU 11 proceeds to step S49. In step S49, the CPU 11 stores the keyword (i) in the RAM 13 as one of the keywords of the karaoke piece. Next, the CPU 11 proceeds to step S50.

ステップＳ５０において、ＣＰＵ１１は、番号ｉが、キーワードリストに登録されているキーワードの数未満であるか否かを判定する。このとき、ＣＰＵ１１は、番号ｉがキーワードの数未満であると判定した場合には（ステップＳ５０：ＹＥＳ）、ステップＳ５１に進む。ステップＳ５１において、ＣＰＵ１１は、番号ｉを１増加させて、ステップＳ４８に進む。一方、ＣＰＵ１１は、番号ｉがキーワードの数未満ではないと判定した場合には（ステップＳ５０：ＮＯ）、ステップＳ５２に進む。ステップＳ５２において、ＣＰＵ１１は、楽曲番号と、ＲＡＭ１３に保存しておいたキーワードとを含むメタ情報を生成する。そして、ＣＰＵ１１は、メタ情報をネットワークストレージ３に記憶させて、キーワードメタ情報生成処理を終了させる。 In step S50, the CPU 11 determines whether the number i is less than the number of keywords registered in the keyword list. At this time, if the CPU 11 determines that the number i is less than the number of keywords (step S50: YES), the CPU 11 proceeds to step S51. In step S51, the CPU 11 increments the number i by 1 and proceeds to step S48. On the other hand, if the CPU 11 determines that the number i is not less than the number of keywords (step S50: NO), the CPU 11 proceeds to step S52. In step S <b> 52, the CPU 11 generates meta information including a music number and a keyword stored in the RAM 13. Then, the CPU 11 stores the meta information in the network storage 3 and ends the keyword meta information generation process.

図６（Ｂ）は、カラオケ動画生成サーバ１における楽曲種別メタ情報生成処理の一例を示すフローチャートである。図６（Ｂ）に示すように、ＣＰＵ１１は、記憶部１４から種類リストを読み出す（ステップＳ６１）。次いで、ＣＰＵ１１は、番号ｉを１に設定する（ステップＳ６２）。次いで、ＣＰＵ１１は、ＲＡＭ１３に展開された音声データに、音色情報（ｉ）が含まれているか否かを判定する（ステップＳ６３）。音色情報（ｉ）は、種類リストに含まれる音色情報のうち、ｉ番目の音色情報である。ＣＰＵ１１は、テロップデータに、音色情報（ｉ）が含まれていないと判定した場合には（ステップＳ６３：ＮＯ）、ステップＳ６５に進む。一方、ＣＰＵ１１は、テロップデータに、音色情報（ｉ）が含まれていると判定した場合には（ステップＳ６３：ＹＥＳ）、ステップＳ６４に進む。ステップＳ６４において、ＣＰＵ１１は、音色情報（ｉ）に対応付けて種類リストに登録されている楽曲種別を、カラオケ楽曲の種類を示す可能性がある楽曲種別としてＲＡＭ１３に保存する。次いで、ＣＰＵ１１は、ステップＳ６５に進む。 FIG. 6B is a flowchart illustrating an example of music type meta information generation processing in the karaoke video generation server 1. As shown in FIG. 6B, the CPU 11 reads the type list from the storage unit 14 (step S61). Next, the CPU 11 sets the number i to 1 (step S62). Next, the CPU 11 determines whether or not the timbre information (i) is included in the audio data expanded in the RAM 13 (step S63). The timbre information (i) is the i-th timbre information among the timbre information included in the type list. When the CPU 11 determines that the tone color information (i) is not included in the telop data (step S63: NO), the CPU 11 proceeds to step S65. On the other hand, if the CPU 11 determines that the telop data includes the timbre information (i) (step S63: YES), the CPU 11 proceeds to step S64. In step S64, the CPU 11 stores the song type registered in the type list in association with the timbre information (i) in the RAM 13 as a song type that may indicate the type of karaoke song. Next, the CPU 11 proceeds to step S65.

ステップＳ６５において、ＣＰＵ１１は、番号ｉが、種類リストに登録されている音色情報の数未満であるか否かを判定する。このとき、ＣＰＵ１１は、番号ｉが音色情報の数未満であると判定した場合には（ステップＳ６５：ＹＥＳ）、ステップＳ６６に進む。ステップＳ６６において、ＣＰＵ１１は、番号ｉを１増加させて、ステップＳ６３に進む。一方、ＣＰＵ１１は、番号ｉが音色情報の数未満ではないと判定した場合には（ステップＳ６５：ＮＯ）、ステップＳ６７に進む。ステップＳ６７において、ＣＰＵ１１は、楽曲番号と、ＲＡＭ１３に保存しておいた楽曲種別とを含むメタ情報を生成する。そして、ＣＰＵ１１は、メタ情報をネットワークストレージ３に記憶させて、楽曲種別メタ情報生成処理を終了させる。 In step S65, the CPU 11 determines whether the number i is less than the number of timbre information registered in the type list. At this time, if the CPU 11 determines that the number i is less than the number of timbre information (step S65: YES), the CPU 11 proceeds to step S66. In step S66, the CPU 11 increments the number i by 1 and proceeds to step S63. On the other hand, if the CPU 11 determines that the number i is not less than the number of timbre information (step S65: NO), the CPU 11 proceeds to step S67. In step S <b> 67, the CPU 11 generates meta information including the music number and the music type stored in the RAM 13. Then, the CPU 11 stores the meta information in the network storage 3 and ends the music type meta information generation process.

以上説明したように、本実施形態によれば、カラオケ動画生成サーバ１が、データ送信端末２から取得された複数の材料情報をＲＡＭ１３に展開する。また、カラオケ動画生成サーバ１が、ＲＡＭ１３に展開された複数の材料情報に基づいて、カラオケ動画データを生成する。また、カラオケ動画生成サーバ１がＲＡＭ１３に展開された複数の材料情報のうち少なくとも１つの材料情報に基づいて、メタ情報を生成する。従って、カラオケ動画自体から生成した場合と比較してより正確なメタ情報を容易に生成することができる。 As described above, according to the present embodiment, the karaoke video generation server 1 develops a plurality of material information acquired from the data transmission terminal 2 in the RAM 13. Further, the karaoke video generation server 1 generates karaoke video data based on a plurality of material information expanded in the RAM 13. Further, the karaoke video generation server 1 generates meta information based on at least one material information among a plurality of material information expanded in the RAM 13. Therefore, it is possible to easily generate more accurate meta information as compared with the case of generating from the karaoke video itself.

１カラオケ動画生成サーバ
２データ送信端末
３ネットワークストレージ
４カラオケ動画配信サーバ
５利用者端末
１０ネットワーク
１１ＣＰＵ
１２ＲＯＭ
１３ＲＡＭ
１４記憶部
１５通信部
Ｓ通信システム DESCRIPTION OF SYMBOLS 1 Karaoke moving image production | generation server 2 Data transmission terminal 3 Network storage 4 Karaoke moving image delivery server 5 User terminal 10 Network 11 CPU
12 ROM
13 RAM
14 storage unit 15 communication unit S communication system

Claims

An information processing method executed by a computer of an information processing apparatus including temporary storage means, acquisition means, expansion means, karaoke video generation means, and meta information generation means,
The acquisition means is a plurality of material information used as a material of a karaoke video, and is video information, sound information indicating a performance sound of karaoke music, and lyrics information indicating lyrics of the karaoke music, An acquisition step of acquiring a plurality of material information including at least lyrics information including an elapsed time from the start of performance until the display of the lyrics is started;
The unfolding step in which the unfolding means unfolds the plurality of material information acquired in the acquiring step once in the temporary storage unit;
A karaoke video generation step, wherein the karaoke video generation means generates the karaoke video based on the plurality of material information in a state in which the plurality of material information is once expanded in the temporary storage means by the expansion step; ,
The meta information generating unit is configured to generate the karaoke video based on at least one material information among the plurality of material information in a state where the plurality of material information is expanded once in the temporary storage unit by the expanding step. Meta information generation step for generating meta information that is meta information used for search and includes the elapsed time included in the lyrics information as a singing start time;
An information processing method comprising:

Executed by a computer of an information processing apparatus comprising temporary storage means, acquisition means, expansion means, karaoke video generation means, word extraction means, comparison means, difficulty determination means, and meta information generation means. An information processing method,
The acquisition means is a plurality of material information used as a material of a karaoke video, and includes video information, sound information indicating a performance sound of karaoke music, and lyrics information indicating lyrics of the karaoke music, and is included in the lyrics An acquisition step of acquiring a plurality of material information including at least lyrics information including a display time from the start to the end of the display of each word,
The unfolding step in which the unfolding means unfolds the plurality of material information acquired in the acquiring step once in the temporary storage unit;
A karaoke video generation step, wherein the karaoke video generation means generates the karaoke video based on the plurality of material information in a state in which the plurality of material information is once expanded in the temporary storage means by the expansion step; ,
The word extraction means is stored in a first storage means for storing a high difficulty word predetermined as a word having a high difficulty in singing and a reference time for singing the high difficulty word in association with each high difficulty word. A word extraction step of extracting a high difficulty word included in the lyrics information developed in the temporary storage means among the plurality of high difficulty words;
A comparison step in which the comparison means compares, for each high difficulty word extracted by the word extraction step, a reference time of singing the high difficulty word and a display time of the high difficulty word;
The difficulty level determination means determines the difficulty level of the lyrics based on the comparison result of the comparison step; and
The meta information generating unit is configured to generate the karaoke video based on at least one material information among the plurality of material information in a state where the plurality of material information is expanded once in the temporary storage unit by the expanding step. Meta information used for the search, and a meta information generation step for generating meta information including the difficulty level determined by the difficulty level determination step;
An information processing method comprising:

Temporary storage means;
A plurality of material information used as a material for the karaoke video, the video information, the sound information indicating the performance sound of the karaoke music, and the lyric information indicating the lyrics of the karaoke music, and the performance of the karaoke music is started. Acquisition means for acquiring a plurality of material information including at least lyrics information including an elapsed time until display of the lyrics is started;
Expanding means for expanding the plurality of material information acquired by the acquiring means once in the temporary storage means;
Karaoke video generation means for generating the karaoke video based on the plurality of material information in a state where the plurality of material information is expanded once in the temporary storage means by the expansion means;
Meta information used for searching the karaoke video based on at least one material information among the plurality of material information in a state where the plurality of material information is expanded once in the temporary storage unit by the expanding unit. Meta information generating means for generating meta information including the elapsed time included in the lyrics information as a singing start time;
An information processing apparatus comprising:

Temporary storage means;
It is a plurality of material information that becomes the material of the karaoke video, video information, sound information indicating the performance sound of the karaoke music, and lyric information indicating the lyrics of the karaoke music, for each word included in the lyrics, Acquisition means for acquiring a plurality of material information including at least lyrics information including a display time from the start to the end of display of the word;
Expanding means for expanding the plurality of material information acquired by the acquiring means once in the temporary storage means;
Karaoke video generation means for generating the karaoke video based on the plurality of material information in a state where the plurality of material information is expanded once in the temporary storage means by the expansion means;
A plurality of high difficulty words stored in a first storage means for storing a high difficulty word predetermined as a word having a high difficulty of singing and a reference time of singing the high difficulty word in association with each high difficulty word Among them, word extraction means for extracting high difficulty words included in the lyrics information developed in the temporary storage means,
For each high difficulty word extracted by the word extraction means, a comparison means for comparing a reference time of singing the high difficulty word and a display time of the high difficulty word;
Difficulty determination means for determining the difficulty level of the lyrics based on the comparison result by the comparison means;
Meta information used for searching the karaoke video based on at least one material information among the plurality of material information in a state where the plurality of material information is expanded once in the temporary storage unit by the expanding unit. Meta information generation means for generating meta information including the difficulty level determined by the difficulty level determination means;
An information processing apparatus comprising:

In the computer of the information processing apparatus provided with temporary storage means,
A plurality of material information used as a material for the karaoke video, the video information, the sound information indicating the performance sound of the karaoke music, and the lyric information indicating the lyrics of the karaoke music, and the performance of the karaoke music is started. An acquisition step of acquiring a plurality of material information including at least lyrics information including an elapsed time until display of the lyrics is started;
An unfolding step of unfolding the plurality of material information obtained by the obtaining step once in the temporary storage means;
A karaoke video generation step for generating the karaoke video based on the plurality of material information in a state where the plurality of material information is expanded once in the temporary storage means by the expansion step;
Meta information used for searching the karaoke video based on at least one material information among the plurality of material information in a state where the plurality of material information is expanded once in the temporary storage means by the expanding step. A meta information generating step for generating meta information including the elapsed time included in the lyrics information as a singing start time;
A program characterized by having executed.

In the computer of the information processing apparatus provided with temporary storage means,
It is a plurality of material information that becomes the material of the karaoke video, video information, sound information indicating the performance sound of the karaoke music, and lyric information indicating the lyrics of the karaoke music, for each word included in the lyrics, An acquisition step of acquiring a plurality of material information including at least lyrics information including a display time from the start to the end of display of the word;
An unfolding step of unfolding the plurality of material information obtained by the obtaining step once in the temporary storage means;
A karaoke video generation step for generating the karaoke video based on the plurality of material information in a state where the plurality of material information is expanded once in the temporary storage means by the expansion step;
A plurality of high difficulty words stored in a first storage means for storing a high difficulty word predetermined as a word having a high difficulty of singing and a reference time of singing the high difficulty word in association with each high difficulty word A word extraction step of extracting high difficulty words included in the lyrics information developed in the temporary storage means,
For each high difficulty word extracted by the word extraction step, a comparison step for comparing a reference time of singing the high difficulty word and a display time of the high difficulty word;
A difficulty determination step for determining the difficulty level of the lyrics based on the comparison result of the comparison step;
Meta information used for searching the karaoke video based on at least one material information among the plurality of material information in a state where the plurality of material information is expanded once in the temporary storage means by the expanding step. A meta information generation step for generating meta information including the difficulty level determined by the difficulty level determination step;
A program characterized by having executed.