JP4894580B2

JP4894580B2 - Seasonal analysis system, seasonality analysis method, and seasonality analysis program

Info

Publication number: JP4894580B2
Application number: JP2007073388A
Authority: JP
Inventors: 貴稔北野
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2007-03-20
Filing date: 2007-03-20
Publication date: 2012-03-14
Anticipated expiration: 2027-03-20
Also published as: JP2008234338A

Description

本発明は、インターネット上に掲載される記事の旬度を解析する旬度解析システム、旬度解析方法、及び旬度解析プログラムに関する。 The present invention relates to a seasonal analysis system, a seasonal analysis method, and a seasonal analysis program for analyzing the seasonality of articles posted on the Internet.

インターネット上には膨大な数の記事が存在している。一日に何度も更新される性質の記事（例えばブログ）の場合、全ての記事を閲覧して最新の情報を把握しようとすると膨大な時間が掛かってしまう。従って、今現在どのようなテーマがあり、そのテーマがどれだけ盛り上がっているのか分かりにくい。更には、インターネット上に掲載される多数の記事のジャンル（スポーツ、経済等）は多様である。同一のジャンルでありながらも、様々な観点から書かれた記事が数多く掲載されている。従って、インターネット上に掲載される多数の記事の中から、必要な記事だけを選択して閲覧することは難しい。 There are a huge number of articles on the Internet. In the case of articles that are updated many times a day (for example, blogs), it takes an enormous amount of time to browse all articles and grasp the latest information. Therefore, it is difficult to understand what themes are now and how exciting they are. Furthermore, the genres (sports, economy, etc.) of many articles posted on the Internet are diverse. Although it is the same genre, many articles written from various viewpoints are published. Therefore, it is difficult to select and browse only necessary articles from a large number of articles posted on the Internet.

従って、インターネット上にどのようなテーマがあり、そのテーマがどれだけ旬であるか、を知ることのできる技術が望まれる。また、このような技術に対しては、精度よくインターネット上のテーマと旬度を把握できることが要求される。尚、旬とは、スパムなどによる突発的な盛り上がりや長期的なトレンドではなく、記事の人気による短期的な盛り上がりのことを指す。 Therefore, there is a demand for a technology that can know what themes are on the Internet and how seasonal the themes are. In addition, such technology is required to be able to accurately grasp Internet themes and seasons. Season means not a sudden rise or a long-term trend due to spam, but a short-term rise due to the popularity of the article.

上記と関連して、語句の出現頻度によって、どのようなテーマがあるのかを把握しようとする技術が特許文献１〜３に記載されている。特許文献１は、複数のニュースサイトの記事を収集して単語を取りだし、出現頻度の高い単語を抽出して重要語とし、重要語と関連性の有る単語を関連語とし、重要語に対する関連語の関連度を計算してこれらを表示することが記載されている。特許文献２には、記事の作成時刻に基いて、記事の作成時刻が新しいほど、その記事内で出現する語句に重みを与え、カテゴリ別に新出の語句のランキングを行うことが記載されている。特許文献３には、更新通知のあったブログ中において、キーワードの出現頻度をカウントし、出現頻度の数を利用して近未来のトレンドキーワードを予測することが記載されている。 In relation to the above, Patent Documents 1 to 3 describe techniques for determining what theme is present depending on the appearance frequency of words. Patent Literature 1 collects articles from a plurality of news sites, extracts words, extracts words with high appearance frequency as important words, sets words related to the important words as related words, and related words for the important words. It is described that the degree of relevance is calculated and displayed. Patent Document 2 describes that, based on the creation time of an article, the newer the creation time of an article, the more weight is given to the words that appear in the article, and the ranking of new words by category is performed. . Patent Document 3 describes that, in a blog for which an update notification has been made, the frequency of occurrence of keywords is counted and a trend keyword in the near future is predicted using the number of appearance frequencies.

また、旬度に関連して、記事に対する人気や重要度を評価する技術が、特許文献４〜６に記載されている。特許文献４には、他のウェブページ（記事）から該当ウェブページに向かうリンク数を用いて、該当ウェブページの人気度を計算することが記載されている。特許文献５には、ホームページのアクセス数を人気とみなすことが記載されている。特許文献６には、ページ重要性ランキングを計算するにあたって、ウェブページデータ中のウェブページのリンクを、あるレベルで集約し、集約されたリンクを使用することが記載されている。 Moreover, the technique which evaluates the popularity and importance with respect to an article in relation to the season is described in Patent Documents 4 to 6. Patent Document 4 describes that the popularity of a corresponding web page is calculated using the number of links from another web page (article) to the corresponding web page. Patent Document 5 describes that the number of homepage accesses is regarded as popular. Patent Document 6 describes that, when calculating the page importance ranking, the links of the web pages in the web page data are aggregated at a certain level and the aggregated links are used.

また、特許文献７には、複数の記事を、記事内容に従って分類する文書分類装置についての記載がある。この特許文献７には、文書データ毎に単語出現回数を計数して文書特徴ベクトルを生成し、文書特徴ベクトルに統計的手法を適用することで文書分類を行うことが記載されている。 Patent Document 7 describes a document classification device that classifies a plurality of articles according to the contents of the articles. Patent Document 7 describes that document feature vectors are generated by counting the number of word appearances for each document data, and a document classification is performed by applying a statistical method to the document feature vectors.

特開２００２−１０８９３７号公報JP 2002-108937 A 特開２００５−１３５３１１号公報JP-A-2005-135311 特開２００６−２２７９６５号公報JP 2006-227965 A 特許第３８０２８１３号公報Japanese Patent No. 3802913 特開２００２−１３２９７６号公報JP 2002-132976 A 特開２００６−１２７５２９号公報JP 2006-127529 A 特開２００１−１０１２２７号公報JP 2001-101227 A

本発明の目的は、インターネット上にどのようなテーマがあるかを、精度よく知ることのできる技術を提供することにある。 An object of the present invention is to provide a technique capable of accurately knowing what themes are on the Internet.

本発明の他の目的は、インターネット上の記事にあるテーマの旬度を、精度よく把握することのできる技術を提供することに有る。 Another object of the present invention is to provide a technique capable of accurately grasping the seasonality of a theme in an article on the Internet.

以下に、［発明を実施するための最良の形態］で使用される番号・符号を用いて、［課題を解決するための手段］を説明する。これらの番号・符号は、［特許請求の範囲］の記載と［発明を実施するための最良の形態］との対応関係を明らかにするために括弧付きで付加されたものである。ただし、それらの番号・符号を、［特許請求の範囲］に記載されている発明の技術的範囲の解釈に用いてはならない。 [Means for Solving the Problems] will be described below using the numbers and symbols used in [Best Mode for Carrying Out the Invention]. These numbers and symbols are added in parentheses in order to clarify the correspondence between the description of [Claims] and [Best Mode for Carrying Out the Invention]. However, these numbers and symbols should not be used for the interpretation of the technical scope of the invention described in [Claims].

本発明の旬度解析システムは、インターネット上に掲載される複数の記事のうち、複数の起点記事と、その複数の起点記事の各々からリンクで辿ることのできる記事群とについて記事内容を収集し、記事を特定する記事情報と対応付けて記事記録部（２）に記録する記事収集部（１）と、記事収集部（１）で収集される起点記事毎に、その起点記事とその起点記事からリンクで辿ることのできる記事群とを含む集合をクラスタとし、その記事情報と対応付けてクラスタ情報を生成し、そのクラスタ情報をクラスタ記録部（４）に記録するクラスタ生成部（３）と、記事記録部（２）及びクラスタ記録部（４）を参照して、そのクラスタに含まれる記事の記事内容に基いて、異なるクラスタ間の類似度をを計算し、計算結果に基いて類似するクラスタ同士をマージしてマージ後クラスタ情報を生成するクラスタ再構成部（５）と、そのマージ後クラスタ情報と記事記録部（２）とを参照し、クラスタ毎に旬度を測定する旬度測定部（７）と、旬度測定部（７）で測定された結果を出力装置に出力させる出力部（８）と、を具備する。 The seasonal analysis system of the present invention collects article contents for a plurality of starting articles and a group of articles that can be traced from each of the plurality of starting articles among a plurality of articles posted on the Internet. The article collection unit (1) that records in the article recording unit (2) in association with the article information that identifies the article, and for each starting article collected by the article collection unit (1), the starting article and the starting article A cluster generation unit (3) for generating a cluster information in association with the article information and recording the cluster information in the cluster recording unit (4). Referring to the article recording unit (2) and the cluster recording unit (4), the similarity between different clusters is calculated based on the article content of the articles included in the cluster, and similar based on the calculation result. The Refers to the cluster reconfiguration unit (5) that merges the clusters and generates post-merge cluster information, and the post-merge cluster information and article recording unit (2), and measures the seasonality for each cluster. A part (7), and an output unit (8) for causing the output device to output the result measured by the seasonality measuring unit (7).

この発明によれば、クラスタ再構成部（５）によって、記事内容の類似するクラスタ同士がまとめられるようにクラスタが再構成されるので、インターネット上の多数の記事をテーマ別に分類させることができる。これにより、インターネット上にどのようなテーマがあるかを、精度よく把握することができる。また、旬度測定部（７）が、テーマ別にマージされたクラスタ毎に旬度を測定するので、テーマに対する旬度を把握することができる。 According to the present invention, the cluster is reconfigured by the cluster restructuring unit (5) so that the clusters having similar article contents are gathered together, so that many articles on the Internet can be classified by theme. Thereby, it is possible to accurately grasp what theme is on the Internet. In addition, since the seasonality measuring unit (7) measures the seasonality for each cluster merged by theme, the seasonality for the theme can be grasped.

上記の旬度解析システムにおいて、クラスタ再構成部（５）は、記事記録部（２）に記録された記事の内容を解析して、単語と出現頻度とを対応付けた出現頻度データを生成し、出現頻度記録部（１０）に記録する記事解析部（９）と、クラスタ記録部（４）及び出現頻度記録部（１０）とを参照し、クラスタ毎に単語と出現頻度とを対応付けたクラスタ特徴ベクトルを生成し、特徴ベクトル記録部（１２）に記録する特徴ベクトル生成部（１１）と、そのクラスタ特徴ベクトルに基いて、異なるクラスタ間の類似度を計算し、計算結果に基いて類似するクラスタ同士をマージしてマージ後クラスタ情報を生成する類似度判定部（１３）と、を備えることが好ましい。 In the seasonal analysis system, the cluster reconstruction unit (5) analyzes the content of the article recorded in the article recording unit (2), and generates appearance frequency data in which the word and the appearance frequency are associated with each other. Referring to the article analysis unit (9) to be recorded in the appearance frequency recording unit (10), the cluster recording unit (4), and the appearance frequency recording unit (10), the word and the appearance frequency are associated with each cluster. A feature vector generation unit (11) that generates a cluster feature vector and records it in the feature vector recording unit (12), and calculates a similarity between different clusters based on the cluster feature vector, and similar based on the calculation result It is preferable to include a similarity determination unit (13) that merges clusters to generate cluster information after merging.

まあ、この際に、記事解析部（２）は、シソーラス辞書を参照して、出現頻度データ中の類似単語がマージされるように、出現頻度データを生成することが好ましい。 Well, at this time, the article analysis unit (2) preferably refers to the thesaurus dictionary and generates the appearance frequency data so that similar words in the appearance frequency data are merged.

上記の旬度解析システムにおいて、類似度判定部（１４）は、異なるクラスタ間で類似度を計算するにあたり、下記式１により類似度を計算することが好ましい。

（但し、→ｖiはクラスタiの特徴ベクトルを示し、→ｖjはクラスタjの特徴ベクトルを示す） In the seasonal analysis system described above, the similarity determination unit (14) preferably calculates the similarity according to the following Equation 1 when calculating the similarity between different clusters.

(Note that → vi indicates the feature vector of cluster i, and → vj indicates the feature vector of cluster j)

上記の旬度解析システムにおいて、記事収集部（１）は、記事内容を収集して記録するにあたり、更に、収集される記事に対して張られるトラックバックの元記事を特定するトラックバック情報と、トラックバックの張られた時刻とを収集して、記事情報と対応付けて記録することが好ましい。ここで、旬度測定部（７）は、クラスタ毎に新鮮度を計算して新鮮度記録部（１６）に記録する新鮮度計算部（１５）と、新鮮度の結果に基いて、旬度を算出する旬度計算部（２１）とを備える。また、新鮮度計算部（１６）は、クラスタ中に含まれるトラックバックの数と、トラックバックの張られた時刻とに基いて、新鮮度を計算する。また、新鮮度計算部（１５）は、下記式２により、新鮮度Ｆ（ｔ）を計算することが更に好ましい。

（但し、α、β、γは定数、ｔはトラックバックの張られてからの経過時間を示す） In the seasonal analysis system described above, the article collection unit (1) further collects and records the article content, and further includes trackback information for identifying the original article of the trackback that is attached to the collected article, It is preferable to collect the stretched time and record it in association with the article information. Here, the seasonality measurement unit (7) calculates the freshness for each cluster and records the freshness calculation unit (15) in the freshness recording unit (16). And a seasonal calculation unit (21). Further, the freshness degree calculation unit (16) calculates the freshness degree based on the number of trackbacks included in the cluster and the time when the trackback is applied. Moreover, it is more preferable that the freshness calculation unit (15) calculates the freshness F (t) by the following formula 2.

(However, α, β, and γ are constants, and t is the elapsed time since the trackback was applied.)

上記の旬度解析システムにおいて、記事収集部（１）は、記事内容を収集して記録するにあたり、更に、収集される記事に対して張られるトラックバックの元記事を特定するトラックバック情報を収集して、記事情報と対応付けて記録することが好ましい。ここで、旬度測定部（７）は、クラスタ毎に波及度を計算して波及度記録部（２０）に記録する波及度計算部（１９）と、波及度の結果に基いて旬度を算出する旬度計算部（２１）とを備る。波及度計算部（１９）は、トラックバック情報に基いて、トラックバックの起点記事からの深さを算出し、クラスタ中に含まれるトラックバックの数と、トラックバックの起点記事からの深さとに基いて、波及度を計算する。このとき、波及度計算部（１９）は、下記式３により、波及度Ｉ（ｄ）を計算することが更に好ましい。

（但し、αはトラックバック元の記事の重みを示し、ｄはトラックバックの前記起点記事からの深さを示す） In the above seasonal analysis system, the article collection unit (1) further collects trackback information for identifying the original article of the trackback that is attached to the collected article when collecting and recording the article content. It is preferable to record in association with article information. Here, the seasonality measurement unit (7) calculates the ripple degree for each cluster and records the ripple degree calculation unit (19) in the ripple degree recording unit (20), and determines the seasonality based on the result of the ripple degree. The seasonal calculation part (21) to calculate is provided. The spillover degree calculation unit (19) calculates the depth from the trackback starting article based on the trackback information, and determines the spillover based on the number of trackbacks included in the cluster and the depth from the trackback starting article. Calculate the degree. At this time, it is more preferable that the ripple degree calculation unit (19) calculates the ripple degree I (d) according to the following formula 3.

(However, α indicates the weight of the article of the trackback source, and d indicates the depth of the trackback from the starting article)

上記の旬度解析システムにおいて、記事収集部（１）は、記事内容を収集して記録するにあたり、更に、収集される記事に対する読み手の評価を示す評価情報を収集して、記事情報と対応付けて記録することが好ましい。ここで、旬度測定部（７）は、クラスタ毎に人気度を計算して人気度記録部（１８）に記録する人気度計算部（１７）と、人気度の結果に基いて旬度を算出する旬度計算部（２１）とを備える。人気度計算部（２１）は、クラスタ中に含まれる記事の数と、記事の評価情報とに基いて、人気度を計算する。この際、評価情報は、収集される記事に対するソーシャルブックマーク数又はアクセス数であることがより好ましい。 In the seasonal analysis system described above, the article collection unit (1) further collects evaluation information indicating the reader's evaluation on the collected article and associates it with the article information when collecting and recording the article content. It is preferable to record. Here, the seasonality measurement unit (7) calculates the popularity for each cluster and records the popularity in the popularity recording unit (18), and the seasonality based on the result of popularity. And a seasonal calculation unit (21) for calculating. The popularity calculation unit (21) calculates the popularity based on the number of articles included in the cluster and the evaluation information of the articles. At this time, the evaluation information is more preferably the number of social bookmarks or the number of accesses to the collected articles.

上記の旬度解析システムにおいて、記事収集部（１）は、記事を収集して記録するにあたり、更に、収集される記事に対して張られるトラックバックの元記事を特定するトラックバック情報と、前記トラックバックの張られた時刻と、収集される記事に対する読み手の評価を示す評価情報と、を収集して、前記記事情報と対応付けて記録することが好ましい。ここで、旬度測定部（７）は、クラスタ毎に、クラスタ中に含まれる記事の数と、記事の前記評価情報とに基いて人気度を計算し、人気度記録部（１８）に記録する人気度計算部（１７）と、クラスタ毎に、トラックバック情報に基いて、トラックバックの起点記事からの深さを算出し、クラスタ中に含まれるトラックバックの数と、トラックバックの起点記事からの深さとに基いて波及度を計算して波及度記録部（２０）に記録する波及度計算部（１９）と、クラスタ毎に、クラスタ中に含まれるトラックバックの数と、トラックバックの張られた時刻とに基いて、新鮮度を計算して新鮮度記録部（１６）に記録する新鮮度計算部（１５）と、人気度、前記波及度、及び前記新鮮度の結果に基いて、旬度を算出する旬度計算部（７）とを備える。この時、新鮮度計算部（１５）は、下記式４により、新鮮度Ｆ（ｔ）を計算し、

（但し、α、β、γは定数、ｔはトラックバックの張られてからの経過時間を示す）
波及度計算部（１９）は、下記式５により、波及度Ｉ（ｄ）を計算し、

（但し、αはトラックバック元の記事の重みを示し、ｄはトラックバックの前記起点記事からの深さを示す）
前記人気度計算部（１７）は、収集される記事に対するソーシャルブックマーク数又はアクセス数を評価情報として、クラスタ毎の総ソーシャルブックマーク数又は総アクセス数を、人気度Ｐ（ｎ）として計算し、
旬度計算部（２１）は、下記式６により、旬度を計算することが更に好ましい。

（但し、Ｉi(d)はトラックバックiの波及度を示し、Ｆi(t)はトラックバックiの新鮮度を示し、Ｐj(n)は、記事ｊの人気度を示す） In the seasonal analysis system described above, the article collection unit (1) further collects and records an article, and further includes trackback information for identifying an original article of the trackback that is attached to the collected article, It is preferable that collected time and evaluation information indicating the reader's evaluation on the collected articles are collected and recorded in association with the article information. Here, the seasonality measuring unit (7) calculates, for each cluster, the popularity based on the number of articles included in the cluster and the evaluation information of the articles, and records the popularity in the popularity recording unit (18). And calculating the depth from the trackback starting article for each cluster based on the trackback information, the number of trackbacks included in the cluster, and the depth from the trackback starting article A ripple degree calculation unit (19) that calculates a ripple degree based on the data and records it in the ripple degree recording unit (20), and for each cluster, the number of trackbacks included in the cluster and the time at which the trackback was applied Based on the results of the freshness calculation unit (15) that calculates the freshness and records the freshness in the freshness recording unit (16), and the popularity, the spread, and the freshness result are calculated. Prepared with seasonal calculation part (7) That. At this time, the freshness calculation unit (15) calculates the freshness F (t) according to the following equation 4,

(However, α, β, and γ are constants, and t is the elapsed time since the trackback was applied.)
The spillover degree calculation unit (19) calculates the spillover degree I (d) by the following formula 5,

(However, α indicates the weight of the article of the trackback source, and d indicates the depth of the trackback from the starting article)
The popularity calculating unit (17) calculates the total number of social bookmarks or the total number of accesses for each cluster as the popularity P (n) using the number of social bookmarks or the number of accesses to the collected articles as evaluation information,
More preferably, the seasonality calculation unit (21) calculates the seasonality according to the following Equation 6.

(Where Ii (d) indicates the influence of trackback i, Fi (t) indicates the freshness of trackback i, and Pj (n) indicates the popularity of article j)

本発明によれば、インターネット上にどのようなテーマがあるかを、精度よく知ることのできる技術が提供される。 ADVANTAGE OF THE INVENTION According to this invention, the technique which can know exactly what theme exists on the internet is provided.

本発明によれば、更に、インターネット上の記事にあるテーマの旬度を、精度よく把握することのできる技術が提供される。 According to the present invention, there is further provided a technology capable of accurately grasping the seasonality of a theme in an article on the Internet.

（第１の実施形態）
以下、図面を参照しつつ、第１の実施形態について説明する。本実施形態の旬度解析システムは、インターネット上の膨大な記事のなかから、旬度の高い記事を自動的に抽出して、その記事を特定する情報を表示するものである。尚、解析の対象となる記事としては、ブログやＷＩＫＩなどの、リンク支援システム付のユーザー参加型ウェブシステム群に含まれる記事が挙げられる。 (First embodiment)
Hereinafter, the first embodiment will be described with reference to the drawings. The seasonal analysis system of the present embodiment automatically extracts articles with high seasonality from a large number of articles on the Internet, and displays information for identifying the articles. The articles to be analyzed include articles included in a user participation type web system group with a link support system, such as a blog or WIKI.

図１は、本実施形態に係る旬度解析システムの構成を示す概略ブロック図である。旬度解析システムは、ＲＯＭ（Ｒｅａｄｏｎｌｙｍｅｍｏｌｙ）等に格納され、ＣＰＵにより実行される旬度解析プログラムと、その旬度解析プログラムの処理内容、結果を記録する記憶装置（ＲＡＭ、ハードディスク等）から構成される。具体的には、図１に示されるように、記事収集部１、記事記録部２、クラスタ生成部３、クラスタ記録部４、クラスタ再構成部５、マージ後クラスタ記録部６、旬度測定部７、及び表示部８を備えている。 FIG. 1 is a schematic block diagram illustrating a configuration of a seasonality analysis system according to the present embodiment. The seasonal analysis system is stored in a ROM (Read only memory) or the like, and is executed from a seasonal analysis program executed by the CPU and a storage device (RAM, hard disk, etc.) that records the processing contents and results of the seasonal analysis program. Composed. Specifically, as shown in FIG. 1, an article collection unit 1, an article recording unit 2, a cluster generation unit 3, a cluster recording unit 4, a cluster reconfiguration unit 5, a merged cluster recording unit 6, a seasonality measuring unit 7 and a display unit 8.

記事収集部１は、インターネット上から定期的に記事を取得して記事データを生成し、記事記録部２に格納する機能を実現する。ここで記事収集部１は、更新された情報や新しい情報のみを記事記録部２に格納する。 The article collection unit 1 implements a function of periodically acquiring articles from the Internet, generating article data, and storing the article data in the article recording unit 2. Here, the article collection unit 1 stores only updated information and new information in the article recording unit 2.

クラスタ生成部３は、記事収集部１が収集する少なくとも一の記事に対してクラスタを設定し、クラスタ情報としてクラスタ記録部４に格納するものである。 The cluster generation unit 3 sets a cluster for at least one article collected by the article collection unit 1 and stores it in the cluster recording unit 4 as cluster information.

クラスタ再構成部５は、記事記録部２とクラスタ記録部４とを参照して、クラスタ間の類似度を判定し、類似クラスタ同士をマージしたマージ後クラスタ情報を生成してマージ後クラスタ情報記録部６に格納するものである。クラスタ再構成部５は、記事解析部９、出現頻度記録部１０、特徴ベクトル生成部１１、特徴ベクトル記録部１２、類似度判定部１１、及び類似度記録部１４を備えている。 The cluster reconfiguration unit 5 refers to the article recording unit 2 and the cluster recording unit 4 to determine the similarity between clusters, generates post-merge cluster information by merging similar clusters, and records post-merge cluster information This is stored in the unit 6. The cluster reconstruction unit 5 includes an article analysis unit 9, an appearance frequency recording unit 10, a feature vector generation unit 11, a feature vector recording unit 12, a similarity determination unit 11, and a similarity recording unit 14.

旬度測定部７は、記事記録部２とマージ後クラスタ記録部６とを参照して、マージ後のクラスタ毎に旬度を測定するものである。旬度測定部７は、新鮮度計算部１５、新鮮度記録部１６、人気度計算部１７、人気度記録部１８、波及度計算部１９、波及度記録部２０、旬度計算部２１、及び旬度記録部２２を備えている。 The seasonality measuring unit 7 measures the seasonality for each cluster after merging with reference to the article recording unit 2 and the merged cluster recording unit 6. The seasonality measuring unit 7 includes a freshness calculating unit 15, a freshness recording unit 16, a popularity calculating unit 17, a popularity recording unit 18, a ripple calculating unit 19, a ripple recording unit 20, a seasonality calculating unit 21, and A seasonal recording unit 22 is provided.

表示部８は、旬度測定部７の測定結果を表示画面上に表示するものである。 The display unit 8 displays the measurement result of the seasonality measurement unit 7 on the display screen.

この旬度解析システムは、以下に述べるように動作してその機能を実現する。図２は、本実施形態に係る旬度解析方法を示すフローチャートである。この旬度解析方法は、記事を収集するステップ（Ｓ００１）、クラスタリングを行うステップ（Ｓ００２）、類似度を判定してクラスタを再構成するステップ（Ｓ００３）、旬度を測定するステップ（Ｓ００４）、及び表示するステップ（Ｓ０００５）を備えている。各ステップにおける動作について、以下に詳述する。 This seasonal analysis system operates as described below to realize its functions. FIG. 2 is a flowchart showing a seasonality analysis method according to this embodiment. This seasonal analysis method includes a step of collecting articles (S001), a step of clustering (S002), a step of determining similarity to reconstruct a cluster (S003), a step of measuring seasonality (S004), And a displaying step (S0005). The operation in each step will be described in detail below.

（（ステップＳ００１；記事の収集））
図３は、記事収集部１がインターネット上に掲載される記事を収集する際の動作を示すフローチャートである。本ステップの説明にあたっては、図４に示されるように、インターネット上に、複数の記事（Ａ〜Ｈ）が掲載されている場合を例として説明する。図４中、矢印はトラックバックによるリンクを示しており、例えば記事Ａは記事Ｂからトラックバックされ、記事Ａから記事Ｂにリンクが張られていることを示している。 ((Step S001; Collecting articles))
FIG. 3 is a flowchart showing an operation when the article collection unit 1 collects articles posted on the Internet. In the description of this step, a case where a plurality of articles (A to H) are posted on the Internet as shown in FIG. 4 will be described as an example. In FIG. 4, an arrow indicates a link by trackback, and for example, article A is tracked back from article B, and a link is provided from article A to article B.

ステップＳ１０１
まず、記事収集部１は、起点となる記事（以下、起点記事）を選択する。ここでは、記事Ａが起点記事として選ばれたとする。起点記事は、例えば、ソーシャルブックマークの登録数が多い記事や、アクセス数が多い記事、ユーザによって指定された記事などが選ばれる様にすればよい。記事収集部１は、選択した起点記事Ａにアクセスして、記事Ａに関するデータ（記事データ）を収集し、記事記録部２に格納する。 Step S101
First, the article collection unit 1 selects an article as a starting point (hereinafter referred to as a starting article). Here, it is assumed that article A is selected as the starting article. As the starting article, for example, an article with a large number of registered social bookmarks, an article with a large number of accesses, an article specified by the user, or the like may be selected. The article collection unit 1 accesses the selected starting article A, collects data related to the article A (article data), and stores it in the article recording unit 2.

図５は、記事記録部２に記録されるデータを示す概念図である。記事収集部２は、この図に示される項目のうち、記事内容（タイトル及び本文）、ＵＲＬ（記事内容）、記事Ａの作成時刻、記事Ａに対するトラックバック元のＵＲＬ、トラックバックのリンクが張られた時刻、及び評価情報（本実施形態ではソーシャルブックマーク数）を収集する。そして、これらを記事を特定するための記事ＩＤ（記事情報；Ａ）と対応付けて、記事記録部２に記録する。この時、図５に示される「訪問済み」の項目はＮＯにしておく。そして、次のステップ（Ｓ１０２）へと進む。尚、図５では、記事Ｂ、Ｃ、Ｄに関する項目も埋められており、「訪問済み」の項目もＹＥＳとなっているが、この図５は、最終的な状態の例を示したものであり、本ステップでの処理後の状態をそのまま示したものではない。 FIG. 5 is a conceptual diagram showing data recorded in the article recording unit 2. Of the items shown in this figure, the article collection unit 2 includes article contents (title and body text), URL (article contents), creation time of article A, trackback source URL for article A, and trackback links. Time and evaluation information (in this embodiment, the number of social bookmarks) are collected. These are recorded in the article recording unit 2 in association with the article ID (article information; A) for specifying the article. At this time, the item “visited” shown in FIG. 5 is set to NO. Then, the process proceeds to the next step (S102). In FIG. 5, items relating to articles B, C, and D are filled in, and the item “visited” is also YES, but FIG. 5 shows an example of the final state. Yes, it does not show the state after the processing in this step.

ステップＳ１０２
次に、記事収集部２は、記事記録部を参照して未訪問の記事が存在するかを判断する。存在する場合には、次のステップＳ１０３に進み、存在しない場合には記事収集に係る処理を終える。この段階では、起点記事Ａの訪問が未訪問となっているので、次のステップＳ１０３に進む。 Step S102
Next, the article collection unit 2 refers to the article recording unit and determines whether there is an unvisited article. If it exists, the process proceeds to the next step S103, and if it does not exist, the process related to article collection ends. At this stage, since the visit of the starting article A is not visited, the process proceeds to the next step S103.

ステップＳ１０３、Ｓ１０４
次に、記事収集部１は、未訪問の記事の中から記事ＩＤの最も若い記事を選択する。ここでは、起点記事Ａが選択される（Ｓ１０３）。そして、選択した記事にトラックバックが張られているかどうかを確認し、張られている場合にはステップＳ１０５へ進み、張られていない場合にはＳ１０６に進む。ここでは、選択された起点記事Ａに対して、記事Ｂと記事Ｃからのトラックバックが張られているので、Ｓ１０５に進む（Ｓ１０４）。 Step S103, S104
Next, the article collection unit 1 selects the youngest article with the article ID from unvisited articles. Here, the starting article A is selected (S103). Then, it is confirmed whether or not a trackback is stretched on the selected article. If the trackback is stretched, the process proceeds to step S105, and if not, the process proceeds to S106. Here, since the trackback from the article B and the article C is applied to the selected starting article A, the process proceeds to S105 (S104).

ステップＳ１０５
次に、記事収集部１は、Ｓ１０３で選択した記事のトラックバック元の記事にアクセスし、Ｓ１０１の処理と同様に、記事に関するデータを収集して記事記録部２に格納する。ここでは、記事Ｂと記事Ｃとにアクセスして収集する。この際、記事Ｂと記事Ｃは、未訪問にしておく。その後、次のステップＳ１０６に進む。 Step S105
Next, the article collection unit 1 accesses the trackback source article of the article selected in S103, collects data related to the article, and stores it in the article recording unit 2 as in the process of S101. Here, article B and article C are accessed and collected. At this time, article B and article C are not visited. Thereafter, the process proceeds to next Step S106.

ステップＳ１０６
次に、記事収集部１は、Ｓ１０３で選択した記事（記事Ａ）を訪問済みであることを記事記録部２に記録して、Ｓ１０２の処理に戻る。Ｓ１０２では、記事Ｂ、記事Ｃが未訪問であるので、Ｓ１０３へと進み、記事ＩＤの若い記事Ｂが選択される。Ｓ１０４において、記事Ｂにはトラックバック元が無いので、Ｓ１０６へと進んで、記事Ｂが訪問済みであることが記事記録部２に記録される。このような処理を繰り返し、起点記事Ａからリンクを辿ってアクセスすることのできる記事群全てについて、記事に関するデータが収集され、記事記録部２に記録される。同様の処理を、他の起点記事（Ｄ及びＧ）についても繰り返して、記事に関するデータを収集して記録する。 Step S106
Next, the article collection unit 1 records in the article recording unit 2 that the article (article A) selected in S103 has been visited, and returns to the process of S102. In S102, since article B and article C have not been visited, the process proceeds to S103, and article B having a lower article ID is selected. In S104, since the article B has no trackback source, the process proceeds to S106, and the article recording unit 2 records that the article B has been visited. Such processing is repeated, and data relating to articles is collected and recorded in the article recording unit 2 for all article groups that can be accessed by following links from the starting article A. Similar processing is repeated for the other starting articles (D and G), and data relating to the articles is collected and recorded.

尚、以上の一連の処理の説明では、起点記事からリンクで辿る事のできる全ての記事について記事内容を収集する場合について説明したが、例えば２回目以降の処理のなどで記事記録部２に既に記事内容などが記憶されている場合には、更新された記事及び新しい記事についてのみ、収集すればよい。 In the above description of the series of processes, the case has been described in which the article contents are collected for all articles that can be traced from the starting article. However, for example, in the second and subsequent processes, the article recording unit 2 has already been collected. When article contents are stored, only updated articles and new articles need to be collected.

（（ステップＳ００２））；クラスタリング処理
続いて、クラスタ生成部３が、記事記録部２１が収集した複数の記事に対して、起点記事及び起点記事からトラックバックによるリンクを辿ることのできる記事群の集合をクラスタとして設定する。クラスタ生成部３は、設定したクラスタに一意なＩＤ（以下、クラスタＩＤ）を割り振り、そのクラスタ中の記事ＩＤを関連付けてクラスタ情報を生成し、クラスタ記録部４に記録する。 ((Step S002)); Clustering Processing Next, the cluster generation unit 3 collects a plurality of articles collected by the article recording unit 21 from a starting article and a set of articles that can follow a trackback link from the starting article. Set as a cluster. The cluster generation unit 3 allocates a unique ID (hereinafter referred to as a cluster ID) to the set cluster, associates an article ID in the cluster, generates cluster information, and records it in the cluster recording unit 4.

図６、７は、クラスタ記録部４に記録されるクラスタ情報の一例を示す概念図であり、図４で挙げた例に対応している。この例では、起点記事Ａ及び起点記事Ａからリンクを辿ることのできる記事Ｂ、Ｃからなる集合がクラスタＣ１として設定されている。同様に、クラスタＣ２として、起点記事Ｄ、記事Ｅ、Ｆからなる集合が設定され、クラスタＣ３として、起点記事Ｇ及び記事Ｈからなる集合が設定されている。尚、図６中では、説明を分かり易くするために、記事名Ａ、Ｂ、を記載しているが、実際には、記事ＩＤ（Ｂ１、Ｂ２・・）とクラスタを示すクラスタＩＤとが対応付けられている。また、以下の説明でも、記事ＩＤ（Ｂ１、Ｂ２・・・）の代わりに、記事名（Ａ、Ｂ、Ｃ・・・）を記載することがある。 6 and 7 are conceptual diagrams showing an example of cluster information recorded in the cluster recording unit 4, and correspond to the example given in FIG. In this example, a set composed of a starting article A and articles B and C that can follow links from the starting article A is set as a cluster C1. Similarly, a set of starting articles D, articles E, and F is set as the cluster C2, and a set of starting articles G and articles H is set as the cluster C3. In FIG. 6, article names A and B are shown for easy understanding of the explanation, but in reality, article IDs (B1, B2,...) Correspond to cluster IDs indicating clusters. It is attached. In the following description, article names (A, B, C,...) May be described instead of article IDs (B1, B2,...).

尚、トラックバックによるリンクを辿ることのできる記事群の集合は、たとえば、記事記録部２に記録されたトラックバック元のＵＲＬを参照することで把握することができる。 The set of articles that can follow the trackback link can be grasped by referring to the URL of the trackback source recorded in the article recording unit 2, for example.

（（ステップＳ００３））；類似度の判定、クラスタ再構成
続いて、図８は、本ステップにおける動作を示すフローチャートである。本動作は、クラスタ再構成部５の動作により実現される。 ((Step S003)); Determination of Similarity, Cluster Reconfiguration Next, FIG. 8 is a flowchart showing the operation in this step. This operation is realized by the operation of the cluster reconfiguration unit 5.

ステップＳ２０１、２０２；記事の選択
まず、記事解析部９が、記事記録部２に記録された全ての記事内容を取得する（Ｓ２０１）。そして、取得した記事に対して、形態素解析を行う。それにより、記事内容を単語に分解し、その中から名詞を抽出する（Ｓ２０２）。本処理では、記事のテーマ（テーマ）の類似性を判定することが目的であり、テーマを表現するのに適さない品詞は必要がない。従って、形態素解析処理の結果から句読点・動詞・形容詞などを除き、名詞だけを抽出する。記事解析部９は、記事中に出現した名詞の出現回数をカウントし、名詞毎に一意なＩＤ（Ｗ１、Ｗ２・・・）を与えて、図９に例示される出現頻度データを生成し、出現頻度記録部１０に記録する。 Step S201, 202; Article Selection First, the article analysis unit 9 acquires all the article contents recorded in the article recording unit 2 (S201). Then, morphological analysis is performed on the acquired article. Thereby, the content of the article is decomposed into words, and nouns are extracted from the words (S202). The purpose of this process is to determine the similarity of article themes (themes), and there is no need for parts of speech that are not suitable for expressing the themes. Accordingly, only nouns are extracted from the result of the morphological analysis process, excluding punctuation marks, verbs, adjectives and the like. The article analysis unit 9 counts the number of appearances of nouns appearing in the article, gives unique IDs (W1, W2,...) For each noun, generates appearance frequency data exemplified in FIG. Record in the appearance frequency recording unit 10.

ステップＳ２０３；特徴ベクトルの生成
続いて、特徴ベクトル生成部１１が、出現頻度データとクラスタ情報とを参照して、各クラスタ毎に、各単語の出現回数を集計する。そして、図１０に示されるように、クラスタ毎に、単語ＩＤ（Ｗ１、Ｗ２、・・）と出現回数とを対応付けた特徴ベクトルを生成し、特徴ベクトル記録部１２に記録する。この際に、全クラスタ中（全クラスタ中で正しいでしょうか？）で、出現頻度が高い順に上位の単語（例えば上位１０単語）を選んで特徴ベクトルを生成する。尚、図１０の例は、説明の便宜上、図８で示した例とは対応させていない。 Step S203: Generation of Feature Vector Subsequently, the feature vector generation unit 11 refers to the appearance frequency data and the cluster information, and totals the number of appearances of each word for each cluster. Then, as shown in FIG. 10, for each cluster, a feature vector in which the word ID (W1, W2,...) Is associated with the number of appearances is generated and recorded in the feature vector recording unit 12. At this time, in all the clusters (is it correct in all the clusters?), A feature vector is generated by selecting an upper word (for example, upper 10 words) in descending order of appearance frequency. Note that the example of FIG. 10 does not correspond to the example shown in FIG. 8 for convenience of explanation.

ステップＳ２０４、２０５；類似度の計算、比較
続いて、類似度判定部１３が、特徴ベクトル記録部１２から、異なる２つのクラスタの特徴ベクトルを取得する。そして、取得した２つの特徴ベクトルの為す角で類似度を計算する。本実施形態では、下記式７で示される様に、２つのクラスタ（iとｊ）の特徴ベクトルのなす角の余弦（コサイン）を類似度で定義するものとする。

（但し、→ｖiはクラスタiの特徴ベクトルを示し、→ｖjはクラスタjの特徴ベクトルを示す） Steps S204 and 205: Calculation and Comparison of Similarity Subsequently, the similarity determination unit 13 acquires feature vectors of two different clusters from the feature vector recording unit 12. Then, the similarity is calculated from the angle formed by the two acquired feature vectors. In this embodiment, the cosine of the angle formed by the feature vectors of the two clusters (i and j) is defined by the similarity as shown by the following equation (7).

図１０の場合を具体例にして、類似度の算出方法をより具体的に説明する。ここでは、クラスタ１とクラスタ２間の類似度を算出するとする。上式７により、クラスタ１と２の類似度を算出すると、以下の様に計算され、類似度＝０．０１という結果を得る。
類似度＝（３×２＋０×２＋９×０＋７×０）／{√（３^２＋０^２＋０^２＋９^２＋７^２）×√（２^２＋３^２＋２^２＋０^２＋０^２）＝０．０１ Taking the case of FIG. 10 as a specific example, the similarity calculation method will be described more specifically. Here, the similarity between cluster 1 and cluster 2 is calculated. When the similarity between the clusters 1 and 2 is calculated by the above equation 7, the calculation is performed as follows, and a result of similarity = 0.01 is obtained.
Similarity = (3 × 2 + 0 × 2 + 9 × 0 + 7 × 0) / {√ (3 ² +0 ² +0 ² +9 ² +7 ² ) × √ (2 ² +3 ² +2 ² +0 ² +0 ² ) = 0.01

類似度判定部１３は、計算した類似度から、２つのクラスタ同士が類似しているかどうかを判定する。異なる２つのクラスタの特徴ベクトルに共通して出現する単語が多い程、また出現回数が近い単語が多いほど、特徴ベクトル同士のなす角度が１に近づき、類似度が高くなる。従って、予め類似度の閾値を設定しておき、特徴ベクトルの角度が閾値よりも近い（類似度が閾値よりも１に近い）クラスタ同士は類似している判定し、同一テーマを扱っているものと判定する。上述した例では、仮に閾値を０．０７に設定していたとすると、類似度＝０．０１であり、閾値よりも１に近くないので、類似していないと判定する。（ステップＳ２０５）。類似していた場合には、次のステップＳ２０６の動作を行い、類似していなかった場合には、Ｓ２０７の動作を行う。 The similarity determination unit 13 determines whether the two clusters are similar from the calculated similarity. The more words that appear in common in the feature vectors of two different clusters and the more words that appear closer to each other, the closer the angle formed by the feature vectors is to 1, and the higher the similarity. Therefore, a threshold value of similarity is set in advance, and the feature vector angle is closer than the threshold value (similarity is closer to 1 than the threshold value), the clusters are determined to be similar, and the same theme is handled. Is determined. In the above-described example, if the threshold value is set to 0.07, similarity = 0.01, which is not closer to 1 than the threshold value, so it is determined that they are not similar. (Step S205). If they are similar, the operation of the next step S206 is performed, and if they are not similar, the operation of S207 is performed.

尚、上述の処理（Ｓ２０４、２０５）で計算された類似度は、図１１に示されるように、類似度記録部１４に格納される。 The similarity calculated in the above-described processing (S204, 205) is stored in the similarity recording unit 14 as shown in FIG.

ステップＳ２０６；類似クラスタのマージ
Ｓ２０５の処理で、２つのクラスタ同士が類似していた場合、類似判定部１３はクラスタ情報を参照して各単語の出現回数を足し合わせ、類似クラスタ同士をマージする。これにより、新しい特徴ベクトルが生成される。また、類似度判定部は、クラスタ情報を参照して、類似クラスタ同士がマージされたクラスタを付加したマージ後クラスタ情報を生成し、マージ後クラスタ記録部に格納する。図１２は、マージ後クラスタ情報の一例を示す概念図である。図１２の例では、クラスタＣ１とクラスタＣ２とが類似しており、クラスタ１と２を併せたクラスタＣ４が追加された例を示している。なお、説明の便宜上、既述の図とは対応していない。マージ済みの特徴ベクトル内のクラスタ間同士は、類似度比較の対象とならないように設定し、次のステップＳ２０７の処理を行う。 Step S206; Merging Similar Clusters When the two clusters are similar in the process of S205, the similarity determination unit 13 refers to the cluster information, adds up the number of appearances of each word, and merges the similar clusters. Thereby, a new feature vector is generated. In addition, the similarity determination unit refers to the cluster information, generates post-merge cluster information to which a cluster in which similar clusters are merged is added, and stores the merged cluster information in the post-merge cluster recording unit. FIG. 12 is a conceptual diagram showing an example of merged cluster information. In the example of FIG. 12, the cluster C1 and the cluster C2 are similar to each other, and an example in which a cluster C4 including the clusters 1 and 2 is added is shown. For convenience of explanation, it does not correspond to the above-described figure. The clusters in the merged feature vector are set so as not to be subjected to similarity comparison, and the process of the next step S207 is performed.

ステップＳ２０７；終了判定
以上の処理一連の処理を、全てのクラスタ同士の組み合わせについて繰り返し、全ての組み合わせについて類似性の判定が終わると、処理を終了する。このようにして再構成されたマージ後クラスタ情報は、記事内容に基いて複数の記事がまとめられた物であるので、テーマ別に記事がまとめられたものであるととらえることができる。すなわち、異なるクラスタ内の記事同士はテーマの異なる記事同士であり、同じクラスタ内の記事同士は、同じテーマの記事同士であるととらえることができる。また、Ｓ２０６においてマージされたクラスタの特徴ベクトルは、クラスタ内で頻繁に使われているキーワードを示しているといえる。従って、クラスタ毎にキーワードを表示する様にすれば、ユーザはどのようなテーマが存在しているかを把握することができる。 Step S207: End Determination The above-described series of processing is repeated for all combinations of clusters, and when the similarity determination is completed for all combinations, the processing ends. Since the merged cluster information reconstructed in this way is a collection of a plurality of articles based on the article content, it can be regarded as a collection of articles by theme. That is, articles in different clusters can be regarded as articles having different themes, and articles in the same cluster can be regarded as articles of the same theme. In addition, it can be said that the cluster feature vector merged in S206 indicates a keyword frequently used in the cluster. Therefore, if a keyword is displayed for each cluster, the user can grasp what theme exists.

（（ステップＳ００４））；旬度の測定
続いて、旬度測定部７が、旬度の測定を行う。図１３は、旬度の測定に係る動作を示すフローチャートである。旬度測定部７は、新鮮度の計算（Ｓ３０１）、人気度の計算（Ｓ３０２）、及び波及度の計算（Ｓ３０３）を行い、これらの値からクラスタ毎に旬度を計算する（Ｓ３０４）。新鮮度、人気度、及び波及度の計算はどの順に行われてもよい。また、複数クラスタのマージされたクラスタ（マージクラスタ）が存在する場合には、旬度計算時に、マージクラスタの構成要素クラスタ毎に人気度、新鮮度、及び波及度が計算され、構成要素クラスタの値を合算してマージクラスタの評価値とする。以下に、各ステップにおける動作の詳細について説明する。 ((Step S004)); Measurement of Seasonal Value Subsequently, the seasonality measurement unit 7 measures the seasonality. FIG. 13 is a flowchart showing an operation related to seasonal measurement. The seasonality measurement unit 7 calculates freshness (S301), popularity (S302), and spread (S303), and calculates the seasonality for each cluster from these values (S304). The calculation of freshness, popularity, and spread may be performed in any order. In addition, when there are merged clusters of multiple clusters (merge clusters), the popularity degree, freshness degree, and ripple degree are calculated for each component cluster of the merge cluster at the time of seasonal calculation. The values are added together to obtain the merge cluster evaluation value. Details of the operation in each step will be described below.

ステップＳ３０１；新鮮度の計算
新鮮時計算部１５は、記事記録部１５を参照して、記事ＩＤと、トラックバック元の記事の情報（トラックバック元のＵＲＬ）と、そのトラックバックの張られた時刻の情報と、を取得する。また、図示しないタイマー機能部から、現在時刻を取得する。一のトラックバックに対して、そのトラックバックが張られてからの経過時間（現在時刻からのトラックバックの張られた時刻の差分）を算出し、トラックバック元の記事、トラックバック先の記事を特定する情報（記事ＩＤ）と対応付けて保持する（図１４）。尚、図１４では、説明を分かり易くするため、記事ＩＤの代わりに、記事名（Ａ、Ｂ、・・・）を記載してある。この経過時間の算出を、記事記録部１５中に記録される全てのトラックバックに対して行う。そして、各トラックバックに対して、経過時間に基いて新鮮度を計算する。 Step S301: Calculation of Freshness The fresh time calculation unit 15 refers to the article recording unit 15, and describes the article ID, the trackback source article information (trackback source URL), and the time when the trackback is set. And get. Further, the current time is acquired from a timer function unit (not shown). Information for identifying the trackback source article and the trackback destination article by calculating the elapsed time (difference of the trackback extension time from the current time) for the one trackback. ID) is held in association with each other (FIG. 14). In FIG. 14, article names (A, B,...) Are described instead of article IDs for easy understanding. The elapsed time is calculated for all trackbacks recorded in the article recording unit 15. For each trackback, the freshness is calculated based on the elapsed time.

本実施形態では、トラックバックの新鮮度は、トラックバックが張られてからの時間の経過に対して、図１５に示されるように指数関数的に減少していくものと設定し、下記式８で新鮮度Ｆ（ｔ）を求める場合を例として説明する。

（但し、α、β、γは定数、ｔはトラックバックの張られてからの経過時間を示す）
尚、式中、ｔは経過時間を示し、ｔ＞＝０である。αは、新鮮度の減少量の幅に関する定数である。βは、新鮮度の傾きを示す。γは、新鮮度の初期値を示す値である。 In this embodiment, the freshness of the trackback is set to decrease exponentially as shown in FIG. 15 with respect to the elapse of time after the trackback is stretched. A case where the degree F (t) is obtained will be described as an example.

(However, α, β, and γ are constants, and t is the elapsed time since the trackback was applied.)
In the formula, t indicates elapsed time, and t> = 0. α is a constant related to the range of the amount of decrease in freshness. β indicates the slope of freshness. γ is a value indicating an initial value of freshness.

新しいトラックバックが多ければ多いほど、そのトラックバック先の記事は注目が高まっていると考えられる。逆に、古いトラックバックしかなければ低くなる。但し、一定時間が経過しても、一定の価値は残る。従って、上式８で示したように、時間経過とともに減少する指数関数を用いることが好ましい。 The more new trackbacks, the more attention is likely to be placed on the articles behind them. Conversely, if there is only an old trackback, it will be low. However, a certain value remains even after a certain period of time. Therefore, it is preferable to use an exponential function that decreases with the passage of time, as shown in Equation 8 above.

具体例として、ここではα＝１０、β＝１０、γ＝０として設定し、各トラックバックの経過時間が図１４のように計算されたとする。このとき、新鮮度計算部１５は、式９に従い計算し、Ａ−Ｂ間のトラックバックの新鮮度＝１．４、Ｃ−Ａ間のトラックバックの新鮮度＝３．７、Ｄ−Ｂ間のトラックバックの新鮮度＝６．１という計算結果を得る（ただし、本例では小数点第３桁は四捨五入した）。 As a specific example, it is assumed here that α = 10, β = 10, and γ = 0, and the elapsed time of each trackback is calculated as shown in FIG. At this time, the freshness calculation unit 15 calculates according to Equation 9, and the freshness of the trackback between A and B = 1.4, the freshness of the trackback between C and A = 3.7, and the trackback between D and B The calculation result is freshness = 6.1 (however, in this example, the third decimal place is rounded off).

新鮮度計算部１５は、算出したトラックバックの新鮮度を、トラックバックを特定する情報と対応付けて、新鮮度記録部１６に記録する（図１６参照；但し、一例を示したものであり、既述の図の内容とは対応していない）。 The freshness calculation unit 15 records the calculated freshness of the trackback in the freshness recording unit 16 in association with the information for specifying the trackback (see FIG. 16; however, an example is shown. Does not correspond to the contents of the figure).

以上で、新鮮度の計算に係る処理を終了する。 Above, the process which concerns on calculation of freshness is complete | finished.

なお、ある記事Ｂの書き手が、現在時刻に近い時間に別の記事Ａを評価してブログを評価してトラックバックを張れば、そのトラックバック先の記事Ａの新鮮度の評価値が高くなると考えられるので、上記で計算したトラックバックの新鮮度は、トラックバック先の記事の新鮮度と捉える事もできる。従って、新鮮度は、記事の盛り上がりのパラメータとしても用いることができる。また、クラスタの新鮮度は、トラックバックの新鮮度の総和に比例するものとして捉えた場合、クラスタ内のトラックバックの新鮮度を合算すれば、クラスタの新鮮度を得る事ができる。従って、新鮮度計算部８がこれらを必要に応じて計算し、表示部８によって表示画面上に表示する様に構成することも有用である。 If a writer of an article B evaluates another article A at a time close to the current time, evaluates the blog and puts a trackback, the evaluation value of the freshness of the article A at the trackback destination is considered to increase. Therefore, the trackback freshness calculated above can be regarded as the freshness of the trackback destination article. Therefore, the freshness level can also be used as a parameter of article excitement. Further, when the freshness of the cluster is regarded as being proportional to the sum of the freshness of the trackback, the freshness of the cluster can be obtained by adding the freshness of the trackback in the cluster. Therefore, it is also useful that the freshness calculation unit 8 calculates these as necessary and displays them on the display screen by the display unit 8.

ステップＳ３０２；人気度の計算
人気度計算部１７は、記事記録部２を参照して、記事の評価情報（本実施形態ではソーシャルブックマーク数）に基いて、その記事の人気度を算出する。図１７に示されるように、記事Ａのソーシャルブックマーク数が１０、記事Ｂが５、記事Ｃが３、記事Ｄが２であったとすると、記事Ａの人気度は１０、記事Ｂの人気度は５、記事Ｃの人気度は３、記事Ｄの人気度は２となる。このようにして算出した人気度は、人気度記録部１１４に記録する。図１９は、本発明の第一の実施形態における人気度のデータの例である。人気度計算部１７は、人気度を記事ＩＤと対応付けて、人気度記録部１１８に記録する（図１８参照）。尚、評価情報としては、その記事の読み手側の評価を示す情報であれば、ソーシャルブックマーク数に限られず、アクセス数などの他のデータを用いてもよい。 Step S302: Popularity Calculation The popularity calculation unit 17 refers to the article recording unit 2 and calculates the popularity of the article based on the evaluation information of the article (the number of social bookmarks in this embodiment). As shown in FIG. 17, if the number of social bookmarks of article A is 10, article B is 5, article C is 3, and article D is 2, the popularity of article A is 10, and the popularity of article B is 5. The popularity of article C is 3, and the popularity of article D is 2. The popularity calculated in this way is recorded in the popularity recording unit 114. FIG. 19 is an example of popularity degree data in the first embodiment of the present invention. The popularity calculating unit 17 records the popularity in association with the article ID in the popularity recording unit 118 (see FIG. 18). The evaluation information is not limited to the number of social bookmarks as long as it is information indicating the evaluation on the reader side of the article, and other data such as the number of accesses may be used.

尚、クラスタ内の記事の人気度を合計して、そのクラスタの人気度としてもよい。 It should be noted that the popularity of articles in a cluster may be totaled to be the popularity of the cluster.

ステップＳ３０３；波及度の計算
波及度計算部１９は、記事記録部２を参照して、トラックバックの起点記事からのパス長（深さ）に基いて、波及度を計算する。ここで、パス長は、起点記事とトラックバック先の記事との間に存在するリンク（トラックバック）数であるものとする。起点記事から離れている（パス長が長い）記事が有るほど、リンク（トラックバック）が多いほど、そのクラスタではテーマが盛り上がっていると考えられる。従って、パス長が長ければ長いほど、波及度を高くする。具体的に説明するために、図１９に示されるように、起点記事Ａに対して、記事Ｂ及びＣからトラックバックが張られており、記事Ｂに対して記事Ｄからトラックバックが張られているものとする。このとき、人気度計算部１７は、記事ＢからＡに対するトラックバック、及び記事ＣからＡに対するトラックバックのパス長を１であると計算する。また、記事ＤからＢに対するトラックバックのパス長は２であると計算する。 Step S303: Calculation of Ripple Degree The ripple degree calculator 19 refers to the article recording unit 2 and calculates the ripple degree based on the path length (depth) from the starting article of the track back. Here, it is assumed that the path length is the number of links (trackbacks) existing between the starting article and the trackback destination article. The more distant articles (the longer the path length) are from the starting article, and the more links (trackbacks), the more the theme is considered to be exciting. Therefore, the longer the path length, the higher the ripple degree. For concrete explanation, as shown in FIG. 19, a trackback is stretched from the articles B and C to the starting article A, and a trackback is stretched from the article D to the article B. And At this time, the popularity calculation unit 17 calculates that the track length of the trackback from the article B to A and the trackback from the article C to A is 1. Also, it is calculated that the trackback path length for articles D to B is 2.

また、波及度に関しては、記事の信頼性などを考慮して、重み付けを行って計算してもよい。このように、記事の信頼性を考慮した場合の一例として、波及度Ｉ（ｄ）を下記式９のように定義することができる。

尚、上式中において、ｄはパス長である。αは、リンクのパス長への重みである。この重みは、信頼性の高い記事（例示；アクセス数の多い記事）から直接リンクされている場合などには大きくし、信頼性の低い記事から直接リンクされている場合などは値を小さくする様にすればよい。 Further, the spread degree may be calculated by weighting in consideration of the reliability of the article. Thus, as an example when the reliability of the article is taken into consideration, the ripple degree I (d) can be defined as in the following Expression 9.

In the above formula, d is a path length. α is a weight to the path length of the link. This weight is increased when linked directly from highly reliable articles (example: articles with a large number of accesses), and is decreased when linked directly from unreliable articles. You can do it.

波及度計算部１９は、以上の様にして算出した波及度を、トラックバック元記事ＩＤ、トラックバック先記事ＩＤと対応付けて、波及度記録部２０に記録する（図２０参照）。尚、新鮮度の段で説明したのと同様に、トラックバックの波及度は、トラックバック先の記事の波及度と捉えてもよい。 The ripple degree calculation unit 19 records the ripple degree calculated as described above in the ripple degree recording unit 20 in association with the trackback source article ID and the trackback destination article ID (see FIG. 20). As described in the freshness level, the trackback ripple may be taken as the ripple of the trackback destination article.

ステップＳ３０４；旬度の測定
旬度算出部２１は、Ｓ３０１〜３０３の処理で算出された波及度、新鮮度、及び人気度に基いて、旬度を算出する。この際に、マージ後クラスタ情報に基いて、クラスタ毎に旬度を算出する。 Step S304: Measurement of seasonality The seasonality calculation unit 21 calculates the seasonality based on the ripple, freshness, and popularity calculated in the processes of S301 to 303. At this time, the seasonality is calculated for each cluster based on the merged cluster information.

下記式１０は、旬度計算の一例を示す式である。

（但し、Ｉi(d)はトラックバックiの波及度を示し、Ｆi(t)はトラックバックiの新鮮度を示し、Ｐj(n)は、記事ｊの人気度を示す） The following equation 10 is an equation showing an example of seasonal calculation.

より具体的に例を挙げると、クラスタＣ１内に含まれる記事が図１９で示される様にトラックバックによりリンクされていたとする。そして、新鮮度として図１６に示される結果が、波及度として図２０に示される結果が、人気度として図１８に示される結果が、それぞれ得られていたものとする。この場合、クラスタＣ１のテーマの旬度は、
ブログクラスタＣ１のテーマの旬度
＝（Ａ−Ｂ間）のトラックバックの波及度×（Ａ−Ｂ）間のトラックバックの新鮮度＋（Ａ−Ｃ）間のトラックバックの波及度×（Ａ−Ｃ）のトラックバックの新鮮度＋（Ｂ−Ｄ）のトラックバックの波及度×Ｂ−Ｄのトラックバックの新鮮度＋ブログ記事Ａ〜Ｄの人気度の総和
＝１×１．４＋１×３．７＋２×６．１＋（１０＋５＋３＋２）
＝３７．３
となる。 More specifically, it is assumed that articles included in the cluster C1 are linked by trackback as shown in FIG. It is assumed that the result shown in FIG. 16 as the freshness, the result shown in FIG. 20 as the spread, and the result shown in FIG. 18 as the popularity are obtained. In this case, the seasonality of the theme of cluster C1 is
Season of the theme of the blog cluster C1 = Ripple of trackback between (A and B) × Freshness of trackback between (A−B) + Ripple of trackback between (A−C) × (A−C) Trackback Freshness + (BD) Trackback Ripple x BD Trackback Freshness + Sum of Popularity of Blog Articles A to D = 1 × 1.4 + 1 × 3.7 + 2 × 6.1 + (10 + 5 + 3 + 2)
= 37.3
It becomes.

旬度計算部２１は、このようにして算出したテーマの旬度を、クラスタＩＤと対応付けて、旬度記録部２２に記録する（図２１参照；但し、クラスタＣ２、Ｃ３に関しては、既述の例と対応していない）。 The seasonality calculation unit 21 records the seasonality of the theme calculated in this way in the seasonality recording unit 22 in association with the cluster ID (see FIG. 21; however, the clusters C2 and C3 are described above). Does not correspond to the example).

尚、既述のように、トラックバックの波及度をトラックバック先の記事の波及度、トラックバックの新鮮度をトラックバック先記事の新鮮度と捉えれば、各記事についても旬度を計算することができる。具体的には、「記事の旬度」＝波及度×新鮮度＋人気度として求めればよい。 As described above, if the trackback ripple is regarded as the ripple of the trackback destination article and the trackback freshness is regarded as the freshness of the trackback destination article, the seasonality can be calculated for each article. Specifically, it may be obtained as “season of article” = spreading degree × freshness + popularity.

（（ステップＳ００５；表示））
表示部１２は、記事記録部２、マージ後クラスタ情報を参照して、クラスタ毎に起点記事のタイトル取得する。また、旬度記録部２２を参照して旬度の高い順にクラスタを並び替え、図２２に示されるように、起点記事のタイトルと対応する旬度を表示画面に表示する。またこの際に、クラスタの特徴ベクトル（図１０参照）を参照して、クラスタのキーワードを表示する。また、マージクラスタを表示する場合には、階層的に表示する様にしてもよい。また、起点記事以外の記事を関連記事として表示してもよい。この際、各記事の旬度を求めていれば、各記事の旬度を併せて表示してもよい。 ((Step S005; Display))
The display unit 12 refers to the article recording unit 2 and the merged cluster information, and acquires the title of the starting article for each cluster. In addition, referring to the seasonal recording unit 22, the clusters are rearranged in descending order of the seasonality, and as shown in FIG. 22, the seasonality corresponding to the title of the starting article is displayed on the display screen. At this time, the keyword of the cluster is displayed with reference to the cluster feature vector (see FIG. 10). Further, when displaying the merge cluster, it may be displayed hierarchically. Further, articles other than the starting article may be displayed as related articles. At this time, if the seasonality of each article is obtained, the seasonality of each article may be displayed together.

以上説明したように、本実施形態によれば、インターネット上の複数の記事が、記事内容に基いてクラスタリングされ、旬度が高い順に起点記事のタイトルや、キーワードなどが表示されるので、ユーザは全ての記事を閲覧する事無く、最近の旬なテーマとしてどのようなテーマが存在するのかを知る事ができる。また、そのテーマの盛り上がり具合も把握することができる。 As described above, according to the present embodiment, a plurality of articles on the Internet are clustered based on the article content, and the title of the starting article, keywords, etc. are displayed in order from the highest season, so the user can You can find out what themes exist as recent seasonal themes without browsing all the articles. In addition, it is possible to grasp the excitement of the theme.

また、旬度を算出するにあたり波及度を用いているので、テーマがどれほど多面的に、どれほど広く注目を集めているかという点を、旬であるかどうかの評価に反映させることができる。 In addition, since the ripple degree is used to calculate the seasonality, it is possible to reflect in the evaluation of whether it is seasonal how much the theme is attracting attention.

また、旬度を算出するにあたり新鮮度を用いているので、昔の評価と最近の評価とが等価に扱われず、最近の評価ほど重みを増して旬度に反映させる事ができる。 In addition, since freshness is used in calculating seasonality, old evaluations and recent evaluations are not treated equivalently, and more recent evaluations can be weighted and reflected in seasonality.

（第２の実施形態）
本発明の第２の実施形態について説明する。図２３は、本実施形態の旬度解析システムの構成を概略的に示すブロック図であり、図２４は動作方法を示すフローチャートである。本実施形態の旬度解析システムは、第１の実施形態に対して、記事解析部９にシソーラス解析部２３が追加されており（図２３）、動作方法としては、単語のシソーラス解析を行うステップ（Ｓ２０２Ａ）が追加されている。その他の構成、動作に関しては、同じ番号を付して省略を説明する。 (Second Embodiment)
A second embodiment of the present invention will be described. FIG. 23 is a block diagram schematically showing the configuration of the seasonality analysis system of this embodiment, and FIG. 24 is a flowchart showing the operation method. The seasonal analysis system of the present embodiment is different from the first embodiment in that a thesaurus analysis unit 23 is added to the article analysis unit 9 (FIG. 23), and the operation method includes a step of performing a thesaurus analysis of words. (S202A) is added. Other configurations and operations will be described with the same reference numerals.

記事解析部９が出現頻度データを生成すると（Ｓ２０２）、シソーラス解析部２３は、シソーラス辞書（図示せず）を参照して、生成した出現頻度データに対してシソーラス解析を行う（Ｓ２０２Ａ）。すなわち、抽出した単語のうち、シソーラスである（類似している）単語同士がないかどうかを判定し、シソーラスと判定された単語同士の出現頻度をマージし、一つの単語として扱う。シソーラス解析部９は、この様にして類似単語のマージされた出現頻度データを、マージ後出現頻度データとして、出現頻度記録部１０に記録する（図２４のＡＳ２０１）。以降のステップＳ２０３以降の処理では、第１の実施形態における出現頻度データに代えて、マージ後出現頻度データが用いられる。 When the article analysis unit 9 generates appearance frequency data (S202), the thesaurus analysis unit 23 performs a thesaurus analysis on the generated appearance frequency data with reference to a thesaurus dictionary (not shown) (S202A). That is, it is determined whether or not there is a thesaurus (similar) word among the extracted words, and the appearance frequencies of the words determined as the thesaurus are merged and handled as one word. The thesaurus analysis unit 9 records the appearance frequency data in which similar words are merged in this manner in the appearance frequency recording unit 10 as post-merge appearance frequency data (AS201 in FIG. 24). In the processing after step S203, the appearance frequency data after merging is used instead of the appearance frequency data in the first embodiment.

本実施形態によれば、単語の出現頻度のみをだけでなく、単語のシソーラスの解析を行い、類似単語を一つにまとめる事により、記事の単語のばらつきを抑えることができ、クラスタの類似度判定の際の精度を向上させることができる。 According to the present embodiment, not only the appearance frequency of words but also the thesaurus of the word is analyzed and the similar words are combined into one, so that the variation of the words of the articles can be suppressed, and the similarity of the clusters The accuracy at the time of determination can be improved.

尚、シソーラス解析部２３がシソーラス解析を行うにあたっては、シソーラス解析部２３がシソーラス辞書を電子データとして記憶しておく事でシソーラス解析を行ってもよいし、インターネット上のシソーラス辞書を利用してシソーラス解析を行うようにしてもよい。 When the thesaurus analysis unit 23 performs the thesaurus analysis, the thesaurus analysis unit 23 may perform the thesaurus analysis by storing the thesaurus dictionary as electronic data, or the thesaurus using the thesaurus dictionary on the Internet. Analysis may be performed.

第１の実施形態に係る旬度解析システムの概略ブロック図である。It is a schematic block diagram of the seasonal analysis system which concerns on 1st Embodiment. 第１の実施形態に係る旬度解析方法を示すフローチャートである。It is a flowchart which shows the seasonal analysis method which concerns on 1st Embodiment. 記事収集部の動作を示すフローチャートである。It is a flowchart which shows operation | movement of an article collection part. 記事のトラックバックの結びつきの例を示す説明図である。It is explanatory drawing which shows the example of the coupling | bonding of the trackback of an article. 記事記録部に記録されるデータ内容を表す概念図である。It is a conceptual diagram showing the data content recorded on an article recording part. クラスタ記録部に記録されるデータ内容を表す概念図である。It is a conceptual diagram showing the data content recorded on a cluster recording part. クラスタ生成部の動作を説明するための説明図である。It is explanatory drawing for demonstrating operation | movement of a cluster production | generation part. クラスタ再構成部の動作を示すフローチャートである。It is a flowchart which shows operation | movement of a cluster reconstruction part. 出現頻度データを示す概念図である。It is a conceptual diagram which shows appearance frequency data. クラスタ特徴ベクトルを示す概念図である。It is a conceptual diagram which shows a cluster feature vector. 類似度記録部に記録されるデータ内容を表す概念図である。It is a conceptual diagram showing the data content recorded on a similarity recording part. マージ後クラスタ情報を説明するための説明図である。It is explanatory drawing for demonstrating cluster information after a merge. 旬度測定部の動作を示すフローチャートである。It is a flowchart which shows operation | movement of a seasonal measurement part. トラックバックとトラックバック時刻との関係を示す概念図である。It is a conceptual diagram which shows the relationship between a track back and track back time. 新鮮度と経過時間との関係を説明するための説明図である。It is explanatory drawing for demonstrating the relationship between freshness and elapsed time. 新鮮度記録部に記録されるデータ内容を表す概念図である。It is a conceptual diagram showing the data content recorded on a freshness recording part. 記事と評価情報との関係を説明するための説明図である。It is explanatory drawing for demonstrating the relationship between an article and evaluation information. 人気度記録部に記録されるデータ内容を表す概念図である。It is a conceptual diagram showing the data content recorded on a popularity recording part. トラックバックとパス長との関係を説明するための説明図である。It is explanatory drawing for demonstrating the relationship between a track back and path length. 波及度記録部に記録されるデータ内容を表す概念図である。It is a conceptual diagram showing the data content recorded on a ripple recording part. 旬度記録部に記録されるデータ内容を表す概念図である。It is a conceptual diagram showing the data content recorded on a seasonal recording part. 表示画面上に表示される内容を表す例図である。It is an example figure showing the content displayed on a display screen. 第２の実施形態の旬度解析システムの構成を示す概略ブロック図である。It is a schematic block diagram which shows the structure of the seasonal analysis system of 2nd Embodiment. 第２の実施形態の旬度解析方法の、クラスタ再構成部の動作を示す示すフローチャートである。It is a flowchart which shows operation | movement of the cluster reconstruction part of the seasonal analysis method of 2nd Embodiment.

Explanation of symbols

１記事収集部
２記事記録部
３クラスタ生成部
４クラスタ記録部
５クラスタ再構成部
６マージ後クラスタ記録部
７旬度測定部
８表示部
９記事解析部
１０出現頻度記録部
１１特徴ベクトル生成部
１２特徴ベクトル記録部
１３類似度判定部
１４類似度記録部
１５新鮮度計算部
１６新鮮度記録部
１７人気度計算部
１８人気度記録部
１９波及度計算部
２０波及度記録部
２１旬度計算部
２２旬度記録部
２３シソーラス解析部 DESCRIPTION OF SYMBOLS 1 Article collection part 2 Article recording part 3 Cluster production | generation part 4 Cluster recording part 5 Cluster reconfiguration | reconstruction part 6 Post-merge cluster recording part 7 Seasonal measurement part 8 Display part 9 Article analysis part 10 Appearance frequency recording part 11 Feature vector generation part 12 Feature vector recording unit 13 Similarity determining unit 14 Similarity recording unit 15 Freshness calculating unit 16 Freshness recording unit 17 Popularity calculating unit 18 Popularity recording unit 19 Ripple calculating unit 20 Ripple recording unit 21 Seasonal calculating unit 22 Seasonal recording part 23 Thesaurus analysis part

Claims

Select a plurality of starting articles from a plurality of articles posted on the Internet, collect article contents for each of the plurality of starting articles and a group of articles that can be traced from each starting article by a link, and An article collection unit that generates article data associated with the article information to be identified;
For each starting article, a cluster generation unit that sets the starting article and a set of articles that can be traced from the starting article as a cluster, and generates cluster information that associates the cluster with the article information. When,
Based on the article data and the cluster information, the similarity between different clusters is determined based on the article contents of the articles included in the cluster, and the similar clusters are merged by merging the similar clusters based on the determination result. A cluster reconfiguration unit that generates information;
A seasonality measuring unit that refers to the cluster information after merging and the article data, and measures the seasonality for each cluster,
An output unit for causing the output device to output the result measured by the seasonality measuring unit;
Seasonal analysis system.

The seasonal analysis system according to claim 1,
The cluster reconfiguration unit
Analyzing the article content of the article based on the article data, and generating an appearance frequency data in which words and appearance frequencies are associated with each other,
Based on the cluster information and the appearance frequency data, for each cluster, a feature vector generation unit that generates a cluster feature vector in which a word and an appearance frequency are associated;
A similarity determination unit that calculates similarity between different clusters based on the cluster feature vector, merges similar clusters based on a determination result, and generates the cluster information after merging. Analysis system.

A seasonal analysis system according to claim 2,
Furthermore,
Thesaurus analysis is performed on the appearance frequency data generated by the article analysis unit, and a thesaurus analysis unit that generates merged appearance frequency data obtained by merging similar words,
The feature vector generation unit generates the cluster feature vector with reference to the merged appearance frequency data.

A seasonality analysis system according to any one of claims 1 to 3,
The similarity determination unit calculates the similarity between cluster i and cluster j according to the following formula 1.

(Note that → vi indicates the feature vector of cluster i, and → vj indicates the feature vector of cluster j)
Seasonal analysis system.

The seasonal analysis system according to claim 1,
In generating the article data, the article collection unit further includes trackback information for identifying an original article of a link attached to a reference-side article by a reference side with respect to an article to be collected, and a time at which the trackback is applied. And the article data in association with the article information,
The seasonality measuring unit is
A freshness calculator for calculating freshness;
Based on the result of the freshness, with a seasonality calculation unit that calculates the seasonality for each cluster,
The freshness calculation unit is a seasonal analysis system that calculates the freshness of the trackback based on the trackback information and the time when the trackback is applied.

A seasonal analysis system according to claim 5,
The freshness calculation unit calculates the freshness F (t) according to the following equation 2 when calculating the freshness for each cluster.

(However, α, β, and γ are constants, and t is the elapsed time since the trackback was applied.)
Seasonal analysis system

The seasonal analysis system according to claim 1,
The article collection unit, when generating the article data, further collects trackback information for specifying an original article of the trackback that is attached to the collected article, and associates with the article information,
The seasonality measuring unit is
A ripple calculation unit for calculating the ripple for each cluster;
Based on the results of the ripples, a seasonal calculation unit for calculating the seasonality,
The ripple calculation unit calculates the depth of the trackback from the starting article based on the trackback information, and determines the number of the trackbacks included in the cluster, the depth of the trackback from the starting article, A seasonal analysis system for calculating the spread based on the above.

The seasonal analysis system according to claim 7,
The ripple degree calculation unit calculates the ripple degree I (d) according to Equation 3 below.

(However, α indicates the weight of the article of the trackback source, and d indicates the depth of the trackback from the starting article)
Seasonal analysis system.

The seasonal analysis system according to claim 1,
In generating the article data, the article collection unit further collects evaluation information indicating a reader's evaluation on the collected articles, and associates with the article information,
The seasonality measuring unit is
A popularity calculation unit for calculating the popularity for each cluster;
Based on the result of popularity, a seasonal calculation unit for calculating the seasonality,
The popularity calculation unit is a seasonal analysis system that calculates the popularity based on the number of articles included in the cluster and the evaluation information of articles.

The seasonal analysis system according to claim 9,
The evaluation information is a seasonal analysis system that is the number of social bookmarks or the number of accesses to collected articles.

The seasonal analysis system according to claim 1,
When generating the article data, the article collection unit further includes trackback information for identifying an original article of the trackback that is applied to the collected article, the time when the trackback is applied, and the collected article. Collecting evaluation information indicating the reader's evaluation and associating it with the article information;
The seasonality measuring unit is
For each cluster, a popularity calculating unit that calculates the popularity based on the number of articles included in the cluster and the evaluation information of the articles;
For each cluster, the depth of the trackback from the starting article is calculated based on the trackback information, and based on the number of trackbacks included in the cluster and the depth of the trackback from the starting article. And a ripple degree calculation unit for calculating the ripple degree,
For each cluster, a freshness calculation unit that calculates the freshness based on the number of the trackbacks included in the cluster and the time at which the trackback is stretched;
A seasonality analysis system comprising: a seasonality calculation unit that calculates seasonality based on the popularity, the spread, and the freshness results.

The seasonal analysis system according to claim 11,
In calculating the freshness for each cluster, the freshness calculation unit calculates the freshness F (t) according to the following formula 4.

(However, α, β, and γ are constants, and t is the elapsed time since the trackback was applied.)
The ripple degree calculation unit calculates the ripple degree I (d) according to the following formula 5.

(However, α indicates the weight of the article of the trackback source, and d indicates the depth of the trackback from the starting article)
The popularity calculating unit calculates the number of social bookmarks or the number of accesses to the collected article as the evaluation information, calculates the total number of social bookmarks or the total number of accesses for each cluster as the popularity P (n),
The seasonality calculation unit calculates seasonality according to the following formula 6.

(Where Ii (d) indicates the influence of trackback i, Fi (t) indicates the freshness of trackback i, and Pj (n) indicates the popularity of article j)
Seasonal analysis system.

Among a plurality of articles posted on the web, article information that collects article contents and identifies articles about a plurality of starting articles and a group of articles that can be traced from each of the plurality of starting articles. An article collection step for generating article data in association with each other;
For each starting article collected in the article collecting step, a set including the starting article and a group of articles that can be traced from the starting article by a link is defined as a cluster, and the correspondence between the cluster and the article information is indicated. A cluster generation step for generating cluster information;
Clusters that calculate similarity between article contents between different clusters based on the article data and the cluster information, merge similar clusters based on the calculation results, and generate cluster information after merging A reconfiguration step;
A seasonality measuring step of measuring seasonality for each cluster based on the cluster information after merging and the content of the articles collected in the article collecting step;
An output step of causing the output device to output the measurement result of the seasonal measurement step;
A seasonal analysis method comprising:

The seasonal analysis method according to claim 13,
The cluster reconfiguration step includes:
Analyzing the content of the article collected in the article collecting step, and generating an appearance frequency data generating step for generating appearance frequency data in which words and appearance frequencies are associated with each other,
Based on the cluster information and the appearance frequency data, a cluster feature vector generation step for generating a cluster feature vector in which a word and an appearance frequency are associated for each cluster;
A similarity analysis step comprising: calculating similarity between different clusters based on the cluster feature vector, combining similar clusters based on a determination result, and generating cluster information after merging. .

The seasonal analysis method according to claim 14, wherein
A seasonality analysis method for generating the appearance frequency data so that similar words are merged with reference to a thesaurus dictionary in the appearance frequency data generation step.

A seasonality analysis method according to any one of claims 13 to 15,
In calculating the similarity between the different clusters in the similarity determination step,
Calculate the similarity using Equation 7 below

(Note that → vi indicates the feature vector of cluster i, and → vj indicates the feature vector of cluster j)
Seasonal analysis method.

The seasonal analysis method according to claim 13,
In the article collection step, in generating the article data, the article further collects trackback information for identifying the original article of the trackback applied to the collected article and the time when the trackback was applied, and the article The article-data is associated with the information,
The seasonal measurement step includes
A freshness calculating step for calculating freshness for each cluster;
A seasonality calculating step for calculating the seasonality based on the freshness result,
A seasonality analysis method for calculating the freshness in the freshness calculation step, based on the number of the trackbacks included in the cluster and the time when the trackback is applied.

The seasonal analysis system according to claim 17,
In the freshness calculation step, the freshness F (t) is calculated by the following equation 8 when calculating the freshness for each cluster.

(However, α indicates the weight of the article of the trackback source, and d indicates the depth of the trackback from the starting article)
Seasonal analysis method.

The seasonal analysis method according to claim 13,
In the article collection step, when generating the article data, further collecting trackback information for specifying an original article of the trackback that is attached to the collected article, and obtaining the article data,
The seasonal measurement step includes
A ripple calculation step for calculating the ripple for each cluster;
A seasonality calculating step for calculating the seasonality based on the result of the spread,
In the ripple degree calculating step, the depth of the trackback from the starting article is calculated based on the trackback information, and the number of the trackbacks included in the cluster, the depth of the trackback from the starting article, A seasonal analysis method for calculating the ripple degree based on the above.

The seasonal analysis method according to claim 19, wherein
In the ripple calculation step, the ripple I (d) is calculated by the following formula 9.

The seasonal analysis method according to claim 13,
In the article collection step, when generating the article data, further collecting evaluation information indicating a reader's evaluation on the collected articles, and associating with the article information as the article data,
The seasonal measurement step includes
A popularity calculation step for calculating the popularity for each cluster;
A seasonality calculating step for calculating the seasonality based on the result of the popularity, and
A seasonality analysis method for calculating the popularity based on the number of articles included in the cluster and the evaluation information of articles in the popularity calculating step.

The seasonal analysis method according to claim 21,
The evaluation information is a seasonal analysis method that is the number of social bookmarks or the number of accesses to collected articles.

The seasonal analysis method according to claim 13,
In the article collection step, in generating the article data, the trackback information for identifying the original article of the trackback that is applied to the collected article, the time when the trackback is applied, and the collected article Collect evaluation information indicating the reader's evaluation as the article data,
The seasonal measurement step includes
For each cluster, a popularity calculation step for calculating the popularity based on the number of articles included in the cluster and the evaluation information of the articles;
For each cluster, the depth of the trackback from the starting article is calculated based on the trackback information, and based on the number of trackbacks included in the cluster and the depth of the trackback from the starting article. And a ripple degree calculating step for calculating the ripple degree,
For each cluster, a freshness calculation step of calculating a freshness based on the number of the trackbacks included in the cluster and the time when the trackback is stretched;
A seasonality calculating method for calculating a seasonality based on the results of the popularity degree, the ripple degree, and the freshness degree, and a seasonality analysis method.

The seasonal analysis method according to claim 23, wherein
In the freshness calculation step, the freshness F (t) is calculated by the following equation 10;

(However, α, β, and γ are constants, and t is the elapsed time since the trackback was applied.)
In the ripple degree calculation step, the ripple degree I (d) is calculated by the following equation 11;

(Where α indicates the weight of the article of the trackback source, and d indicates the depth of the trackback from the starting article)
The popularity calculating unit calculates the number of social bookmarks or the number of accesses to the collected article as the evaluation information, calculates the total number of social bookmarks or the total number of accesses for each cluster as the popularity P (n),
The seasonality calculation unit calculates the seasonality according to the following formula 12.

(Where Ii (d) indicates the influence of trackback i, Fi (t) indicates the freshness of trackback i, and Pj (n) indicates the popularity of article j)
Seasonal analysis method.

A seasonality analysis program for causing a computer to execute the seasonality analysis method according to any one of claims 13 to 24.