JP2011108053A

JP2011108053A - System for evaluating news article

Info

Publication number: JP2011108053A
Application number: JP2009263398A
Authority: JP
Inventors: Shohei Abe; 昌平阿部; Yoshio Ozawa; 良男小澤
Original assignee: Nomura Research Institute Ltd
Current assignee: Nomura Research Institute Ltd
Priority date: 2009-11-18
Filing date: 2009-11-18
Publication date: 2011-06-02

Abstract

<P>PROBLEM TO BE SOLVED: To provide a technique capable of quantitatively calculating the importance of individual news articles. <P>SOLUTION: A system 10 for evaluating news articles includes: a user setting storage part 24 for setting a keyword for each analytical matter; an article collection part 12 for obtaining news articles and blog articles including a keyword from a news server 30 and a blog server 28 arranged on the Internet 26 and respectively storing the obtaind news articles and blog articles in a news article storage part 14 and a blog article storage part 16; a correspondence relation determination part 18 for specifying news articles having correspondence relation with respective blog articles; and an influence analysis part 20 for totalizing the number of blog articles having correspondence relation for each news article and storing the total number of blog articles into an analytical result storage part 22 in correlation with each news article. The correspondence relation determination part 18 compares link information set in each blog article with the URL of each news article, and when the link information coincides with the URL, sets the correspondence relation between the blog article and the news article. <P>COPYRIGHT: (C)2011,JPO&INPIT

Description

この発明はニュース記事評価システムに係り、特に、インターネット上に開設された各ニュースサイトが提供するニュース記事の影響力を、具体的な数値に基づいて評価する技術に関する。 The present invention relates to a news article evaluation system, and more particularly to a technique for evaluating the influence of a news article provided by each news site established on the Internet based on specific numerical values.

今日、インターネットの普及に伴い、新聞や雑誌といった従来型の紙媒体に代わり、インターネット上に自社の新製品情報や広告情報を公開し、その認知度の向上を目指す企業が増えてきている。
同じく、インターネットの普及に伴い、Webログ（以下「ブログ」）と称する簡易的な日記サイトを開設し、日々の雑感をインターネット上に公開する個人が増えてきており、多くの購読者を抱えた人気ブログの場合、商品の認知度や売上の向上に大きな影響力を有するようになってきている。 Today, with the spread of the Internet, an increasing number of companies are aiming to increase their recognition by releasing their new product information and advertising information on the Internet instead of conventional paper media such as newspapers and magazines.
Similarly, along with the spread of the Internet, a simple diary site called a web log (hereinafter referred to as “blog”) has been opened, and an increasing number of individuals publish their daily feelings on the Internet. In the case of popular blogs, it has come to have a great influence on the improvement of product recognition and sales.

ブログに掲載された情報は顧客のナマの声を反映しており、可視化されたクチコミ情報といえるため、非特許文献１に示すように、ブログ記事を収集・分析することにより、企業のマーケティング活動にフィードバックさせるサービスが既に登場している。
オンラインバズ分析（BuzzSeeQer）インターネットURL:http://www.nifty.com/buzz/seeqer/index.htm 検索日：平成２１年１０月１６日 Since the information posted on the blog reflects the voice of the customer's raw name, it can be said that the word-of-mouth information has been visualized. As shown in Non-Patent Document 1, by collecting and analyzing blog articles, corporate marketing activities There is already a service that lets you give feedback.
Online Buzz Analysis (BuzzSeeQer) Internet URL: http://www.nifty.com/buzz/seeqer/index.htm Search Date: October 16, 2009

この非特許文献１に記載の分析サービスの場合、キャンペーンやテレビCM放送の前後に亘るブログ記事数の推移やその内容（好意的／批判的）を自動解析し、企業ユーザにレポートする機能を備えている。また、このサービスは、特定企業の商品やサービスについて記述しているブログ開設者（以下「ブロガー」）の属性を分析し、レポートする機能をも備えている。 In the case of the analysis service described in this non-patent document 1, it has a function to automatically analyze the transition of the number of blog articles before and after the campaign and the TV commercial broadcast and its contents (favorable / critical) and report to the corporate user. ing. This service also has a function of analyzing and reporting the attributes of a blog creator (hereinafter referred to as “blogger”) describing a product or service of a specific company.

このため、このサービスを利用することにより、企業ユーザは自社の広告活動や広報活動がうまく機能しているか否かを確認したり、つぎの展開を模索する上で有益な指針を得ることが可能となる。 For this reason, by using this service, corporate users can check whether their advertising and publicity activities are functioning well, and can obtain useful guidelines for exploring the next development. It becomes.

しかしながら、従来の分析サービスはあくまでも企業の広告活動や広報活動自体の適否を判定するものであり、数あるニュースサイトの中で、どこのサイトのニュース記事が最もブロガーに影響力を及ぼしているのか、どのニュースサイトに自社情報を掲載すれば意図したクチコミ情報が醸成されるのか、については回答不能であった。
企業の広報担当者あるいは広告担当者にとっては、限られた予算の範囲内で最大限の効果を上げることが義務づけられており、そのためには最適なニュースサイトの選定が極めて重要となるにもかかわらず、個々のニュース記事の影響力を定量的に計測する仕組みが存在しなかったため、単純にニュースサイトの規模や定期購読者数、ページビュー数、ブランドイメージ等に基づいて掲載サイトが選定されてきた。 However, conventional analysis services only determine the suitability of corporate advertising and publicity activities, and among the many news sites, which site's news articles have the most influence on bloggers? , It was impossible to respond to which news site the company's information would publish the intended review information.
For corporate spokespersons or advertising professionals, it is obliged to achieve the maximum effect within a limited budget, and the selection of the optimal news site is extremely important for this purpose. Since there was no mechanism for quantitatively measuring the influence of individual news articles, the posting site has been selected simply based on the size of the news site, the number of subscribers, page views, brand image, etc. It was.

この発明は、従来のこのような問題を解決するために案出されたものであり、個々のニュース記事の影響力を定量的に算出することを可能とする技術の提供を目的としている。 The present invention has been devised in order to solve such a conventional problem, and an object of the present invention is to provide a technique capable of quantitatively calculating the influence of each news article.

上記の目的を達成するため、請求項１に記載したニュース記事評価システムは、分析案件毎にキーワードを設定しておく記憶手段と、インターネット上に設置されたニュースサーバから、上記キーワードを含むニュース記事を取得し、ニュース記事記憶手段に格納する手段と、インターネット上に設置されたブログサーバから、上記キーワードを含むブログ記事を取得し、ブログ記事記憶手段に格納する手段と、各ブログ記事と対応関係にあるニュース記事を特定する対応関係判定手段と、ニュース記事毎に、対応関係にあるブログ記事の件数を集計し、このブログ記事の総数を各ニュース記事に関連付けて解析結果記憶手段に格納する影響力解析手段を備え、上記対応関係判定手段は、各ブログ記事中に設定されたリンク情報と各ニュース記事のURLとを比較し、両者が一致している場合にブログ記事とニュース記事との対応関係を認定することを特徴としている。 In order to achieve the above object, a news article evaluation system according to claim 1 includes a storage means for setting a keyword for each analysis item and a news article containing the keyword from a news server installed on the Internet. And means for storing the information in the news article storage means, the means for acquiring the blog article including the keyword from the blog server installed on the Internet, and storing it in the blog article storage means, and the correspondence relationship with each blog article Correspondence determination means for identifying news articles in the site, and the effect of counting the number of blog articles in correspondence for each news article and storing the total number of blog articles in the analysis result storage means in association with each news article Force analysis means, and the correspondence determination means includes link information set in each blog article and each news. Compared with the thing of the URL, it is characterized in that certified the correspondence between the blog articles and news articles when the two are the same.

請求項２に記載したニュース記事評価システムは、キーワードを設定しておく記憶手段と、インターネット上に設置されたニュースサーバから、上記キーワードを含むニュース記事を取得し、ニュース記事記憶手段に格納する手段と、インターネット上に設置されたブログサーバから、上記キーワードを含むブログ記事を取得し、ブログ記事記憶手段に格納する手段と、各ブログ記事と対応関係にあるニュース記事を特定する対応関係判定手段と、ニュース記事毎に、対応関係にあるブログ記事の件数を集計し、このブログ記事の総数を各ニュース記事に関連付けて解析結果記憶手段に格納する影響力解析手段を備え、上記対応関係判定手段は、各ブログ記事と各ニュース記事間の最長共通文字列数（＝引用文字数）を算出し、この最長共通文字列数が最も多く、かつ所定の閾値を超えているブログ記事及びニュース記事の組合せに対して対応関係を認定することを特徴としている。 The news article evaluation system according to claim 2 is a storage means for setting a keyword, and a means for acquiring a news article including the keyword from a news server installed on the Internet and storing it in the news article storage means. A means for acquiring a blog article including the keyword from a blog server installed on the Internet, storing the blog article in a blog article storage means, and a correspondence determining means for identifying a news article corresponding to each blog article; , For each news article, the number of blog articles in a correspondence relationship is aggregated, and the impact analysis means for associating the total number of blog articles with each news article and storing it in the analysis result storage means. , Calculate the longest common character string number (= quoted character number) between each blog article and each news article, and this longest common character string Is characterized in that string number is approved the most, and the corresponding relation between the combination of posts and news articles exceeds a predetermined threshold.

請求項３に記載したニュース記事評価システムは、キーワードを設定しておく記憶手段と、インターネット上に設置されたニュースサーバから、上記キーワードを含むニュース記事を取得し、ニュース記事記憶手段に格納する手段と、インターネット上に設置されたブログサーバから、上記キーワードを含むブログ記事を取得し、ブログ記事記憶手段に格納する手段と、各ブログ記事と対応関係にあるニュース記事を特定する対応関係判定手段と、ニュース記事毎に、対応関係にあるブログ記事の件数を集計し、このブログ記事の総数を各ニュース記事に関連付けて解析結果記憶手段に格納する影響力解析手段を備え、上記対応関係判定手段は、各ブログ記事と各ニュース記事間の類似度を算出し、この類似度が最も高いブログ記事及びニュース記事の組合せに対して対応関係を認定することを特徴としている。 The news article evaluation system according to claim 3 is a storage means for setting a keyword, and a means for acquiring a news article including the keyword from a news server installed on the Internet and storing it in the news article storage means. A means for acquiring a blog article including the keyword from a blog server installed on the Internet, storing the blog article in a blog article storage means, and a correspondence determining means for identifying a news article corresponding to each blog article; , For each news article, the number of blog articles in a correspondence relationship is aggregated, and the impact analysis means for associating the total number of blog articles with each news article and storing it in the analysis result storage means. The similarity between each blog article and each news article is calculated, and the blog article and news item with the highest similarity are calculated. It is characterized in that to certify the correspondence relationship with respect to the combination of the scan articles.

請求項４に記載したニュース記事評価システムは、請求項３に記載のシステムであって、さらに上記対応関係判定手段が、各ニュース記事及びブログ記事を形態素単位に分解し、所定の品詞に係る形態素を各記事から抽出する処理と、抽出された各形態素のTF-IDF値を算出する処理と、この各形態素のTF-IDF値に基づいて各記事をベクトル化する処理と、各ニュース記事のベクトルと各ブログ記事のベクトル間の内積を求める処理と、この内積が所定の閾値に最も近いニュース記事とブログ記事との組合せに対して対応関係を認定する処理を実行することを特徴としている。 The news article evaluation system according to claim 4 is the system according to claim 3, wherein the correspondence determination means further decomposes each news article and blog article into morpheme units, and relates to a morpheme related to a predetermined part of speech. Of each article, a process of calculating the TF-IDF value of each extracted morpheme, a process of vectorizing each article based on the TF-IDF value of each morpheme, and a vector of each news article And a process for obtaining an inner product between vectors of each blog article, and a process for certifying a correspondence relationship with a combination of a news article and a blog article whose inner product is closest to a predetermined threshold value.

請求項５に記載したニュース記事評価システムは、請求項１〜４に記載のシステムであって、さらに、分析案件毎に少なくとも一つのサブキーワードを設定しておく記憶手段を備え、上記影響力解析手段が、各ニュース記事に対応付けられたブログ記事の中で、上記サブキーワードを含むものの件数を集計し、このサブキーワード毎のブログ記事の総数を各ニュース記事に関連付けて解析結果記憶手段に格納する処理を実行することを特徴としている。 The news article evaluation system according to claim 5 is the system according to claims 1 to 4, further comprising storage means for setting at least one sub-keyword for each analysis item, and the influence analysis The means counts the number of blog articles associated with each news article that include the above sub-keywords, and stores the total number of blog articles for each sub-keyword in association with each news article in the analysis result storage means. It is characterized by executing the processing.

請求項６に記載したニュース記事評価システムは、請求項１〜５に記載のシステムであって、さらに、ブログ記事の内容が肯定的か否定的かを判定するために用いる複数の評価語と、各評価語の有する肯定的意味合いの強弱あるいは否定的意味合いの強弱に応じて設定されたポイントとの組合せを格納しておく評価語辞書と、各ブログ記事を形態素単位に分解し、所定の品詞を備えた形態素を抽出する手段と、各形態素と上記評価語とを比較し、評価語と一致する形態素に対して該当のポイントを付与する手段と、ブログ記事単位でポイントを集計し、当該集計ポイントが設定値以上の場合に当該ブログ記事の内容を肯定的と認定し、設定値未満の場合には否定的と認定する手段とを備え、上記影響力解析手段は、上記ブログ記事の総数の中で、肯定的と認定されたブログ記事の数と、否定的と認定されたブログ記事の数を集計し、上記解析結果記憶手段に格納する処理を実行することを特徴としている。 The news article evaluation system according to claim 6 is the system according to claims 1 to 5, and further includes a plurality of evaluation words used to determine whether the content of the blog article is positive or negative; An evaluation word dictionary that stores a combination of points set according to the strength of the positive meaning or negative meaning of each evaluation word, and each blog article is decomposed into morpheme units, and predetermined parts of speech are A means for extracting the provided morpheme, a means for comparing each morpheme with the above evaluation word, giving a corresponding point to the morpheme that matches the evaluation word, and aggregating the points for each blog article A means for certifying the content of the blog article as positive when the value is greater than or equal to a set value, and a means for determining as negative when the value is less than the set value. , The number of qualified posts affirmative, aggregates the number of authorized posts negative, is characterized by performing a process of storing in the analysis result storage unit.

請求項１に記載したニュース記事評価システムによれば、所定のキーワードを含み、したがって同一事象について報道する各ニュース記事毎に、それぞれのニュース記事にリンクを張っているブログ記事の件数が算出されるため、このブログ記事の総数を相互に比較することにより、最も影響力の大きいニュース記事を特定することが可能となる。 According to the news article evaluation system described in claim 1, for each news article that includes a predetermined keyword and reports on the same event, the number of blog articles linked to each news article is calculated. Therefore, by comparing the total number of blog articles with each other, it becomes possible to identify the news article having the greatest influence.

請求項２に記載したニュース記事評価システムによれば、所定のキーワードを含み、したがって同一事象について報道する各ニュース記事毎に、それぞれのニュース記事を引用しているブログ記事の件数が算出されるため、このブログ記事の総数を相互に比較することにより、最も影響力の大きいニュース記事を特定することが可能となる。 According to the news article evaluation system described in claim 2, the number of blog articles quoting each news article is calculated for each news article that includes a predetermined keyword and reports on the same event. By comparing the total number of blog articles with each other, it becomes possible to identify the news articles with the greatest influence.

請求項３及び４に記載したニュース記事評価システムによれば、所定のキーワードを含み、したがって同一事象について報道する各ニュース記事毎に、それぞれのニュース記事と内容において類似しているブログ記事の件数が算出されるため、このブログ記事の総数を相互に比較することにより、最も影響力の大きいニュース記事を特定することが可能となる。 According to the news article evaluation system described in claims 3 and 4, for each news article that includes a predetermined keyword and reports on the same event, the number of blog articles that are similar in content to each news article. Since it is calculated, it is possible to identify the news article having the greatest influence by comparing the total number of the blog articles with each other.

請求項５に記載したニュース記事評価システムにあっては、各ニュース記事に対応付けられたブログ記事の中で、所定のサブキーワードを含むブログ記事の件数を集計する機能を備えているため、このサブキーワードに関する各ニュース記事の記述内容の巧拙を評価することが可能となる。 The news article evaluation system according to claim 5 has a function of counting the number of blog articles including a predetermined subkeyword among the blog articles associated with each news article. It is possible to evaluate the skill of the description content of each news article regarding the sub-keyword.

請求項６に記載したニュース記事評価システムによれば、各ニュース記事に対応付けられたブログ記事の中で、肯定的な内容のものと否定的な内容のものとの構成比率がわかるため、これに基づいて各ニュース記事の記述内容の巧拙を推し量ることが可能となる。 According to the news article evaluation system described in claim 6, since the composition ratio between the positive contents and the negative contents among the blog articles associated with each news article is known, Based on this, it becomes possible to estimate the skill of the description contents of each news article.

図１は、この発明に係るニュース記事評価システム10の全体構成を示すブロック図であり、記事収集部12と、ニュース記事記憶部14と、ブログ記事記憶部16と、対応関係判定部18と、影響力解析部20と、評価語辞書21と、解析結果記憶部22と、ユーザ設定記憶部24を備えている。 FIG. 1 is a block diagram showing the overall configuration of a news article evaluation system 10 according to the present invention. An article collection unit 12, a news article storage unit 14, a blog article storage unit 16, a correspondence determination unit 18, An influence analysis unit 20, an evaluation word dictionary 21, an analysis result storage unit 22, and a user setting storage unit 24 are provided.

上記の記事収集部12、対応関係判定部18及び影響力解析部20は、コンピュータのCPUが、OS及びアプリケーションプログラムに従って必要な処理を実行することによって実現される。
また、上記のニュース記事記憶部14、ブログ記事記憶部16、評価語辞書21、解析結果記憶部22及びユーザ設定記憶部24は、同コンピュータのハードディスク内に設けられている。 The article collection unit 12, the correspondence determination unit 18, and the influence analysis unit 20 are realized by the CPU of the computer executing necessary processes according to the OS and application programs.
The news article storage unit 14, the blog article storage unit 16, the evaluation word dictionary 21, the analysis result storage unit 22, and the user setting storage unit 24 are provided in the hard disk of the computer.

上記記事収集部12は、インターネット26を介して、複数のブログサーバ28及び複数のニュースサーバ30と接続されている。
ブログサーバ28は、インターネットを介して接続されたクライアント端末32に対して、ブログ記事の投稿受付機能及びブログ記事の公開機能を提供するWebサーバである。
また、ニュースサーバ30は、インターネットを介して接続されたクライアント端末32に対して、ニュース記事の公開機能を提供するWebサーバである。 The article collection unit 12 is connected to a plurality of blog servers 28 and a plurality of news servers 30 via the Internet 26.
The blog server 28 is a Web server that provides a blog article posting reception function and a blog article publishing function to the client terminal 32 connected via the Internet.
The news server 30 is a Web server that provides a news article publishing function to a client terminal 32 connected via the Internet.

上記解析結果記憶部22及びユーザ設定記憶部24には、通信ネットワークを介してWebサーバ34が接続されている。
このWebサーバ34は、インターネット等の通信ネットワークで接続された複数のクライアント端末36に対して、ニュース記事の評価サービスを提供する機能を備えている。 A web server 34 is connected to the analysis result storage unit 22 and the user setting storage unit 24 via a communication network.
The Web server 34 has a function of providing a news article evaluation service to a plurality of client terminals 36 connected via a communication network such as the Internet.

このニュース記事評価システム10の場合、主として広告企画会社や広告代理店が、自社が関与した広告対象製品（サービスを含む）について、どのニュースサイトに掲載された記事がブロガーに対して大きな影響を及ぼしたのかを検証する目的に利用される。もちろん、メーカー自身が自社製品に関して同様の検証を行う目的にも有効に活用できる。 In the case of this news article evaluation system 10, articles posted on which news sites have a major impact on bloggers, mainly about advertising products (including services) that the advertising planning company or advertising agency is involved in. It is used for the purpose of verifying whether or not Of course, it can also be used effectively for the purpose of the manufacturer performing the same verification for its own products.

まず、このニュース記事評価システム10の利用者である広告企画会社等（以下「ユーザ企業」）の担当者は、クライアント端末36からWebサーバ34内の専用サイトにアクセスし、自己のアカウント及びパスワードを入力してログインする。
図２は、クライアント端末36のWebブラウザ上に表示された分析案件一覧画面40を示しており、当該ユーザ企業が分析対象として登録している分析案件がリスト表示されている。
この分析案件リスト42には、案件名、実施タイプ、実施間隔、初回実施日、最終実施日、設定内容、結果の表示項目が設定されている。 First, a person in charge of the news article evaluation system 10 such as an advertising planning company (hereinafter referred to as “user company”) accesses a dedicated site in the Web server 34 from the client terminal 36 and enters his / her account and password. Enter and login.
FIG. 2 shows an analysis item list screen 40 displayed on the Web browser of the client terminal 36, and a list of analysis items registered as analysis targets by the user company is displayed.
In this analysis item list 42, item name, execution type, execution interval, first execution date, last execution date, setting contents, and result display items are set.

ここで担当者が「案件追加」のボタン44をクリックすると、図３に示すように、分析案件追加画面46が表示される。
これに対し担当者は、まず案件名設定欄48、キーワード指定欄49、サブキーワード指定欄50に対して、必要な入力を行う。図においては、案件名とキーワードに同じ「ePhone 3GS」が設定されているが、両者を異ならせることも当然に可能である。 Here, when the person in charge clicks the “add case” button 44, an analysis case addition screen 46 is displayed as shown in FIG.
On the other hand, the person in charge first makes necessary inputs to the case name setting column 48, the keyword specifying column 49, and the sub-keyword specifying column 50. In the figure, the same “ePhone 3GS” is set for the project name and the keyword, but it is naturally possible to make the two different.

詳細は後述するが、ここで指定したキーワードに基づいて記事収集部12は各ニュースサイト及びブログサイトからニュース記事及びブログ記事を抽出し、ニュース記事記憶部14及びブログ記事記憶部16にそれぞれ格納することとなる。これに対しサブキーワードは、キーワードに基づいて抽出された各ブログ記事中に、当該サブキーワードが含まれているか否かを検証するために利用される。
デフォルトでは３つのサブキーワード指定欄50が設けられているが、担当者は「追加」ボタン51をクリックすることにより、さらに多くのサブキーワードを設定することができる。 Although details will be described later, based on the keyword specified here, the article collection unit 12 extracts news articles and blog articles from each news site and blog site, and stores them in the news article storage unit 14 and the blog article storage unit 16, respectively. It will be. On the other hand, the sub-keyword is used to verify whether or not the sub-keyword is included in each blog article extracted based on the keyword.
By default, three sub-keyword designation fields 50 are provided, but the person in charge can set more sub-keywords by clicking the “add” button 51.

つぎに担当者は、基準日指定欄52において、分析対象となるニュース記事及びブログ記事をフィルタリングするための基準日（年月日）を設定する。
デフォルトでは現在の日付が設定されているため、変更の必要がある場合のみ、任意の日付を担当者は選択入力する。 Next, the person in charge sets a reference date (year / month / day) for filtering news articles and blog articles to be analyzed in the reference date designation field 52.
Since the current date is set by default, the person in charge selects and inputs an arbitrary date only when it needs to be changed.

つぎに担当者は、ニュースサイト指定欄54において、記事の収集先となるニュースサイトの設定を行う。
まず既定のニュースサイトのチェックボックス55にチェックを入れると、システムの側で予め用意した複数のニュースサイトを包括的に指定可能となる。ここで担当者が「確認」ボタン56を押下すると、既定のニュースサイトの一覧画面が表示される（図示省略）。ここに列挙された各ニュースサイトのチェックを外すことにより、担当者は不要と考えるニュースサイトを収集先から除外することができる。 Next, the person in charge sets a news site as a collection destination of articles in the news site designation field 54.
First, when a check box 55 of a default news site is checked, a plurality of news sites prepared in advance on the system side can be comprehensively designated. Here, when the person in charge presses the “confirm” button 56, a list screen of default news sites is displayed (not shown). By unchecking the news sites listed here, the person in charge can exclude news sites that are considered unnecessary from the collection destination.

また、既定のニュースサイトに漏れがある場合、担当者は「特定ニュースサイトの登録」ボタン57を押下する。この結果、特定ニュースサイトの指定画面が表示されるため（図示省略）、担当者は当該ニュースサイトの名称及びURLを入力し、登録ボタンを押下する。これにより、当該案件に関して特定ニュースサイトが記事の収集先として追加される。 If there is a leak in the default news site, the person in charge presses the “Register specific news site” button 57. As a result, since a screen for specifying a specific news site is displayed (not shown), the person in charge inputs the name and URL of the news site and presses the registration button. As a result, the specific news site is added as a collection destination of articles regarding the case.

さらに担当者は、「特定記事の追加」ボタン58を押下して特定記事の指定画面を表示させ（図示省略）、当該ニュース記事のURLを入力することにより、当該案件に関して特定ニュース記事を収集対象に含めることができる。 In addition, the person in charge presses the “Add Specific Article” button 58 to display the screen for specifying a specific article (not shown), and inputs the URL of the news article to collect the specific news article regarding the matter. Can be included.

つぎに担当者は、実施タイプ指定欄60において、「一時実施」及び「定期実施」の何れかのラジオボタンにチェックを入れ、実施タイプを二者択一的に選択する。
ここで「定期実施」を選択した担当者は、その実施間隔についても設定を行う。例えば、基本間隔設定欄61において「週」を選択すると共に、詳細間隔設定欄62において「１週ごと」及び「月曜日」を選択することが該当する。 Next, the person in charge checks one of the radio buttons of “temporary execution” and “periodic execution” in the execution type designation field 60 and selects the execution type alternatively.
Here, the person in charge who selects “Periodic execution” also sets the execution interval. For example, “week” is selected in the basic interval setting field 61, and “every week” and “Monday” are selected in the detailed interval setting field 62.

図示は省略したが、基本間隔として「日」を選択した場合には、何日ごとに実施するのかを指定するための詳細間隔設定欄が再表示される。また、基本間隔として「月」を選択した場合には、何ヶ月ごとに実施するのか、及び毎月何日に実施するのかを指定するための詳細間隔設定欄が再表示される。 Although illustration is omitted, when “day” is selected as the basic interval, a detailed interval setting field for designating every number of days is displayed again. In addition, when “month” is selected as the basic interval, the detailed interval setting column for designating every month and how many days every month is displayed again.

分析案件追加画面46において必要事項の入力を済ませた担当者が「登録」ボタン63をクリックすると、入力データがクライアント端末36からWebサーバ34に送信される。Webサーバ34は、この入力データをユーザ設定記憶部24に格納する。
担当者は、図２の分析案件一覧画面40において、「設定内容」項目の「確認・変更」ボタン64をクリックすることにより、一旦設定した内容を自由に変更することができる。 When the person in charge who has entered the necessary items on the analysis item addition screen 46 clicks the “Register” button 63, the input data is transmitted from the client terminal 36 to the Web server 34. The Web server 34 stores the input data in the user setting storage unit 24.
The person in charge can freely change the contents once set by clicking the “confirmation / change” button 64 of the “setting contents” item in the analysis item list screen 40 of FIG.

以後、担当者が設定した内容に従い、ニュース記事評価システム10によって自動的にニュース記事の収集処理、ブログ記事の収集処理、各ブログ記事とニュース記事との対応付け処理、各ニュース記事の影響力解析処理が実行され、解析結果記憶部22に解析結果が蓄積される。
これに対し担当者は、クライアント端末36からWebサーバ34にアクセスし、解析結果を随時参照可能となる。
具体的には、図２の分析案件一覧画面40において、「結果」項目の「表示」ボタン65をクリックすると、Webサーバ34から分析結果一覧画面がクライアント端末36に送信される。 Thereafter, according to the content set by the person in charge, the news article evaluation system 10 automatically collects news articles, collects blog articles, associates each blog article with a news article, and analyzes the influence of each news article. The process is executed, and the analysis result is accumulated in the analysis result storage unit 22.
On the other hand, the person in charge can access the Web server 34 from the client terminal 36 and refer to the analysis result as needed.
Specifically, when the “display” button 65 of the “result” item is clicked on the analysis item list screen 40 in FIG. 2, the analysis result list screen is transmitted from the Web server 34 to the client terminal 36.

図４は、クライアント端末36のWebブラウザ上に表示された分析結果一覧画面68を例示するものであり、ニュース記事毎に当該ニュース記事の影響を受けたと推定されるブログの総数等が列記されている。
例えば、NO.１の「Responde」のニュースサイトに掲載された「ePhone 3GS」に関する記事の場合、掲載日が2009年１月25日であり、関連ブログの総数が120件に上り、その中で記事の内容がポジティブ（肯定的）なものが95件、ネガティブ（否定的）なものが25件で、ポジティブの占める率が79％であったことが示されている。
また、サブキーワードとして設定された「動画撮影」の文字列が記載された関連ブログの総数が115件に上り、その中で記事の内容がポジティブなものが92件、ネガティブなものが23件で、ポジティブの占める率が80％であったことが示されている。
さらに、サブキーワードとして設定された「7.2Mbps」の文字列が記載された関連ブログの総数が40件に上り、その中で記事の内容がポジティブなものが35件、ネガティブなものが５件で、ポジティブの占める率が88％であったことが示されている。 FIG. 4 illustrates an analysis result list screen 68 displayed on the web browser of the client terminal 36, and lists the total number of blogs estimated to be affected by the news article for each news article. Yes.
For example, in the case of an article about “ePhone 3GS” posted on the No. 1 “Responde” news site, the publication date was January 25, 2009, and the total number of related blogs reached 120. It is shown that 95 articles were positive (positive) and 25 were negative (negative), with 79% of positives.
In addition, the total number of related blogs with the sub-keyword “video shooting” text string is 115, of which 92 are positive and 23 are negative. It is shown that the percentage of positives was 80%.
In addition, the total number of related blogs with the "7.2Mbps" character string set as a sub-keyword is 40, of which 35 are positive and 5 are negative. It was shown that the percentage of positives was 88%.

同様に、NO.２の「BNET Japan」のニュースサイトに掲載された記事の場合、掲載日が2009年１月25日であり、関連ブログの総数が90件に上り、その中で記事の内容がポジティブなものが70件、ネガティブなものが20件で、ポジティブの占める率が78％であったことが示されている。
また、サブキーワードとして設定された「動画撮影」の文字列が記載された関連ブログの総数が88件に上り、その中で記事の内容がポジティブなものが67件、ネガティブなものが21件で、ポジティブの占める率が77％であったことが示されている。
さらに、サブキーワードとして設定された「7.2Mbps」の文字列が記載された関連ブログの総数が70件に上り、その中で記事の内容がポジティブなものが63件、ネガティブなものが７件で、ポジティブの占める率が90％であったことが示されている。 Similarly, in the case of an article published on the No. 2 “BNET Japan” news site, the date of publication was January 25, 2009, and the total number of related blogs reached 90, of which the contents of the article There are 70 positive cases and 20 negative cases, and the percentage of positives is 78%.
In addition, the total number of related blogs with the sub-keyword “video shooting” text string is 88, of which 67 are positive and 21 are negative. It is shown that the percentage of positives was 77%.
In addition, the total number of related blogs with the "7.2Mbps" character string set as a sub-keyword is 70, of which 63 are positive and 7 are negative. It is shown that the percentage of positives was 90%.

この分析結果一覧画面68を検討することにより、担当者は多くの知見を得ることができる。
単純なところでは、上位にランキングされたRespondeやBNETJapanのニュースサイトに広告記事を出稿すれば、次回も世間の大きな注目を浴びる可能性が高いことを認識できる。
さらに細かい部分に目を転じれば、Respondeの記事では「動画撮影」のように一般受けするテーマについては反響が大きい（115／120件）が、「7.2Mbps」のように比較的マニアックなテーマに関しては反響が小さい（40／120件）ことが読み取れる。これに対しBNETJapanの記事の場合、「7.2Mbps」のサブキーワードに関して相対的に大きな反響を得ており（70／90件）、Respondeの読者層よりもマニアックな読者が多いのではないか、あるいはRespondeよりも技術寄りの記者が多いのではないか、という仮説が成り立つ。 By examining the analysis result list screen 68, the person in charge can obtain a lot of knowledge.
In a simple place, you can recognize that if you publish an advertising article on the top ranked Responde or BNETJapan news sites, it will likely attract a lot of attention the next time.
Turning to the more detailed parts, Responde's article has a large response (115/120) for themes that are commonly received, such as “video shooting”, but relatively maniac themes such as “7.2 Mbps” It can be seen that the response is small (40/120). On the other hand, in the case of BNETJapan's article, there is a relatively great response to the “7.2Mbps” sub-keyword (70/90 cases), and there may be more readers who are more enthusiastic than Responde's readers. The hypothesis is that there may be more reporters who are more technical than Responde.

以下、図５のフローチャートに従い、このシステム10における処理手順を説明する。
まず一定間隔で（例えば１日１回）、記事収集部12はユーザ設定記憶部24に格納された各分析案件の実施間隔をチェックし（Ｓ10）、実施のタイミングが到来した分析対象案件が存在する場合には（Ｓ12）、当該案件に設定されたキーワードを読み込む（Ｓ14）。 Hereinafter, the processing procedure in the system 10 will be described with reference to the flowchart of FIG.
First, at regular intervals (for example, once a day), the article collection unit 12 checks the execution interval of each analysis item stored in the user setting storage unit 24 (S10), and there is an analysis target item whose execution timing has arrived. If so (S12), the keyword set for the case is read (S14).

つぎに記事収集部12は、設定されたニュースサイトにアクセスし、サイト内に設置された検索窓にキーワード（例えば「ePhone 3GS」）を投入することにより、必要なニュース記事を検索する（Ｓ16）。
つぎに記事収集部12は、当該ニュースサイトから取得したニュース記事の中で、設定された基準日以降の日付を有するものを抽出し、ニュース記事記憶部14に格納する（Ｓ17）。 Next, the article collection unit 12 accesses a set news site and searches for a necessary news article by inputting a keyword (for example, “ePhone 3GS”) in a search window installed in the site (S16). .
Next, the article collection unit 12 extracts news articles acquired from the news site that have dates after the set reference date, and stores them in the news article storage unit 14 (S17).

つぎに記事収集部12は、所定のブログサイトにアクセスし、サイト内に設置された検索窓に同キーワードを投入することにより、必要なブログ記事を検索する（Ｓ18）。
つぎに記事収集部12は、当該ブログサイトから取得したブログ記事の中で、設定された基準日以降の日付を有するものを抽出し、ブログ記事記憶部16に格納する（Ｓ19）。 Next, the article collection unit 12 searches a necessary blog article by accessing a predetermined blog site and inputting the same keyword into a search window installed in the site (S18).
Next, the article collection unit 12 extracts a blog article acquired from the blog site that has a date after the set reference date and stores it in the blog article storage unit 16 (S19).

なお、ニュース記事の収集処理（Ｓ16及びＳ17）と、ブログ記事の収集処理（Ｓ18及びＳ19）は順不同であり、ブログ記事の収集処理を先に実行してもよいし、両者を同時に実行してもよい。
また、上記のように複数のニュースサイトやブログサイトに個別にアクセスしてニュース記事やブログ記事を取得する代わりに、Google（登録商標）やYahoo!（登録商標）等の検索サイト内に設けられた検索窓にキーワードを投入し、取得した検索結果リストの中から必要なニュースサイトやブログサイトのURLを含む記事をまとめて抽出してもよい。 The news article collection process (S16 and S17) and the blog article collection process (S18 and S19) are in no particular order, and the blog article collection process may be executed first, or both may be executed simultaneously. Also good.
Also, instead of accessing multiple news sites and blog sites individually and acquiring news articles and blog articles as described above, it is provided in search sites such as Google (registered trademark) and Yahoo! (registered trademark). A keyword may be input to the search window, and articles including URLs of necessary news sites and blog sites may be collectively extracted from the obtained search result list.

つぎに、対応関係判定部18が起動し、収集したニュース記事毎に、当該ニュース記事に基づいて記述された対応ブログ記事を特定する（Ｓ20）。
以下、図６のフローチャートに従い、ニュース記事とブログ記事との対応付けに係る処理手順を説明する。 Next, the correspondence determination unit 18 is activated, and for each collected news article, the corresponding blog article described based on the news article is specified (S20).
Hereinafter, a processing procedure related to associating a news article with a blog article will be described according to the flowchart of FIG.

まず対応関係判定部18は、ブログ記事記憶部16内に格納された各ブログ記事について、記事中にリンク情報が含まれているか否かをチェックし（Ｓ20-01）、リンク情報が含まれている場合には（Ｓ20-02）、ニュース記事記憶部14内に格納された各ニュース記事のURLとリンク情報とを比較する（Ｓ20-03）。そして、リンク情報と一致するURLを備えたニュース記事については、当該ブログ記事との間に「リンク関係あり」と認定する（Ｓ20-04）。 First, the correspondence determination unit 18 checks whether or not link information is included in the article for each blog article stored in the blog article storage unit 16 (S20-01), and the link information is included. If it exists (S20-02), the URL of each news article stored in the news article storage unit 14 is compared with the link information (S20-03). Then, a news article having a URL that matches the link information is certified as “link relation” with the blog article (S20-04).

図７はこの具体例を示すものであり、ブログ記事中の「詳しくはこちら」のボタンに設定されたリンク情報と、ニュース記事のURLが一致しているため、両者間に「リンク関係あり」の対応関係が認定されている。
この「リンク関係あり」の対応関係は排他的なものではなく、あるブログ記事中に複数のニュース記事のリンク情報が設定されていた場合には、複数のニュース記事との間で「リンク関係あり」と認定される。 FIG. 7 shows this specific example. Since the link information set in the “Click here for details” button in the blog article matches the URL of the news article, there is a “link relationship” between the two. Has been certified.
The correspondence relationship of “with link relationship” is not exclusive, and when link information of multiple news articles is set in a blog article, there is a “link relationship with multiple news articles”. "

つぎに対応関係判定部18は、各ニュース記事と各ブログ記事とを、LCS（Longest Common Subsequence）の解法を用いて比較し（Ｓ20-05）、最長共通文字列数（＝引用文字数）が所定の閾値以上（例えば20文字以上）のブログ記事とニュース記事との組合せを引用関係候補と認定する（Ｓ20-06）。
そして、一つのブログ記事に対して複数のニュース記事が引用関係候補と認定された場合には、最も引用文字数が多いニュース記事との間で「引用関係あり」と認定される（Ｓ20-07）。
図８はこの具体例を示すものであり、ニュース記事中の一部の文字列が、ほぼそのままの形でブログ記事中に埋め込まれているため、両者間に引用関係が認定されている。 Next, the correspondence determination unit 18 compares each news article with each blog article using the LCS (Longest Common Subsequence) solution (S20-05), and the longest common character string number (= quoted character number) is predetermined. A combination of a blog article and a news article that is equal to or greater than the threshold (for example, 20 characters or more) is recognized as a citation candidate (S20-06).
When a plurality of news articles are recognized as citation relation candidates for one blog article, it is recognized as “quoting relation” with a news article having the largest number of quotation characters (S20-07). .
FIG. 8 shows a specific example of this. Since some character strings in the news article are embedded in the blog article almost as they are, the citation relationship is recognized between them.

LCSの解法自体は公知技術であるが、図９に基づきその基本原理を説明する。
まず、対応関係判定部18は与えられた文章を形態素単位に分解し、特定の品詞（例えば名詞、動詞、形容詞）に係る形態素を抽出した後、形態素毎にユニークなIDを割り振る。 The LCS solution itself is a known technique, but its basic principle will be described with reference to FIG.
First, the correspondence determination unit 18 decomposes a given sentence into morpheme units, extracts morphemes related to specific parts of speech (for example, nouns, verbs, adjectives), and then assigns a unique ID to each morpheme.

例えば、同図(a)の「今日はいい天気だ。だから今日は野球をするよ。」という文章からは、「今日」、「いい」、「天気」、「今日」、「野球」、「する」の形態素が取り出され、「今日」→(1)、「いい」→(2)、「天気」→(3)、「野球」→(4)、「する」→(5)というように、(1)〜(5)のIDが付与される。
また、同図(b)の「今日はいい天気です。今日はサッカーをします。」という文章からは、「今日」、「いい」、「天気」、「今日」、「サッカー」、「する（『します』の原形）」の形態素が取り出され、「今日」→(1)、「いい」→(2)、「天気」→(3)、「サッカー」→(6)、「する」→(5)というように、(1)〜(3)、(5)、(6)のIDが付与される。 For example, from the sentence "Today is good weather. So today I will play baseball" in the figure (a), "Today", "Good", "Weather", "Today", "Baseball", ""Todo" morpheme is taken out, "Today" → (1), "Good" → (2), "Weather" → (3), "Baseball" → (4), "Yes" → (5) , (1) to (5) are given IDs.
Also, from the sentence “Today is good weather. I will play soccer today” in the figure (b), “Today”, “Good”, “Weather”, “Today”, “Soccer”, “Yes” (The original form of “Shima”) is taken out, “Today” → (1), “Good” → (2), “Weather” → (3), “Soccer” → (6), “Yes” → As shown in (5), IDs (1) to (3), (5), and (6) are assigned.

つぎに対応関係判定部18は、(a)(b)両文章のIDの並びを比較し、両者間で連続的に一致する(1)(2)(3)(1)を最長共通文字列と認定する。この場合、最長共通文字列数は「４」となる。
このように、文字列同士を直接比較する代わりに、共通の形態素単位でユニークなIDを割り振ることにより、処理の高速化を図ることができる。
また、両文章中から特定の品詞を備えた文字列のみを抽出して比較することにより、多少の表現の違い（言い回しの変更）を吸収することが可能となる。 Next, the correspondence determination unit 18 compares (a) and (b) the sequence of IDs of both sentences, and continuously matches (1), (2), (3), and (1) with the longest common character string. Certify. In this case, the longest common character string number is “4”.
As described above, instead of directly comparing character strings, a unique ID is allocated in common morpheme units, so that the processing speed can be increased.
In addition, by extracting and comparing only character strings having specific parts of speech from both sentences, it becomes possible to absorb a slight difference in expression (phrase change).

つぎに対応関係判定部18は、TF-IDF及びベクトル空間法を用いて、各ニュース記事と各ブログ記事間の類似度を算出する（Ｓ20-08）。
以下、図１０のフローチャート及び図１１、図１２の説明図に従い、この類似度算出に係る処理手順を説明する。 Next, the correspondence determination unit 18 calculates the similarity between each news article and each blog article using TF-IDF and the vector space method (S20-08).
Hereinafter, the processing procedure for calculating the similarity will be described with reference to the flowchart of FIG. 10 and the explanatory diagrams of FIGS. 11 and 12.

まず対応関係判定部18は、各ニュース記事及びブログ記事に対して形態素解析を施し（Ｓ20-08-01）、各記事から特定品詞（例えば名詞）を抽出する（Ｓ20-08-02）。
図１１の例では、文書Ａ（ブログ記事）の「今日が締め切りだ。今日も徹夜かな。」から「今日／締め切り／今日／徹夜」の用語が、文書Ｂ（ブログ記事）の「今日も煮干しだ。飽き飽きだ。」から「今日／煮干し」の用語が、文書Ｃ（ニュース記事）の「今日は天気がよい。野球をしよう。」から「今日／天気／野球」の用語が、文書Ｄ（ニュース記事）の「天気がよい。サッカーをしよう。」から「天気／サッカー」の用語がそれぞれ取り出されている。 First, the correspondence determination unit 18 performs morphological analysis on each news article and blog article (S20-08-01), and extracts a specific part of speech (for example, a noun) from each article (S20-08-02).
In the example of FIG. 11, the term “Today is the deadline. Today is the deadline.” To “Today / deadline / Today / all night” in document A (blog article) From the word “Today / Weather / Baseball” from Document C (News Article) “Today is good weather. Let's play baseball.” In the document D (news article), the term “weather / soccer” is extracted from “weather is good. Let's play soccer”.

つぎに対応関係判定部18は、各記事における各用語の頻度（TF／Term Frequency）を算出する（Ｓ20-08-03）。例えば、文書Ａにおける「今日」の頻度は「２」となる。 Next, the correspondence determination unit 18 calculates the frequency (TF / Term Frequency) of each term in each article (S20-08-03). For example, the frequency of “today” in the document A is “2”.

つぎに対応関係判定部18は、用語毎に当該用語を含む記事数（DF／Document Frequency）を算出し（Ｓ20-08-04）、DF辞書70に格納する（Ｓ20-08-05）。例えば、文書Ａ〜Ｄにおける「今日」を含む記事の数は「３」となる。 Next, the correspondence determination unit 18 calculates the number of articles (DF / Document Frequency) including the term for each term (S20-08-04) and stores it in the DF dictionary 70 (S20-08-05). For example, the number of articles including “today” in the documents A to D is “3”.

つぎに対応関係判定部18は、このDF辞書70に基づいて各文書をベクトル化する。
例えば、文書Ａの場合はDF辞書70に収録された用語の中、「今日」「締め切り」「徹夜」の３種類の用語を含んでいるため、対応関係判定部18はこれらの用語のDFに基づいて、IDF（Inverse Document Frequency）及びTF-IDFを求める。 Next, the correspondence determination unit 18 vectorizes each document based on the DF dictionary 70.
For example, in the case of document A, since the terms included in the DF dictionary 70 include three types of terms “today”, “deadline”, and “all night”, the correspondence determination unit 18 includes the DFs of these terms. Based on this, IDF (Inverse Document Frequency) and TF-IDF are obtained.

まず対応関係判定部18は、以下のようにして各用語のIDFを算出する（Ｓ20-08-06）。
IDF（今日）＝log（文書数／DF）
＝log（４／３） First, the correspondence determination unit 18 calculates the IDF of each term as follows (S20-08-06).
IDF (today) = log (number of documents / DF)
= Log (4/3)

つぎに対応関係判定部18は、以下のようにして各用語のTF-IDFを算出する（Ｓ20-08-07）。
TF-IDF（今日）＝TF（今日）×IDF（今日）
＝２×log（４／３）＝0.25
同様の処理により、対応関係判定部18は「締め切り」のTF-IDF＝0.5、「徹夜」のTF-IDF＝0.5を算出する。 Next, the correspondence determination unit 18 calculates the TF-IDF of each term as follows (S20-08-07).
TF-IDF (today) = TF (today) x IDF (today)
= 2 x log (4/3) = 0.25
By similar processing, the correspondence determination unit 18 calculates TF-IDF = 0.5 for “deadline” and TF-IDF = 0.5 for “all night”.

ここで、文書Ａに含まれる「今日」「締め切り」「徹夜」の３種類の用語はDF辞書70における掲載順が１〜３番であるため、図１２に示すように、ベクトル要素として１〜３行までに0.33、0.62、0.43の数値が代入され、他の用語の掲載順に対応する行には0.0が代入されたベクトルが対応関係判定部18によって生成され、文書Ａのベクトルとなされる（Ｓ20-08-08）。 Here, since the three types of terms “Today”, “Deadline”, and “Tonight” included in the document A are listed in the DF dictionary 70 in the order 1 to 3, as shown in FIG. Numeric values of 0.33, 0.62, and 0.43 are substituted up to three lines, and a vector in which 0.0 is substituted into the lines corresponding to the order in which other terms are listed is generated by the correspondence determination unit 18 and is used as the vector of document A ( S20-08-08).

なお、「今日」のTF-IDFは0.25であり、文書Ａにおける「今日」の頻度は「２」であるが、ベクトル長を１に揃えるための正規化を施された結果、トータルで0.33という数値が導かれている。同様に、「締め切り」のTF-IDF：0.5及び「徹夜」のTF-IDF：0.5も、ベクトル長を１に揃えるための正規化により、それぞれ0.62及び0.43に変換されている。文書Ｂ以下についても同様である。 Note that the TF-IDF of “Today” is 0.25, and the frequency of “Today” in the document A is “2”. However, as a result of normalization to make the vector length equal to 1, the total is 0.33. Numerical values are derived. Similarly, “deadline” TF-IDF: 0.5 and “all night” TF-IDF: 0.5 are also converted to 0.62 and 0.43, respectively, by normalization to align the vector length to 1. The same applies to document B and the following.

文書Ｂの場合にはDF辞書70に収録された用語の中、「今日」「煮干し」の２種類の用語を含んでおり、これらの用語のDF辞書70における掲載順が１番と４番であるため、ベクトル要素として１行目及び４行目に0.16及び0.43の数値が代入され、他の用語の掲載順に対応する行には0.0が代入されている。 In the case of Document B, among the terms recorded in the DF dictionary 70, there are two types of terms “Today” and “Niboshi”, and the order in which these terms appear in the DF dictionary 70 is No. 1 and No. 4. Therefore, numerical values of 0.16 and 0.43 are assigned to the first and fourth lines as vector elements, and 0.0 is assigned to the lines corresponding to the order in which other terms are listed.

また、文書Ｃの場合はDF辞書70に収録された用語の中、「今日」「天気」「野球」の３種類の用語を含んでおり、これらの用語のDF辞書70における掲載順が１番と５番、６番であるため、ベクトル要素として１行目、５行目、６行目にそれぞれ0.16、0.43、0.43の数値が代入され、他の用語の掲載順に対応する行には0.0が代入されている。 In the case of the document C, the terms included in the DF dictionary 70 include three types of terms “today”, “weather”, and “baseball”, and these terms are listed first in the DF dictionary 70. Since the numbers 5 and 6 are assigned, the numerical values 0.16, 0.43, and 0.43 are assigned to the first, fifth, and sixth lines as vector elements, and 0.0 is assigned to the lines that correspond to the order in which other terms are listed. Assigned.

また、文書Ｄの場合はDF辞書辞書70に収録された用語の中、「天気」「サッカー」の２種類の用語を含んでおり、これらの用語のDF辞書70における掲載順が６番と７番であるため、ベクトル要素として６行目及び７行目にそれぞれ0.43、0.22の数値が代入され、他の用語の掲載順に対応する行には0.0が代入されている。 In the case of document D, two terms “weather” and “soccer” are included in the terms recorded in the DF dictionary dictionary 70, and the order of posting these terms in the DF dictionary 70 is Nos. 6 and 7. Therefore, numerical values of 0.43 and 0.22 are assigned to the sixth and seventh lines as vector elements, respectively, and 0.0 is assigned to the lines corresponding to the order in which other terms are listed.

つぎに対応関係判定部18は、各ニュース記事のベクトルと各ブログ記事のベクトルとの間の内積（距離）を求める（Ｓ20-08-09）。この内積が、両記事間の類似度を表している。 Next, the correspondence determination unit 18 obtains an inner product (distance) between each news article vector and each blog article vector (S20-08-09). This inner product represents the similarity between the articles.

つぎに対応関係判定部18は、この類似度が最も高くなるブログ記事とニュース記事との組合せに対して「類似関係あり」を認定する（図６のＳ20-09）。具体的には、ベクトル間の内積が1.0に最も近いものが、最高の類似度と評価される。この閾値は、別途実験にて得られた知見に従い定められた数値である。
図１３はこの具体例を示すものであり、ブログ記事は作者自身の言葉で綴られているが、そこに登場する用語の組合せの共通性からニュース記事との間に類似関係が認定されている。 Next, the correspondence determination unit 18 recognizes “there is a similar relationship” for the combination of the blog article and the news article with the highest similarity (S20-09 in FIG. 6). Specifically, the one whose inner product between vectors is closest to 1.0 is evaluated as the highest similarity. This threshold value is a numerical value determined in accordance with knowledge obtained in a separate experiment.
FIG. 13 shows a specific example of this, and the blog article is spelled in the author's own words, but the similarity between the news article is recognized because of the common combination of terms appearing there. .

つぎに対応関係判定部18は、リンク、引用、類似の何れかの関係が認定されたブログ記事とニュース記事との間に対応関係を認定する（図６のＳ20-10）。
図１４の例では、(a)の「リンク関係」についてはニュース記事Ａ及びニュース記事Ｂがブログαに対して対応関連ありとされ、(b)の「引用関係」についてはニュース記事Ａがブログαに対して対応関連ありと認定され、(c)の「類似関係」についてはニュース記事Ｂがブログαに対して対応関連ありとされている場合に、最終的にニュース記事Ａ及びＢとブログ記事αとの間に対応関係が認定されている。 Next, the correspondence determination unit 18 recognizes the correspondence between the blog article and the news article in which any one of the links, quotations, and similar relationships is recognized (S20-10 in FIG. 6).
In the example of FIG. 14, news article A and news article B are associated with blog α for “link relation” in (a), and news article A is blogged for “quotation relation” in (b). It is recognized that there is a corresponding relationship with α, and for the “similarity” in (c), when news article B is determined to have a corresponding relationship with blog α, the news articles A and B and the blog are finally included. Correspondence is certified with article α.

上記においては、対応関係判定部18がリンク関係の有無、引用関係の有無、及び類似関係の有無に基づいてニュース記事とブログ記事間の対応関係を判定する例を説明したが、これらの中の少なくとも一つによってニュース記事とブログ記事間の対応関係を判定してもよい。 In the above, the example in which the correspondence determination unit 18 determines the correspondence between the news article and the blog article based on the presence / absence of the link relationship, the presence / absence of the citation relationship, and the presence / absence of the similarity relationship has been described. You may determine the correspondence between a news article and a blog article by at least one.

以上のようにして対応関係判定部18によるブログ記事とニュース記事との対応付けが完了すると、影響力解析部20が起動し、分析案件毎に各ニュース記事の影響力が算出される（図５のＳ22）。
以下、図１５のフローチャートに従い、この影響力算出に係る処理手順を説明する。 When the association between the blog article and the news article is completed by the correspondence determination unit 18 as described above, the influence analysis unit 20 is activated and the influence of each news article is calculated for each analysis case (FIG. 5). S22).
Hereinafter, a processing procedure related to the influence calculation will be described with reference to the flowchart of FIG.

まず影響力解析部20は、評価語辞書21を参照して各ブログ記事の内容を分析し、記述内容がポジティブ（肯定的）かネガティブ（否定的）であるかを判定する（Ｓ22-01）。
すなわち、図１６に示すように、評価語辞書21内にはブログ記事の内容を判定するのに役立つ評価語が予め多数蓄積されており、各評価語の持つ肯定的意味合いの強さや否定的意味合いの強さに応じた正負のポイントが設定されている。
このため影響力解析部20は、ブログ記事を形態素に分解して特定の品詞（名詞や形容詞等）を取り出した後、評価語辞書に格納された各評価語と比較して行き、当該ブログ記事中に評価語を発見する都度、そのポイントを加算する。そして、最終的なポイントがプラスの場合には当該ブログ記事をポジティブと認定し、０またはマイナスの場合にはネガティブと認定する。 First, the influence analysis unit 20 analyzes the content of each blog article with reference to the evaluation word dictionary 21, and determines whether the description content is positive (positive) or negative (negative) (S22-01). .
That is, as shown in FIG. 16, a large number of evaluation words useful for determining the content of the blog article are stored in advance in the evaluation word dictionary 21, and the strength of the positive meaning and the negative meaning of each evaluation word. Positive and negative points are set according to the strength.
For this reason, the influence analysis unit 20 decomposes the blog article into morphemes and extracts specific parts of speech (nouns, adjectives, etc.), and then compares them with each evaluation word stored in the evaluation word dictionary. Every time an evaluation word is found, the points are added. When the final point is positive, the blog article is recognized as positive, and when the final point is 0 or negative, it is recognized as negative.

つぎに影響力解析部20は、各ニュース記事に対応付けられたブログ記事の数を集計し（Ｓ22-02）、その総数を当該ニュース記事の影響力とする。この総数が多いということは、多くのブログ記事に影響を与えたことを意味するからである。
つぎに影響力解析部20は、当該ニュース記事に対応付けられたブログ記事の中で、予め設定されたサブキーワードを含むものの数を、サブキーワード毎に集計する（Ｓ22-03）。
最後に影響力解析部20は、当該ニュース記事に対応付けられたブログ記事の中で、内容がポジティブなものの数とネガティブなものの数、及びポジティブが占める比率を、対応付けられたブログ記事全体と、各サブキーワードを含むブログ記事別に算出する（Ｓ22-04）。 Next, the influence analysis unit 20 counts the number of blog articles associated with each news article (S22-02), and uses the total as the influence of the news article. The large number means that it has influenced many blog posts.
Next, the influence analysis unit 20 counts the number of blog articles associated with the news article including a preset sub keyword for each sub keyword (S22-03).
Finally, the influence analysis unit 20 calculates the number of positive and negative contents in the blog article associated with the news article, and the ratio of the positive account to the entire associated blog article. The calculation is performed for each blog article including each sub-keyword (S22-04).

この算出結果は、解析結果記憶部22に格納され（図５のＳ24）、上記の通り、Webサーバ34を介してクライアント端末36に送信される分析結果一覧画面68中に表示される（図４参照）。 This calculation result is stored in the analysis result storage unit 22 (S24 in FIG. 5), and displayed on the analysis result list screen 68 transmitted to the client terminal 36 via the Web server 34 as described above (FIG. 4). reference).

この発明に係るニュース記事評価システムの機能構成を示すブロック図である。It is a block diagram which shows the function structure of the news article evaluation system which concerns on this invention. 分析案件一覧画面を示す図である。It is a figure which shows an analysis item list screen. 分析案件追加画面を示す図である。It is a figure which shows an analysis case addition screen. 分析結果一覧画面を示す図である。It is a figure which shows an analysis result list screen. このニュース記事評価システムの全体的な処理手順を示すフローチャートである。It is a flowchart which shows the whole process sequence of this news article evaluation system. ニュース記事とブログ記事との対応付けに係る処理手順を示すフローチャートである。It is a flowchart which shows the process sequence which concerns on matching with a news article and a blog article. ニュース記事とブログ記事との間に「リンク関係あり」の関係が認定された例を示す説明図である。It is explanatory drawing which shows the example in which the relationship of "there is a link relationship" was recognized between the news article and the blog article. ニュース記事とブログ記事との間に「引用関係あり」の関係が認定された例を示す説明図である。It is explanatory drawing which shows the example in which the relationship of "there is a quotation relation" was recognized between the news article and the blog article. LCS解法の基本原理を示す説明図である。It is explanatory drawing which shows the basic principle of LCS solution. ニュース記事とブログ記事間の類似度算出に係る処理手順を示すフローチャートである。It is a flowchart which shows the process sequence which concerns on the similarity calculation between a news article and a blog article. ニュース記事とブログ記事間の類似度算出に係る処理内容を示す説明図である。It is explanatory drawing which shows the processing content which concerns on the similarity calculation between a news article and a blog article. ニュース記事とブログ記事間の類似度算出に係る処理内容を示す説明図である。It is explanatory drawing which shows the processing content which concerns on the similarity calculation between a news article and a blog article. ニュース記事とブログ記事との間に「類似関係あり」の関係が認定された例を示す説明図である。It is explanatory drawing which shows the example by which the relationship of "there is a similar relationship" was recognized between the news article and the blog article. ニュース記事とブログ記事との対応付けの方法を示す説明図である。It is explanatory drawing which shows the method of matching with a news article and a blog article. ニュース記事の影響力算出に係る処理手順を示すフローチャートである。It is a flowchart which shows the process sequence which concerns on the influence calculation of a news article. 評価語辞書の登録レコードを例示する図である。It is a figure which illustrates the registration record of an evaluation word dictionary.

10 ニュース記事評価システム
12 記事収集部
14 ニュース記事記憶部
16 ブログ記事記憶部
18 対応関係判定部
20 影響力解析部
21 評価語辞書
22 解析結果記憶部
24 ユーザ設定記憶部
26 インターネット
28 ブログサーバ
30 ニュースサーバ
32 クライアント端末
34 Webサーバ
36 クライアント端末
40 分析案件一覧画面
42 分析案件リスト
44 「案件追加」ボタン
46 分析案件追加画面
48 案件名設定欄
49 キーワード指定欄
50 サブキーワード指定欄
51 「追加」ボタン
52 基準日指定欄
54 ニュースサイト指定欄
55 チェックボックス
56 「確認」ボタン
57 「特定ニュースサイトの登録」ボタン
58 「特定記事の追加」ボタン
60 実施タイプ指定欄
61 基本間隔設定欄
62 詳細間隔設定欄
63 「登録」ボタン
64 「確認・変更」ボタン
65 「表示」ボタン
68 分析結果一覧画面
70 DF辞書 10 News article evaluation system
12 Article collection department
14 News article storage
16 Blog article storage
18 Correspondence judgment section
20 Impact Analysis Department
21 Evaluation word dictionary
22 Analysis result storage
24 User setting memory
26 Internet
28 Blog server
30 news server
32 client terminals
34 Web server
36 Client terminal
40 Analysis Item List Screen
42 Analytical case list
44 Add Item button
46 Analysis project addition screen
48 Item name setting field
49 Keyword specification field
50 Sub-keyword specification field
51 Add button
52 Base date specification field
54 News site designation field
55 Check box
56 Confirm button
57 “Register specific news site” button
58 Add Specific Article button
60 Implementation type designation field
61 Basic interval setting field
62 Detailed interval setting field
63 “Register” button
64 Confirm / Change button
65 Display button
68 Analysis result list screen
70 DF Dictionary

Claims

Storage means to set keywords for each analysis case;
Means for acquiring a news article including the keyword from a news server installed on the Internet and storing it in a news article storage means;
Means for acquiring a blog article including the keyword from a blog server installed on the Internet and storing it in a blog article storage means;
Correspondence determination means for identifying a news article corresponding to each blog article,
For each news article, there is an impact analysis means that counts the number of blog articles that have a correspondence relationship, and stores the total number of blog articles in the analysis result storage means in association with each news article,
The correspondence determination means compares the link information set in each blog article with the URL of each news article, and if the two match, certifies the correspondence between the blog article and the news article. A featured news article evaluation system.

Storage means for setting keywords;
Means for acquiring a news article including the keyword from a news server installed on the Internet and storing it in a news article storage means;
Means for acquiring a blog article including the keyword from a blog server installed on the Internet and storing it in a blog article storage means;
Correspondence determination means for identifying a news article corresponding to each blog article,
For each news article, there is an impact analysis means that counts the number of blog articles that have a correspondence relationship, and stores the total number of blog articles in the analysis result storage means in association with each news article,
The correspondence determination means calculates the longest common character string number between each blog article and each news article, and the combination of the blog article and the news article that has the largest number of the longest common character string and exceeds a predetermined threshold value. A news article evaluation system characterized by certifying the correspondence relations.

Storage means for setting keywords;
Means for acquiring a news article including the keyword from a news server installed on the Internet and storing it in a news article storage means;
Means for acquiring a blog article including the keyword from a blog server installed on the Internet and storing it in a blog article storage means;
Correspondence determination means for identifying a news article corresponding to each blog article,
For each news article, there is an impact analysis means that counts the number of blog articles that have a correspondence relationship, and stores the total number of blog articles in the analysis result storage means in association with each news article,
The correspondence determining means calculates a similarity between each blog article and each news article, and determines a correspondence for a combination of the blog article and the news article having the highest similarity. Evaluation system.

The correspondence determination means
A process of decomposing each news article and blog article into morpheme units and extracting a morpheme related to a predetermined part of speech from each article;
Processing to calculate the TF-IDF value of each extracted morpheme;
A process of vectorizing each article based on the TF-IDF value of each morpheme,
A process for obtaining an inner product between each news article vector and each blog article vector;
4. The news article evaluation system according to claim 3, wherein a process of determining a correspondence relation is performed for a combination of a news article and a blog article whose inner product is closest to a predetermined threshold.

A storage means for setting at least one sub-keyword for each analysis item is provided.
The influence analysis means counts the number of blog articles associated with each news article that include the sub-keyword, and associates the total number of blog articles for each sub-keyword with each news article to obtain an analysis result. The news article evaluation system according to any one of claims 1 to 4, wherein a process of storing in the storage means is executed.

A combination of multiple evaluation words used to determine whether the content of a blog article is positive or negative, and points set according to the strength of the positive meaning or the negative meaning of each evaluation word Evaluation word dictionary to be stored,
Means for decomposing each blog article into morpheme units and extracting morphemes with predetermined parts of speech;
Means for comparing each morpheme and the evaluation word, and assigning a corresponding point to the morpheme that matches the evaluation word;
A means is provided for counting points in units of blog articles, certifying the content of the blog article as positive when the calculated point is greater than or equal to the set value, and determining as negative when the calculated point is less than the set value,
The influence analysis means aggregates the number of blog articles certified as positive and the number of blog articles certified as negative among the total number of blog articles, and stores the result in the analysis result storage means. 6. The news article evaluation system according to claim 1, wherein processing is executed.