JP2008015694A

JP2008015694A - Document taste learning system, method, and program

Info

Publication number: JP2008015694A
Application number: JP2006184767A
Authority: JP
Inventors: Takashi Nakamura; 隆史中村
Original assignee: Oki Networks Co Ltd
Current assignee: Oki Networks Co Ltd
Priority date: 2006-07-04
Filing date: 2006-07-04
Publication date: 2008-01-24

Abstract

<P>PROBLEM TO BE SOLVED: To reduce or dispense with intentionally carried out operation (work) required from a user for learning a document taste of the user. <P>SOLUTION: A document taste learning system counts bookmark registration or file storage of a document as one condition for determining that the document is attractive to the user. Another document taste learning system counts a longer staying time of the user in browsing of a document in comparison with a threshold value as one condition for determining that the document is attractive to the user. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は文書嗜好学習システム、方法及びプログラムに関し、例えば、ユーザの所望する文書を効率良く検索する場合に適用し得るものである。 The present invention relates to a document preference learning system, method, and program, and can be applied to, for example, efficiently searching for a document desired by a user.

近年、コンピュータやインターネットの普及により、多くのＷｅｂページが生成されている。その結果、Ｗｅｂページの数が膨大になったことで、ユーザの求めるＷｅｂページを効率良く検索する要求が高まってきている。その解決方法として、ベイズ推定を用いたＷｅｂマイニングなどの手法も研究されつつある（非特許文献１、非特許文献２参照）。 In recent years, with the spread of computers and the Internet, many Web pages have been generated. As a result, since the number of Web pages has become enormous, there is an increasing demand for efficiently searching for Web pages requested by users. As a solution, a technique such as Web mining using Bayesian estimation is being studied (see Non-Patent Document 1 and Non-Patent Document 2).

この従来方法では、ユーザのお気に入りのＷｅｂページ（関心があるＷｅｂページ）の特徴を学習することで、ユーザがキーワードなどの検索用情報を考えることなく、ユーザが同様に関心を向けるＷｅｂページ（同じ趣味嗜好のＷｅｂページ）を収集するシステムの実現を目標として、お気に入りのＷｅｂページの学習方法を含めた検索手法を提案している。
天野環、中里秀則著、「ベイズ推定を用いたＷｅｂマイニング」、２００４年電子情報通信学会ソサイエティ大会講演論文集、Ｎｏ．Ｂ−１９−９、ｐ．４０４、２００４年９月天野環、中里秀則、中村隆史著、「ベイズ推定を用いたＷｅｂマイニング」、信学技報、電子情報通信学会、２００５年３月１５日 In this conventional method, by learning the characteristics of a user's favorite web page (web page of interest), the web page that the user is similarly interested in (same as the search information such as a keyword) is not considered. With the goal of realizing a system that collects hobby-preferred web pages), a search method including a method for learning favorite web pages has been proposed.
Tamano Amano, Hidenori Nakazato, “Web mining using Bayesian inference”, Proceedings of Society Conference of IEICE Society Society, No. B-19-9, p. 404, September 2004 Tamano Amano, Hidenori Nakazato, Takashi Nakamura, “Web Mining Using Bayesian Estimation”, IEICE Technical Report, IEICE, March 15, 2005

しかしながら、従来方法では、ユーザはお気に入りのページと関係のあるページと関係のないページを手作業で収集して分類しておく必要がある。すなわち、ユーザの文書嗜好（関心ある文書）を学習するためには、ユーザ自身が、そのＷｅｂページ（ＵＲＬ）の情報に対し、各々、好き（関心有り）、嫌い（関心無し）を選択する必要があり、事前学習において、大変な手間がかかってしまう。例えば、Ｗｅｂページを表示しているＷｅｂブラウザ画面の一部に、「好き」アイコンや「嫌い」アイコンを表示させ、いずれかのアイコンのクリックにより、ユーザは自己の嗜好をシステムに学習させる。 However, according to the conventional method, the user needs to manually collect and classify pages that are not related to a favorite page. That is, in order to learn the user's document preferences (documents of interest), the user himself / herself needs to select the likes (interests) and dislikes (no interests) for the information on the Web page (URL). And it takes a lot of work in advance learning. For example, a “like” icon or a “dislike” icon is displayed on a part of a web browser screen displaying a web page, and the user learns his / her preference by clicking on either icon.

検索の精度を向上させようとすると、学習しておくＷｅｂページも多い方が好ましく、学習するページ数を多くしようとすると、ユーザによる手作業の負担もかなり大きなものとなる。 In order to improve the accuracy of the search, it is preferable that there are many Web pages to be learned, and if the number of pages to be learned is increased, the burden of manual work by the user becomes considerably large.

本発明は、上記問題に鑑みてなされたものであり、ユーザの文書嗜好を学習するためにユーザに意図して実行させる動作（作業）を軽減若しくはなくすことができる文書嗜好学習システム、方法及びプログラムを提供することを目的とする。 The present invention has been made in view of the above problems, and a document preference learning system, method, and program capable of reducing or eliminating an operation (work) that a user intentionally executes in order to learn a user's document preference. The purpose is to provide.

第１の本発明の文書嗜好学習システムは、入力文書の特徴量を得る特徴量取得手段と、上記入力文書に係る情報を、将来の上記入力文書の利用、使用のために保存する行為がなされたか否かを判別し、上記保存する行為がなされたことをユーザが関心ある文書と判別する１条件として、上記入力文書に対するユーザの関心有無を判別する関心判別手段と、上記入力文書の特徴量を、ユーザが関心ある文書か否かに振り分けて記憶する文書特徴量記憶手段とを有することを特徴とする。 In the document preference learning system according to the first aspect of the present invention, a feature amount obtaining unit for obtaining a feature amount of an input document and an act of storing information related to the input document for future use and use of the input document are performed. An interest discriminating means for discriminating whether or not the user is interested in the input document, and a feature amount of the input document as one condition for discriminating whether or not the act of saving is performed as a document in which the user is interested. And a document feature amount storage means for storing the information according to whether the user is interested in the document.

第２の本発明の文書嗜好学習システムは、入力文書の特徴量を得る特徴量取得手段と、上記入力文書をユーザが閲覧している時間が閾値を超えたことをユーザが関心ある文書と判別する１条件として、上記入力文書に対するユーザの関心有無を判別する関心判別手段と、上記入力文書の特徴量を、ユーザが関心ある文書か否かに振り分けて記憶する文書特徴量記憶手段とを有することを特徴とする。 A document preference learning system according to a second aspect of the present invention is characterized in that a feature amount acquisition unit that obtains a feature amount of an input document, and that a user is interested in a document whose time during which the input document is browsed exceeds a threshold. As one condition, there is an interest determination unit that determines whether or not the user is interested in the input document, and a document feature amount storage unit that stores the feature amount of the input document according to whether the user is interested in the document. It is characterized by that.

第３の本発明の文書嗜好学習方法は、特徴量取得手段、関心判別手段、及び、文書特徴量記憶手段を有し、上記特徴量取得手段は、入力文書の特徴量を得、上記関心判別手段は、上記入力文書に係る情報を、将来の上記入力文書の利用、使用のために保存する行為がなされたか否かを判別し、上記保存する行為がなされたことをユーザが関心ある文書と判別する１条件として、上記入力文書に対するユーザの関心有無を判別し、上記文書特徴量記憶手段は、上記入力文書の特徴量を、ユーザが関心ある文書か否かに振り分けて記憶することを特徴とする。 A document preference learning method according to a third aspect of the present invention includes a feature amount acquisition unit, an interest determination unit, and a document feature amount storage unit. The feature amount acquisition unit obtains a feature amount of an input document, and the interest determination The means determines whether or not an act of storing the information related to the input document for the future use and use of the input document is performed, and the user is interested in the fact that the act of storing is performed. As one condition for determining, whether or not the user is interested in the input document is determined, and the document feature amount storage means distributes and stores the feature amount of the input document according to whether the user is interested in the document. And

第４の本発明の文書嗜好学習方法は、特徴量取得手段、関心判別手段、及び、文書特徴量記憶手段を有し、上記特徴量取得手段は、入力文書の特徴量を得、上記関心判別手段は、上記入力文書をユーザが閲覧している時間が閾値を超えたことをユーザが関心ある文書と判別する１条件として、上記入力文書に対するユーザの関心有無を判別し、上記文書特徴量記憶手段は、上記入力文書の特徴量を、ユーザが関心ある文書か否かに振り分けて記憶することを特徴とする。 A document preference learning method according to a fourth aspect of the present invention includes a feature amount acquisition unit, an interest determination unit, and a document feature amount storage unit. The feature amount acquisition unit obtains a feature amount of an input document, and performs the interest determination. The means determines whether or not the user is interested in the input document as one condition for determining that the user's browsing time of the input document exceeds a threshold as a document of interest to the user, and stores the document feature value storage. The means is characterized in that the feature quantity of the input document is sorted and stored according to whether the user is interested in the document.

第５の本発明の文書嗜好学習プログラムは、コンピュータを、入力文書の特徴量を得る特徴量取得手段と、上記入力文書に係る情報を、将来の上記入力文書の利用、使用のために保存する行為がなされたか否かを判別し、上記保存する行為がなされたことをユーザが関心ある文書と判別する１条件として、上記入力文書に対するユーザの関心有無を判別する関心判別手段と、上記入力文書の特徴量を、ユーザが関心ある文書か否かに振り分けて記憶する文書特徴量記憶手段として機能させるように記述されていることを特徴とする。 In the document preference learning program of the fifth aspect of the present invention, the computer stores feature quantity acquisition means for obtaining the feature quantity of the input document, and information related to the input document for future use and use of the input document. An interest discriminating means for discriminating whether or not the user is interested in the input document as one condition for discriminating whether or not an act has been performed, and discriminating that the act of saving has been performed as a document of interest to the user, and the input document The feature amount is described so as to function as a document feature amount storage unit that stores the feature amount according to whether the user is interested in the document.

第６の本発明の文書嗜好学習プログラムは、コンピュータを、入力文書の特徴量を得る特徴量取得手段と、上記入力文書をユーザが閲覧している時間が閾値を超えたことをユーザが関心ある文書と判別する１条件として、上記入力文書に対するユーザの関心有無を判別する関心判別手段と、上記入力文書の特徴量を、ユーザが関心ある文書か否かに振り分けて記憶する文書特徴量記憶手段として機能させるように記述されていることを特徴とする。 In the document preference learning program of the sixth aspect of the present invention, the user is interested in the feature amount obtaining means for obtaining the feature amount of the input document and the time that the user is browsing the input document exceeds the threshold. As one condition for discriminating a document, an interest discriminating unit that discriminates whether or not the user is interested in the input document, and a document feature amount storage unit that stores the feature amount of the input document according to whether the user is interested in the document. It is described to function as.

本発明によれば、ユーザの文書嗜好を学習するためにユーザに意図して実行させる動作（作業）を軽減若しくはなくすことができる。 ADVANTAGE OF THE INVENTION According to this invention, in order to learn a user's document preference, the operation | movement (work | work) which a user intends to perform can be reduced or eliminated.

（Ａ）第１の実施形態
以下、本発明による文書嗜好学習システム、方法及びプログラムを、Ｗｅｂページ閲覧システム、方法及びプログラムに適用した第１の実施形態を、図面を参照しながら詳述する。 (A) First Embodiment Hereinafter, a first embodiment in which a document preference learning system, method and program according to the present invention are applied to a Web page browsing system, method and program will be described in detail with reference to the drawings.

（Ａ−１）第１の実施形態の構成
図１は、第１の実施形態におけるＷｅｂページ閲覧システムの機能的構成を示すブロック図である。第１の実施形態のＷｅｂページ閲覧システムは、パソコン、携帯電話端末、ＰＤＡなどの情報処理装置上に、第１の実施形態のＷｅｂページ閲覧プログラム（固定データを含む；いわゆるＷｅｂブラウザ）をインストールすることにより、構築されるものであるが、機能的には、図１で表すことができる。 (A-1) Configuration of First Embodiment FIG. 1 is a block diagram illustrating a functional configuration of a Web page browsing system according to the first embodiment. The Web page browsing system of the first embodiment installs the Web page browsing program (including fixed data; so-called Web browser) of the first embodiment on an information processing apparatus such as a personal computer, a mobile phone terminal, or a PDA. Although it is constructed by this, it can be functionally represented in FIG.

図１において、第１の実施形態のＷｅｂページ閲覧システム１は、入力部１０、表示部１１、通信部１２、閲覧制御部１３、ページ特徴量形成部１４、ページ特徴量記憶部１５、嗜好文書判定部１６及びブックマーク記憶部１７を有する。 In FIG. 1, a Web page browsing system 1 according to the first embodiment includes an input unit 10, a display unit 11, a communication unit 12, a browsing control unit 13, a page feature value forming unit 14, a page feature value storage unit 15, and a preference document. A determination unit 16 and a bookmark storage unit 17 are included.

入力部１０は、ハードウェア的にはキーボードやマウスなどが該当し、Ｗｅｂページの閲覧の起動を指示したり、ユーザ嗜好文書の検索を指示したり、ブックマーク（お気に入りＵＲＬ）への登録を指示したりなどするものである。 The input unit 10 corresponds to hardware such as a keyboard and a mouse, and instructs to start browsing a Web page, searches for a user-preferred document, and instructs registration to a bookmark (favorite URL). It is something to do.

表示部１１は、ハードウェア的にはディスプレイが該当し、閲覧に供するＷｅｂページを表示したり、各種の操作用アイコンを表示したりするものである。 The display unit 11 corresponds to a display in terms of hardware, and displays a web page for browsing or displays various operation icons.

通信部１２は、通信回路や通信用ソフトウェアなどが該当し、閲覧制御部１３から指示されたＷｅｂページを、インターネット２０から取り出すための通信を実行するものである。 The communication unit 12 corresponds to a communication circuit, communication software, or the like, and executes communication for retrieving the Web page instructed from the browsing control unit 13 from the Internet 20.

閲覧制御部１３は、入力部１０から与えられた信号やそのときの処理段階などに応じて、Ｗｅｂページの閲覧に係る各種の動作を制御するものである。 The browsing control unit 13 controls various operations related to browsing of Web pages according to a signal given from the input unit 10 and a processing stage at that time.

ページ特徴量形成部１４は、閲覧制御部１３から指示されたＷｅｂページの特徴量を形成するものである。Ｗｅｂページの特徴量の形成は、ユーザの文書嗜好を学習する場合や、検索されたＷｅｂページがユーザの文書嗜好に合致するか否かを判定する場合などに実行される。 The page feature amount forming unit 14 forms the feature amount of the Web page instructed from the browsing control unit 13. The feature amount of the Web page is formed when learning the user's document preference or when determining whether or not the searched Web page matches the user's document preference.

ページ特徴量記憶部１５は、ユーザの文書嗜好を反映させたページ特徴量を記憶するものである。ページ特徴量記憶部１５は、ユーザが好きな（関心有り）Ｗｅｂページの特徴量と、ユーザが嫌いな（関心無し）Ｗｅｂページの特徴量とを区別して記憶しているものである。ページ特徴量記憶部１５は、その全体又は一部が補助記憶装置上に構成されても良い。 The page feature amount storage unit 15 stores a page feature amount reflecting the user's document preference. The page feature amount storage unit 15 stores a feature amount of a Web page that the user likes (with interest) and a feature amount of a Web page that the user dislikes (with no interest) in distinction. The page feature quantity storage unit 15 may be entirely or partially configured on an auxiliary storage device.

嗜好文書判定部１６は、閲覧制御部１３から指示されたＷｅｂページがユーザの文書嗜好に合致するか否かを判定するものである。嗜好文書判定部１６は、判定時には、ページ特徴量形成部１４から判定に供するＷｅｂページの特徴量を取得し、ページ特徴量記憶部１５の記憶内容をも参酌する。 The preference document determination unit 16 determines whether the Web page instructed from the browsing control unit 13 matches the user's document preference. At the time of determination, the preference document determination unit 16 acquires the feature amount of the Web page to be used for determination from the page feature amount formation unit 14 and also considers the storage content of the page feature amount storage unit 15.

ブックマーク記憶部１７は、閲覧制御部１３の制御下で、ユーザがブックマーク指定したＵＲＬを記憶するものである。 The bookmark storage unit 17 stores a URL specified by the user as a bookmark under the control of the browsing control unit 13.

（Ａ−２）第１の実施形態の動作
次に、第１の実施形態のＷｅｂページ閲覧システムにおける動作（Ｗｅｂページ閲覧方法）を説明する。 (A-2) Operation of First Embodiment Next, an operation (Web page browsing method) in the Web page browsing system of the first embodiment will be described.

まず、ユーザの文書嗜好を学習する際の動作を、図２及び図３のフローチャートを参照しながら説明する。図２は、あるＷｅｂページについて、ユーザの文書嗜好を学習する際の全体の動作の流れを示しており、図３は、ページ特徴量の作成動作の流れを示している。なお、図２や図３の処理の途中では、キー入力などによって他の処理に割り込むことがあるが、図２及び図３ではそのような割り込み処理を省略して記述している。 First, the operation when learning the user's document preference will be described with reference to the flowcharts of FIGS. FIG. 2 shows the flow of the overall operation when learning the user's document preference for a certain Web page, and FIG. 3 shows the flow of the page feature value creation operation. In the middle of the processing of FIG. 2 and FIG. 3, other processing may be interrupted by key input or the like, but in FIG. 2 and FIG. 3, such interrupt processing is omitted.

第１の実施形態のＷｅｂページ閲覧システムにおいては、ユーザの閲覧に供するために、インターネット２０から新たなＷｅｂページを取得すると、閲覧制御部１３は、図２に示す処理を開始し、まず、取得したＷｅｂページを表示（閲覧）させる（ステップ１０１）。この表示処理は、既に、インターネット２０から取得し、他のＷｅｂページの表示により、キャッシュメモリに格納されたＷｅｂページを再表示する場合をも含んで良く、逆に、このような場合を排除するようにしても良い。 In the Web page browsing system of the first embodiment, when a new Web page is acquired from the Internet 20 for use by the user, the browsing control unit 13 starts the process shown in FIG. The Web page that has been displayed is displayed (viewed) (step 101). This display processing may include a case where the Web page already acquired from the Internet 20 and stored in the cache memory is displayed by displaying another Web page. Conversely, such a case is excluded. You may do it.

その後、閲覧制御部１３は、表示（閲覧）とは無関係に、ページ特徴量形成部１４に、そのＷｅｂページの特徴量の形成処理を実行させる（ステップ１０２）。図３は、上述したように、このようなＷｅｂページの特徴量の具体的な形成処理を示している。 Thereafter, the browsing control unit 13 causes the page feature value forming unit 14 to execute the feature value forming process of the Web page regardless of the display (browsing) (step 102). FIG. 3 shows a specific process for forming the feature amount of the Web page as described above.

次に、閲覧制御部１３は、対象となっているＷｅｂページのＵＲＬを、ブックマークに追加する操作がユーザによってなされたか否かを判別する（ステップ１０３）。 Next, the browsing control unit 13 determines whether or not an operation for adding the URL of the target Web page to the bookmark has been performed by the user (step 103).

閲覧制御部１３は、このような判別を、例えば、以下のようなタイミングで行う。例えば、Ｗｅｂページの閲覧（表示）を終了させるタイミングで行う。終了させるタイミングとは、他のＷｅｂページの表示に切り替えられた場合やブラウザを閉じる操作がなされた場合などが該当する。なお、追加する操作がなされたことを、実際に追加操作がなされたタイミングで捉え、追加する操作がなされないことを、Ｗｅｂページの閲覧（表示）を終了させるタイミングで行うようにしても良い。 The browsing control unit 13 performs such determination at, for example, the following timing. For example, it is performed at the timing when browsing (display) of the Web page is ended. The timing to end corresponds to the case where the display is switched to another Web page or the operation of closing the browser is performed. Note that the addition operation may be recognized at the timing when the addition operation is actually performed, and the addition operation may not be performed at the timing when the browsing (display) of the Web page is ended.

閲覧制御部１３は、ＵＲＬがブックマークに追加されると、そのページ特徴量を、ページ特徴量記憶部１５に、ユーザが好きな（関心有り）Ｗｅｂページの特徴量として記憶し、ＵＲＬがブックマークに追加されることがないと、そのページ特徴量を、ページ特徴量記憶部１５に、ユーザが嫌いな（関心無し）Ｗｅｂページの特徴量として記憶し（ステップ１０４）、図２に示す一連の処理を終了する。 When the URL is added to the bookmark, the browsing control unit 13 stores the page feature amount in the page feature amount storage unit 15 as the feature amount of the Web page that the user likes (interests), and the URL is the bookmark. If not added, the page feature value is stored in the page feature value storage unit 15 as a feature value of the Web page that the user dislikes (no interest) (step 104), and a series of processes shown in FIG. Exit.

次に、Ｗｅｂページの特徴量の具体的な形成処理を、図３を参照しながら説明する。 Next, a specific process for forming the feature amount of the Web page will be described with reference to FIG.

この第１の実施形態の場合、Ｗｅｂページの特徴量を、単語（例えば名詞）若しくは単語列（例えば名詞句）などのトークン毎の出現回数若しくは出現率の集合としており、図３は、このようなＷｅｂページの特徴量を抽出する処理となっている。 In the case of the first embodiment, the feature amount of the Web page is a set of the number of appearances or the appearance rate for each token such as a word (for example, a noun) or a word string (for example, a noun phrase), and FIG. This is a process for extracting the feature amount of a simple Web page.

処理を開始すると、処理対象のＷｅｂページ（ＨＴＭＬ文書）の先頭側から、１個のトークンを抽出しようとし、抽出できたか否かを判別する（ステップ２０１）。 When the process is started, one token is to be extracted from the head side of the Web page (HTML document) to be processed, and it is determined whether or not the token has been extracted (step 201).

抽出できると、その抽出したトークンは、初めて抽出されたものか否かを判別する（ステップ２０２）。言い換えると、ページの特徴量の要素として、今回、抽出されたトークンが既に登録されているか否かを判別する。既に登録されているトークンであれば、その出現回数を１インクリメントして（ステップ２０３）、上述したステップ２０１に戻る。一方、初めて、抽出されたものであると、ページの特徴量要素のトークンとして追加し（ステップ２０４）、上述したステップ２０１に戻る。なお、新トークンを追加する際には、その出現回数を１とする。 If it can be extracted, it is determined whether or not the extracted token is extracted for the first time (step 202). In other words, it is determined whether or not the currently extracted token is already registered as an element of the feature amount of the page. If the token is already registered, the number of appearances is incremented by 1 (step 203), and the process returns to step 201 described above. On the other hand, if it is extracted for the first time, it is added as a token of the feature element of the page (step 204), and the process returns to step 201 described above. When a new token is added, the number of appearances is set to 1.

処理対象のＷｅｂページ（ＨＴＭＬ文書）の先頭側から、ステップ２０１〜２０４でなる処理を繰り返し実行すると、やがて、トークンを抽出できない状態になる。このときには、Ｗｅｂページの特徴量の作成処理を終了する。なお、各トークンに対する頻度情報を、出現回数ではなく、出現率とする場合であれば、トークンを抽出できなくなったときに、各トークンの出現回数の総和を求めた後、出現率を求める処理を行う。 If the processing in steps 201 to 204 is repeatedly executed from the top side of the Web page (HTML document) to be processed, the token cannot be extracted before long. At this time, the creation process of the feature amount of the Web page is ended. If the frequency information for each token is not the number of appearances but the appearance rate, when the token cannot be extracted, the total number of appearances of each token is calculated, and then the processing for calculating the appearance rate is performed. Do.

以上では、一般的なＷｅｂページの閲覧に並行して実行される、ユーザ嗜好の学習動作を説明した。 The user preference learning operation executed in parallel with the browsing of a general Web page has been described above.

このような学習データは、Ｗｅｂページの検索時に利用される。例えば、検索候補となったＷｅｂページが、ユーザが関心あるものであるか否かを判断する際に利用される。まず、このような判断対象となったＷｅｂページに対しても特徴量を作成する。その後、判断対象となったＷｅｂページの特徴量が、統計的に見て、ページ特徴量記憶部１５に記憶されているユーザが好きな（関心有り）複数のＷｅｂページの特徴量のグループに近いか、ページ特徴量記憶部１５に記憶されているユーザが嫌いな（関心無し）複数のＷｅｂページの特徴量のグループに近いかを判別し、ユーザが関心あるものであるか否かを決定する。判別のための統計的手法については、非特許文献２に記載の方法を適用できるので、ここでは、詳細説明は省略する。 Such learning data is used when searching for Web pages. For example, it is used when determining whether or not a Web page that is a search candidate is of interest to the user. First, a feature amount is also created for such a Web page that has been determined. Thereafter, the feature quantity of the Web page that is the determination target is close to a group of feature quantities of a plurality of Web pages that the user stores (interests) that the user stores in the page feature quantity storage unit 15 as viewed statistically. Or whether it is close to a group of feature values of a plurality of Web pages that the user stores in the page feature value storage unit 15 dislikes (not interested), and determines whether the user is interested or not. . Since the method described in Non-Patent Document 2 can be applied to the statistical method for determination, detailed description thereof is omitted here.

（Ａ−３）第１の実施形態の効果
第１の実施形態によれば、ユーザ嗜好をＷｅｂページから学習する際、対象となっているＷｅｂページに対するユーザの好き嫌い（関心の有無）を、ブックマークへのＵＲＬへの登録有無により判定するようにしたので、ユーザがＷｅｂページの好き嫌いをシステムに教えるためだけに行う動作（作業）をなくすことができる。 (A-3) Effect of First Embodiment According to the first embodiment, when learning user preferences from a Web page, the user's likes and dislikes (presence of interest) for the target Web page are bookmarked. Since the determination is made based on the presence / absence of registration in the URL, the operation (work) that the user performs only to teach the system about the likes and dislikes of the Web page can be eliminated.

（Ｂ）第２の実施形態
次に、本発明による文書嗜好学習システム、方法及びプログラムを、Ｗｅｂページ閲覧システム、方法及びプログラムに適用した第２の実施形態を、図面を参照しながら詳述する。 (B) Second Embodiment Next, a second embodiment in which the document preference learning system, method and program according to the present invention are applied to a Web page browsing system, method and program will be described in detail with reference to the drawings. .

図４は、第２の実施形態におけるＷｅｂページ閲覧システムの機能的構成を示すブロック図であり、第１の実施形態に係る図１との同一、対応部分には同一、対応符号を付して示している。 FIG. 4 is a block diagram showing a functional configuration of the Web page browsing system in the second embodiment. The same and corresponding parts as those in FIG. 1 according to the first embodiment are assigned the same and corresponding reference numerals. Show.

図４において、第２の実施形態のＷｅｂページ閲覧システム１Ａは、入力部１０、表示部１１、通信部１２、閲覧制御部１３Ａ、ページ特徴量形成部１４、ページ特徴量記憶部１５、嗜好文書判定部１６及び閲覧時間計時部１８を有する。なお、ブックマーク記憶部１７も存在するが、第２の実施形態の特徴から離れているので、図４では図示を省略している。 In FIG. 4, the web page browsing system 1A of the second embodiment includes an input unit 10, a display unit 11, a communication unit 12, a browsing control unit 13A, a page feature value forming unit 14, a page feature value storage unit 15, and a preference document. A determination unit 16 and a browsing time counting unit 18 are included. Although the bookmark storage unit 17 exists, it is not shown in FIG. 4 because it is away from the features of the second embodiment.

閲覧時間計時部１８は、閲覧制御部１３Ａの制御下で、あるＷｅｂページがユーザの閲覧に供している時間（ＵＲＬへの滞在時間）を計時するものである。閲覧時間計時部１８は、例えば、あるＷｅｂページを表示部１１に表示させた開始時点から、表示を終了させる時点までの時間を閲覧時間として計時する。終了時点は、他のＷｅｂページの表示に切り替えられた時点やブラウザを閉じる操作がなされた時点などが該当する。 The browsing time counting unit 18 counts the time during which a certain web page is being browsed by the user (the stay time at the URL) under the control of the browsing control unit 13A. The browsing time counting unit 18 counts, for example, the time from the start point at which a certain Web page is displayed on the display unit 11 to the point at which the display is ended as the browsing time. The end time corresponds to a time point when switching to display of another Web page or a time point when an operation of closing the browser is performed.

この第２の実施形態では、Ｗｅｂページの閲覧時間（ＵＲＬへの滞在時間）は、ユーザの関心が高いページが、低いページより長くことに基づいて、ユーザの好き嫌い（関心の有無）を自動的に捉える指標に採用したものである。 In this second embodiment, the browsing time (stay time at URL) of the Web page is automatically set to the user's likes and dislikes (presence / absence of interest) based on the fact that the user's high interest page is longer than the low page. It is adopted as an index to grasp.

図５は、第２の実施形態に関し、あるＷｅｂページについて、ユーザの文書嗜好を学習する際の全体の動作の流れを示しており、第１の実施形態に係る図２との同一ステップには、同一符号を付して示している。 FIG. 5 shows the flow of the entire operation when learning the user's document preference for a certain Web page regarding the second embodiment, and the same steps as FIG. 2 according to the first embodiment include These are shown with the same reference numerals.

この第２の実施形態の場合、図５に示すように、ユーザの好き嫌いを判別するステップ３０３が、ブックマークの登録有無（ステップ１０３）ではなく、閲覧時間が閾値を超えたか否かで行うようになっている。その他の処理ステップは、第１の実施形態と同様である。 In the case of the second embodiment, as shown in FIG. 5, the step 303 for determining whether the user likes or dislikes is performed based on whether or not the browsing time exceeds the threshold, not the presence / absence of bookmark registration (step 103). It has become. Other processing steps are the same as those in the first embodiment.

第２の実施形態の閲覧制御部１３Ａは、閲覧時間が閾値を超えていると、対象となっているＷｅｂページの特徴量を、ページ特徴量記憶部１５に、ユーザが好きな（関心有り）Ｗｅｂページの特徴量として記憶し、閲覧時間が閾値以下であると、対象となっているＷｅｂページの特徴量を、ページ特徴量記憶部１５に、ユーザが嫌いな（関心無し）Ｗｅｂページの特徴量として記憶し（ステップ１０４）、図５に示す一連の処理を終了する。 When the browsing time exceeds the threshold, the browsing control unit 13A of the second embodiment likes the feature amount of the target Web page in the page feature amount storage unit 15 (interested). If the browsing time is less than or equal to the threshold value, the feature amount of the target web page is stored in the page feature amount storage unit 15 as the feature amount of the web page. As a quantity (step 104), the series of processing shown in FIG.

第２の実施形態によれば、ユーザ嗜好をＷｅｂページから学習する際、対象となっているＷｅｂページに対するユーザの好き嫌い（関心の有無）を、ＵＲＬへの滞在時間の長短により判定するようにしたので、ユーザがＷｅｂページの好き嫌いをシステムに教えるためだけに行う動作（作業）をなくすことができる。 According to the second embodiment, when learning user preferences from a web page, the user's likes and dislikes (presence of interest) for the target web page are determined based on the length of stay in the URL. Therefore, it is possible to eliminate the operation (work) that the user performs only to teach the system about the likes and dislikes of the Web page.

（Ｃ）他の実施形態
上記各実施形態におけるユーザの嗜好を自動的に判定する方法は、実施形態のように単独としてＷｅｂページ閲覧システムに適用するだけでなく、他の方法と組み合わせて、Ｗｅｂページ閲覧システムに適用するようにしても良い。 (C) Other Embodiments The method for automatically determining the user's preference in each of the above embodiments is not only applied to the web page browsing system as a single unit as in the embodiment, but also in combination with other methods. You may make it apply to a page browsing system.

例えば、自動判定モードを設定できるようにしておき、自動判定モードがオンのときには、上記各実施形態の判定方法を適用し、自動判定モードがオフのときには、従来のように、「好き」アイコンや「嫌い」アイコンを表示させ、ユーザに指定させるようにしても良い。 For example, the automatic determination mode can be set, and when the automatic determination mode is on, the determination method of each of the above embodiments is applied. When the automatic determination mode is off, the “like” icon or A “dislike” icon may be displayed and designated by the user.

また例えば、ページ特徴量記憶部１５に特徴量を記憶させたページ数が好き嫌い合わせて所定ページ数になるまでは、「好き」アイコンや「嫌い」アイコンを表示させる方法を適用し、それ以降は、上記各実施形態の判定方法を適用するようにしても良い。 In addition, for example, a method of displaying a “like” icon or a “dislike” icon is applied until the number of pages in which the feature amount is stored in the page feature amount storage unit 15 becomes a predetermined number of pages. The determination methods of the above embodiments may be applied.

さらに例えば、第１及び第２の実施形態の方法を組み合わせるようにしても良い。例えば、ＵＲＬがブックマーク登録されたＷｅｂページは直ちに好きと判定すると共に、ブックマーク登録されなかったものについては、さらに、閲覧時間を確認して好き嫌いの判定を行うようにしても良い。 Further, for example, the methods of the first and second embodiments may be combined. For example, it may be determined that a Web page whose URL is bookmarked is immediately liked, and for those that are not bookmarked, the browsing time may be confirmed to determine whether or not the user likes or dislikes.

上記各実施形態では、好き嫌いという２分類で判定するものを示したが、分類を増加させるようにしても良い。例えば、ブックマークの登録に関し、複数のフォルダのいずれかに登録し得る場合には、好きに関し、複数のサブ分類を設けるようにしても良い。また例えば、閲覧時間を、３つ以上の分けて捉えるようにしても良い。 In each of the above-described embodiments, what is determined by two categories of likes and dislikes is shown, but the number of categories may be increased. For example, when bookmarks can be registered in any of a plurality of folders, a plurality of sub-classes may be provided for likes. Further, for example, the browsing time may be divided into three or more.

上記各実施形態では、学習モードでは、全種類のＷｅｂページを好き嫌いの判別対象としたが、種類によっては、好き嫌いの判断対象から除外するようにしても良い。例えば、画像（動画像を含む）の面積が全面積の所定割合を超えるＷｅｂページを対象外とするようにしても良く、また、Ｗｅｂページの総トークン数が所定数以下のＷｅｂページを対象外とするようにしても良い。 In each of the above embodiments, in the learning mode, all types of Web pages are determined as likes and dislikes, but depending on the types, they may be excluded from likes and dislikes. For example, Web pages in which the area of an image (including moving images) exceeds a predetermined percentage of the total area may be excluded, and Web pages in which the total number of tokens in the Web page is equal to or less than a predetermined number are excluded. You may make it.

上記第１の実施形態では、ブックマークへＵＲＬが登録されたＷｅｂページを好きと判定するものを示したが、後での処理のために、閲覧に供しているＷｅｂページをユーザがファイル保存（ファイル登録）した場合には、そのＷｅｂページを好きと扱うようにしても良い。すなわち、将来の処理、利用のために、Ｗｅｂページに係る情報の少なくとも一部を保存する行為をユーザが実行したＷｅｂページを好きと判断するようにする。 In the first embodiment, the Web page whose URL is registered in the bookmark is determined to be liked. However, for later processing, the user saves the Web page being browsed as a file (file In the case of registration), the user may treat the Web page as likes. That is, it is determined that the user likes the Web page in which the user has performed the act of saving at least part of the information related to the Web page for future processing and use.

上記第２の実施形態では、閲覧時間に応じた好き嫌いを判断する閾値が固定値のものを示したが、ユーザが可変設定できるようにしても良い。また、システムが、Ｗｅｂページの種類に応じて、複数種類の中から、閾値を選択するようにしても良い。例えば、企業などのトップページと、そのトップページからの木構造の階層において下位のＷｅｂページ
とでは、閾値を自動的に切り替えるようにしても良い。トップページか否かは、所定のキーワードの有無やＵＲＬの構造などによって判別することができる。 In the second embodiment, the threshold for determining likes and dislikes according to the viewing time is a fixed value. However, the user may be able to variably set the threshold. Further, the system may select a threshold value from a plurality of types according to the type of the Web page. For example, the threshold value may be automatically switched between a top page of a company or the like and a lower Web page in a tree structure hierarchy from the top page. Whether the page is a top page can be determined by the presence of a predetermined keyword, the structure of a URL, and the like.

第２の実施形態に関して言えば、ユーザが、あるＷｅｂページの閲覧後、他のＷｅｂページの閲覧に移り、その後、閲覧を最初のＷｅｂページに戻した場合には（キャッシュにメモリされたものを再表示する）、以下のように取り扱うようにしても良い。最初の閲覧で好き判定されている場合には、後での閲覧を判定外にする。最初の閲覧で嫌い判定された場合には、（１）戻った後の閲覧時間をも閲覧時間に組み入れて判定し直す、又は、（２）当初の閲覧とは無関係に再表示の閲覧時間に基づいて判定し直す。（１）及び（２）の場合共に、再表示を反映させて好き判定された場合には、当初の嫌い判定の情報をページ特徴量記憶部から削除する。 As for the second embodiment, when a user moves to another web page after browsing a certain web page and then returns to the first web page (the one stored in the cache). It may be handled as follows. If it is determined that the user likes the first browsing, the subsequent browsing is excluded from the determination. If it is determined that the user dislikes the first browsing, (1) the browsing time after returning is also included in the browsing time and re-determination is performed, or (2) the viewing time is displayed again regardless of the initial browsing. Re-determine based on. In both cases (1) and (2), when a determination is made that the re-display is reflected, the initial dislike information is deleted from the page feature amount storage unit.

上記各実施形態に記載した方法は、インターネット上に接続されるパソコン、携帯端末機器（例えば、携帯電話、ＰＤＡ）などが接続されるＬＡＮ、電話回線、専用線、無線ネットワークなどの回線を使って構築される通信ネットワークから利用可能であり、あらゆるシステムへの最適な嗜好判定のための情報収集を提供することができる。 The method described in each of the above embodiments uses a line such as a personal computer connected to the Internet, a mobile terminal device (for example, a mobile phone, a PDA) or the like, a LAN, a telephone line, a dedicated line, or a wireless network. It can be used from the constructed communication network, and can provide information collection for optimum preference determination to any system.

上記各実施形態では、Ｗｅｂを例に説明したが、本発明はＷｅｂに限定せず、任意の通信アプリケーションを用いることができる。例えば、電子メールを用いることも可能である。 In each of the above embodiments, the Web has been described as an example. However, the present invention is not limited to the Web, and any communication application can be used. For example, electronic mail can be used.

また、本発明では、任意の文書を対象とすることが可能である。従って、ネットワーク経由の通信を行うことは必ずしも要件ではない。例えば、ＣＤ−ＲＯＭなどの記憶媒体から読み出した文書を対象とすることも可能である。 In the present invention, any document can be targeted. Therefore, it is not always a requirement to perform communication via the network. For example, it is possible to target a document read from a storage medium such as a CD-ROM.

第１の実施形態におけるＷｅｂページ閲覧システムの機能的構成を示すブロック図である。It is a block diagram which shows the functional structure of the web page browsing system in 1st Embodiment. あるＷｅｂページについて、ユーザの文書嗜好を学習する際の第１の実施形態における全体の動作の流れを示すフローチャートである。It is a flowchart which shows the flow of the whole operation | movement in 1st Embodiment at the time of learning a user's document preference about a certain web page. 第１の実施形態におけるページ特徴量の作成動作の流れを示すフローチャートである。It is a flowchart which shows the flow of the production | generation operation | movement of the page feature-value in 1st Embodiment. 第２の実施形態におけるＷｅｂページ閲覧システムの機能的構成を示すブロック図である。It is a block diagram which shows the functional structure of the web page browsing system in 2nd Embodiment. あるＷｅｂページについて、ユーザの文書嗜好を学習する際の第２の実施形態における全体の動作の流れを示すフローチャートである。It is a flowchart which shows the flow of the whole operation | movement in 2nd Embodiment at the time of learning a user's document preference about a certain web page.

Explanation of symbols

１、１Ａ…Ｗｅｂページ閲覧システム、１０…入力部、１１…表示部、１２…通信部、１３、１３Ａ…閲覧制御部、１４…ページ特徴量形成部、１５…ページ特徴量記憶部、１６…嗜好文書判定部、１７…ブックマーク記憶部、１８…閲覧時間計時部、２０…インターネット。
DESCRIPTION OF SYMBOLS 1, 1A ... Web page browsing system, 10 ... Input part, 11 ... Display part, 12 ... Communication part, 13, 13A ... Browsing control part, 14 ... Page feature-value formation part, 15 ... Page feature-value memory | storage part, 16 ... Preference document determination unit, 17 ... bookmark storage unit, 18 ... browsing time counter, 20 ... Internet.

Claims

A feature quantity acquisition means for obtaining a feature quantity of the input document;
It is determined whether or not an action to save the information related to the input document for the future use and use of the input document has been performed, and it is determined that the action to be stored is a document in which the user is interested 1 As a condition, interest discriminating means for discriminating whether the user is interested in the input document,
A document preference learning system comprising: a document feature amount storage unit that stores a feature amount of the input document according to whether the document is a document of interest to a user.

A feature quantity acquisition means for obtaining a feature quantity of the input document;
An interest discriminating means for discriminating whether or not the user is interested in the input document, as one condition for discriminating that the user's browsing time of the input document exceeds a threshold as a document of interest to the user;
A document preference learning system comprising: a document feature amount storage unit that stores a feature amount of the input document according to whether the document is a document of interest to a user.

A feature amount acquisition unit, an interest determination unit, and a document feature amount storage unit;
The feature amount acquisition means obtains the feature amount of the input document,
The interest discriminating unit determines whether or not an act of storing the information related to the input document for future use or use of the input document has been performed, and the user is interested in performing the storing operation. As one condition for discriminating from a certain document, the user's interest in the input document is discriminated,
The document feature amount storage means stores the feature amount of the input document according to whether the document is a document of interest to the user or not.

A feature amount acquisition unit, an interest determination unit, and a document feature amount storage unit;
The feature amount acquisition means obtains the feature amount of the input document,
The interest determination means determines whether the user is interested in the input document as one condition for determining that the time when the user is browsing the input document exceeds a threshold as a document of interest to the user,
The document feature amount storage means stores the feature amount of the input document according to whether the document is a document of interest to the user or not.

Computer
A feature quantity acquisition means for obtaining a feature quantity of the input document;
It is determined whether or not an action to save the information related to the input document for the future use and use of the input document has been performed, and it is determined that the action to be stored is a document in which the user is interested 1 As a condition, interest discriminating means for discriminating whether the user is interested in the input document,
A document preference learning program characterized in that it is described so as to function as a document feature amount storage unit that stores the feature amount of the input document according to whether the document is a document of interest to the user.

Computer
A feature quantity acquisition means for obtaining a feature quantity of the input document;
An interest discriminating means for discriminating whether or not the user is interested in the input document, as one condition for discriminating that the user's browsing time of the input document exceeds a threshold as a document of interest to the user;
A document preference learning program characterized in that it is described so as to function as a document feature amount storage unit that stores the feature amount of the input document according to whether the document is a document of interest to the user.