JP2003281182A

JP2003281182A - Information retrieval device, information retrieval method, program and recording medium

Info

Publication number: JP2003281182A
Application number: JP2002076923A
Authority: JP
Inventors: Takashige Tanaka; 敬重田中
Original assignee: Seiko Epson Corp
Current assignee: Seiko Epson Corp
Priority date: 2002-03-19
Filing date: 2002-03-19
Publication date: 2003-10-03
Anticipated expiration: 2022-03-19
Also published as: JP3945282B2

Abstract

<P>PROBLEM TO BE SOLVED: To shorten the time required for extracting information fitting to a retrieval condition from information stored in a database. <P>SOLUTION: A data collection part 102 obtains related information corresponding to a data item specified by a setting file 200 from respective related information pieces of a document data stored in a groupware database 20a, and outputs the same to an indexing part 104. The indexing part 104 indexes the related information by each document data and registers the same in an index file stored in a database 10a for retrieval. <P>COPYRIGHT: (C)2004,JPO

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、データベースの情
報を検索する情報検索装置、情報検索方法、プログラム
および記録媒体に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an information search device, information search method, program and recording medium for searching information in a database.

【０００２】[0002]

【従来の技術】企業などでは、例えばＬＡＮ（Local Ar
ea Network）などのコンピュータネットワーク（以下、
単に「ネットワーク」と称する）が構成され、このネッ
トワーク内における各種データの共有により、作業効率
の向上化が図られている。具体的には、ネットワークを
形成するいずれかのコンピュータにグループウェアやコ
ラボレートウェアなどと呼ばれるソフトウェア（以下、
「グループウェア」と称する）が導入されることで、こ
のコンピュータ（以下、「グループウェアサーバ」と称
する）が保持する各種データ（例えば、共有文書や各ユ
ーザのスケジュールなど）に対してネットワークに接続
された各コンピュータ（以下、「クライアント端末」と
称する）からアクセス可能になる。2. Description of the Related Art In companies, for example, LAN (Local Ar
ea Network) and other computer networks (hereinafter,
The network is simply referred to as "network"), and various data are shared in this network to improve work efficiency. Specifically, software called groupware or collaborative ware (hereinafter,
By introducing "groupware", connect to the network for various data (for example, shared documents and schedules of each user) held by this computer (hereinafter referred to as "groupware server"). It becomes accessible from each computer (hereinafter referred to as “client terminal”).

【０００３】また、グループウェアには、クライアント
端末からの要求に応じて、蓄積された文書データから該
当する文書データを検索する機能が備えられている。こ
れにより、ユーザは、クライアント端末を用いてグルー
プウェアサーバが管理する大量の文書データから所望の
文書データを見つけることが容易となる。Further, the groupware has a function of searching the stored document data for the corresponding document data in response to a request from the client terminal. As a result, the user can easily find desired document data from the large amount of document data managed by the groupware server using the client terminal.

【０００４】[0004]

【発明が解決しようとする課題】しかしながら、グルー
プウェアサーバが文書データを検索する時には、全ての
文書データを対象に検索処理を実行するのが一般的であ
り、文書データの数や各文書データの容量に比例して検
索時間も長くなるといった問題がある。特に、顧客から
の問い合わせに対応するコールセンターでは、グループ
ウェアサーバが顧客からの問い合わせに応じた文書デー
タを素早く検索して取り出す必要があるため、この問題
は、より深刻化する。However, when the groupware server searches for document data, it is common to execute a search process for all document data, and the number of document data and the number of document data There is a problem that the search time also increases in proportion to the capacity. In particular, in a call center that responds to inquiries from customers, this problem is exacerbated because the groupware server needs to quickly search and retrieve document data in response to inquiries from customers.

【０００５】本発明は、上述した事情を鑑みてなされた
ものであり、データベースに蓄積されている情報のう
ち、検索条件に該当する情報を特定するに要する時間を
短縮することが可能な情報検索装置、情報検索方法、プ
ログラムおよび記録媒体を提供することを目的とする。The present invention has been made in view of the above-mentioned circumstances, and an information search capable of shortening the time required to specify the information corresponding to the search condition among the information stored in the database. An object is to provide an apparatus, an information search method, a program, and a recording medium.

【０００６】[0006]

【課題を解決するための手段】上記目的を達成するため
に、本発明は、少なくともテキスト文を含むテキストデ
ータと、当該テキストデータの識別情報とを対応付ける
とともに、当該テキスト文に関連した複数の関連情報
と、当該複数の関連情報を分類する項目と、当該テキス
ト文に対応するテキストデータの識別情報とを対応付け
るデータベースを検索する情報検索装置において、前記
項目のうち、検索の対象となり得る項目を指定する項目
指定情報を記憶する第１の記憶手段と、前記項目指定情
報によって指定された項目に分類される関連情報を前記
データベースから取得する関連情報取得手段と、前記関
連情報取得手段によって取得された関連情報を、当該関
連情報に対応する前記識別情報と対応付けて記憶する第
２の記憶手段と、前記項目指定情報によって指定された
項目に則した検索条件を取得する検索条件取得手段と、
前記第２の記憶手段に記憶された関連情報の中から、前
記検索条件に該当する関連情報を特定し、当該関連情報
に対応する前記識別情報を特定する検索手段とを備える
情報検索装置を提供する。In order to achieve the above object, the present invention associates at least text data containing a text sentence with identification information of the text data and also provides a plurality of associations related to the text sentence. In an information search device that searches a database that associates information, an item that classifies the plurality of related information items, and identification information of text data that corresponds to the text sentence, an item that can be a search target among the items is specified. First storing means for storing item specifying information to be stored, related information acquiring means for acquiring related information classified into items specified by the item specifying information from the database, and acquired by the related information acquiring means. A second storage unit that stores the related information in association with the identification information corresponding to the related information; A search condition acquisition unit that acquires a search condition conforming to the specified item by item specification information,
Provided is an information search device comprising: a search unit that specifies related information corresponding to the search condition from the related information stored in the second storage unit and specifies the identification information corresponding to the related information. To do.

【０００７】また、上記目的を達成するために、本発明
は、少なくともテキスト文を含むテキストデータと、当
該テキストデータの識別情報とを対応付けるとともに、
当該テキスト文に関連した複数の関連情報と、当該複数
の関連情報を分類する項目と、当該テキスト文に対応す
るテキストデータの識別情報とを対応付けるデータベー
スを検索する情報検索方法において、前記項目のうち、
検索の対象となり得る項目を指定する項目指定情報を記
憶装置に記憶する第１の過程と、前記項目指定情報によ
って指定された項目に分類される関連情報を前記データ
ベースから取得する第２の過程と、前記第２の過程にお
いて取得された関連情報を、当該関連情報に対応する前
記識別情報と対応付けて前記記憶装置に記憶する第３の
過程と、前記項目指定情報によって指定された項目に則
した検索条件を取得する第４の過程と、前記記憶装置に
記憶された関連情報の中から、前記検索条件に該当する
関連情報を特定し、当該関連情報に対応する前記識別情
報を特定する第５の過程とを備える情報検索方法を提供
する。In order to achieve the above object, the present invention associates at least text data including a text sentence with identification information of the text data, and
In the information search method for searching a database that associates a plurality of related information related to the text sentence, an item that classifies the plurality of related information, and identification information of the text data corresponding to the text sentence, among the items ,
A first step of storing, in a storage device, item designation information that designates an item that can be a search target; and a second step of acquiring, from the database, related information classified into the item designated by the item designation information. According to the third step of storing the related information acquired in the second step in the storage device in association with the identification information corresponding to the related information, and the item designated by the item designation information. The fourth step of obtaining the searched condition, and the related information corresponding to the searched condition is specified from the related information stored in the storage device, and the identification information corresponding to the related information is specified. An information search method including the process of 5 is provided.

【０００８】上述した情報検索装置および情報検索方法
によれば、データベースに記憶されている複数の項目か
ら検索の対象となり得る項目だけが予め抽出され、そし
て、その抽出された項目に対して検索が行われる。従っ
て、本発明によれば、該当する文書データを特定するに
要する時間が、データベースの全ての項目に対して検索
が実行されるときに比べて早くなる。また、利用者は、
項目指定情報が指定する項目を変更するだけで、検索の
対象とする項目を変更することができる。According to the above-described information search device and information search method, only items that can be searched are extracted in advance from a plurality of items stored in the database, and the extracted items are searched. Done. Therefore, according to the present invention, the time required to identify the corresponding document data is shorter than when the search is performed for all the items in the database. Also, the user
The item to be searched can be changed only by changing the item designated by the item designation information.

【０００９】ここで、上記情報検索装置において、前記
テキストデータからテキスト文を抽出する本文抽出手段
と、前記抽出されたテキスト文を複数の単語に分割する
形態素解析手段と、前記複数の単語の各々が前記テキス
ト文に出現する回数を計数する出現頻度計数手段とを備
え、前記第２の記憶手段は、前記単語と当該単語の計数
値とを、前記テキスト文に対応するテキストデータの識
別情報と対応付けて記憶する構成が望ましい。この構成
によれば、検索条件として単語が取得された場合に、当
該単語を多く含む順にテキストデータの識別情報を特定
するといったことが行える。Here, in the above information retrieval device, a body extracting means for extracting a text sentence from the text data, a morpheme analyzing means for dividing the extracted text sentence into a plurality of words, and each of the plurality of words. And an appearance frequency counting unit that counts the number of times that appears in the text sentence, and the second storage unit stores the word and the count value of the word as identification information of text data corresponding to the text sentence. It is desirable to store them in association with each other. According to this configuration, when a word is acquired as the search condition, the identification information of the text data can be specified in the order of including the word in large numbers.

【００１０】また、上記目的を達成するために、本発明
は、少なくともテキスト文を含むテキストデータと、当
該テキストデータの識別情報とを対応付けるとともに、
当該テキスト文に関連した複数の関連情報と、当該複数
の関連情報を分類する項目と、当該テキスト文に対応す
るテキストデータの識別情報とを対応付けるデータベー
スを検索するコンピュータを、前記項目のうち、検索の
対象となり得る項目を指定する項目指定情報を記憶する
第１の記憶手段、前記項目指定情報によって指定された
項目に分類される関連情報を前記データベースから取得
する関連情報取得手段、前記関連情報取得手段によって
取得された関連情報を、当該関連情報に対応する前記識
別情報と対応付けて記憶する第２の記憶手段、前記項目
指定情報によって指定された項目に則した検索条件を取
得する検索条件取得手段、および、前記第２の記憶手段
に記憶された関連情報の中から、前記検索条件に該当す
る関連情報を特定し、当該関連情報に対応する前記識別
情報を特定する検索手段として機能させるためのプログ
ラムを提供する。このプログラムは、例えば光ディスク
や磁気ディスクなどのコンピュータ読み取り可能な記録
媒体に記録されていても良いことは勿論である。In order to achieve the above object, the present invention associates at least text data including a text sentence with identification information of the text data, and
Among the above items, a computer that searches a database that associates a plurality of related information related to the text sentence, an item that classifies the plurality of related information items, and identification information of the text data corresponding to the text sentence is searched. First storage means for storing item designation information for designating an item that can be a target of, a related information acquisition means for obtaining related information classified into the item designated by the item designation information from the database, and the related information acquisition Second storage means for storing the related information acquired by the means in association with the identification information corresponding to the related information, and search condition acquisition for acquiring a search condition according to the item designated by the item designation information Means and related information stored in the second storage means, and specifies related information corresponding to the search condition. Provides a program for functioning as the search means for identifying the identification information corresponding to the related information. Of course, this program may be recorded in a computer-readable recording medium such as an optical disk or a magnetic disk.

【００１１】[0011]

【発明の実施の形態】以下、図面を参照して本発明の実
施形態について説明する。BEST MODE FOR CARRYING OUT THE INVENTION Embodiments of the present invention will be described below with reference to the drawings.

【００１２】図１は、本発明の実施形態に係る情報検索
システムの構成を示す図である。この図において、グル
ープウェアサーバ２０は、例えば磁気ディスクなどの記
憶装置に格納されたグループウェアデータベース２０ａ
を備えている。このグループウェアデータベース２０ａ
には、ネットワーク２を介して接続された多数のクライ
アント端末３０の間で共有される文書データが蓄積され
ている。ここで、文書データとは、テキスト文が含まれ
るデータのことである。また、グループウェアサーバ２
０は、共有される文書データが蓄積されたデータベース
（すなわち、上述したグループウェアデータベース２０
ａ）の他にも、実際には、例えば利用者毎の電子メール
データが蓄積されたデータベースや、利用者毎のスケジ
ュールデータが蓄積されたデータベースといった多種の
データベースを備えている。FIG. 1 is a diagram showing the configuration of an information search system according to an embodiment of the present invention. In this figure, the groupware server 20 is a groupware database 20a stored in a storage device such as a magnetic disk.
Is equipped with. This groupware database 20a
Document data shared by a large number of client terminals 30 connected via the network 2 is stored in the. Here, the document data is data including a text sentence. In addition, the groupware server 2
0 is a database in which shared document data is accumulated (that is, the groupware database 20 described above).
In addition to a), actually, various databases are provided, such as a database in which electronic mail data for each user is stored and a database in which schedule data for each user is stored.

【００１３】さて、図１において、情報検索装置１０
は、パーソナルコンピュータなどから構成されており、
ネットワーク２を介してクライアント端末３０からの文
書データの検索要求を取得し、この検索要求に該当する
文書データの候補を当該クライアント端末３０に送信す
るものである。さらに説明すると、情報検索装置１０
は、例えば磁気ディスクなどの記憶装置を備え、この記
憶装置には、検索用データベース１０ａが格納されてい
る。情報検索装置１０は、グループウェアデータベース
２０ａに蓄積されている各文書データに関連する情報を
検索用データベース１０ａに蓄積し、クライアント端末
３０から検索要求を取得したときに、この検索用データ
ベース１０ａに蓄積された情報を検索するようになって
いる。Now, referring to FIG. 1, the information retrieval device 10
Consists of a personal computer, etc.,
A document data search request from the client terminal 30 is acquired via the network 2, and a candidate for the document data corresponding to this search request is transmitted to the client terminal 30. To explain further, the information retrieval device 10
Is equipped with a storage device such as a magnetic disk, and the search database 10a is stored in this storage device. The information search device 10 stores information related to each document data stored in the groupware database 20a in the search database 10a, and stores in the search database 10a when a search request is acquired from the client terminal 30. It is designed to search for information that has been posted.

【００１４】図２は、本実施形態に係る情報検索装置１
０の構成を示す機能ブロック図である。同図において、
設定ファイル解析部１００は、設定ファイル２００に示
される指示に従って、文書データに関連する情報のう
ち、検索用データベース１０ａに蓄積すべき情報（以
下、「検索用情報」という）を特定し、データ収集部１
０２に出力する。ここで、設定ファイル２００は、例え
ばグループウェアサーバ２０の管理者などによって作成
されるデータファイルであり、その構成を図３に示す。
同図に示すように、設定ファイル２００には、取得項
目、重み付け単語、格納先アドレスおよび格納元アドレ
スの各々が指定されている。FIG. 2 shows an information retrieval apparatus 1 according to this embodiment.
It is a functional block diagram which shows the structure of 0. In the figure,
The setting file analysis unit 100 specifies information to be accumulated in the search database 10a (hereinafter, referred to as “search information”) among the information related to the document data according to the instruction shown in the setting file 200, and collects the data. Part 1
Output to 02. Here, the setting file 200 is a data file created by, for example, the administrator of the groupware server 20, and the configuration thereof is shown in FIG.
As shown in the figure, each of the acquisition item, the weighted word, the storage destination address and the storage source address is designated in the setting file 200.

【００１５】取得項目は、グループウェアサーバ２０が
管理するデータ項目のうち、どの項目を取得するかを指
定するものである。詳述すると、グループウェアサーバ
２０は、文書データに関連する関連情報をデータ項目ご
とに分けて記録されたグループウェアファイル２２を、
文書データごとに備えている。図４は、このグループウ
ェアファイルの一例を示す図である。この図において、
文字列「ITEM_NAME」は、データ項目を示すものであ
り、この文字列「ITEM_NAME」と等号（＝）にて結ばれ
た文字列がデータ項目名を示す。例えば、「ITEM_NAME=
Classification」である場合、データ項目名は、「分類
（Classification）」となる。また、データ項目名（す
なわち、文字列「ITEM_NAME」）の次行がデータ項目名
に対応する文書データの関連情報である。具体的には、
例えば、文字列「ITEM_NAME=Classification」の次行に
記載された文字列「TYPE_TEXT=テクニカルノート」は、
データ項目名「分類」対応する文書データの関連情報が
「テクニカルノート」であることを示している。そこ
で、取得項目は、グループウェアファイル２２に含まれ
るデータ項目名（文字列「ITEM_NAME」によって示され
るデータ項目名）のうち、取得すべきデータ項目名を指
定する。なお、図示を省略するが、このグループウェア
ファイル２２には、当該グループウェアファイル２２
が、どの文書データに対応しているかも示されている。The acquisition item is for designating which of the data items managed by the groupware server 20 is to be acquired. More specifically, the groupware server 20 stores the groupware file 22 in which the related information related to the document data is recorded for each data item.
Prepared for each document data. FIG. 4 is a diagram showing an example of this groupware file. In this figure,
The character string “ITEM_NAME” indicates a data item, and the character string connected to the character string “ITEM_NAME” by an equal sign (=) indicates a data item name. For example, "ITEM_NAME =
In the case of “Classification”, the data item name is “Classification”. Further, the next line of the data item name (that is, the character string “ITEM_NAME”) is the related information of the document data corresponding to the data item name. In particular,
For example, the character string "TYPE_TEXT = Technical Note" written in the line following the character string "ITEM_NAME = Classification" is
Data item name “classification” Indicates that the related information of the corresponding document data is “technical note”. Therefore, the acquisition item specifies the data item name to be acquired, out of the data item names (data item name indicated by the character string “ITEM_NAME”) included in the groupware file 22. Although not shown, the groupware file 22 includes the groupware file 22.
It also shows which document data corresponds to.

【００１６】また、設定ファイル２００における重み付
け単語は、検索語として頻繁に用いられる単語を指定す
るためのものである。格納元アドレスは、検索対象とな
るデータベースが格納されているアドレスを示すもので
ある。詳述すると、グループウェアサーバ２０は、上述
したように、多数のデータベースを備えるのが一般的で
あり、このため、どのデータベースを検索対象とするか
が特定される必要がある。そこで、アドレスを指定する
ことにより、検索対象となるデータベースを特定するの
である。また、格納先アドレスは、上述した格納元アド
レスによって特定されるデータベース内の各データから
検索用情報に従って抽出した情報を検索用データベース
１０ａに格納するときのアドレスを示すものである。こ
のように、検索対象となるデータベースごとに、異なる
格納先アドレスが指定されることで、検索対象となるデ
ータベースごとに抽出した情報を多数検索用データベー
ス１０ａに格納することができるようになっている。The weighted words in the setting file 200 are for designating words that are frequently used as search words. The storage source address indicates the address where the database to be searched is stored. More specifically, the groupware server 20 generally includes a large number of databases as described above, and therefore, it is necessary to specify which database is to be searched. Therefore, the database to be searched is specified by designating the address. Further, the storage destination address indicates an address at which the information extracted according to the search information from each data in the database specified by the storage source address is stored in the search database 10a. In this way, by specifying different storage destination addresses for each database to be searched, it is possible to store the information extracted for each database to be searched in the multiple search database 10a. .

【００１７】さて、図２において、データ収集部１０２
は、設定ファイル解析部１００からの検索用情報によっ
て示される取得項目をグループウェアサーバ２０からネ
ットワーク２を介して受け取り、次の処理を行うもので
ある。すなわち、データ収集部１０２は、文書データお
よびグループウェアファイル２２から取得した各項目の
うち、文書データにおける本文部分に対応するものから
本文データファイル２０２を生成するとともに、本文部
分以外のものから情報データファイル２０４を生成し、
各々をインデキシング部１０４に出力する。図５に示す
ように、本文データファイル２０２には、重み付け単語
によって指定された単語（図示例では、「インターフェ
ースデバイスＹＹＹ」など）が本文データの末尾に付加
される（詳細については、後述）。また、図６に示すよ
うに、情報データファイル２０４に含まれる情報は、例
えば、文書データに付されたタイトル（TITLE）や、グ
ループウェアデータベース２０ａにおける文書データの
格納元アドレス（URL：Uniform Resource Locator）な
どである。なお、データ収集部１０２がグループウェア
サーバ２０から文書データを取得する機能は、グループ
ウェアの製造元が提供するＡＰＩ（Application Progra
m Interface）によって実現されている。Now, referring to FIG. 2, the data collection unit 102
Receives the acquisition item indicated by the search information from the setting file analysis unit 100 from the groupware server 20 via the network 2 and performs the following processing. That is, the data collection unit 102 generates the body data file 202 from the items corresponding to the body part of the document data among the items acquired from the document data and the groupware file 22, and the information data from the items other than the body part. Generate file 204,
Each is output to the indexing unit 104. As shown in FIG. 5, in the body data file 202, a word (in the illustrated example, “interface device YYY” or the like) designated by the weighted word is added to the end of the body data (details will be described later). Further, as shown in FIG. 6, the information included in the information data file 204 includes, for example, a title (TITLE) attached to the document data and a storage source address (URL: Uniform Resource Locator) of the document data in the groupware database 20a. ) And so on. The function of the data collection unit 102 to acquire the document data from the groupware server 20 has an API (Application Program) provided by the groupware manufacturer.
m Interface).

【００１８】インデキシング部１０４は、データ収集部
１０２から受け取った本文データファイル２０２に対し
て形態素解析を行った後に、インデキシング（目次化）
を実行し、この実行結果を、インデックスファイル２０
６に登録するものであり、コンピュータにおけるＣＰＵ
に相当する。インデックスファイル２０６は、検索用デ
ータベース１０ａに格納されているものであり、インデ
ックスファイル２０６には、ページテーブル２０６ａ、
キーワードテーブル２０６ｃおよび単語テーブル２０６
ｂが含まれている（図７参照）。なお、各データテーブ
ルについては、後述する。The indexing unit 104 performs morphological analysis on the text data file 202 received from the data collection unit 102, and then performs indexing (indexing).
Is executed, and the execution result is the index file 20
The CPU in the computer to be registered in 6
Equivalent to. The index file 206 is stored in the search database 10a, and the index file 206 includes a page table 206a,
Keyword table 206c and word table 206
b is included (see FIG. 7). Each data table will be described later.

【００１９】ここで、インデキシング部１０４が実行す
る形態素解析とは、漢字仮名交じりで記載された日本語
の文を単語（形態素）に分解し、各単語の読み仮名や品
詞などを特定することである。形態素解析用辞書１０６
は、インデキシング部１０４における形態素解析に用い
られる辞書であり、様々な単語を収録している。さらに
説明すると、インデキシング部１０４は、解析対象とな
る文の続きの部分と最も長く一致する単語を形態素解析
用辞書１０６から抽出するといったことを繰り返して文
を単語（形態素）に分解する。なお、単語同士が空白で
区切られる言語（例えば英語）にて本文データファイル
の本文が記載されている場合には、形態素解析が必要な
いことは勿論である。Here, the morphological analysis executed by the indexing unit 104 is to decompose a Japanese sentence written with kanji and kana mixed into words (morphemes) and specify the phonetic kana or part-of-speech of each word. is there. Morphological analysis dictionary 106
Is a dictionary used for morphological analysis in the indexing unit 104, and stores various words. More specifically, the indexing unit 104 decomposes the sentence into words (morphemes) by repeatedly extracting from the morphological analysis dictionary 106 the word that has the longest match with the subsequent portion of the sentence to be analyzed. Of course, if the text of the text data file is written in a language (for example, English) in which words are separated by spaces, morphological analysis is not necessary.

【００２０】図８は、上述したページテーブルの一例を
示す図である。このページテーブル２０６ａは、各文書
データの概要を示す情報を管理するためのものである。
このページテーブル２０６ａの１つのレコードには、文
書識別情報と、サーバ識別情報と、格納元アドレスと、
最終更新日時情報と、題名情報と、本文情報と、分類情
報と、総単語数情報と、ソフト別文書識別情報と、参照
レベル情報との各々が含まれている。FIG. 8 is a diagram showing an example of the above-mentioned page table. The page table 206a is for managing information indicating the outline of each document data.
In one record of this page table 206a, document identification information, server identification information, storage source address,
The information includes last update date / time information, title information, text information, classification information, total word number information, software-specific document identification information, and reference level information.

【００２１】ここで、文書識別情報は、グループウェア
データベース２０ａから取得した文書データごとに、情
報検索装置１０が固有に割り当てる識別情報である。サ
ーバ識別情報は、その文書データの取得元であるグルー
プウェアサーバ２０を特定する情報であり、本実施形態
にあっては、図８に示すように、情報検索装置１０がサ
ーバごとに固有に割り当てた番号によって示される。格
納元アドレスは、グループウェアデータベース２０ａに
おける文書データの格納アドレスを示すものであり、図
８に示すように、ＵＲＬによって指定されている。最終
更新日時情報は、情報検索装置１０が文書データの情報
を更新した最終日時を示す情報である。題名情報は、そ
の文書データの題名（TITLE）を示す情報であり、例え
ば２５６バイトといった所定バイト数の文字列によって
示される。本文情報は、その文書データの本文の先頭か
ら所定文字数（例えば２５６バイト）分の文を示すもの
である。Here, the document identification information is identification information uniquely assigned by the information search device 10 for each document data acquired from the groupware database 20a. The server identification information is information that identifies the groupware server 20 that is the acquisition source of the document data. In the present embodiment, as shown in FIG. 8, the information search device 10 uniquely allocates each server. Indicated by the number. The storage source address indicates the storage address of the document data in the groupware database 20a, and is specified by the URL as shown in FIG. The last update date / time information is information indicating the last date / time when the information search device 10 updated the information of the document data. The title information is information indicating the title (TITLE) of the document data, and is indicated by a character string having a predetermined number of bytes such as 256 bytes. The body information indicates a sentence of a predetermined number of characters (for example, 256 bytes) from the beginning of the body of the document data.

【００２２】また、分類情報は、文書データの文書の分
類を示す情報である。より具体的には、例えば、文書デ
ータがコールセンター内のネットワークで共有されるも
のである場合、分類情報には、その文書データが製品の
テクニカルサポート用文書なのか、製品のマニュアルな
のかといったことを示す情報が記録される。総単語情報
は、文書データの本文における総単語数を示すものであ
る。ソフト別文書識別情報は、グループウェアサーバ２
０が文書データに割り当てた固有の識別情報を示すもの
である。参照レベル情報は、その文書データの閲覧がネ
ットワークに接続された各クライアント端末に限定され
ているか、または、ネットワーク外の端末にも許可され
ているかといった情報を示すものである。ここで、サー
バ識別情報と、ソフト別文書識別情報とがページテーブ
ル２０６ａに含まれているのは、多数のサーバに同一の
グループウェアが導入されている場合に、各々のサーバ
が同一の識別情報を文書データに割り当てたときでも、
どのサーバのどの文書データなのかを一意に特定できる
ようにするためである。The classification information is information indicating the classification of the document of the document data. More specifically, for example, when the document data is shared by the network in the call center, the classification information indicates whether the document data is a product technical support document or a product manual. Information is recorded. The total word information indicates the total number of words in the text of the document data. The document identification information for each software is the groupware server 2
0 indicates unique identification information assigned to the document data. The reference level information indicates whether or not the browsing of the document data is limited to each client terminal connected to the network or is permitted to the terminals outside the network. Here, the server identification information and the software-specific document identification information are included in the page table 206a, because when the same groupware is installed in many servers, each server has the same identification information. Even when assigning to document data,
This is because it is possible to uniquely identify which document data of which server.

【００２３】次いで、図９は、上述した単語テーブルの
一例を示す図である。この単語テーブル２０６ｂは、各
文書データの本文に含まれる単語を管理するためのもの
である。より具体的には、図９に示すように、単語テー
ブル２０６ｂの１つのレコードには、単語と、情報検索
装置１０が単語ごとに固有に割り当てられる単語識別情
報と、グループウェアデータベース２０ａに蓄積されて
いる全文書データのうち、この単語を本文に含む文書デ
ータの数を示す単語使用文書数とが含まれている。ここ
で、単語使用文書数は、インデキシング部１０４が文書
データの本文データファイル２０２に対して形態素解析
を行った結果に従って算出されるものである。具体的に
は、インデキシング部１０４は、１つの本文データファ
イル２０２に形態素解析を行って本文を単語（形態素）
に分解した後に、各々の単語ごとに固有の識別情報を割
り当てて、単語テーブル２０６ｂに登録する。そして、
インデキシング部１０４は、登録した単語識別情報に対
応する単語使用文書数の値を「１」だけインクリメント
する。係る処理がグループウェアデータベース２０ａに
蓄積されている全ての文書データについて行われた結
果、単語ごとの単語使用文書数が得られる。Next, FIG. 9 is a diagram showing an example of the above-mentioned word table. The word table 206b is for managing the words included in the body of each document data. More specifically, as shown in FIG. 9, in one record of the word table 206b, words, word identification information uniquely assigned to each word by the information search device 10, and accumulated in the groupware database 20a. Among all the document data, the word use document number indicating the number of document data including this word in the body is included. Here, the word usage document number is calculated according to the result of the morphological analysis of the text data file 202 of the document data by the indexing unit 104. Specifically, the indexing unit 104 performs a morphological analysis on one body data file 202 to convert the body into words (morphemes).
Then, each word is assigned unique identification information and registered in the word table 206b. And
The indexing unit 104 increments the value of the word usage document number corresponding to the registered word identification information by "1". As a result of such processing being performed on all the document data stored in the groupware database 20a, the number of word-used documents for each word can be obtained.

【００２４】また、図１０は、上述したキーワードテー
ブルの一例を示す図である。このキーワードテーブル２
０６ｃは、各文書データの本文に含まれる単語ごとに、
１つの単語が何回出現しているかなどを管理するための
ものである。具体的には、図１０に示すように、キーワ
ードテーブル２０６ｃの１つのレコードには、上述した
単語テーブル２０６ｂに含まれる単語識別情報と、上述
したページテーブル２０６ａに含まれる文書識別情報
と、出現回数と、重要度とが含まれている。出現回数
は、単語が、文書識別情報によって特定される文書デー
タの本文内に何回出現するかを示すものであり、インデ
キシング部１０４が行う形態素解析により得られる。さ
らに説明すると、インデキシング部１０４は、文書デー
タの本文データファイル２０２の本文を単語（形態素）
に分解した後に、その本文内に、単語識別情報によって
示される単語が幾つ含まれるかを計数することにより、
出現頻度を算出する。重要度は、全文書データの本文に
おける単語の頻出度を示すものであり、次の式を用いて
インデキシング部１０４により算出される。（重要度）＝Ｓ×ｌｏｇ(Ｎ／ｎ) ここで、Ｓは、出現回数、Ｎは、グループウェアデータ
ベース２０ａに蓄積されている文書データの数、ｎは、
上述した単語使用文書数である。この式によって示され
るように、本文に同じ単語が含まれる文書データが多く
なる程、その単語の重要度が小さくなり、また、１つの
文書データの本文に同じ単語が頻繁に出現する程、その
単語の重要度が高くなる。ここで、上述したように、文
書データの本文データファイル２０２の末尾には、デー
タ収集部１０２により重み付け単語が付与されているた
め、この重み付け単語の重要度は、相対的に高くなるの
である。特に、文書データの題目（TITLE）には、その
文書データの本文の内容を顕著に反映した単語が含まれ
ることが多いため、この題目を本文データファイル２０
２に重み付けするようにしても良い。FIG. 10 is a diagram showing an example of the above-mentioned keyword table. This keyword table 2
06c is for each word included in the body of each document data,
This is for managing how many times a word appears. Specifically, as shown in FIG. 10, in one record of the keyword table 206c, the word identification information included in the word table 206b described above, the document identification information included in the page table 206a described above, and the number of appearances. And importance. The number of appearances indicates how many times the word appears in the text of the document data specified by the document identification information, and is obtained by the morphological analysis performed by the indexing unit 104. To further explain, the indexing unit 104 defines the body of the body data file 202 of the document data as a word (morpheme).
After being decomposed into, by counting how many words indicated by the word identification information are included in the body,
Calculate the appearance frequency. The degree of importance indicates the frequency of words in the body of all document data, and is calculated by the indexing unit 104 using the following formula. (Importance) = S × log (N / n) Here, S is the number of appearances, N is the number of document data accumulated in the groupware database 20a, and n is
This is the number of word-using documents described above. As shown in this equation, the more document data that includes the same word in the body, the less important the word becomes, and the more frequently the same word appears in the body of one document data, the more Words become more important. Here, as described above, since the weighting word is added to the end of the text data file 202 of the document data by the data collecting unit 102, the importance of the weighting word becomes relatively high. In particular, since the subject (TITLE) of the document data often includes words that remarkably reflect the content of the body of the document data, this subject is referred to as the body data file 20.
You may make it weight to 2.

【００２５】図２において、検索要求取得応答部１０８
は、ネットワーク２を介してクライアント端末３０から
検索要求を受け取り、検索部１１０に出力する。この検
索要求取得応答部１０８は、コンピュータにおけるネッ
トワークインターフェースデバイスに相当する。また、
検索部１１０は、検索要求取得応答部１０８からの検索
要求に応じて検索用データベース１０ａに格納されてい
るインデックスファイル２０６を検索し、検索結果を、
検索要求取得応答部１０８に出力する。検索要求取得応
答部１０８は、検索部１１０から検索結果を受け取る
と、この検索結果をネットワーク２を介してクライアン
ト端末３０に送信する。In FIG. 2, the search request acquisition response unit 108
Receives a search request from the client terminal 30 via the network 2 and outputs the search request to the search unit 110. The search request acquisition response unit 108 corresponds to a network interface device in a computer. Also,
The search unit 110 searches the index file 206 stored in the search database 10a in response to the search request from the search request acquisition response unit 108, and returns the search result as
It outputs to the search request acquisition response unit 108. Upon receiving the search result from the search unit 110, the search request acquisition response unit 108 transmits the search result to the client terminal 30 via the network 2.

【００２６】次いで、本実施形態に係る情報検索装置１
０の動作について説明する。ここで、以下に説明する各
処理手順を規定するプログラムは、情報検索装置１０が
備えるＲＯＭや磁気ディスクなどの記録媒体に格納され
ている。なお、このプログラムは、例えば、光ディスク
や光磁気ディスク、磁気ディスクなどの可搬型の記録媒
体に記録されたものが情報検索装置１０にインストール
されたものでも良く、また、ネットワーク２を介して当
該情報検索装置１０にインストールされたものであって
も良い。Next, the information search device 1 according to the present embodiment.
The operation of 0 will be described. Here, a program defining each processing procedure described below is stored in a recording medium such as a ROM or a magnetic disk included in the information search device 10. Note that this program may be recorded in a portable recording medium such as an optical disc, a magneto-optical disc, or a magnetic disc and installed in the information retrieval device 10, or the information may be transmitted via the network 2. It may be installed in the search device 10.

【００２７】さて、情報検索装置１０は、グループウェ
アデータベース２０ａに蓄積されている各文書データの
情報を示すインデックスファイル２０６に登録するため
の登録処理を実行する。具体的には、図１１に示すよう
に、先ず、設定ファイル解析部１００が設定ファイル２
００を読み出して、設定ファイル２００によって指示さ
れる取得項目、重み付け単語、格納元アドレスおよび格
納先アドレスを特定し、これらの特定した情報を検索用
情報としてデータ収集部１０２に出力する（ステップＳ
ａ１）。The information retrieval device 10 executes a registration process for registering in the index file 206 showing the information of each document data stored in the groupware database 20a. Specifically, as shown in FIG. 11, first, the setting file analysis unit 100 sets the setting file 2
00 is specified, the acquisition item, the weighted word, the storage source address and the storage destination address instructed by the setting file 200 are specified, and these specified information are output to the data collection unit 102 as search information (step S).
a1).

【００２８】次に、データ収集部１０２は、設定ファイ
ル解析部１００からの検索用情報によって示される取得
項目をグループウェアサーバ２０からネットワーク２を
介して受け取り、本文データファイル２０２（図５参
照）および情報データファイル２０４（図６参照）を生
成し、各々をインデキシング部１０４に出力する（ステ
ップＳａ２）。Next, the data collection unit 102 receives the acquisition item indicated by the search information from the setting file analysis unit 100 from the groupware server 20 via the network 2 and sends the text data file 202 (see FIG. 5) and The information data file 204 (see FIG. 6) is generated, and each is output to the indexing unit 104 (step Sa2).

【００２９】そして、インデキシング部１０４は、デー
タ収集部１０２から受け取った本文データファイル２０
２に対して形態素解析を行った後に、インデキシングを
実行し、この実行結果を、３つのデータテーブルを含む
インデックスファイル２０６に登録する。（ステップＳ
ａ３）。これにより、１つの文書データに関する情報が
インデックスファイル２０６に登録されることとなる。
次いで、データ収集部１０２は、グループウェアデータ
ベース２０ａ内に処理されてない文書データがあるかを
判別し（ステップＳａ４）、この判別結果がＹＥＳであ
れば、残りの文書データの情報をインデックスファイル
２０６に登録すべく、処理手順をステップＳａ２に戻
す。一方、ステップＳａ４における判別結果がＮＯであ
れば、データ収集部１０２は、処理を終了する。これに
より、グループウェアデータベース２０ａに蓄積されて
いる全ての文書データの情報がインデックスファイル２
０６に登録されることとなる。The indexing unit 104 then receives the text data file 20 received from the data collection unit 102.
After performing morphological analysis on No. 2, indexing is executed, and the execution result is registered in the index file 206 including three data tables. (Step S
a3). As a result, the information about one document data is registered in the index file 206.
Next, the data collection unit 102 determines whether or not there is unprocessed document data in the groupware database 20a (step Sa4). If the determination result is YES, the information of the remaining document data is stored in the index file 206. The processing procedure is returned to step Sa2 so as to be registered in. On the other hand, if the determination result in step Sa4 is NO, the data collection unit 102 ends the process. As a result, the information of all the document data accumulated in the groupware database 20a is stored in the index file 2
It will be registered in 06.

【００３０】ところで、グループウェアデータベース２
０ａに蓄積されている文書データに対して、追加または
削除が行われたり、また、１つの文書データに対して編
集が行われたりといった編集処理が頻繁に行われる。そ
こで、情報検索装置１０は、インデックスファイル２０
６に登録されている情報とグループウェアデータベース
２０ａ内の各文書データの整合性が崩れないように、次
のインデックスファイル修正処理を一定時間ごとに行っ
ている。By the way, the groupware database 2
The editing process is frequently performed such that the document data accumulated in 0a is added or deleted, or one document data is edited. Therefore, the information search device 10 uses the index file 20.
The following index file correction process is performed at regular time intervals so that the consistency between the information registered in 6 and each document data in the groupware database 20a is not broken.

【００３１】すなわち、図１２に示すように、先ず、デ
ータ収集部１０２は、設定ファイル解析部１００からの
検索用情報によって示される取得項目をグループウェア
サーバ２０からネットワーク２を介して受け取り、本文
データファイル２０２および情報データファイル２０４
を生成し、各々をインデキシング部１０４に出力する
（ステップＳｂ１）。インデキシング部１０４は、本文
データファイル２０２、情報データファイル２０４およ
びインデックスファイル２０６に登録されている情報か
ら、文書データが、追加されたものであるか、修正
されたものであるか、編集が加えられていないもの
か、を判別する（ステップＳｂ２）。That is, as shown in FIG. 12, first, the data collection unit 102 receives the acquisition item indicated by the search information from the setting file analysis unit 100 from the groupware server 20 via the network 2 and the body data. File 202 and information data file 204
Are generated and output to the indexing unit 104 (step Sb1). The indexing unit 104 uses the information registered in the text data file 202, the information data file 204, and the index file 206 to determine whether the document data is added, modified, or edited. It is determined whether or not it is not present (step Sb2).

【００３２】より具体的には、インデキシング部１０４
は、情報データファイル２０４に含まれているサーバ識
別情報およびソフト別文書識別情報に該当するものがイ
ンデックスファイル２０６のページテーブル２０６ａに
登録されていなければ、この文書データが追加されたも
のであると判別する。一方、情報データファイル２０４
に含まれているサーバ識別情報およびソフト別文書識別
情報に該当するものが、インデックスファイル２０６の
ページテーブル２０６ａに既に登録されているものの、
最終更新日時情報が情報データファイル２０４とインデ
ックスファイル２０６との間で異なる場合には、インデ
キシング部１０４は、この文書データが修正されたと判
別する。さらにまた、サーバ識別情報、ソフト別文書識
別情報および最終更新日時情報の各々がいずれも情報デ
ータファイル２０４とインデックスファイル２０６との
間で同じであれば、インデキシング部１０４は、この文
書データに対して何ら編集処理が成されていないと判別
する。More specifically, the indexing unit 104
Means that if the server identification information and the software-specific document identification information included in the information data file 204 are not registered in the page table 206a of the index file 206, this document data is added. Determine. On the other hand, the information data file 204
Although the information corresponding to the server identification information and the software-specific document identification information included in is already registered in the page table 206a of the index file 206,
If the last update date / time information differs between the information data file 204 and the index file 206, the indexing unit 104 determines that this document data has been modified. Furthermore, if each of the server identification information, the software-specific document identification information, and the last update date / time information is the same between the information data file 204 and the index file 206, the indexing unit 104 applies the document data to this document data. It is determined that no editing process has been performed.

【００３３】さて、ステップＳｂ２における判別結果
が、追加されたものである、と判別された場合には、
インデキシング部１０４は、上述した登録処理における
ステップＳａ３と同様の処理を実行し、この文書データ
の情報をインデックスファイル２０６に登録する（ステ
ップＳｂ３）。次いで、データ収集部１０２は、グルー
プウェアデータベース２０ａ内に処理されていない文書
データがあるかを判別し（ステップＳｂ４）、この判別
結果がＹＥＳであれば、残りの文書データを処理すべ
く、処理手順をステップＳｂ１に戻す。これにより、グ
ループウェアデータベース２０ａに追加された文書デー
タの情報がインデックスファイル２０６に新たに登録さ
れることとなる。Now, when it is determined that the determination result in step Sb2 is the added one,
The indexing unit 104 executes the same process as step Sa3 in the above-mentioned registration process, and registers the information of this document data in the index file 206 (step Sb3). Next, the data collection unit 102 determines whether there is unprocessed document data in the groupware database 20a (step Sb4). If the determination result is YES, the process is performed to process the remaining document data. The procedure is returned to step Sb1. As a result, the information on the document data added to the groupware database 20a is newly registered in the index file 206.

【００３４】一方、ステップＳｂ２の判別において、
修正されたものである、と判別された場合には、インデ
キシング部１０４は、この文書データに対応するインデ
ックスファイル２０６の情報を一旦削除した後に、この
文書データに対応する情報を新たに生成し、インデック
スファイル２０６に登録する。より具体的には、インデ
キシング部１０４は、先ず、この文書データに対応する
文書識別情報（図８参照）を特定し（ステップＳｂ
５）、インデックスファイル２０６に含まれるページテ
ーブル２０６ａ、単語テーブル２０６ｂ、キーワードテ
ーブル２０６ｃの各々のテーブルから、特定した文書識
別情報に関する情報を一括して削除する（ステップＳｂ
６）。次いで、インデキシング部１０４は、この文書デ
ータに対応する情報を上述したインデキシング処理によ
り生成し、インデックスファイル２０６に登録する（ス
テップＳｂ７）。次いで、データ収集部１０２は、グル
ープウェアデータベース２０ａ内に処理されていない文
書データがあるかを判別し（ステップＳｂ４）、この判
別結果がＹＥＳであれば、残りの文書データを処理すべ
く、処理手順をステップＳｂ１に戻す。これにより、文
書データに対して行われた修正がインデックスファイル
２０６に反映されることとなる。また、ステップＳｂ２
における判別結果が、編集が加えられていないもので
あると判別された場合にも、インデキシング部１０４
は、処理ステップをステップＳｂ４に進める。On the other hand, in the determination of step Sb2,
If it is determined that the document data has been corrected, the indexing unit 104 once deletes the information of the index file 206 corresponding to this document data, and then newly generates the information corresponding to this document data. Register in the index file 206. More specifically, the indexing unit 104 first identifies the document identification information (see FIG. 8) corresponding to this document data (step Sb).
5) Collectively delete the information about the specified document identification information from each of the page table 206a, word table 206b, and keyword table 206c included in the index file 206 (step Sb).
6). Next, the indexing unit 104 generates information corresponding to this document data by the indexing process described above and registers it in the index file 206 (step Sb7). Next, the data collection unit 102 determines whether there is unprocessed document data in the groupware database 20a (step Sb4), and if the determination result is YES, the process is performed to process the remaining document data. The procedure is returned to step Sb1. As a result, the correction made to the document data will be reflected in the index file 206. In addition, step Sb2
Even when it is determined that the edited result is that no editing has been performed, the indexing unit 104
Advances the processing step to step Sb4.

【００３５】次いで、ステップＳｂ４における判別結果
がＮＯであれば、グループウェアデータベース２０ａ内
の全ての文書データに対して処理が実行されたこととな
る。従って、上述した一連の処理の間、インデックスフ
ァイル２０６（ページテーブル２０６ａ）において、一
度も参照されなかった文書識別情報に対応する文書デー
タは、グループウェアデータベース２０ａ内に存在しな
いこととなる。従って、インデキシング部１０４は、イ
ンデックスファイル２０６のページテーブル２０６ａか
ら、参照されなかった文書識別情報を全て抽出し（ステ
ップＳｂ８）、抽出した文書識別情報に対応する各情報
を、インデックスファイル２０６に含まれる全てのテー
ブルから削除して（ステップＳｂ９）、処理を終了す
る。これにより、グループウェアデータベース２０ａか
ら削除された文書データに対応する情報がインデックス
ファイル２０６から削除されることとなる。また、文書
データが削除された場合、その文書識別情報に対応する
情報をインデックスファイル２０６から削除するだけで
よいため、インデックスファイル２０６の修正に要する
時間が短縮される。Next, if the decision result in the step Sb4 is NO, it means that the processing is executed for all the document data in the groupware database 20a. Therefore, during the series of processes described above, the document data corresponding to the document identification information that has never been referenced in the index file 206 (page table 206a) does not exist in the groupware database 20a. Therefore, the indexing unit 104 extracts all unreferenced document identification information from the page table 206a of the index file 206 (step Sb8), and the index file 206 includes each piece of information corresponding to the extracted document identification information. All the tables are deleted (step Sb9), and the process ends. As a result, the information corresponding to the document data deleted from the groupware database 20a is deleted from the index file 206. Further, when the document data is deleted, it is only necessary to delete the information corresponding to the document identification information from the index file 206, so that the time required to modify the index file 206 is shortened.

【００３６】このように、インデックスファイル２０６
には、グループウェアデータベース２０ａに蓄積されて
いる各文書データの情報が登録され、文書データに対し
て、追加や削除、修正といった編集処理が行われたとし
ても、上述したインデックスファイル修正処理が一定時
間ごとに繰り返し行われることで、その編集処理に応じ
て変更された情報がインデックスファイル２０６に即座
に反映される。In this way, the index file 206
The information of each document data accumulated in the groupware database 20a is registered in the file, and even if the editing process such as addition, deletion, or correction is performed on the document data, the above-mentioned index file correction process is fixed. The information changed according to the editing process is immediately reflected in the index file 206 by being repeatedly performed every time.

【００３７】さて、情報検索装置１０の検索要求取得応
答部１０８は、クライアント端末３０からネットワーク
２を介して検索要求を受け取ると、この検索要求を検索
部１１０に出力する。検索部１１０は、受け取った検索
要求に従ってインデックスファイル２０６を検索し、該
当する文書データの情報を抽出する。より具体的には、
検索要求には、検索語として、検索用の単語、または、
設定ファイル２００によって指定されたデータ項目が含
まれている。例えば、検索要求に単語が検索語として含
まれている場合、検索部１１０は、キーワードテーブル
２０６ｃを参照し、その単語（詳細には、単語識別情
報）の重要度が最も大きい順に文書識別情報を抽出す
る。そして、検索部１１０は、重要度の上位から所定の
数（例えば２０など）だけの文書識別情報に対応する題
名情報、本文情報および格納元アドレス（ＵＲＬ）など
をページテーブル２０６ａから抽出し、検索要求取得応
答部１０８を介してクライアント端末３０に送信する。
これにより、クライアント端末３０に検索語に対応した
文書データの候補が送信されることとなる。また、検索
語として、例えば最終編集日時が検索要求に含まれてい
た場合には、検索部１１０は、ページテーブル２０６ａ
の各レコードを検索し、該当する文書識別情報に対応す
る題名情報、本文情報および格納元アドレス（ＵＲＬ）
を検索要求取得応答部１０８を介してクライアント端末
３０に送信する。なお、検索要求には、検索語として、
単語およびデータ項目の各々が含まれていても良いこと
は勿論である。When the search request acquisition response unit 108 of the information search apparatus 10 receives the search request from the client terminal 30 via the network 2, the search request acquisition response unit 108 outputs the search request to the search unit 110. The search unit 110 searches the index file 206 according to the received search request and extracts the information of the corresponding document data. More specifically,
In the search request, as a search word, a search word, or
The data items designated by the setting file 200 are included. For example, when a word is included in the search request as a search word, the search unit 110 refers to the keyword table 206c, and obtains the document identification information in the descending order of importance of the word (specifically, the word identification information). Extract. Then, the search unit 110 extracts, from the page table 206a, the title information, the body text information, the storage source address (URL), etc. corresponding to a predetermined number (for example, 20) of document identification information from the highest importance level. It is transmitted to the client terminal 30 via the request acquisition response unit 108.
As a result, the candidate of the document data corresponding to the search word is transmitted to the client terminal 30. Further, when the search request includes, for example, the last edit date and time as the search word, the search unit 110 causes the page table 206a.
Search each record of, and title information, body information and storage source address (URL) corresponding to the corresponding document identification information
Is transmitted to the client terminal 30 via the search request acquisition response unit 108. In the search request, as a search term,
Of course, each word and each data item may be included.

【００３８】このように、本実施形態によれば、グルー
プウェアデータベース２０ａに蓄積されている文書デー
タごとに、検索条件となり得る情報だけがインデックス
ファイル２０６に予め登録されている。情報検索装置１
０は、検索要求を受けた場合には、このインデックスフ
ァイル２０６を検索すれば良く、インデックスファイル
２０６のデータ量は、グループウェアデータベース２０
ａに蓄積されている文書データのデータ量よりも小さい
ため、グループウェアデータベース２０ａの各文書デー
タを対象として検索するよりも、速く検索が行える。さ
らに、利用者などが設定ファイル２００によって指定す
る取得項目を変更すれば、インデックスファイル２０６
に登録されるデータ項目を変更することができるため、
検索の用途に合わせてインデックスファイル２０６を構
成しておくことができる。また、本実施形態にて説明し
た情報検索装置１０は、複数のグループウェア間で汎用
的に用いられ得るものである。さらに詳述すると、グル
ープウェア毎に設定ファイル２００に記述する取得項目
を変更するだけで、グループウェア毎にインデックスフ
ァイル２０６が構築されることになる。また、このよう
な構成により、グループウェア毎にインデックスファイ
ル２０６を構築すべく設定ファイル２００を変更したと
しても、変更された設定ファイル２００に対応させて情
報検索装置１０を動作させるべく、本実施形態に係る情
報検索のためのプログラムを再度コンパイルする必要が
ない。As described above, according to this embodiment, only the information that can be the search condition is registered in advance in the index file 206 for each document data stored in the groupware database 20a. Information retrieval device 1
0, when a search request is received, the index file 206 may be searched, and the data amount of the index file 206 is the groupware database 20.
Since the data amount is smaller than the amount of document data stored in a, the search can be performed faster than searching for each document data in the groupware database 20a. Further, if the user or the like changes the acquisition item designated by the setting file 200, the index file 206
Since the data items registered in can be changed,
The index file 206 can be configured according to the purpose of search. Further, the information search device 10 described in the present embodiment can be generally used among a plurality of groupware. More specifically, the index file 206 is constructed for each groupware simply by changing the acquisition item described in the setting file 200 for each groupware. Further, with this configuration, even if the setting file 200 is changed to construct the index file 206 for each groupware, the present embodiment is configured to operate the information search device 10 in association with the changed setting file 200. It is not necessary to recompile the program for information retrieval according to.

【００３９】＜変形例＞上述した実施形態は、あくまで
も例示であって、本発明の一態様を示すものであり、本
発明の範囲内で任意に変形可能である。そこで、以下
に、各種の変形例について説明する。<Modification> The above-described embodiment is merely an example and shows one aspect of the present invention, and can be arbitrarily modified within the scope of the present invention. Therefore, various modifications will be described below.

【００４０】例えば、上述した実施形態では、ネットワ
ーク２にグループウェアサーバ２０が１つだけ接続され
る構成について例示したが、これに限らず、グループウ
ェアサーバ２０が複数接続される構成であっても良い。
さらに、夫々のグループウェアサーバ２０には、互いに
異なるグループウェアが導入されていても良い。さらに
詳述すると、互いに異なる複数のグループウェアサーバ
の各々のデータベースを統括的に検索することは、グル
ープウェア毎にデータの管理形式（例えばデータ項目の
数や名前など）が異なるため、一般的に困難である。こ
れに対して、本変形例は、検索対象となり得るデータ項
目の情報だけをインデックスファイル２０６のページテ
ーブル２０６ａに登録する構成となっている。従って、
情報検索装置１０がページテーブル２０６ａを検索する
ことは、複数のグループウェアサーバの各々のデータベ
ースを検索することと同等なことであり、これにより、
複数のグループウェアサーバの各々のデータベースの検
索が実現される。For example, in the above-described embodiment, the configuration in which only one groupware server 20 is connected to the network 2 has been exemplified, but the configuration is not limited to this, and a configuration in which a plurality of groupware servers 20 are connected is also possible. good.
Furthermore, different groupware may be installed in each groupware server 20. More specifically, comprehensively searching each database of a plurality of different groupware servers generally requires different data management formats (for example, the number of data items and names) for each groupware. Have difficulty. On the other hand, in this modification, only the information of the data items that can be searched is registered in the page table 206a of the index file 206. Therefore,
Searching the page table 206a by the information search device 10 is equivalent to searching each database of a plurality of groupware servers.
A search of each database of a plurality of groupware servers is realized.

【００４１】また、例えば、インデキシング部１０４
は、本文データファイル２０２に対して形態素解析を行
う際に、例えば「ＰＣ」、「パーソナルコンピュー
タ」、「パソコン」といった、互いに同一のものを指す
単語を一つの単語として扱っても良い。これにより、例
えば、検索語として「パソコン」が検索要求に含まれて
いた場合でも、「ＰＣ」や「パーソナルコンピュータ」
といった単語を含む文書データも該当する文書データと
して抽出され、検索の精度が向上する。Further, for example, the indexing unit 104
When performing the morphological analysis on the body data file 202, the words that refer to the same thing such as “PC”, “personal computer”, and “personal computer” may be treated as one word. As a result, for example, even if "PC" is included in the search request as a search term, "PC" or "personal computer"
Document data including a word such as is also extracted as the corresponding document data, and the accuracy of search is improved.

【００４２】[0042]

【発明の効果】本発明によれば、データベースに蓄積さ
れている情報のうち、検索条件に該当する情報を特定す
るに要する時間を短縮することが可能な情報検索装置、
情報検索方法、プログラムおよび記録媒体が提供され
る。According to the present invention, an information search device capable of shortening the time required to specify the information corresponding to the search condition among the information stored in the database,
An information search method, a program, and a recording medium are provided.

[Brief description of drawings]

【図１】本発明の実施形態に係る情報検索システムの
構成を示すブロック図である。FIG. 1 is a block diagram showing a configuration of an information search system according to an embodiment of the present invention.

【図２】情報検索装置の機能的構成を示すブロック図
である。FIG. 2 is a block diagram showing a functional configuration of an information search device.

【図３】同設定ファイルの一例を示す図である。FIG. 3 is a diagram showing an example of the same setting file.

【図４】同グループウェアファイルの一例を示す図で
ある。FIG. 4 is a diagram showing an example of the same groupware file.

【図５】同本文データファイルの一例を示す図であ
る。FIG. 5 is a diagram showing an example of the text data file.

【図６】同情報データファイルの一例を示す図であ
る。FIG. 6 is a diagram showing an example of the same information data file.

【図７】同インデックスファイルのデータ構成を示す
概念図である。FIG. 7 is a conceptual diagram showing a data structure of the index file.

【図８】同ページテーブルの一例を示す図である。FIG. 8 is a diagram showing an example of the page table.

【図９】同単語テーブルの一例を示す図である。FIG. 9 is a diagram showing an example of the word table.

【図１０】同キーワードテーブルの一例を示す図であ
る。FIG. 10 is a diagram showing an example of the keyword table.

【図１１】情報検索装置によって実行される登録処理
の手順を示すフローチャートである。FIG. 11 is a flowchart showing a procedure of registration processing executed by the information search device.

【図１２】情報検索装置によって実行されるインデッ
クスファイル修正処理の手順を示すフローチャートであ
る。FIG. 12 is a flowchart showing a procedure of an index file correction process executed by the information search device.

[Explanation of symbols]

１０・・・情報検索装置、１０ａ・・・検索用データベース、
２０・・・・グループウェアサーバ、２０ａ・・・グループウ
ェアデータベース、３０・・・クライアント端末、１００・
・・設定ファイル解析部、１０２・・・データ収集部、１０
４・・・インデキシング部、１０６・・・形態素解析用辞書、
１０８・・・検索要求取得応答部、１１０・・・検索部、２０
０・・・設定ファイル、２０６・・・インデックスファイル。10 ... Information search device, 10a ... Search database,
20 ... Groupware server, 20a ... Groupware database, 30 ... Client terminal, 100.
..Setting file analysis unit, 102 ... data collection unit, 10
4 ... Indexing unit, 106 ... Morphological analysis dictionary,
108 ... Search request acquisition response unit, 110 ... Search unit, 20
0 ... setting file, 206 ... index file.

Claims

[Claims]

1. The text data including at least a text sentence and the identification information of the text data are associated with each other, a plurality of related information related to the text sentence, an item for classifying the plurality of related information, and the text. In an information search device for searching a database that associates identification information of text data corresponding to a sentence, a first storage unit that stores item designation information that designates an item that can be a search target among the items; Related information acquired by the related information acquisition unit for acquiring related information classified from the database specified by the specified information, the related information acquired by the related information acquisition unit,
A second storage unit for storing the identification information corresponding to the related information in association with the identification information; a search condition obtaining unit for obtaining a search condition according to the item designated by the item designation information; An information retrieving apparatus, comprising: retrieving means that identifies the relevant information corresponding to the search condition from the relevant information stored in the means, and identifies the identification information corresponding to the relevant information.

2. A body extracting unit that extracts a text sentence from the text data, a morpheme analyzing unit that divides the extracted text sentence into a plurality of words, and each of the plurality of words appears in the text sentence. An appearance frequency counting means for counting the number of times, wherein the second storage means stores the word and the count value of the word in association with identification information of text data corresponding to the text sentence. The information retrieval device according to claim 1, wherein the information retrieval device is a device.

3. The text data including at least a text sentence and the identification information of the text data are associated with each other, a plurality of related information related to the text sentence, an item for classifying the plurality of related information, and the text. In an information retrieval method for retrieving a database in which identification information of text data corresponding to a sentence is associated, a first step of storing item designation information for designating an item that can be a search target among the items in a storage device, A second step of acquiring, from the database, related information classified into items specified by the item specifying information; and the related information acquired in the second step, the identification information corresponding to the related information. According to the third process of associating and storing in the storage device, and the item designated by the item designation information. A fourth step of acquiring the search condition, and a fifth step of specifying related information corresponding to the search condition from the related information stored in the storage device and specifying the identification information corresponding to the related information. An information retrieval method comprising:

4. The text data including at least a text sentence and the identification information of the text data are associated with each other, a plurality of related information related to the text sentence, an item for classifying the plurality of related information, and the text. A computer that searches a database that associates identification information of text data corresponding to a sentence with a first storage unit that stores item designation information that designates an item that can be a search target among the items, according to the item designation information. Related information acquisition means for acquiring from the database the related information classified into the designated items, the related information acquired by the related information acquisition means,
A second storage unit that stores the identification information corresponding to the related information in association with the identification information; a search condition acquisition unit that acquires a search condition according to the item designated by the item designation information; and the second storage A program for causing related information corresponding to the search condition to be specified from among related information stored in the means, and causing the identification information corresponding to the related information to function as a searching means.

5. The text data including at least a text sentence and the identification information of the text data are associated with each other, a plurality of related information related to the text sentence, an item for classifying the plurality of related information, and the text. A computer that searches a database that associates identification information of text data corresponding to a sentence with a first storage unit that stores item designation information that designates an item that can be a search target among the items, according to the item designation information. Related information acquisition means for acquiring from the database the related information classified into the designated items, the related information acquired by the related information acquisition means,
A second storage unit that stores the identification information corresponding to the related information in association with the identification information; a search condition acquisition unit that acquires a search condition according to the item designated by the item designation information; and the second storage A readable-by-computer recording program for specifying the related information corresponding to the search condition from the related information stored in the means and causing the identification information corresponding to the related information to function as the search means. recoding media.