JP5398077B2

JP5398077B2 - Importance determination method, storage system, and program for data stored in storage device

Info

Publication number: JP5398077B2
Application number: JP2010026445A
Authority: JP
Inventors: 理森永
Original assignee: Hitachi Solutions Ltd
Current assignee: Hitachi Solutions Ltd
Priority date: 2010-02-09
Filing date: 2010-02-09
Publication date: 2014-01-29
Anticipated expiration: 2030-02-09
Also published as: JP2011164891A

Description

本発明は、ストレージ装置に保存されたファイルデータの重要性を予め設定された条件に従って判定する方法及び判定結果に基づいて重要度の高いデータの複製を更新及び消去が不可能なストレージ装置に保存する方法及びシステム並びにプログラム関するものである。 The present invention relates to a method for determining the importance of file data stored in a storage device according to preset conditions, and a copy of highly important data is stored in a storage device that cannot be updated or deleted based on the determination result. The present invention relates to a method, a system, and a program.

利用できるストレージ装置の大容量化と低コスト化に伴って、企業は多くの情報を電子化してストレージ装置に蓄積している。
最近、企業内にあるストレージ装置に保存されているファイルデータに対して検索処理を行い、有効利用するためのエンタープライズサーチシステムが普及しつつある。
また、企業で使用されるストレージシステムの中には、ファイルデータを参照はできるが改変および削除操作をできないようするWORM(Write Once Read Many、更新禁止)機能を持つものがある。
さらに、複数の利用者が同一の電子文書に対して、同時に参照・編集を行うことができるWikiと呼ばれるハイパーテキストを扱うWebアプリケーションがファイルデータの整理・収集に利用されている。 With the increase in capacity and cost of storage devices that can be used, companies are digitizing and storing a large amount of information in the storage devices.
Recently, an enterprise search system for performing a search process on file data stored in a storage device in a company and using it effectively is becoming widespread.
In addition, some storage systems used in enterprises have a WORM (Write Once Read Many) function that allows file data to be referenced but cannot be altered or deleted.
Furthermore, a web application that handles hypertext called Wiki, which allows multiple users to view and edit the same electronic document at the same time, is used to organize and collect file data.

図９は、エンタープライズサーチシステムの概要を示すブロック構成図であり、ストレージ装置９０にあるキーワードを取り出して検索インデックスの構成を行うインデックス作成部９１、インデックスデータを格納するインデックスデータベース９２、ユーザからの検索要求の入力を受け付ける検索要求入力部９３、検索要求からインデックスデータベース９２を参照して検索結果を作成する検索処理部９４、検索結果を表示する検索結果表示部９５とから構成されている。 FIG. 9 is a block configuration diagram showing an overview of the enterprise search system. An index creation unit 91 that extracts keywords from the storage device 90 and constructs a search index, an index database 92 that stores index data, and a search from a user A search request input unit 93 that receives a request input, a search processing unit 94 that creates a search result by referring to the index database 92 from the search request, and a search result display unit 95 that displays the search result are configured.

このような構成のエンタープライズサーチシステムにおいては、インデックス作成部９１はストレージ装置９０に登録されている情報を取得し、取得した情報に含まれるキーワードと取得元の情報の場所をインデックスデータベース９２に登録する。
一方、検索要求入力部９３から入力された検索要求は、検索処理部９４においてインデックスデータベース９２を参照して処理され、検索結果表示部９５で検索結果を表示させる。この時、表示する検索結果には検索要求に合致する情報の名称と保存されている場所が含まれる。
なお、本発明に関連する公知技術文献としては下記の特許文献１がある。 In the enterprise search system having such a configuration, the index creation unit 91 acquires information registered in the storage device 90 and registers the keyword included in the acquired information and the location of the information of the acquisition source in the index database 92. .
On the other hand, the search request input from the search request input unit 93 is processed in the search processing unit 94 with reference to the index database 92 and the search result display unit 95 displays the search result. At this time, the search result to be displayed includes the name of the information that matches the search request and the saved location.
In addition, there exists the following patent document 1 as a well-known technical document relevant to this invention.

特開２００６−７２９５７号JP 2006-72957 A

ところで、e-Discovery関連法案の整備により、企業においては業務文書等のデータの長期保存を義務付ける法規制が強化されている。
WORM機能を持つストレージ装置を利用すれば、消去及び更新が不可能であるので長期にわたって重要なデータを消去または更新することなく保存することができる。しかし、データをそのまま保存した場合、ストレージ装置の記憶容量が膨大なものとなってしまう。
従って、重要度の高いデータのみについてその複製をWORM機能を持つストレージ装置に保存するのが望ましい。
しかしながら、従来においては、ストレージ装置に保存されたデータの重要性を如何にして判定するかについて適切な方法が存在しない。 By the way, due to the development of bills related to e-Discovery, laws and regulations that require long-term storage of data such as business documents are strengthened in companies.
If a storage device having a WORM function is used, erasure and update are impossible, so that important data can be stored for a long time without being erased or updated. However, if the data is stored as it is, the storage capacity of the storage device becomes enormous.
Therefore, it is desirable to store a copy of only highly important data in a storage apparatus having the WORM function.
However, conventionally, there is no appropriate method for determining the importance of data stored in the storage apparatus.

本発明の目的は、ストレージ装置に保存されたデータの重要性を適切に判定する方法および判定結果に基づいて重要データであった場合にはその複製をWORM機能を持つストレージ装置に保存し、原本が失われたり改変されたりした場合であっても、元の内容を取得することができるシステム並びにプログラムを提供することにある。 An object of the present invention is to appropriately determine the importance of data stored in a storage device and to store a copy of the data in a storage device having a WORM function if it is important data based on the determination result. It is an object to provide a system and program capable of acquiring the original contents even when the contents are lost or modified.

上記目的を達成するために、本発明の方法は、ストレージ装置に保存されたデータの重要性を判定するコンピュータにおいて、ストレージ装置に保存された検索対象ファイルデータのキーワードが既登録のものであるか否かをハッシュ値によって判定し、未登録のものであれば当該ファイルデータに含まれるキーワードを取得すると共にハッシュ値を取得したうえ当該ファイルの重要度を示す値を初期化してインデックスデータベースに登録し、既登録のものであり、更新されたファイルデータであれば当該ファイルデータに含まれるキーワードとハッシュ値を取得したうえ当該ファイルの重要度を示す値を初期化してインデックスデータベースに登録するステップと、
前記ストレージ装置に保存されるファイルデータの重要度を判定する値と重要文書であることを判定するキーワードとを予め条件データベースに登録するステップと、
前記ファイルデータの保存先への参照情報を含むハイパーテキスト文書をハイパーテキストデータベースに記憶し、記憶されたハイパーテキスト文書に含まれるファイルデータへの参照数の値を、参照されたファイルデータごとに更新し、前記インデックスデータベースの当該ファイルデータの重要度を示す値として記憶させるステップと、
前記インデックスデータベースに記憶されたファイルデータ毎の重要度を示す値と前記条件データベースに登録された各ファイルデータの重要度を判定する値とを比較し、ファイルデータ毎の重要度を示す値が大きく、かつ当該ファイルデータに含まれるキーワードを前記インデックスデータベースから取得し、前記条件データベースに登録されたキーワードが含まれていた場合には、当該ファイルデータが重要度の高い重要文書データであるものとして前記ストレージ装置に保存されたファイルデータの複製を更新及び消去が不可能なストレージ装置に保存するステップとを備えることを特徴とする。
また、前記ハイパーテキストデータベースに記憶されたハイパーテキスト文書に含まれるファイルデータへの参照数の値を参照されたファイルデータごとに更新する場合に、削除されたハイパーリンクが存在する場合、当該ファイルデータへの参照数の値を減算し、追加されたハイパーリンクが存在する場合、当該ファイルデータへの参照数の値を加算するステップを備えることを特徴とする。 In order to achieve the above object, according to the method of the present invention, in a computer for determining the importance of data stored in a storage device, whether the keyword of search target file data stored in the storage device is already registered. If it is unregistered, the keyword included in the file data is acquired, the hash value is acquired, and the value indicating the importance of the file is initialized and registered in the index database. If it is already registered and updated file data, a keyword and a hash value included in the file data are acquired and a value indicating the importance of the file is initialized and registered in the index database;
Registering a value for determining the importance of the file data stored in the storage device and a keyword for determining the importance document in the condition database in advance;
A hypertext document including reference information to the file data storage destination is stored in a hypertext database, and the value of the number of references to the file data included in the stored hypertext document is updated for each referenced file data. And storing as a value indicating the importance of the file data of the index database;
The value indicating the importance for each file data stored in the index database is compared with the value for determining the importance of each file data registered in the condition database, and the value indicating the importance for each file data is large. When the keyword included in the file data is acquired from the index database and the keyword registered in the condition database is included, the file data is assumed to be important document data with high importance. Storing a copy of the file data stored in the storage device in a storage device that cannot be updated or deleted.
In addition, when the value of the number of references to the file data included in the hypertext document stored in the hypertext database is updated for each referenced file data, if the deleted hyperlink exists, the file data The step of subtracting the value of the number of references to the file and adding the value of the number of references to the file data when an added hyperlink exists is provided.

また、本発明に係るシステムは、ストレージ装置に保存された検索対象のファイルデータのキーワードが既登録のものであるか否かをハッシュ値によって判定し、未登録のものであれば当該ファイルデータに含まれるキーワードを取得すると共にハッシュ値を取得したうえ当該ファイルの重要度を示す値を初期化してインデックスデータベースに登録し、既登録のものであり、更新されたファイルデータであれば当該ファイルデータに含まれるキーワードとハッシュ値を取得したうえ当該ファイルの重要度を示す値を初期化してインデックスデータベースに登録する手段と、
前記ストレージ装置に保存されるファイルデータの重要度を判定する値と重要文書であることを判定するキーワードとを予め条件データベースに登録する手段と、
前記ファイルデータの保存先への参照情報を含むハイパーテキスト文書をハイパーテキストデータベースに記憶し、記憶されたハイパーテキスト文書に含まれるファイルデータへの参照数の値を、参照されたファイルデータごとに更新し、前記インデックスデータベースの当該ファイルデータの重要度を示す値として記憶させる手段と、
前記インデックスデータベースに記憶されたファイルデータ毎の重要度を示す値と前記条件データベースに登録された各ファイルデータの重要度を判定する値とを比較し、ファイルデータ毎の重要度を示す値が大きく、かつ当該ファイルデータに含まれるキーワードを前記インデックスデータベースから取得し、前記条件データベースに登録されたキーワードが含まれていた場合には、当該ファイルデータが重要度の高い重要文書データであるものとして前記ストレージ装置に保存されたファイルデータの複製を更新及び消去が不可能なストレージ装置に保存する手段とを備えることを特徴とする。
また、前記ハイパーテキストデータベースに記憶されたハイパーテキスト文書に含まれるファイルデータへの参照数の値を参照されたファイルデータごとに更新する場合に、削除されたハイパーリンクが存在する場合、当該ファイルデータへの参照数の値を減算し、追加されたハイパーリンクが存在する場合、当該ファイルデータへの参照数の値を加算する手段を備えることを特徴とする。 Further, the system according to the present invention determines whether or not the keyword of the search target file data stored in the storage device is an already registered keyword, and if it is unregistered, the file data is stored in the file data. Acquires the included keyword and also obtains the hash value, initializes the value indicating the importance of the file, registers it in the index database, and is already registered. Means for acquiring the included keyword and hash value, initializing a value indicating the importance of the file, and registering it in the index database;
Means for preliminarily registering in the condition database a value for determining the importance of the file data stored in the storage device and a keyword for determining that the file is an important document;
A hypertext document including reference information to the file data storage destination is stored in a hypertext database, and the value of the number of references to the file data included in the stored hypertext document is updated for each referenced file data. And means for storing the value indicating the importance of the file data of the index database;
The value indicating the importance for each file data stored in the index database is compared with the value for determining the importance of each file data registered in the condition database, and the value indicating the importance for each file data is large. When the keyword included in the file data is acquired from the index database and the keyword registered in the condition database is included, the file data is assumed to be important document data with high importance. Means for storing a copy of the file data stored in the storage device in a storage device that cannot be updated or deleted.
In addition, when the value of the number of references to the file data included in the hypertext document stored in the hypertext database is updated for each referenced file data, if the deleted hyperlink exists, the file data A means for subtracting the value of the number of references to the file and adding a value of the number of references to the file data when an added hyperlink exists is provided.

また本発明に係るプログラムは、ストレージ装置に保存されたデータの重要性を判定するコンピュータを、ストレージ装置に保存された検索対象ファイルデータのキーワードが既登録のものであるか否かをハッシュ値によって判定し、未登録のものであれば当該ファイルデータに含まれるキーワードを取得すると共にハッシュ値を取得したうえ当該ファイルの重要度を示す値を初期化してインデックスデータベースに登録し、既登録のものであり、更新されたファイルデータであれば当該ファイルデータに含まれるキーワードとハッシュ値を取得したうえ当該ファイルの重要度を示す値を初期化してインデックスデータベースに登録する手段と、
前記ストレージ装置に保存されるファイルデータの重要度を判定する値と重要文書であることを判定するキーワードとを予め条件データベースに登録する手段と、
前記ファイルデータの保存先への参照情報を含むハイパーテキスト文書をハイパーテキストデータベースに記憶し、記憶されたハイパーテキスト文書に含まれるファイルデータへの参照数の値を、参照されたファイルデータごとに更新し、前記インデックスデータベースの当該ファイルデータの重要度を示す値として記憶させる手段と、
前記インデックスデータベースに記憶されたファイルデータ毎の重要度を示す値と前記条件データベースに登録された各ファイルデータの重要度を判定する値とを比較し、ファイルデータ毎の重要度を示す値が大きく、かつ当該ファイルデータに含まれるキーワードを前記インデックスデータベースから取得し、前記条件データベースに登録されたキーワードが含まれていた場合には、当該ファイルデータが重要度の高い重要文書データであるものとして前記ストレージ装置に保存されたファイルデータの複製を更新及び消去が不可能なストレージ装置に保存する手段として機能させることを特徴とする。
また、前記ハイパーテキストデータベースに記憶されたハイパーテキスト文書に含まれるファイルデータへの参照数の値を参照されたファイルデータごとに更新する場合に、削除されたハイパーリンクが存在する場合、当該ファイルデータへの参照数の値を減算し、追加されたハイパーリンクが存在する場合、当該ファイルデータへの参照数の値を加算する手段を備えることを特徴とする。 Further, the program according to the present invention allows a computer that determines the importance of data stored in the storage device to determine whether or not the keyword of the search target file data stored in the storage device is a registered one by using a hash value. If it is unregistered, the keyword included in the file data is acquired and the hash value is acquired, and the value indicating the importance of the file is initialized and registered in the index database. Yes, if it is updated file data, a keyword and a hash value included in the file data are acquired, and a value indicating the importance of the file is initialized and registered in the index database;
Means for preliminarily registering in the condition database a value for determining the importance of the file data stored in the storage device and a keyword for determining that the file is an important document;
A hypertext document including reference information to the file data storage destination is stored in a hypertext database, and the value of the number of references to the file data included in the stored hypertext document is updated for each referenced file data. And means for storing the value indicating the importance of the file data of the index database;
The value indicating the importance for each file data stored in the index database is compared with the value for determining the importance of each file data registered in the condition database, and the value indicating the importance for each file data is large. When the keyword included in the file data is acquired from the index database and the keyword registered in the condition database is included, the file data is assumed to be important document data with high importance. It is characterized by functioning as a means for storing a copy of file data stored in a storage device in a storage device that cannot be updated or deleted.
In addition, when the value of the number of references to the file data included in the hypertext document stored in the hypertext database is updated for each referenced file data, if the deleted hyperlink exists, the file data A means for subtracting the value of the number of references to the file and adding a value of the number of references to the file data when an added hyperlink exists is provided.

本発明によれば、ファイルデータの重要度をそのファイルデータに張られているリンク数によって判定し、設定条件で指定したリンク数より高いものはその複製を消去及び更新が不可能なストレージ装置に移動した保存するようにしたので、原本が失われたり改変されたりした場合であっても、元の内容を取得することができる。 According to the present invention, the degree of importance of file data is determined by the number of links attached to the file data. Since it was moved and saved, the original contents can be acquired even if the original is lost or altered.

本発明の一実施の形態例を示すシステム構成図である。1 is a system configuration diagram showing an embodiment of the present invention. インデックスデータベースに格納されているデータの構成図である。It is a block diagram of the data stored in the index database. ハイパーテキストデータベースに格納されているデータの構成図である。It is a block diagram of the data stored in the hypertext database. 条件データベースに格納されているデータの構成図である。It is a block diagram of the data stored in the condition database. インデックス情報の更新処理の概要を示すフローチャートである。It is a flowchart which shows the outline | summary of the update process of index information. ハイパーリンクを含む文書の保存処理の概要を示すフローチャートである。It is a flowchart which shows the outline | summary of the preservation | save process of the document containing a hyperlink. ファイルデータの移動処理の概要を示すフローチャートである。It is a flowchart which shows the outline | summary of the movement process of file data. 条件データベースに情報を登録する処理の概要を示すフローチャートである。It is a flowchart which shows the outline | summary of the process which registers information into a condition database. 従来におけるエンタープライズサーチシステムの概要を示すブロック構成図である。It is a block block diagram which shows the outline | summary of the conventional enterprise search system.

以下、本発明を適用した保管システムの一実施の形態について説明する。
図１は、本発明の実施の形態の一例を示すシステム構成図である。
本システムは、ネットワーク１に、ホストシステム２と、ストレージ装置装置３と、WORM(Write Once Read Many)機能を持つストレージ装置装置４と、端末５が接続されることで構成されている。 Hereinafter, an embodiment of a storage system to which the present invention is applied will be described.
FIG. 1 is a system configuration diagram showing an example of an embodiment of the present invention.
This system is configured by connecting a host system 2, a storage device 3, a storage device 4 having a WORM (Write Once Read Many) function, and a terminal 5 to a network 1.

ホストシステム２と端末５は、ネットワーク１を通じてストレージ装置３とWORM機能を持つストレージ装置装置４に対しファイルデータの保存と参照を行うことができる。
ホストシステム２は検索プログラム６と、ハイパーテキスト管理プログラム７と、データ複製プログラム８と、インデックスデータベース９と、ハイパーテキストデータベース１０と、条件データベース１１を備えている。
端末５はブラウザー１２を備えている。 The host system 2 and the terminal 5 can store and refer to file data with respect to the storage apparatus 3 and the storage apparatus apparatus 4 having the WORM function through the network 1.
The host system 2 includes a search program 6, a hypertext management program 7, a data replication program 8, an index database 9, a hypertext database 10, and a condition database 11.
The terminal 5 includes a browser 12.

ハイパーテキストとしてHTML文書を利用する場合、ホストシステム２はWebサーバーに相当し、ハイパーテキスト管理プログラム７とハイパーテキストデータベース１０はWikiソフトウェアに、ブラウザー１２はWebブラウザーに相当する。
ブラウザー１２はホストシステム２を通じて検索プログラム６と、ハイパーテキスト管理プログラム７と、データ複製プログラム８を利用することができる。 When an HTML document is used as hypertext, the host system 2 corresponds to a Web server, the hypertext management program 7 and the hypertext database 10 correspond to Wiki software, and the browser 12 corresponds to a Web browser.
The browser 12 can use the search program 6, the hypertext management program 7, and the data replication program 8 through the host system 2.

インデックスデータベース９に格納されるファイルデータのインデックス情報は、図２に示すように、保存先２１、ハッシュ値２２、キーワード２３、重要度２４の各エリアデータから構成される。
保存先２１は、特定のファイルデータがどのファイルサーバのどの位置に保存されているのか示しており、ファイルデータの識別子となる。
ハッシュ値２２はファイルデータをハッシュ関数に通して得た値で、ファイルデータの内容が変更されたかどうか確認するのに使用する。
キーワード２３はファイルデータの内容をスペースや記号、形態素で分解した単語の一覧となる。
重要度２４はハイパーテキスト管理プログラム７が判定したファイルデータの重要度を表す値が代入される。 As shown in FIG. 2, the index information of the file data stored in the index database 9 includes area data of a storage destination 21, a hash value 22, a keyword 23, and an importance 24.
The storage location 21 indicates where a specific file data is stored in which file server, and serves as an identifier of the file data.
The hash value 22 is a value obtained by passing the file data through the hash function, and is used to check whether the content of the file data has been changed.
The keyword 23 is a list of words obtained by decomposing the contents of the file data with spaces, symbols, and morphemes.
As the importance 24, a value representing the importance of the file data determined by the hypertext management program 7 is substituted.

ハイパーテキストデータベース１０に格納されるファイルデータへのハイパーリンクを含む文書は、図３に示すように、データのタイトル３１とハイパーリンクを複数含むことができる本文３２から構成される。
本文３２に含まれるハイパーリンクは、ストレージ装置３とWORM機能を持つストレージ装置４に保存されるファイルデータを参照するものであり、インターネットのWebページに対するハイパーリンクとは区別される。
タイトル３１は、ハイパーテキストデータベース１０に保存される文書の識別子となる。本文３２には複数のハイパーリンクや文章等を挿入することができる。例えば、図３の本文３２にある「契約書A」という文字列は、ストレージ装置３に保存されている同名のファイルデータへのハイパーリンクとなっている。 As shown in FIG. 3, a document including hyperlinks to file data stored in the hypertext database 10 includes a data title 31 and a body 32 that can include a plurality of hyperlinks.
The hyperlink included in the text 32 refers to file data stored in the storage apparatus 3 and the storage apparatus 4 having the WORM function, and is distinguished from a hyperlink to a web page on the Internet.
The title 31 is an identifier of a document stored in the hypertext database 10. A plurality of hyperlinks and sentences can be inserted into the body 32. For example, the character string “Contract A” in the text 32 of FIG. 3 is a hyperlink to the file data of the same name stored in the storage device 3.

ストレージ装置３からWORM機能を持つストレージ装置装置４へ移動するファイルデータの条件には、図４に示すように重要度４１、キーワード４２を指定することができる。図３に示す重要度４１、キーワード４２は条件データベース１１に格納される。
重要度４１にはファイルデータの重要度を判定する単一の値を登録でき、インデックスデータベース９の重要度２４と比較するのに利用される。
キーワード４２には重要文書であることを判定する複数のキーワードを登録することができ、データ複製プログラム８は登録されたそれぞれのキーワードがインデックスデータベース８のキーワード２２に含まれるかどうか確認するのに使用する。 As the condition of the file data to be moved from the storage apparatus 3 to the storage apparatus apparatus 4 having the WORM function, importance 41 and keyword 42 can be specified as shown in FIG. The importance level 41 and the keyword 42 shown in FIG. 3 are stored in the condition database 11.
A single value for determining the importance of the file data can be registered in the importance 41 and is used for comparison with the importance 24 of the index database 9.
A plurality of keywords for determining that the document is an important document can be registered in the keyword 42, and the data replication program 8 is used to check whether each registered keyword is included in the keyword 22 of the index database 8. To do.

図５は、検索プログラム６における、インデックスデータベース９に登録されるインデックス情報の初期化処理の概要を示すフローチャートである。
最初に、インデックス情報の元となるファイルデータをストレージ装置３とWORM機能を持つストレージ装置４から取得する（ステップ５０１）。
次に、取得したファイルデータにハッシュ関数を適用し、そのハッシュ値を取得する（ステップ５０２）。 FIG. 5 is a flowchart showing an overview of the initialization process of the index information registered in the index database 9 in the search program 6.
First, file data that is the source of index information is acquired from the storage apparatus 3 and the storage apparatus 4 having the WORM function (step 501).
Next, a hash function is applied to the acquired file data to acquire the hash value (step 502).

次に、取得したファイルデータの情報がインデックスデータベース９に登録されているかどうか判定する（ステップ５０３）。
もし、取得したファイルデータがインデックスデータベース９に登録されていない場合は、重要度２４に初期値として「０」を登録する（ステップ５０５）。 Next, it is determined whether or not the acquired file data information is registered in the index database 9 (step 503).
If the acquired file data is not registered in the index database 9, “0” is registered as an initial value in the importance 24 (step 505).

しかし、取り出したファイルデータの情報がインデックスデータベース９に既に登録されている場合は、取り出したファイルデータとインデックスデータベース９に登録されているハッシュ値２２とを比較し、前回のインデックス情報の初期化を行った時点からファイルデータの内容が更新されていないかどうか判定する（ステップ５０４）。 However, when the information of the extracted file data is already registered in the index database 9, the extracted file data is compared with the hash value 22 registered in the index database 9, and the previous index information is initialized. It is determined whether or not the contents of the file data have been updated since the time of the execution (step 504).

もし、取り出したファイルデータのハッシュ値がインデックスデータベース９に登録されているハッシュ値と一致しない場合は、ファイルデータが更新されていることになるので、当該ファイルデータに含まれているキーワードを取得し、インデックスデータベース９へ新しい情報（キーワードとハッシュ値）を登録する（ステップ５０６、５０７）。また、重要度２４も初期化する。 If the hash value of the extracted file data does not match the hash value registered in the index database 9, the file data has been updated, so the keyword contained in the file data is acquired. Then, new information (keyword and hash value) is registered in the index database 9 (steps 506 and 507). The importance level 24 is also initialized.

検索プログラム６は、以上の処理をストレージ装置装置３とWORM機能を持つストレージ装置４に保存されている全てのファイルデータに対し、定期的に繰り返して実行する。
ハイパーリンクを含む文書は複製するファイルデータを選択するための情報の取得先であり、その内容にはユーザが検索して見つけ出したファイルデータの中から特に重要と判断されたものに対するハイパーリンクが含まれている。 The search program 6 periodically and repeatedly executes the above processing for all file data stored in the storage apparatus 3 and the storage apparatus 4 having the WORM function.
A document containing a hyperlink is a source of information for selecting the file data to be copied, and its contents include a hyperlink for the file data that is determined to be particularly important from the file data searched and found by the user It is.

図６は、ハイパーテキスト管理プログラム７における、ユーザが指定したファイルデータへのハイパーリンクを含む文書の保存処理の概要を示すフローチャートである。
ユーザがブラウザー１０に入力したキーワードによるファイルデータの検索をホストシステム２に要求すると、ホストシステム２からブラウザーに、入力されたキーワードを含むファイルデータへのハイパーリンクの一覧を含むデータが送信される。
ユーザはホストシテム２から受け取ったハイパーリンクの一覧のうち必要なハイパーリンクを選び、そのタイトルとハイパーリンクの一覧をホストシステム２に送信する。すると、そのハイパーリンクを含むデータはハイパーテキストデータベース１０に保存される。 FIG. 6 is a flowchart showing an outline of processing for storing a document including a hyperlink to file data designated by the user in the hypertext management program 7.
When the user requests the host system 2 to search for file data using a keyword input to the browser 10, the host system 2 transmits data including a list of hyperlinks to file data including the input keyword to the browser.
The user selects a necessary hyperlink from the list of hyperlinks received from the host system 2 and transmits the title and the list of hyperlinks to the host system 2. Then, the data including the hyperlink is stored in the hypertext database 10.

この時、ハイパーテキスト管理プログラム７はホストシステム２からハイパーリンクを含む文書を受け取り、ハイパーリンクが指し示すファイルデータの保存先の一覧を取得する（ステップ６０１）。 At this time, the hypertext management program 7 receives a document including a hyperlink from the host system 2 and acquires a list of file data storage destinations indicated by the hyperlink (step 601).

次に、ハイパーリンクを含む文書のタイトルをハイパーテキストデータベース１０のタイトル３１から探し、ハイパーリンクを含む文書がすでに登録されているものかどうか判定する（ステップ６０２）。すなわち、既に存在するタイトルであるか否かを判定する。
もし、登録されていない場合は、ハイパーリンクを含む文書のデータを新規に作成して、最初に取得したファイルデータの保存先の一覧からインデックスデータベース９を参照して、それぞれのファイルデータの重要度２４に「１」を加算してから、受け取ったハイパーリンクを含む文書のタイトルと本文をハイパーテキストデータベース１０に登録する（ステップ６０３、６０４）。 Next, the title of the document including the hyperlink is searched from the title 31 of the hypertext database 10, and it is determined whether or not the document including the hyperlink has already been registered (step 602). That is, it is determined whether or not the title already exists.
If it is not registered, the document data including the hyperlink is newly created, the index database 9 is referred to from the list of storage destinations of the first acquired file data, and the importance of each file data After adding “1” to 24, the title and body of the document including the received hyperlink are registered in the hypertext database 10 (steps 603 and 604).

もし、ハイパーリンクを含む文書のタイトルがハイパーテキストデータベース１０に登録されている場合は、すなわち既に存在するタイトルであった場合には、すでに登録されている文書と受け取ったハイパーリンクを含む文書の間で差分を取得し、受け取った文書において削除されたハイパーリンクと追加されたハイパーリンクが存在するかどうか判定する（ステップ６０５、６０６、６０７）。 If the title of the document including the hyperlink is registered in the hypertext database 10, that is, if the title already exists, the interval between the already registered document and the document including the received hyperlink is set. In step 605, 606, and 607, it is determined whether or not there is a deleted hyperlink and an added hyperlink in the received document.

受け取った文書に、削除されたハイパーリンクが存在する場合は、全ての削除されたハイパーリンクが指し示すファイルデータの保存先から、インデックスデータベース９を参照し、その重要度２４を減算する（ステップ６０８）。
逆に、追加されたハイパーリンクが存在する場合は、全ての追加されたハイパーリンクが指し示すファイルデータの保存先から、インデックスデータベース９を参照し、その重要度２４を加算する（ステップ６０３）。 If there is a deleted hyperlink in the received document, the index database 9 is referenced from the file data storage location indicated by all the deleted hyperlinks, and the importance level 24 is subtracted (step 608). .
Conversely, if there is an added hyperlink, the index database 9 is referenced from the file data storage location indicated by all the added hyperlinks, and the importance 24 is added (step 603).

すでに存在するタイトルの、受け取ったハイパーリンクを含む文書を、ハイパーテキストデータベース１０で上書き保存することでハイパーリンクを含む文書の保存処理は終了する。
なお、ハイパーリンクを含む文書を削除する場合は、対象のタイトル３１の本文３２を空にしたデータを保存する。 The document containing the hyperlink is overwritten and saved in the hypertext database 10 with the already existing title and the received hyperlink, and the saving process of the document containing the hyperlink ends.
When deleting a document including a hyperlink, data in which the text 32 of the target title 31 is emptied is stored.

図７は、ファイルデータ複製プログラム８における、ストレージ装置３からWORM機能を持つストレージ装置４へファイルデータを移動する処理の概要を示すフローチャートである。
最初に、条件データベース８から、移動処理を行うファイルデータの条件となる、重要度４１とキーワード４２を取得する。（ステップ７０１） FIG. 7 is a flowchart showing an outline of processing for moving file data from the storage apparatus 3 to the storage apparatus 4 having the WORM function in the file data replication program 8.
First, an importance level 41 and a keyword 42, which are conditions for file data to be moved, are acquired from the condition database 8. (Step 701)

次に、移動処理対象のファイルデータの保存先２１とハッシュ値２２とキーワード２３と重要度２４をインデックスデータベース９から取得する（ステップ７０２）。
次に、移動処理対象のファイルデータがストレージ装置３に格納されているファイルデータかどうか保存先２１によって判定する（ステップ７０３）。 Next, the storage destination 21, the hash value 22, the keyword 23, and the importance 24 of the file data to be moved are acquired from the index database 9 (step 702).
Next, it is determined by the storage destination 21 whether the file data to be migrated is file data stored in the storage device 3 (step 703).

さらに、同じファイルデータがすでにWORM機能を持つストレージ装置４に存在するかどうか判定する（ステップ７０４）。もし、WORM機能を持つストレージ装置４にファイルデータが保存されている場合は、データ移動を行う必要が無いため、次のファイルデータの処理に移る。 Further, it is determined whether or not the same file data already exists in the storage apparatus 4 having the WORM function (step 704). If the file data is stored in the storage device 4 having the WORM function, it is not necessary to move the data, and the process moves to the next file data.

次に、条件データベース１１に設定された重要度４１とインデックスデータベース９に保存された重要度２４とを比較し、重要度２４の値がより大きければ、条件データベース１１のキーワード４２の中にインデックスデータベース９のキーワード２３の単語が含まれていないか確認する（ステップ７０５、７０６）。 Next, the importance level 41 set in the condition database 11 is compared with the importance level 24 stored in the index database 9. If the importance level 24 is larger, the index database is included in the keyword 42 of the condition database 11. It is confirmed whether the 9 keyword 23 words are not included (steps 705 and 706).

もし、含まれている場合は、保存先２１のファイルデータを読み込み、WORM機能を持つストレージ装置４に読み込んだファイルデータを保存する（ステップ７０７）。
データ複製プログラム８は、以上の処理をインデックスデータベース９の保存先２１に登録されているすべてのファイルデータに対して定期的に実行する。 If it is included, the file data of the storage destination 21 is read, and the read file data is stored in the storage device 4 having the WORM function (step 707).
The data replication program 8 periodically executes the above processing for all file data registered in the storage destination 21 of the index database 9.

図８は、データ複製プログラム８による、条件データベース１１の初期化処理の概要を示すフローチャートである。
データ複製プログラム８は、ブラウザー１２に重要度の値と複製を作成するファイルデータに含まれるキーワードの入力を求める画面を表示する。（ステップ８０１）
データ複製プログラム８は、ブラウザー１２から送信されたユーザの入力内容を受信し、条件データベース１１に重要度４１とキーワード４２を登録する（ステップ８０２、８０３、８０４）。これによって、条件データベース１１の初期化処理が終了する。 FIG. 8 is a flowchart showing an overview of the initialization process of the condition database 11 by the data replication program 8.
The data replication program 8 displays a screen for requesting the browser 12 to input the importance value and the keyword included in the file data to be replicated. (Step 801)
The data replication program 8 receives the user input transmitted from the browser 12, and registers the importance 41 and the keyword 42 in the condition database 11 (steps 802, 803, and 804). Thereby, the initialization process of the condition database 11 is completed.

１ネットワーク
２ホストシステム
３ストレージ装置
４ WORM機能を持つストレージ装置
５端末
６検索プログラム
７ハイパーテキスト管理プログラム
８データ複製プログラム
９インデックスデータベース
１０ハイパーテキストデータベース
１１条件データベース
１２ブラウザー
２１保存先
２２ハッシュ値
２３キーワード
２４重要度
３１タイトル
３２本文
４１重要度
４２キーワード DESCRIPTION OF SYMBOLS 1 Network 2 Host system 3 Storage apparatus 4 Storage apparatus with WORM function 5 Terminal 6 Search program 7 Hypertext management program 8 Data replication program 9 Index database 10 Hypertext database 11 Condition database 12 Browser 21 Storage destination 22 Hash value 23 Keyword 24 Importance 31 Title 32 Body 41 Importance 42 Keyword

Claims

In a computer for determining the importance of data stored in a storage device,
It is determined whether or not the keyword of the file data to be searched stored in the storage device is already registered, and if it is unregistered, the keyword included in the file data is acquired and the hash value is acquired. , The value indicating the importance of the file is initialized and registered in the index database. If it is already registered and updated file data, the keyword and hash value included in the file data are acquired. And initializing a value indicating the importance of the file and registering it in the index database;
Registering a value for determining the importance of the file data stored in the storage device and a keyword for determining the importance document in the condition database in advance;
A hypertext document including reference information to the file data storage destination is stored in a hypertext database, and the value of the number of references to the file data included in the stored hypertext document is updated for each referenced file data. And storing as a value indicating the importance of the file data of the index database;
The value indicating the importance for each file data stored in the index database is compared with the value for determining the importance of each file data registered in the condition database, and the value indicating the importance for each file data is large. When the keyword included in the file data is acquired from the index database and the keyword registered in the condition database is included, the file data is assumed to be important document data with high importance. And a step of storing a copy of the file data stored in the storage device in a storage device that cannot be updated or deleted.

When updating the value of the number of references to the file data included in the hypertext document stored in the hypertext database for each file data referred to, if there is a deleted hyperlink, the file data 2. The storage device according to claim 1, further comprising a step of subtracting the value of the reference number and adding the value of the reference number to the file data when an added hyperlink exists. A method for determining the importance of data.

A storage system comprising a computer for storing file data in a storage device,
The computer determines whether or not the keyword of the file data stored in the storage device is a registered one by using a hash value, and if it is unregistered, acquires the keyword included in the file data. After acquiring the hash value, initialize the value indicating the importance of the file and register it in the index database. If it is already registered and updated file data, the keyword and hash value included in the file data are Means for acquiring and hash value and initializing a value indicating the importance of the file and registering it in the index database;
Means for preliminarily registering in the condition database a value for determining the importance of the file data stored in the storage device and a keyword for determining that the file is an important document;
A hypertext document including reference information to the file data storage destination is stored in a hypertext database, and the value of the number of references to the file data included in the stored hypertext document is updated for each referenced file data. And means for storing the value indicating the importance of the file data of the index database;
The value indicating the importance for each file data stored in the index database is compared with the value for determining the importance of each file data registered in the condition database, and the value indicating the importance for each file data is large. When the keyword included in the file data is acquired from the index database and the keyword registered in the condition database is included, the file data is assumed to be important document data with high importance. storage system, characterized in that it comprises a means for storing a copy of the file data stored in the storage device to update and erasing impossible storage device.

When updating the value of the number of references to the file data included in the hypertext document stored in the hypertext database for each file data referred to, if there is a deleted hyperlink, the file data 4. The storage system according to claim 3 , further comprising means for subtracting the value of the reference number and adding the value of the reference number to the file data when the added hyperlink exists.

A computer that determines the importance of data stored in the storage device,
Hash with keywords of the search target file data stored in the storage device determines whether or not the registered by the hash value, obtains the keywords contained in the file data as long as unregistered After obtaining the value, initialize the value indicating the importance of the file and register it in the index database. If it is already registered and updated file data, the keyword and hash value included in the file data are obtained. And means for obtaining a hash value and initializing a value indicating the importance of the file and registering it in the index database;
Means for preliminarily registering in the condition database a value for determining the importance of the file data stored in the storage device and a keyword for determining that the file is an important document;
A hypertext document including reference information to the file data storage destination is stored in a hypertext database, and the value of the number of references to the file data included in the stored hypertext document is updated for each referenced file data. And means for storing the value indicating the importance of the file data of the index database;
The value indicating the importance for each file data stored in the index database is compared with the value for determining the importance of each file data registered in the condition database, and the value indicating the importance for each file data is large. When the keyword included in the file data is acquired from the index database and the keyword registered in the condition database is included, the file data is assumed to be important document data with high importance. A program that causes a copy of file data stored in a storage device to function as means for storing in a storage device that cannot be updated or deleted.

When updating the value of the number of references to the file data included in the hypertext document stored in the hypertext database for each file data referred to, if there is a deleted hyperlink, the file data 6. The program according to claim 5, wherein when there is a hyperlink added by subtracting the value of the reference number, the program is caused to function as means for adding the value of the reference number to the file data.