JP6284395B2

JP6284395B2 - Data storage control device, data storage control method, and program

Info

Publication number: JP6284395B2
Application number: JP2014045431A
Authority: JP
Inventors: 泰弘小原
Original assignee: NTT Communications Corp
Current assignee: NTT Communications Corp
Priority date: 2014-03-07
Filing date: 2014-03-07
Publication date: 2018-02-28
Anticipated expiration: 2034-03-07
Also published as: JP2015170201A

Description

本発明は、分散ファイルシステムのように、複数のデータ格納装置にデータを格納するデータ格納システムに関連するものである。 The present invention relates to a data storage system that stores data in a plurality of data storage devices, such as a distributed file system.

クラウドサービスの普及等により、信頼性が高く、より高速なストレージが求められている。このような要求に対し、複数のストレージノードにファイルシステムの機能を分散させた分散ファイルシステムが提案されている。 Due to the widespread use of cloud services, storage with higher reliability and higher speed is required. In response to such a request, a distributed file system in which file system functions are distributed to a plurality of storage nodes has been proposed.

分散ファイルシステムにおいて、データをどのストレージノードに配置するかは、ハッシュ関数等を用いて決定される。これにより、複数ノードの分散による信頼性の向上、複数ノードからの入出力の並列化による性能の向上等が達成される。なお、分散ファイルシステムに関連する先行技術として、特許文献１に記載された技術がある。 In the distributed file system, to which storage node data is arranged is determined using a hash function or the like. As a result, reliability is improved by distributing a plurality of nodes, and performance is improved by parallelizing input / output from the plurality of nodes. As a prior art related to the distributed file system, there is a technique described in Patent Document 1.

特開２００９−２５９００７号公報JP 2009-259007 A

分散ファイルシステムにおいて、データが配置される複数のストレージノードには、クライアントからデータの書き込みを行うことができるプライマリノードと、当該データの複製が格納され、クライアントからデータ読み出しのみができるレプリカノードが存在する。 In a distributed file system, there are a primary node that can write data from a client and a replica node that stores a copy of the data and can only read data from the client in multiple storage nodes where data is placed. To do.

しかし、従来技術では、所定のアルゴリズムにより、どのストレージノードがプライマリノードになり、どのストレージノードがレプリカノードになるかが決まってしまい、特定のストレージノードをプライマリに指定するといった制御を行うことができない。 However, in the prior art, a predetermined algorithm determines which storage node becomes a primary node and which storage node becomes a replica node, and control such as designating a specific storage node as primary cannot be performed. .

従って、例えば、クライアントが、一旦決定されたプライマリノードから遠く離れた地点に移動した場合において、近くにストレージノード（レプリカノード）がある場合であっても、当該離れたプライマリノードに対してデータ書き込みを行わなければならず、処理に時間がかかるといった問題が生じ得る。 Therefore, for example, when the client moves to a point far away from the primary node once determined, even if there is a storage node (replica node) nearby, data writing to the remote primary node is possible May have a problem that it takes time for processing.

本発明は上記の点に鑑みてなされたものであり、データを複数のデータ格納装置に格納するデータ格納システムにおいて、プライマリとするデータ格納装置を指定することを可能とする技術を提供することを目的とする。 The present invention has been made in view of the above points, and provides a technique that enables a data storage device to be designated as a primary in a data storage system that stores data in a plurality of data storage devices. Objective.

本発明の実施の形態によれば、データを複数のデータ格納装置に格納するデータ格納システムにおけるデータ格納制御装置であって、
前記複数のデータ格納装置の順序集合を格納する格納手段と、
前記複数のデータ格納装置のうちの特定のデータ格納装置を指定するための指定情報を用いて、前記順序集合における要素の順序を変更する順序算出手段と、を備え、
前記データ格納システムにおいて、前記順序算出手段により順序が変更された順序集合における所定番目の要素に対応するデータ格納装置が前記特定のデータ格納装置として使用されることを特徴とするデータ格納制御装置が提供される。 According to an embodiment of the present invention, a data storage control device in a data storage system for storing data in a plurality of data storage devices,
Storage means for storing an ordered set of the plurality of data storage devices;
Order calculating means for changing the order of elements in the ordered set using designation information for designating a specific data storage device of the plurality of data storage devices;
In the data storage system, a data storage control device, wherein a data storage device corresponding to a predetermined element in the ordered set whose order is changed by the order calculation means is used as the specific data storage device. Provided.

また、本発明の実施の形態によれば、データを複数のデータ格納装置に格納するデータ格納システムにおけるデータ格納制御装置が実行するデータ格納制御方法であって、
前記データ格納制御装置は、前記複数のデータ格納装置の順序集合を格納する格納手段を備えており、前記データ格納制御方法は、
前記複数のデータ格納装置のうちの特定のデータ格納装置を指定するための指定情報を用いて、前記順序集合における要素の順序を変更する順序算出ステップを備え、
前記データ格納システムにおいて、前記順序算出ステップにより順序が変更された順序集合における所定番目の要素に対応するデータ格納装置が前記特定のデータ格納装置として使用されることを特徴とするデータ格納制御方法が提供される。 According to the embodiment of the present invention, there is provided a data storage control method executed by a data storage control device in a data storage system for storing data in a plurality of data storage devices,
The data storage control device includes storage means for storing an ordered set of the plurality of data storage devices, and the data storage control method includes:
Using designation information for designating a specific data storage device among the plurality of data storage devices, comprising an order calculation step of changing the order of elements in the ordered set;
In the data storage system, a data storage control method, wherein a data storage device corresponding to a predetermined element in the ordered set whose order is changed by the order calculation step is used as the specific data storage device. Provided.

本発明の実施の形態によれば、データを複数のデータ格納装置に格納するデータ格納システムにおいて、プライマリとするデータ格納装置を指定することを可能とする技術が提供される。 According to the embodiment of the present invention, there is provided a technique capable of designating a primary data storage device in a data storage system that stores data in a plurality of data storage devices.

本発明の実施の形態に係るシステムの全体構成図である。1 is an overall configuration diagram of a system according to an embodiment of the present invention. データ格納装置３０の機能構成図である。3 is a functional configuration diagram of a data storage device 30. FIG. データ格納制御装置１０の機能構成図である。3 is a functional configuration diagram of the data storage control device 10. FIG. データ格納制御装置１０が保持するテーブルの例を示す図である。It is a figure which shows the example of the table which the data storage control apparatus 10 hold | maintains. プライマリデータの格納場所を決定する処理を説明するための図である。It is a figure for demonstrating the process which determines the storage location of primary data. プレイスメントグループの概念を説明するための図である。It is a figure for demonstrating the concept of a placement group. 複製経路決定処理を説明するための図である。It is a figure for demonstrating a replication route determination process. 複製経路決定処理を説明するための図である。It is a figure for demonstrating a replication route determination process. 本実施の形態における処理の流れの例を示すシーケンス図である。It is a sequence diagram which shows the example of the flow of a process in this Embodiment.

以下、図面を参照して本発明の実施の形態を説明する。以下で説明する実施の形態は一例に過ぎず、本発明が適用される実施の形態は、以下の実施の形態に限られるわけではない。例えば、本実施の形態は、分散ファイルシステムを対象とするが、本発明は分散ファイルシステムに限らず適用可能である。 Embodiments of the present invention will be described below with reference to the drawings. The embodiment described below is merely an example, and the embodiment to which the present invention is applied is not limited to the following embodiment. For example, the present embodiment is directed to a distributed file system, but the present invention is not limited to a distributed file system and can be applied.

（システムの構成）
図１に本発明の実施の形態に係るシステムの全体構成図を示す。本実施の形態に係るシステは、複数のストレージノード（本実施の形態のデータ格納装置３０）にファイルシステムの機能を分散させた分散ファイルシステムである。分散ファイルシステムには現状、種々のものがあり、本実施の形態の分散ファイルシステムはＣｅｐｈをベースとするが、これは例に過ぎず、本実施の形態の技術は分散ファイルシステムの種類に限らず適用可能である。また、本実施の形態では、本発明に関する機能／処理を主に説明している。例えば、ハッシュ関数等によってデータを複数のストレージノードにマッピングする処理は分散ファイルシステムの一般的な処理であり、本実施の形態においては簡易な説明としている。 (System configuration)
FIG. 1 shows an overall configuration diagram of a system according to an embodiment of the present invention. The system according to the present embodiment is a distributed file system in which file system functions are distributed to a plurality of storage nodes (the data storage device 30 according to the present embodiment). Currently, there are various distributed file systems, and the distributed file system of the present embodiment is based on Ceph, but this is only an example, and the technology of the present embodiment is limited to the types of distributed file systems. It is applicable. In the present embodiment, functions / processing related to the present invention are mainly described. For example, the process of mapping data to a plurality of storage nodes using a hash function or the like is a general process of a distributed file system, and is simply described in the present embodiment.

図１に示すように、本実施の形態に係るシステムは、データ格納制御装置１０、クライアント装置２０、データ格納装置３０がネットワーク４０に接続された構成を備える。図１には、データ格納制御装置１０、クライアント装置２０がそれぞれ１つずつ示されているが、これは例であり、それぞれが複数であってもよい。また、データ格納制御装置１０の機能をクライアント装置２０内又はいずれかのデータ格納装置３０内に備えることも可能であり、その場合、データ格納制御装置１０を備えないこととしてもよい。データ格納制御装置１０の機能をクライアント装置２０やデータ格納装置３０に備える場合、クライアント装置２０やデータ格納装置３０をデータ格納制御装置と称してもよい。 As shown in FIG. 1, the system according to the present embodiment has a configuration in which a data storage control device 10, a client device 20, and a data storage device 30 are connected to a network 40. In FIG. 1, one data storage control device 10 and one client device 20 are shown, but this is an example, and there may be a plurality of each. Further, the function of the data storage control device 10 can be provided in the client device 20 or any one of the data storage devices 30. In this case, the data storage control device 10 may not be provided. When the function of the data storage control device 10 is provided in the client device 20 or the data storage device 30, the client device 20 or the data storage device 30 may be referred to as a data storage control device.

データ格納装置３０は、分散ファイルシステムにおけるストレージノードである。ファイルのデータは複数のデータ格納装置３０に格納される。本実施の形態の分散ファイルシステムでは、データ格納装置３０群に格納される１つのデータに対し、１つのプライマリのデータ格納装置３０と、それ以外のレプリカのデータ格納装置３０が存在する。以下、プライマリのデータ格納装置３０をプライマリ格納装置と呼び、レプリカのデータ格納装置３０をレプリカ格納装置と呼ぶことにし、プライマリ格納装置に格納されるデータをプライマリデータ、レプリカ格納装置に格納されるデータをレプリカデータ（あるいは、複製、コピー等）と呼ぶことにする。 The data storage device 30 is a storage node in the distributed file system. File data is stored in a plurality of data storage devices 30. In the distributed file system according to the present embodiment, there is one primary data storage device 30 and other replica data storage devices 30 for one piece of data stored in the data storage device 30 group. Hereinafter, the primary data storage device 30 is referred to as a primary storage device, the replica data storage device 30 is referred to as a replica storage device, and the data stored in the primary storage device is primary data and the data stored in the replica storage device. Is called replica data (or replica, copy, etc.).

クライアント装置２０からファイルに対する書き込みを行うことができるデータ格納装置３０は、プライマリ格納装置のみである。データの読み出しはプライマリ格納装置からでも、レプリカ格納装置からでも可能である。また、プライマリ格納装置に格納されたデータの複製が、各レプリカ格納装置に格納される。 The data storage device 30 that can write to the file from the client device 20 is only the primary storage device. Data can be read from the primary storage device or the replica storage device. In addition, a copy of the data stored in the primary storage device is stored in each replica storage device.

図２に、データ格納装置３０の機能構成図を示す。図２に示すように、データ格納装置３０は、データ格納部３１、分散ファイルシステム処理部３２、複製制御部３３を有する。データ格納部３１は、データを格納するハードディスク、メモリ等のデバイスである。分散ファイルシステム処理部３２は、データ格納装置３０を分散ファイルシステムにおけるストレージノードとして機能させるための処理部であり、当該処理部自体は既存技術で実現できる。複製制御部３３は、格納するデータの複製を他のデータ格納装置３０に格納するために、データ格納制御装置１０から指示された経路に沿ってデータの転送等を行う機能部である。 FIG. 2 shows a functional configuration diagram of the data storage device 30. As illustrated in FIG. 2, the data storage device 30 includes a data storage unit 31, a distributed file system processing unit 32, and a replication control unit 33. The data storage unit 31 is a device such as a hard disk or a memory that stores data. The distributed file system processing unit 32 is a processing unit for causing the data storage device 30 to function as a storage node in the distributed file system, and the processing unit itself can be realized by existing technology. The replication control unit 33 is a functional unit that transfers data along a path instructed by the data storage control device 10 in order to store a copy of data to be stored in another data storage device 30.

クライアント装置２０は、例えばＰＣであり、分散ファイルシステムのクライアント側のソフトウェアが搭載されている。 The client device 20 is a PC, for example, and is loaded with software on the client side of the distributed file system.

図３に、本実施の形態に係るデータ格納制御装置１０の機能構成図を示す。図３に示すように、データ格納制御装置１０は、データ格納装置順序計算部１１、プライマリ指定子決定部１２、テーブル情報格納部１３、複製経路決定部１４、経路決定用情報格納部１５、及び通信処理部１６を有する。各機能部の概要は以下のとおりである。 FIG. 3 shows a functional configuration diagram of the data storage control device 10 according to the present embodiment. As shown in FIG. 3, the data storage control device 10 includes a data storage device order calculation unit 11, a primary specifier determination unit 12, a table information storage unit 13, a replication path determination unit 14, a path determination information storage unit 15, and A communication processing unit 16 is included. The outline of each functional part is as follows.

データ格納装置順序計算部１１は、テーブル情報格納部１３に格納されている情報に基づいて、例えばファイル毎に、当該ファイルのデータを格納するデータ格納装置３０のうち、どれをプライマリ格納装置とし、どれをレプリカ格納装置とするか等を決定する。プライマリ指定子決定部１２は、後述するプライマリ指定子を例えばファイル単位で決定し、決定したプライマリ指定子をテーブル情報格納部１３に格納する。複製経路決定部１４は、プライマリ格納装置からレプリカ格納装置に対してデータの複製を格納する際における格納の経路を決定する機能部である。経路決定用情報格納部１５は、複製経路決定部１４が経路を決定する際に用いられる地理的な情報やネットワークの情報等を格納している。通信処理部１６は、他の装置との間の通信を行う。 Based on the information stored in the table information storage unit 13, the data storage device sequence calculation unit 11, for example, for each file, which of the data storage devices 30 that stores the data of the file is the primary storage device, Which one is used as a replica storage device is determined. The primary specifier determination unit 12 determines a primary specifier to be described later, for example, in units of files, and stores the determined primary specifier in the table information storage unit 13. The replication path determination unit 14 is a functional unit that determines a storage path when storing a replica of data from the primary storage device to the replica storage device. The route determination information storage unit 15 stores geographical information, network information, and the like used when the replication route determination unit 14 determines a route. The communication processing unit 16 performs communication with other devices.

図４に、テーブル情報格納部１３に格納されるテーブルの例を示す。図４（ａ）に示すテーブルは、ファイル（ファイル番号等の識別情報）毎にプライマリ指定子を格納するテーブルである。このテーブルをプライマリ指定子テーブルと呼ぶ。図４（ｂ）に示すテーブルは、ファイル毎に、ファイルのデータを格納するデータ格納装置３０の順序集合を格納したテーブルである。このテーブルを順序集合テーブルと呼ぶ。 FIG. 4 shows an example of a table stored in the table information storage unit 13. The table shown in FIG. 4A is a table that stores a primary designator for each file (identification information such as a file number). This table is called a primary specifier table. The table shown in FIG. 4B is a table that stores an ordered set of data storage devices 30 that store file data for each file. This table is called an ordered set table.

本実施の形態に係るデータ格納制御装置１０は、コンピュータに、本実施の形態で説明する処理内容を記述したプログラムを実行させることにより実現可能である。すなわち、データ格納制御装置１０が有する機能は、当該コンピュータに内蔵されるＣＰＵやメモリ、ハードディスクなどのハードウェア資源を用いて、データ格納制御装置１０で実施される処理に対応するプログラムを実行することによって実現することが可能である。また、上記プログラムは、コンピュータが読み取り可能な記録媒体（可搬メモリ等）に記録して、保存したり、配布したりすることが可能である。また、上記プログラムをインターネットや電子メールなど、ネットワークを通して提供することも可能である。 The data storage control apparatus 10 according to the present embodiment can be realized by causing a computer to execute a program describing the processing contents described in the present embodiment. That is, the function of the data storage control device 10 is to execute a program corresponding to the process executed by the data storage control device 10 using hardware resources such as a CPU, memory, and hard disk built in the computer. Can be realized. Further, the program can be recorded on a computer-readable recording medium (portable memory or the like), stored, or distributed. It is also possible to provide the program through a network such as the Internet or electronic mail.

以下では、プライマリデータの格納場所であるプライマリ格納装置を決定する処理、及び、複製経路を決定する処理を詳細に説明する。 Hereinafter, a process for determining a primary storage device that is a storage location of primary data and a process for determining a replication path will be described in detail.

（プライマリ格納装置を決定する処理）
既存機能として、本実施の形態における分散ファイルシステムでは、データを格納するデータ格納装置群を、ハッシュアルゴリズム等を使用して決定する（データとデータ格納装置とのマッピングを行う）機能を有する。当該マッピングの処理は、クライアント装置２０で行ってもよいし、データ格納制御装置１０で行ってもよい。ここで格納対象の「データ」は、ファイル単位のデータであってもよいし、後述するようにファイルのデータを分割した単位のデータであってもよい。 (Process to determine primary storage device)
As an existing function, the distributed file system according to the present embodiment has a function of determining a data storage device group for storing data using a hash algorithm or the like (mapping data and data storage device). The mapping process may be performed by the client device 20 or the data storage control device 10. Here, the “data” to be stored may be data in file units, or may be data in units obtained by dividing file data as will be described later.

上記マッピングの処理によりデータ格納装置３０の順序集合が得られる。得られた順序集合において、１番目のデータ格納装置３０がプライマリ格納装置として用いられ、それ以外がレプリカ格納装置として用いられる。しかし、この方式では、プライマリ格納装置が故障しない限り、プライマリとレプリカを入れ替えることはできない。 An ordered set of data storage devices 30 is obtained by the above mapping process. In the obtained ordered set, the first data storage device 30 is used as a primary storage device, and the others are used as replica storage devices. However, with this method, the primary and replica cannot be interchanged unless the primary storage device fails.

前述したように、クライアント装置２０からはプライマリ格納装置に対してのみデータの書き込みを行うことができるため、例えば、クライアント装置２０がプライマリ格納装置から地理的に遠く離れた場所に移動した場合において、たとえ近くにデータ格納装置３０（レプリカ格納装置）があったとしても、当該レプリカ格納装置をプライマリ格納装置として、書き込みを行うことはできなかった。このような場合、大量のデータの書き込みを行う場合等では、長い時間がかかる可能性がある。 As described above, since data can be written only to the primary storage device from the client device 20, for example, when the client device 20 moves to a location far away from the primary storage device, Even if there is a data storage device 30 (replica storage device) in the vicinity, writing could not be performed using the replica storage device as the primary storage device. In such a case, when a large amount of data is written, it may take a long time.

一方、本実施の形態では、プライマリ指定子を用いることで、データの転送等を伴わずに、レプリカ格納装置をプライマリ格納装置に変更することを可能としている。以下の最初の説明では、例として、ファイル単位でのマッピング例を示す。また、ファイルの識別情報としてファイル番号ｉを使用する。また、格納の対象とするファイルを対象ファイルと呼ぶ。 On the other hand, in the present embodiment, by using a primary designator, it is possible to change a replica storage device to a primary storage device without transferring data or the like. In the first description below, an example of mapping in units of files is shown as an example. The file number i is used as file identification information. A file to be stored is called a target file.

ファイル番号ｉ等に対してハッシュ関数を用いることで、対象ファイルのデータを格納するデータ格納装置３０の順序集合としてＯ＝｛ｏ１、ｏ２、ｏ３｝が得られたとする。本実施の形態において、当該順序集合は、クライアント装置２０により算出された場合、また、データ格納制御装置１０で算出された場合のいずれもデータ格納制御装置１０のテーブル情報格納部１３に格納されるものとする。 It is assumed that O = {o1, o2, o3} is obtained as an ordered set of the data storage device 30 that stores data of the target file by using a hash function for the file number i and the like. In the present embodiment, the ordered set is stored in the table information storage unit 13 of the data storage control device 10 both when calculated by the client device 20 and when calculated by the data storage control device 10. Shall.

図５（ａ）に示すように、｛ｏ１、ｏ２、ｏ３｝における各要素はデータ格納装置３０を示す。ｏ１等の識別情報にどのデータ格納装置３０が対応し、また、データ格納装置３０へアクセスする際に必要となるＩＰアドレス等の情報はデータ格納制御装置１０に予め格納されているものとする。 As shown in FIG. 5A, each element in {o1, o2, o3} represents a data storage device 30. It is assumed that which data storage device 30 corresponds to the identification information such as o1 and information such as an IP address necessary for accessing the data storage device 30 is stored in the data storage control device 10 in advance.

順序集合が｛ｏ１、ｏ２、ｏ３｝である本例の場合、先頭の要素はｏ１であるから、ｏ１のデータ格納装置３０がプライマリ格納装置となる。従って、この状態において、クライアント装置２０は、アクセスの際にデータ格納制御装置１０から当該順序集合等を取得し、対象ファイルのデータ書き込みを行う際には、ｏ１のデータ格納装置３０（プライマリ格納装置）にのみ書き込みを行う。なお、クライアント装置２０が、順序集合｛ｏ１、ｏ２、ｏ３｝により、プライマリ格納装置を認識し、プライマリ格納装置に対してのみ書き込みを行うといった処理自体は、本実施の形態における分散ファイルシステムのクライアント側のソフトウェアが既存機能として備える機能である。 In this example in which the ordered set is {o1, o2, o3}, the leading element is o1, so the data storage device 30 of o1 becomes the primary storage device. Therefore, in this state, the client device 20 acquires the ordered set or the like from the data storage control device 10 at the time of access, and when writing the data of the target file, the o1 data storage device 30 (primary storage device). ) Only. Note that the client device 20 recognizes the primary storage device by the ordered set {o1, o2, o3}, and the processing itself of writing only to the primary storage device is the client of the distributed file system in this embodiment. This is a function that the existing software has as an existing function.

その後、対象ファイルに対して（ゼロでない）プライマリ指定子ｒが与えられ、ファイル番号ｉに対応付けてテーブル情報格納部１３のプライマリ指定子テーブルに格納されたものとする。 Thereafter, it is assumed that a primary specifier r (not zero) is given to the target file and stored in the primary specifier table of the table information storage unit 13 in association with the file number i.

データ格納制御装置１０におけるデータ格納装置順序計算部１１は、対象ファイルのプライマリ指定子ｒ（本例では整数の１とする）をプライマリ指定子テーブルから読み出すとともに、対象ファイルの順序集合｛ｏ１、ｏ２、ｏ３｝を順序集合テーブルから読み出す。そして、データ格納装置順序計算部１１は、順序集合｛ｏ１、ｏ２、ｏ３｝をｒ（１）回分回転（シフト、ロテート）し、シフト後の順序集合Ｏ´＝｛ｏ２、ｏ３、ｏ１｝を得て、これを対象ファイルの識別情報に対応付けて順序集合テーブルに格納する。 The data storage device order calculation unit 11 in the data storage control device 10 reads the primary specifier r of the target file (in this example, an integer 1) from the primary specifier table and also sets the target file order set {o1, o2 , O3} is read from the ordered set table. Then, the data storage device order calculation unit 11 rotates (shifts, rotates) the ordered set {o1, o2, o3} by r (1) times, and obtains the shifted ordered set O ′ = {o2, o3, o1}. This is associated with the identification information of the target file and stored in the ordered set table.

シフト後の順序集合｛ｏ２、ｏ３、ｏ１｝における先頭の要素はｏ２であるから、図５（ｂ）に示すとおり、ｏ２のデータ格納装置がプライマリ格納装置になる。つまり、図５（ａ）に示すように、当初は拠点Ａにあるデータ格納装置（ｏ１）がプライマリであったが、プライマリ指定子ｒによる計算を行うことで拠点Ｂにあるデータ格納装置（ｏ２）がプライマリになったのである。 Since the leading element in the ordered set {o2, o3, o1} after the shift is o2, as shown in FIG. 5B, the data storage device of o2 becomes the primary storage device. That is, as shown in FIG. 5A, the data storage device (o1) at the base A is initially primary, but the data storage device (o2) at the base B is calculated by performing the calculation using the primary specifier r. ) Became primary.

上記の例では、特定のプライマリ指定子ｒが与えられたときに、プライマリ格納装置が特定のデータ格納装置へ変更されることを説明した。ここで、プライマリ指定子ｒの決定の方法は特定の方法に限定されないが、例えば、以下のようにして、クライアント装置２０との距離が最も近いデータ格納装置がプライマリ格納装置になるように、プライマリ指定子ｒを決定することができる。 In the above example, it has been described that when a specific primary designator r is given, the primary storage device is changed to a specific data storage device. Here, the method of determining the primary specifier r is not limited to a specific method. For example, as described below, the primary storage device becomes the primary storage device so that the data storage device closest to the client device 20 becomes the primary storage device. The specifier r can be determined.

例えば図５の例において、ある時点においてクライアント装置２０（対象ファイルにアクセスするクライアント装置を意味する）は拠点Ａに近い場所にあり、拠点Ａにあるデータ格納装置（ｏ１）をプライマリ格納装置としている。 For example, in the example of FIG. 5, the client device 20 (meaning a client device that accesses the target file) at a certain point in time is near the base A, and the data storage device (o1) at the base A is the primary storage device. .

その後、クライアント装置２０が拠点Ｂに近い地点に移動し、対象ファイルへのアクセスを行う場合において、対象ファイルの識別情報を含むアクセス要求がデータ格納制御装置１０に送信される。また、アクセス要求には、クライアント装置２０の識別情報等の識別情報が含まれており、この識別情報により、クライアント装置２０と各データ格納装置との間の距離を判定できるものとする。 Thereafter, when the client device 20 moves to a location near the base B and accesses the target file, an access request including identification information of the target file is transmitted to the data storage control device 10. Further, the access request includes identification information such as identification information of the client device 20, and the distance between the client device 20 and each data storage device can be determined based on this identification information.

アクセス要求を受信したデータ格納制御装置１０において、プライマリ指定子決定部１２は、対象ファイルのデータを格納するデータ格納装置のうち、拠点Ｂのデータ格納装置（ｏ２）がクライアント装置２０に最も近いと判断し、データ格納装置（ｏ２）がプライマリ格納装置になるようにプライマリ指定子ｒを決定する。例えば、順序集合｛ｏ１、ｏ２、ｏ３｝において、データ格納装置（ｏ２）がプライマリ格納装置になるには、ｏ２が先頭になればよいから、ｒを１と決定し、これを対象ファイルの識別情報と対応付けてプライマリ指定子テーブルに格納する。その後は、前述したようにＯ´＝｛ｏ２、ｏ３、ｏ１｝が算出され、Ｏ´＝｛ｏ２、ｏ３、ｏ１｝に従って、対象ファイルのプライマリ格納装置が決められる。 In the data storage control device 10 that has received the access request, the primary designator determination unit 12 determines that the data storage device (o2) at the base B is the closest to the client device 20 among the data storage devices that store the data of the target file. The primary designator r is determined so that the data storage device (o2) becomes the primary storage device. For example, in order set {o1, o2, o3}, in order for the data storage device (o2) to become the primary storage device, it is only necessary that o2 is first, so r is determined as 1, and this is used as the identification information of the target file. Store in the primary specifier table in association. Thereafter, as described above, O ′ = {o2, o3, o1} is calculated, and the primary storage device of the target file is determined according to O ′ = {o2, o3, o1}.

＜プレイスメントグループについて＞
前述したように、データ格納装置にマッピングされるデータの単位はファイルに限られない。例えば、Ｃｅｐｈのプレイスメントグループ（ＰＧ）の概念を適用し、ファイルのデータを分割した単位でデータ格納装置へのマッピングを行ってもよい。 <About Placement Group>
As described above, the unit of data mapped to the data storage device is not limited to a file. For example, the concept of Ceph's placement group (PG) may be applied, and mapping to the data storage device may be performed in units of dividing the file data.

図６を参照してＰＧの概念を説明する。図６に示すように、ファイルのデータｄがサイズ等に応じて複数のファイルブロック（図６の例ではｄ１、ｄ２、ｄ３）に分割される。そして、分割された各ファイルブロックは、プレイスメントグループ（ＰＧ）と呼ばれる仮想的なコンテナにマッピングされる。なお、図６の例では、ｄ１、ｄ２、ｄ３がＰＧ１、ＰＧ２、ＰＧ３にマッピングされているがこれは例に過ぎず、実際にはハッシュアルゴリズム等によりコンテナへのマッピングが決められる。そして、最終的に、各ＰＧをハッシュ関数等によりデータ格納装置へマッピングする。このマッピング方法は、前述したファイル単位にデータ格納装置へマッピングする方法と同じでよい。つまり、前述したファイル単位でのマッピング方法は、プレイスメントグループの概念を取り入れた場合において、分割数が１である場合と等価である。 The concept of PG will be described with reference to FIG. As shown in FIG. 6, the file data d is divided into a plurality of file blocks (d1, d2, d3 in the example of FIG. 6) according to the size and the like. Each divided file block is mapped to a virtual container called a placement group (PG). In the example of FIG. 6, d1, d2, and d3 are mapped to PG1, PG2, and PG3, but this is only an example, and the mapping to the container is actually determined by a hash algorithm or the like. Finally, each PG is mapped to the data storage device by a hash function or the like. This mapping method may be the same as the method for mapping to the data storage device in units of files described above. That is, the above-described mapping method in units of files is equivalent to the case where the number of divisions is 1 when the concept of a placement group is introduced.

プレイスメントグループの概念を導入する場合、図４（ａ）に示すプライマリ指定子テーブルは、ファイルブロック毎にプライマリ指定子を対応付けたテーブルとなる。また、図４（ｂ）に示す順序集合テーブルは、ＰＧ毎に順序集合を対応付けたテーブルとなる。また、クライアント装置２０が、ＰＧとファイルブロックとファイルとの関係を知る必要があるから、「ファイル−ファイルブロック−ＰＧ」のマッピング情報が必要になるが、これはクライアント装置２０が保持してもよいし、データ格納制御装置が保持してもよい。 When the concept of a placement group is introduced, the primary specifier table shown in FIG. 4A is a table in which a primary specifier is associated with each file block. Also, the ordered set table shown in FIG. 4B is a table in which the ordered set is associated with each PG. Further, since the client device 20 needs to know the relationship between the PG, the file block, and the file, mapping information of “file-file block-PG” is necessary. Alternatively, the data storage control device may hold it.

プライマリ指定子を用いたプライマリ格納装置の変更方法は前述したとおりである。前述したファイル単位での方法において、「ファイル」を「ファイルブロック」に置き換えればよい。 The method for changing the primary storage device using the primary designator is as described above. In the file unit method described above, “file” may be replaced with “file block”.

例えば、プライマリ格納装置とレプリカ格納装置が地理的に離れた拠点に存在する場合において、ファイルのデータの前半部分はアクセスが早いが後半部分は遅延が発生するといった問題が生じる可能性がある。これに対して、上記のように、ファイルブロック毎にプライマリ指定子ｒを設定し、同じ拠点がプライマリ格納装置になるようにすることで解決できる。 For example, when the primary storage device and the replica storage device exist in geographically separated locations, there is a possibility that the first half of the file data is accessed quickly but the second half is delayed. On the other hand, as described above, the problem can be solved by setting the primary designator r for each file block so that the same base becomes the primary storage device.

（複製経路を決定する処理）
次に、データ格納制御装置１０の複製経路決定部１４において行われる複製経路決定処理を説明する。 (Process to determine replication route)
Next, the replication path determination process performed in the replication path determination unit 14 of the data storage control device 10 will be described.

例えば、順序集合Ｏ＝｛ｏ１、ｏ２、ｏ３｝が計算されたとすると、プライマリ格納装置はｏ１になり、その他はレプリカ格納装置となる。前述したように、クライアント装置２０からのデータの書き込みはプライマリ格納装置に対してのみ行うことができ、プライマリ格納装置に格納されたデータ（プライマリデータ）の複製（レプリカデータ）が、プライマリ格納装置を基点として、各レプリカ格納装置に転送され、各レプリカ格納装置に格納される。データを格納する場合の経路については、例えば、「ｏ１―＞ｏ２―＞ｏ３」、「ｏ１―＞ｏ２、ｏ１―＞ｏ３」、「ｏ１―＞ｏ３―＞ｏ２」といった経路をとることができる。このような経路のことを複製経路と呼ぶことにする。 For example, if the ordered set O = {o1, o2, o3} is calculated, the primary storage device is o1, and the others are replica storage devices. As described above, data can be written from the client device 20 only to the primary storage device, and a copy (replica data) of data (primary data) stored in the primary storage device is stored in the primary storage device. As a base point, it is transferred to each replica storage device and stored in each replica storage device. With regard to a path for storing data, for example, a path such as “o1-> o2-> o3”, “o1-> o2, o1-> o3”, and “o1-> o3-> o2” can be taken. . Such a route is called a replication route.

ここで、基本的には、クライアント装置２０からデータ書き込み要求を受けたプライマリ格納装置は、自分へのデータ格納とともに全てのレプリカ格納装置へのデータ格納完了を確認できた後に、クライアント装置２０に対してデータ書き込みが終了したことを通知し、クライアント装置２０はこの通知でもって書き込み処理が終わったと判断し、次の処理に移行できる。 Here, basically, the primary storage device that has received the data write request from the client device 20 can confirm the completion of data storage in all replica storage devices along with the data storage to itself, Thus, the client device 20 determines that the writing process has ended with this notification, and can proceed to the next process.

従って、レプリカデータのレプリカ格納装置への格納は迅速である必要がある。しかし、装置間のネットワーク状況（帯域等）により、装置間でのデータ転送に長い時間がかかる場合がある。特に転送するデータ量が大きい場合には、この問題は顕著になる。 Therefore, it is necessary to store the replica data in the replica storage device quickly. However, data transfer between devices may take a long time depending on the network status (bandwidth, etc.) between the devices. This problem becomes significant especially when the amount of data to be transferred is large.

なお、後述するように、本実施の形態では、上記と異なる手順を採用しており、レプリカ格納装置へのデータ格納完了を確認する前に、クライアント装置２０へ格納完了を通知している。このような場合でも、データの整合性を保つ観点等から、レプリカデータのレプリカ格納装置への格納は迅速である必要がある。 As will be described later, this embodiment employs a procedure different from the above, and notifies the client device 20 of the completion of storage before confirming the completion of data storage in the replica storage device. Even in such a case, it is necessary to store the replica data in the replica storage device quickly from the viewpoint of maintaining data consistency.

そこで、本実施の形態では、ネットワーク状況等を考慮して複製経路を決定する。決定の一例を図７を参照して説明する。図７（ａ）に示すように、Ａ地点（地理的な位置）にプライマリ格納装置が存在し、Ｂ地点とＣ地点にレプリカ格納装置が存在する。そして、各装置（地点）間のネットワークの状況（図７の例では容量）が図に示すとおりであったとする。また、ここでは、帯域１０のネットワークで迅速に転送できると想定されるサイズ１０のレプリカデータの転送を行うものとする。 Therefore, in the present embodiment, the replication route is determined in consideration of the network status and the like. An example of the determination will be described with reference to FIG. As shown in FIG. 7A, a primary storage device exists at point A (geographic position), and replica storage devices exist at points B and C. Assume that the network status (capacity in the example of FIG. 7) between the devices (points) is as shown in the figure. Here, it is assumed that replica data of size 10 that is assumed to be able to be quickly transferred in the network of the band 10 is transferred.

このとき、Ａ地点のプライマリ格納装置からＢ地点のレプリカ格納装置へのデータ転送は迅速に行うことができるが、Ａ地点のプライマリ格納装置からＣ地点のレプリカ格納装置へのデータ転送には非常に時間がかかってしまう。一方、Ｂ地点のレプリカ格納装置からＣ地点のレプリカ格納装置へのデータ転送は迅速に行うことができる。従って、複製経路決定部１４は、図７（ｂ）に示すように、複製経路としてＡ−＞Ｂ−＞Ｃという経路を決定する。 At this time, data transfer from the primary storage device at the point A to the replica storage device at the point B can be performed quickly, but it is very difficult to transfer data from the primary storage device at the point A to the replica storage device at the point C. It takes time. On the other hand, data transfer from the replica storage device at point B to the replica storage device at point C can be performed quickly. Therefore, as shown in FIG. 7B, the replication path determination unit 14 determines a path A-> B-> C as the replication path.

図８は他の例を示す。図８に示す例では、各地点間のネットワークとしての帯域は全て同じく１０であるが、Ａ−＞ＢとＡ−＞Ｃは物理的には１回線（帯域１０）しかない。このような場合、仮に、Ａ−＞Ｂ及びＡ−＞Ｃという経路でサイズ１０のレプリカデータを送信しようとすると、結果として１回線でサイズ２０のデータを送ることになり、迅速に送ることができない。そこで、この場合も、複製経路決定部１４は、図８（ｂ）に示すように、複製経路としてＡ−＞Ｂ−＞Ｃという経路を決定する。 FIG. 8 shows another example. In the example shown in FIG. 8, the network bandwidth between the points is 10 in all cases, but A-> B and A-> C physically have only one line (band 10). In such a case, if replica data of size 10 is transmitted through the route of A-> B and A-> C, the result is that data of size 20 is transmitted by one line, and can be transmitted quickly. Can not. Therefore, in this case as well, as shown in FIG. 8B, the replication path determination unit 14 determines the path A-> B-> C as the replication path.

また、レプリカ格納装置から迅速にデータ格納完了通知を受信することが望ましいため、例えば、プライマリ格納装置から距離の近いレプリカ格納装置に先にレプリカデータを転送する、あるいは、データを高速に格納する能力を持つレプリカ格納装置に先にレプリカデータを転送する、といった判断基準も加味して経路決定することもできる。 In addition, since it is desirable to quickly receive a data storage completion notification from the replica storage device, for example, the ability to transfer replica data to a replica storage device closer to the distance from the primary storage device or to store data at high speed It is also possible to determine the route in consideration of the criterion of transferring replica data to a replica storage device having

経路決定用情報格納部１５には、各装置間の空き帯域等のネットワーク情報、地点間の距離、物理的な回線情報、各データ格納装置の容量／データ格納速度／信頼性／負荷状態等が格納されており、複製経路決定部１４は、経路決定用情報格納部１５から読み出した情報に基づいて、最適な経路を算出する。 The route determination information storage unit 15 includes network information such as free bandwidth between devices, distance between points, physical line information, capacity / data storage speed / reliability / load status of each data storage device, and the like. The replication route determination unit 14 calculates the optimum route based on the information read from the route determination information storage unit 15.

（システムの動作シーケンス例）
次に、図９を参照して、本実施の形態におけるシステムの動作シーケンスの例を説明する。以下の例では、ステップ１０１の前の時点において、アクセスするファイル（対象ファイル）の順序集合が既にあるものとする。 (System operation sequence example)
Next, an example of an operation sequence of the system in the present embodiment will be described with reference to FIG. In the following example, it is assumed that there is already an ordered set of files (target files) to be accessed at a time point before step 101.

クライアント装置２０が対象ファイルへのアクセス（書き込み等）を行う場合に、クライアント装置２０は、対象ファイルの識別情報を含むファイルアクセス要求をデータ格納制御装置１０に送信する（ステップ１０１）。データ格納制御装置１０におけるプライマリ指定子決定部１２は、順序集合に含まれる各データ格納装置とクライアント装置２０との間の距離を決定し、最も近いデータ格納装置をプライマリにすると決定し、当該データ格納装置が順序集合の先頭に来るようにプライマリ指定子を決定する（ステップ１０２）。 When the client device 20 accesses (writes, etc.) the target file, the client device 20 transmits a file access request including the target file identification information to the data storage control device 10 (step 101). The primary designator determination unit 12 in the data storage control device 10 determines the distance between each data storage device included in the ordered set and the client device 20, determines that the closest data storage device is the primary, and the data The primary designator is determined so that the storage device is at the head of the ordered set (step 102).

次に、データ格納装置順序計算部１１が、対象ファイルについての既にある順序集合と、ステップ１０２で決定したプライマリ指定子とから順序集合のシフトを行うことにより、新たな順序集合を得る（ステップ１０３）。これにより、プライマリ格納装置が決定される。本例では、データ格納制御装置１０は、対象ファイルについてプライマリ格納装置になったデータ格納装置に対してその旨を通知する（ステップ１０４）。 Next, the data storage device order calculation unit 11 shifts the ordered set from the existing ordered set for the target file and the primary specifier determined in step 102 to obtain a new ordered set (step 103). ). Thereby, the primary storage device is determined. In this example, the data storage control device 10 notifies the data storage device that has become the primary storage device for the target file (step 104).

また、データ格納装置１０は、クライアント装置２０に応答を返す（ステップ１０５）。応答には、例えば、プライマリ格納装置の識別情報、レプリカ格納装置の識別情報等が含まれる。 The data storage device 10 returns a response to the client device 20 (step 105). The response includes, for example, identification information of the primary storage device, identification information of the replica storage device, and the like.

その後、クライアント装置２０は、データのファイルへの書き込みを行う（ステップ１０６）。ここでは、分散ファイルシステムの機能により、プライマリ格納装置へデータの書き込みが行われる。 Thereafter, the client device 20 writes data to a file (step 106). Here, data is written to the primary storage device by the function of the distributed file system.

ここで、本実施の形態のプライマリ格納装置は、レプリカ格納装置へのデータの複製の転送を行う前に、あるいは、レプリカ格納装置へのデータの複製の転送後、レプリカ格納装置から格納完了通知（ＡＣＫ）を受信する前に、プライマリ格納装置自身のデータの格納が終了した時点で、クライアント装置２０に格納完了通知（ＡＣＫ）を送信することとしている。図９の例では、プライマリ格納装置からレプリカ格納装置へのデータの複製の転送を行う前にクライアント装置２０に格納完了通知（ＡＣＫ）を送信している（ステップ１０７）。 Here, the primary storage device according to the present embodiment notifies the storage completion notification from the replica storage device before transferring the data copy to the replica storage device or after transferring the data copy to the replica storage device ( Before receiving the (ACK), the storage completion notification (ACK) is transmitted to the client device 20 when the storage of the data of the primary storage device itself is completed. In the example of FIG. 9, a storage completion notification (ACK) is transmitted to the client device 20 before transferring the data copy from the primary storage device to the replica storage device (step 107).

レプリカ格納装置とプライマリ格納装置の距離が離れている場合、レプリカ格納装置からの格納完了通知（ＡＣＫ）を待ってクライアント装置２０に格納完了通知（ＡＣＫ）を送信するとなると、クライアント装置２０への格納完了通知（ＡＣＫ）送信が遅れ、クライアント装置２０側で次の処理への移行が遅れてしまう可能性があるが、本実施の形態により、この問題が解消される。 When the distance between the replica storage device and the primary storage device is long, when the storage completion notification (ACK) is transmitted to the client device 20 after waiting for the storage completion notification (ACK) from the replica storage device, the storage to the client device 20 is performed. There is a possibility that the completion notification (ACK) transmission is delayed and the shift to the next processing on the client device 20 side may be delayed, but this embodiment solves this problem.

上記のような処理の他、プライマリ格納装置から距離が近いレプリカ格納装置のみから格納完了通知（ＡＣＫ）を受信した後に、クライアント装置２０に格納完了通知（ＡＣＫ）を送信することとしてもよい。 In addition to the processing described above, the storage completion notification (ACK) may be transmitted to the client device 20 after receiving the storage completion notification (ACK) only from the replica storage device that is close to the primary storage device.

図９のステップ１０８において、プライマリ格納装置は、複製データの転送を行うための経路情報をデータ格納制御装置１０に要求する。要求を受けたデータ格納制御装置１０の複製経路決定部１４は、前述したような方法で経路を決定し（ステップ１０９）、経路の情報をプライマリ格納装置に通知する（ステップ１１０）。そして、この経路に従って複製したデータの転送、及び格納が行われる（ステップ１１１〜１１４）。 In step 108 of FIG. 9, the primary storage device requests the data storage control device 10 for path information for transferring the replicated data. Upon receiving the request, the replication path determination unit 14 of the data storage control apparatus 10 determines the path by the method described above (step 109), and notifies the primary storage apparatus of the path information (step 110). Then, the copied data is transferred and stored along this path (steps 111 to 114).

なお、上記の例では、データ格納制御装置１０が経路を決定したが、プライマリ格納装置自身が、データ格納制御装置１０からネットワーク情報等の経路決定に必要となる情報を取得して、経路決定処理を行うこととしてもよい。 In the above example, the data storage control device 10 determines the route. However, the primary storage device itself acquires information necessary for route determination such as network information from the data storage control device 10, and the route determination processing is performed. It is good also as performing.

（実施の形態のまとめ、効果等）
以上、説明したように、本実施の形態によれば、データを複数のデータ格納装置に格納するデータ格納システムにおけるデータ格納制御装置であって、前記複数のデータ格納装置の順序集合を格納する格納手段と、前記複数のデータ格納装置のうちの特定のデータ格納装置を指定するための指定情報を用いて、前記順序集合における要素の順序を変更する順序算出手段と、を備え、前記データ格納システムにおいて、前記順序算出手段により順序が変更された順序集合における所定番目の要素に対応するデータ格納装置が前記特定のデータ格納装置として使用されることを特徴とするデータ格納制御装置が提供される。 (Summary of the embodiment, effects, etc.)
As described above, according to the present embodiment, a data storage control device in a data storage system for storing data in a plurality of data storage devices, which stores an ordered set of the plurality of data storage devices. Means for changing the order of elements in the ordered set by using designation information for designating a specific data storage device among the plurality of data storage devices, and the data storage system The data storage control device is characterized in that a data storage device corresponding to a predetermined element in the ordered set whose order is changed by the order calculation means is used as the specific data storage device.

前記順序集合における所定番目の要素は、例えば、当該順序集合における１番目の要素である。また、前記指定情報は整数であり、前記順序算出手段は、前記格納手段に格納された順序集合に対して当該整数の回数分のシフトを行うことにより前記順序の変更を行う。 The predetermined element in the ordered set is, for example, the first element in the ordered set. The designation information is an integer, and the order calculation means changes the order by shifting the order set stored in the storage means by the number of times of the integer.

前記データ格納システムにおいては、例えば、前記複数のデータ格納装置のうちの前記特定のデータ格納装置に前記データが書き込まれ、当該特定のデータ格納装置以外のデータ格納装置である複製データ格納装置には前記データの複製データが格納される。 In the data storage system, for example, the data is written to the specific data storage device among the plurality of data storage devices, and the duplicate data storage device which is a data storage device other than the specific data storage device is used. Replicated data of the data is stored.

前記データ格納制御装置は、前記特定のデータ格納装置を基点として、各複製データ格納装置に複製データを格納するための複製経路を、各データ格納装置が設置される地点間を接続するネットワークの情報に基づいて決定する複製経路決定手段を備えてもよい。 The data storage control device uses the specific data storage device as a base point, a replication path for storing the replicated data in each replicated data storage device, and information on a network connecting the points where the respective data storage devices are installed A duplication path determination means for determining based on the above may be provided.

また、前記データ格納制御装置は、前記複数のデータ格納装置のうち、前記データへのアクセスを行うクライアント装置に最も近い地点にあるデータ格納装置が前記特定のデータ格納装置になるように前記指定情報を決定する指定情報決定手段を備えてもよい。 In addition, the data storage control device may include the designation information so that a data storage device located closest to a client device that accesses the data among the plurality of data storage devices becomes the specific data storage device. There may be provided designation information determining means for determining.

前記データ格納システムは、例えば分散ファイルシステムであり、前記データはファイルのデータ、又は、ファイルのデータを分割したデータである。 The data storage system is, for example, a distributed file system, and the data is file data or data obtained by dividing file data.

既に説明したように、従来技術では、プライマリデータをどのデータ格納装置に置くといった詳細な指定はできなかった。また、プライマリデータとレプリカデータの関係を反転させること等もできなかった。 As described above, in the conventional technology, detailed specification such as in which data storage device the primary data is placed cannot be performed. In addition, the relationship between primary data and replica data cannot be reversed.

一方、本実施の形態では、特定のデータ格納装置をプライマリとして指定することが可能となる。これにより、例えば、クライアントからの近接地に既にレプリカが存在する場合において、これをプライマリデータとして利用することが可能となる。この結果、数テラバイトレベルの大規模なデータを疑似的に高速移動させることが可能となる。 On the other hand, in the present embodiment, a specific data storage device can be designated as the primary. Thereby, for example, when a replica already exists in the vicinity from the client, it can be used as primary data. As a result, large-scale data of several terabytes can be moved in a pseudo high speed.

本発明は、上記の実施の形態に限定されることなく、特許請求の範囲内において、種々変更・応用が可能である。 The present invention is not limited to the above-described embodiments, and various modifications and applications are possible within the scope of the claims.

１０データ格納制御装置
２０クライアント装置
３０データ格納装置
４０ネットワーク
１１データ格納装置順序計算部
１２プライマリ指定子決定部
１３テーブル情報格納部
１４複製経路決定部
１５経路決定用情報格納部
１６通信処理部
３１データ格納部
３２分散ファイルシステム処理部
３３複製制御部 DESCRIPTION OF SYMBOLS 10 Data storage control apparatus 20 Client apparatus 30 Data storage apparatus 40 Network 11 Data storage apparatus order calculation part 12 Primary specifier determination part 13 Table information storage part 14 Duplication path | route determination part 15 Path | route determination information storage part 16 Communication processing part 31 Data Storage unit 32 Distributed file system processing unit 33 Replication control unit

Claims

A data storage control device in a data storage system for storing data in a plurality of data storage devices,
Storage means for storing an ordered set of the plurality of data storage devices;
Order calculating means for changing the order of elements in the ordered set using designation information for designating a specific data storage device of the plurality of data storage devices;
In the data storage system, a data storage device corresponding to a predetermined element in the ordered set whose order is changed by the order calculation means is used as the specific data storage device.

The data storage control device according to claim 1, wherein the predetermined element in the ordered set is the first element in the ordered set.

The specification information is an integer, and the order calculation means changes the order by shifting the order set stored in the storage means by the number of times of the integer. The data storage control device according to 1 or 2.

In the data storage system, the data is written to the specific data storage device of the plurality of data storage devices, and the data storage device other than the specific data storage device stores the data The data storage control device according to any one of claims 1 to 3, wherein duplicate data is stored.

A replication path for determining a replication path for storing replicated data in each replicated data storage apparatus based on the information on the network connecting the points where the respective data storage apparatuses are installed, based on the specific data storage apparatus The data storage control device according to claim 4, further comprising a determination unit.

A designation information determining unit configured to determine the designation information so that a data storage device located closest to a client device that accesses the data among the plurality of data storage devices becomes the specific data storage device; 6. The data storage control device according to claim 1, wherein the data storage control device is a data storage control device.

The data storage system according to any one of claims 1 to 6, wherein the data storage system is a distributed file system, and the data is file data or data obtained by dividing file data. Control device.

A data storage control method executed by a data storage control device in a data storage system for storing data in a plurality of data storage devices,
The data storage control device includes storage means for storing an ordered set of the plurality of data storage devices, and the data storage control method includes:
Using designation information for designating a specific data storage device among the plurality of data storage devices, comprising an order calculation step of changing the order of elements in the ordered set;
In the data storage system, a data storage device corresponding to a predetermined element in the ordered set whose order is changed by the order calculation step is used as the specific data storage device.

The program for functioning a computer as each means in the data storage control apparatus of any one of Claims 1 thru | or 7.