CN107046575B

CN107046575B - A kind of high density storage method for cloud storage system

Info

Publication number: CN107046575B
Application number: CN201710250990.8A
Authority: CN
Inventors: 金友兵
Original assignee: Nanjing Zhuo Shengyun Mdt Infotech Ltd
Current assignee: Nanjing Zhuo Shengyun Mdt Infotech Ltd
Priority date: 2017-04-18
Filing date: 2017-04-18
Publication date: 2019-07-12
Anticipated expiration: 2037-04-18
Also published as: CN107046575A

Abstract

The invention discloses a kind of cloud storage systems, including several clients, client is connect by cable with interchanger, it is characterized by: interchanger is connect by cable with metadata node, monitoring node and several OSD memory nodes, every two OSD memory node is connect by SAS line with disk chassis JBOD.The high density storage method of above system are as follows: each OSD memory node respectively manages the hard disk in a part of extension cabinet, after a memory node delay machine, by all hard disks in another memory node adapter tube extension cabinet, guarantees that all data are normally read and write.After node after delay machine restores, associated hard disk is taken over back again, ensure that storage overall performance.The present invention can support OSD memory node to realize highdensity storage capacity, substantially reduce the cost of cloud storage, and promote storage performance.

Description

A kind of high density storage method for cloud storage system

Technical field

The invention belongs to computer application field, in particular to a kind of cloud storage system and its high density storage method.

Background technique

With the fast development of computer technology, cloud storage service is increasingly received by enterprise.Usual cloud storage With the presence of multiserver role in system, metadata node, monitoring node, OSD memory node etc. are generally comprised.Wherein OSD is deposited Storage node is also referred to as object storage nodes, it is the place of the main contents storage of user data.In a Mass storage collection In group, the data or file content that user saves are divided into multiple objects.Each object is according to certain algorithm or metadata The judgement of service is saved on some OSD memory node.Loss of data in order to prevent, an object can copy as multiple copies, It is saved on different OSD memory nodes.After some memory node delay machine or damage, it still is able to by other copies Read-write object data.This storage method can be very good to support the extension of big data and the reliability of data.

But this storage mode is not suitable for providing too many memory space in an OSD memory node.Assuming that an OSD There is a large amount of hard disk on memory node, many data can be saved.After the OSD memory node delay machine or damage, thereon hard Disk will all stop servicing.When after the OSD memory node restore after, even if the data of hard disk are not lost thereon, but every piece The data of hard disk all lag behind the copy data on other nodes, and system needs to carry out a large amount of data recovery and duplication, at this moment Cloud storage can be seriously affected, the ability of service is externally provided.If the number of hard disk is all lost on the node, that is more likely Network data storm is generated, at this moment service almost can not be externally provided.Although the speed that can be restored by control data, to protect The ability that card cloud storage externally services, but reduce the speed restored and will lead to the overlong time of data recovery, in this way other When equipment breaks down, generates service disruption or a possibility that loss of data is larger.

So failure while can permit single hard disk or small-scale hard disk in common distributed storage, still Failure can bring serious problem while a large amount of hard disks.OSD memory node single so generally cannot built-in too many hard disk. In view of the fast development of current CPU and memory, this OSD memory node can not provide high density storage, lead to cloud storage Cost is excessively high.

Summary of the invention

Goal of the invention: aiming at the problems existing in the prior art, the present invention, which provides one kind, can support that OSD memory node is real Existing highdensity storage capacity, substantially reduces the cost of cloud storage, and provide the cloud storage system and its high density of storage performance Storage method.

Technical solution: in order to solve the above technical problems, a kind of cloud storage system provided by the invention, including several clients End, client are connect by cable with interchanger, and interchanger passes through cable and metadata node, monitoring node and several OSD Memory node connection, every two OSD memory node are connect by SAS line with disk chassis JBOD.

Further, the disk chassis JBOD is made of bottom plate and several hard disk drives, the hard disk drive peace On bottom plate.

A kind of high density storage method with cloud storage system as described above, the specific steps are as follows:

Step 1: being initialized first, and two memory nodes have been found that JBOD equipment disk, is then carried out only to disk One property number；

Step 2: the respective adapter tube certain proportion disk of two memory nodes, then to monitoring Node registry；

Step 3: memory node timing carries out network communication with monitoring node；

Step 4: by monitoring node timing acquisition memory node state, then find that certain memory node fails；

Step 5: winning failure node from cluster, and notice opposite end memory node takes over all disks, and to monitoring node Registration, finally completes adapter tube process；

Step 6: node recovery process, after first restoring normal to memory node, to monitoring Node registry；

Step 7: monitoring node notice opposite end memory node is cancelled to the adapter tube of associative disk, and judge whether to cancel at Function enters step eight if cancelling successfully, continues to cancel to the adapter tube of associative disk if not cancelling successfully until taking Disappear success；

Step 8: original memory node adapter tube associative disk, to monitoring Node registry, final system restores normal.

Further, specific step is as follows for the calculation methods of two memory node adapter tube disk ratios in the step 2: Weight is calculated according to the CPU core number of two OSD memory nodes and memory size, automatic disk ratio of distributing gives OSD storage section Point manages, wherein the calculation method of weight: CPU core number * 50%+ memory size * 50%.

Compared with the prior art, the advantages of the present invention are as follows:

Implementation method of the present invention is simple, and traditional dual control storage server also can be realized the adapter tube of hard disk, but due to This kind of server is generally individually present, and is not to be added in cloud storage cluster, the realization of High Availabitity is extremely complex.For example dual control is deposited Storage server generally requires three channels judge whether opposite end is normal, has main heartbeat, attached heartbeat and isolation card here, and guarantee three A channel will not simultaneously disable.Meanwhile two nodes write one piece of hard disk simultaneously in order to prevent, there is also the suicides of a set of complexity Mechanism.Since equipment is in cloud storage cluster in this patent, it can be determined that mode is simple and effective, quite with introduce cluster monitoring Node is arbitrated as third, but this arbitration is simpler, efficient.In addition, in the monitoring agent of OSD memory node itself When program can not be communicated with cluster monitoring node, any read-write requests will be refused, guarantee that there is no two OSD memory nodes are same When the case where writing a hard disk；

Therefore, the present invention provides the OSD memory node high availability schemes of a cloud storage system, improve entire cloud clothes The reliabilty and availability of business so that entire cloud storage system service is more reliable and more stable, while can be supported preferably highly dense Spend disk chassis, reduced carrying cost.

Detailed description of the invention

Fig. 1 is the structural diagram of the present invention；

Fig. 2 is overall flow figure of the invention.

Specific embodiment

With reference to the accompanying drawings and detailed description, the present invention is furture elucidated.

The present invention provides OSD memory node high density storage method in a kind of storage cluster.System structure of the invention is such as Attached drawing 1.In hardware connection, there is HBA card on each OSD memory node, while every two OSD memory node connects jointly One extension cabinet JBOD identifies that all disks in extension cabinet, two such memory node can access extension by HBA card All hard disks on cabinet.Present invention is primarily intended to each OSD memory nodes respectively to manage the hard disk in a part of extension cabinet, when After one memory node delay machine, by all hard disks in another memory node adapter tube extension cabinet, guarantee that all data are normal Read-write.After node after delay machine restores, associated hard disk is taken over back again, ensure that storage overall performance.In order to reach above-mentioned Purpose process step of the invention is as described in Figure 2:

1. initialization procedure: a) all disks in the pairs of each self-discovery extension cabinet in OSD storage service end of every two, Various OSD memory nodes see that each disk is an equipment；B) each disk Unique Device in cluster is registered as to number；c) The ownership situation of each disk is controlled according to certain software algorithm.In general, if two OSD memory node configurations are identical, it can Belong to situation with mean allocation disk；If configuration is different, disk ownership situation can be distributed according to performance ratio；d)OSD The disk oneself belonged to is carried out carry by memory node, and to monitoring Node registry, can be provided read-write in this way and be supported.e)OSD Timing is communicated with monitoring node in memory node operational process, reports the state of each disk.

2. delay machine adapter tube process: a) when one of node delay machine, the monitoring node of cluster finds the node failure, The node is extractd from cluster, is no longer had read-write data in this way and is gone to access the failure node, while peer node being notified to execute Adapter tube.B) after OSD memory node discovery peer node is detached from cluster, all disks in the extension cabinet are taken over.Because equipment is compiled It is recorded before number being all, so directly carry.C) after completing adapter tube, to monitoring Node registry, the disk taken in this way It can normally read and write.This process is fully able to the case where avoiding two nodes while operating a hard disk, has reached realization OSD Memory node High Availabitity target.

3. rejoining group system after the OSD memory node of failure restores.Node discovery malfunctioning node is monitored to restore Afterwards, notice peer node unloads original adapter tube disk.After the completion of unloading, the node restored again takes over back original hard disk, and to Monitor Node registry.Because data are still consistent at this time, do not needed after adapter tube in the recovery operation for carrying out data, it will not The service ability externally provided is provided.

It include: metadata node, monitoring node, OSD memory node in one typical cloud storage cluster.But scale is most Big is OSD memory node.Original OSD memory node is replaced with OSD memory node group, the hardware connection side of cluster by this system Formula is as shown above, and each hardware is all general here.Wherein:

1. extension cabinet JBOD (Just a Bunch Of Disks, hard disk cluster): installed on a bottom plate with more The storage equipment of a hard disk drive.A usual JBOD can have very multi-diskbit, connect a large amount of hard disks, such as 60 or 90 disks The JBOD of position starts to popularize on the market.It is simpler in production since JBOD does not have controller, without various intelligent functions It is single, it is possible to which that there is extreme high reliability.

2. each server node: CPU number and memory size appropriate are needed to configure, since the present invention supports high density to deposit Storage, so can configure biggish CPU core number and memory scale for OSD memory node therein, two OSD memory nodes are needed HBA card is configured, is connect by SAS port with JBOD.

3. the network switch: entire cluster is attached using general interchanger, externally provides cloud storage service.

In this structure, in an OSD memory node group, memory node A and B can be accessed on connection disk chassis JBOD All hard disks can be immediately seen the dev equipment of associated hard disk that is, in linux system.If OSD memory node is wanted Manage certain block hard disk, mount equipment operation under directly progress Linux, so that it may read and write the hard disk；If it is desired to abandoning to certain The read-write operation of a hard disk, the umount equipment under directly progress Linux.

The present invention is a kind of cloud storage system and its high density storage method.The system is divided on two OSD memory nodes It does not dispose, the main flow and method for realizing High Availabitity is as follows:

1. calculating weight according to the CPU core number of two OSD memory nodes and memory size, automatic disk ratio of distributing is given OSD memory node management.The calculation method of weight: CPU core number * 50%+ memory size * 50%, but this calculation method can Ratio can be adjusted as needed；

After the starting of 2.OSD memory node, to the hard disk serial number of cluster monitoring this node state of Node registry and management.Registration After success, OSD memory node uses these hard disks, the read-write requests of client is received, to save and read the data of user；

The every status information and last report for reporting oneself for several seconds (can configure) to cluster of 3.OSD memory node Time；Cluster monitoring node certain time discovery OSD memory node does not have report information, will judge that the OSD memory node fails, The node is extractd from cluster, no matter failure node state in this way is as follows, read-write requests will not occur to arrive the node；

4. finding in addition, OSD memory node can also obtain the integrality of cluster per several seconds by monitoring node to end segment Whether point fails.If it is determined that peer node fails, start hard disk takeover process.The hard disk that the program manages opposite end all by This node administration, and be registered in group system, such group system still is able to normally handle all hard disks, but read-write is asked Failure node will not gone to by asking；

5. monitoring nodes program if it is determined that this end node is out of touch with cluster, will not receive any read-write requests, This can guarantee the case where being not in two nodes while writing one piece of hard disk；

Takeover process part is not to be whole with all hard disks of peer node to take over, but is single with each hard disk Position is to determine whether take over single hard disk.The case where for hard disk corruptions, although also will start takeover process, due to the section Point also can not identify or read the hard disk, understand taking over failing in this way, then still starting common hard disk failure alarm.

The network of node or service flash problem generally manually perform adapter tube after failure node restores in order to prevent The inverse process of operation restores entire cluster service.It in this way can be hard to avoid there is repeatedly concussive adapter tube in certain situations Disk.This recovery operation mainly comprises the processes of

After 1. delay machine node restores normal, to cluster monitoring Node registry.

2. monitoring node notice opposite end memory node cancellation to the adapter tube of associative disk.Usually peer node stops to phase The read-write of disk is closed, executes umount unloading operation after complete.

3. after unloading successfully, the memory node of recovery re-executes the adapter tube operation of these disks, generally directly execute Mount carry, and to monitoring Node registry.

It is sub that the above description is only an embodiment of the present invention, is not intended to restrict the invention.It is all in principle of the invention Within, made equivalent replacement should all be included in the protection scope of the present invention.The content category that the present invention is not elaborated The prior art well known to this professional domain technical staff.

Claims

1. a kind of high density storage method for cloud storage system, which is characterized in that specific step is as follows:

Step 1: being initialized first, and two memory nodes have been found that JBOD equipment disk, then carries out uniqueness to disk Number；

Step 4: it by monitoring node timing acquisition memory node state, if it find that certain memory node fails, thens follow the steps Five；

Step 5: failure node is won from cluster, notice opposite end memory node takes over all disks, and infuses to monitoring node Volume, finally completes adapter tube process；

Step 7: monitoring node notice opposite end memory node cancels the adapter tube to associative disk, and judges whether to cancel successfully, such as Fruit, which is cancelled, successfully then enters step eight, continue to cancel if not cancelling successfully to the adapter tube of associative disk until cancel at Function；

Step 8: original memory node adapter tube associative disk, to monitoring Node registry, final system restores normal；

Wherein cloud storage system, including several clients, client are connect by cable with interchanger, and interchanger passes through cable It is connect with metadata node, monitoring node and several OSD memory nodes, every two OSD memory node passes through SAS line and disk Cabinet JBOD connection；The disk chassis JBOD is made of bottom plate and several disc drivers, and the disc driver is mounted on bottom On plate.

2. a kind of high density storage method for cloud storage system according to claim 1, which is characterized in that described Specific step is as follows for the calculation method of two memory node adapter tube disk ratios in step 2: according to two OSD memory nodes CPU core number and memory size calculate weight, and automatic disk ratio of distributing gives OSD memory node management, the wherein calculating of weight Method: CPU core number * 50%+ memory size * 50%.