[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN107046575B - A kind of high density storage method for cloud storage system - Google Patents

A kind of high density storage method for cloud storage system Download PDF

Info

Publication number
CN107046575B
CN107046575B CN201710250990.8A CN201710250990A CN107046575B CN 107046575 B CN107046575 B CN 107046575B CN 201710250990 A CN201710250990 A CN 201710250990A CN 107046575 B CN107046575 B CN 107046575B
Authority
CN
China
Prior art keywords
node
disk
memory node
memory
osd
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710250990.8A
Other languages
Chinese (zh)
Other versions
CN107046575A (en
Inventor
金友兵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Zhuo Shengyun Mdt Infotech Ltd
Original Assignee
Nanjing Zhuo Shengyun Mdt Infotech Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Zhuo Shengyun Mdt Infotech Ltd filed Critical Nanjing Zhuo Shengyun Mdt Infotech Ltd
Priority to CN201710250990.8A priority Critical patent/CN107046575B/en
Publication of CN107046575A publication Critical patent/CN107046575A/en
Application granted granted Critical
Publication of CN107046575B publication Critical patent/CN107046575B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0653Monitoring storage devices or systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of cloud storage systems, including several clients, client is connect by cable with interchanger, it is characterized by: interchanger is connect by cable with metadata node, monitoring node and several OSD memory nodes, every two OSD memory node is connect by SAS line with disk chassis JBOD.The high density storage method of above system are as follows: each OSD memory node respectively manages the hard disk in a part of extension cabinet, after a memory node delay machine, by all hard disks in another memory node adapter tube extension cabinet, guarantees that all data are normally read and write.After node after delay machine restores, associated hard disk is taken over back again, ensure that storage overall performance.The present invention can support OSD memory node to realize highdensity storage capacity, substantially reduce the cost of cloud storage, and promote storage performance.

Description

A kind of high density storage method for cloud storage system
Technical field
The invention belongs to computer application field, in particular to a kind of cloud storage system and its high density storage method.
Background technique
With the fast development of computer technology, cloud storage service is increasingly received by enterprise.Usual cloud storage With the presence of multiserver role in system, metadata node, monitoring node, OSD memory node etc. are generally comprised.Wherein OSD is deposited Storage node is also referred to as object storage nodes, it is the place of the main contents storage of user data.In a Mass storage collection In group, the data or file content that user saves are divided into multiple objects.Each object is according to certain algorithm or metadata The judgement of service is saved on some OSD memory node.Loss of data in order to prevent, an object can copy as multiple copies, It is saved on different OSD memory nodes.After some memory node delay machine or damage, it still is able to by other copies Read-write object data.This storage method can be very good to support the extension of big data and the reliability of data.
But this storage mode is not suitable for providing too many memory space in an OSD memory node.Assuming that an OSD There is a large amount of hard disk on memory node, many data can be saved.After the OSD memory node delay machine or damage, thereon hard Disk will all stop servicing.When after the OSD memory node restore after, even if the data of hard disk are not lost thereon, but every piece The data of hard disk all lag behind the copy data on other nodes, and system needs to carry out a large amount of data recovery and duplication, at this moment Cloud storage can be seriously affected, the ability of service is externally provided.If the number of hard disk is all lost on the node, that is more likely Network data storm is generated, at this moment service almost can not be externally provided.Although the speed that can be restored by control data, to protect The ability that card cloud storage externally services, but reduce the speed restored and will lead to the overlong time of data recovery, in this way other When equipment breaks down, generates service disruption or a possibility that loss of data is larger.
So failure while can permit single hard disk or small-scale hard disk in common distributed storage, still Failure can bring serious problem while a large amount of hard disks.OSD memory node single so generally cannot built-in too many hard disk. In view of the fast development of current CPU and memory, this OSD memory node can not provide high density storage, lead to cloud storage Cost is excessively high.
Summary of the invention
Goal of the invention: aiming at the problems existing in the prior art, the present invention, which provides one kind, can support that OSD memory node is real Existing highdensity storage capacity, substantially reduces the cost of cloud storage, and provide the cloud storage system and its high density of storage performance Storage method.
Technical solution: in order to solve the above technical problems, a kind of cloud storage system provided by the invention, including several clients End, client are connect by cable with interchanger, and interchanger passes through cable and metadata node, monitoring node and several OSD Memory node connection, every two OSD memory node are connect by SAS line with disk chassis JBOD.
Further, the disk chassis JBOD is made of bottom plate and several hard disk drives, the hard disk drive peace On bottom plate.
A kind of high density storage method with cloud storage system as described above, the specific steps are as follows:
Step 1: being initialized first, and two memory nodes have been found that JBOD equipment disk, is then carried out only to disk One property number;
Step 2: the respective adapter tube certain proportion disk of two memory nodes, then to monitoring Node registry;
Step 3: memory node timing carries out network communication with monitoring node;
Step 4: by monitoring node timing acquisition memory node state, then find that certain memory node fails;
Step 5: winning failure node from cluster, and notice opposite end memory node takes over all disks, and to monitoring node Registration, finally completes adapter tube process;
Step 6: node recovery process, after first restoring normal to memory node, to monitoring Node registry;
Step 7: monitoring node notice opposite end memory node is cancelled to the adapter tube of associative disk, and judge whether to cancel at Function enters step eight if cancelling successfully, continues to cancel to the adapter tube of associative disk if not cancelling successfully until taking Disappear success;
Step 8: original memory node adapter tube associative disk, to monitoring Node registry, final system restores normal.
Further, specific step is as follows for the calculation methods of two memory node adapter tube disk ratios in the step 2: Weight is calculated according to the CPU core number of two OSD memory nodes and memory size, automatic disk ratio of distributing gives OSD storage section Point manages, wherein the calculation method of weight: CPU core number * 50%+ memory size * 50%.
Compared with the prior art, the advantages of the present invention are as follows:
Implementation method of the present invention is simple, and traditional dual control storage server also can be realized the adapter tube of hard disk, but due to This kind of server is generally individually present, and is not to be added in cloud storage cluster, the realization of High Availabitity is extremely complex.For example dual control is deposited Storage server generally requires three channels judge whether opposite end is normal, has main heartbeat, attached heartbeat and isolation card here, and guarantee three A channel will not simultaneously disable.Meanwhile two nodes write one piece of hard disk simultaneously in order to prevent, there is also the suicides of a set of complexity Mechanism.Since equipment is in cloud storage cluster in this patent, it can be determined that mode is simple and effective, quite with introduce cluster monitoring Node is arbitrated as third, but this arbitration is simpler, efficient.In addition, in the monitoring agent of OSD memory node itself When program can not be communicated with cluster monitoring node, any read-write requests will be refused, guarantee that there is no two OSD memory nodes are same When the case where writing a hard disk;
Therefore, the present invention provides the OSD memory node high availability schemes of a cloud storage system, improve entire cloud clothes The reliabilty and availability of business so that entire cloud storage system service is more reliable and more stable, while can be supported preferably highly dense Spend disk chassis, reduced carrying cost.
Detailed description of the invention
Fig. 1 is the structural diagram of the present invention;
Fig. 2 is overall flow figure of the invention.
Specific embodiment
With reference to the accompanying drawings and detailed description, the present invention is furture elucidated.
The present invention provides OSD memory node high density storage method in a kind of storage cluster.System structure of the invention is such as Attached drawing 1.In hardware connection, there is HBA card on each OSD memory node, while every two OSD memory node connects jointly One extension cabinet JBOD identifies that all disks in extension cabinet, two such memory node can access extension by HBA card All hard disks on cabinet.Present invention is primarily intended to each OSD memory nodes respectively to manage the hard disk in a part of extension cabinet, when After one memory node delay machine, by all hard disks in another memory node adapter tube extension cabinet, guarantee that all data are normal Read-write.After node after delay machine restores, associated hard disk is taken over back again, ensure that storage overall performance.In order to reach above-mentioned Purpose process step of the invention is as described in Figure 2:
1. initialization procedure: a) all disks in the pairs of each self-discovery extension cabinet in OSD storage service end of every two, Various OSD memory nodes see that each disk is an equipment;B) each disk Unique Device in cluster is registered as to number;c) The ownership situation of each disk is controlled according to certain software algorithm.In general, if two OSD memory node configurations are identical, it can Belong to situation with mean allocation disk;If configuration is different, disk ownership situation can be distributed according to performance ratio;d)OSD The disk oneself belonged to is carried out carry by memory node, and to monitoring Node registry, can be provided read-write in this way and be supported.e)OSD Timing is communicated with monitoring node in memory node operational process, reports the state of each disk.
2. delay machine adapter tube process: a) when one of node delay machine, the monitoring node of cluster finds the node failure, The node is extractd from cluster, is no longer had read-write data in this way and is gone to access the failure node, while peer node being notified to execute Adapter tube.B) after OSD memory node discovery peer node is detached from cluster, all disks in the extension cabinet are taken over.Because equipment is compiled It is recorded before number being all, so directly carry.C) after completing adapter tube, to monitoring Node registry, the disk taken in this way It can normally read and write.This process is fully able to the case where avoiding two nodes while operating a hard disk, has reached realization OSD Memory node High Availabitity target.
3. rejoining group system after the OSD memory node of failure restores.Node discovery malfunctioning node is monitored to restore Afterwards, notice peer node unloads original adapter tube disk.After the completion of unloading, the node restored again takes over back original hard disk, and to Monitor Node registry.Because data are still consistent at this time, do not needed after adapter tube in the recovery operation for carrying out data, it will not The service ability externally provided is provided.
It include: metadata node, monitoring node, OSD memory node in one typical cloud storage cluster.But scale is most Big is OSD memory node.Original OSD memory node is replaced with OSD memory node group, the hardware connection side of cluster by this system Formula is as shown above, and each hardware is all general here.Wherein:
1. extension cabinet JBOD (Just a Bunch Of Disks, hard disk cluster): installed on a bottom plate with more The storage equipment of a hard disk drive.A usual JBOD can have very multi-diskbit, connect a large amount of hard disks, such as 60 or 90 disks The JBOD of position starts to popularize on the market.It is simpler in production since JBOD does not have controller, without various intelligent functions It is single, it is possible to which that there is extreme high reliability.
2. each server node: CPU number and memory size appropriate are needed to configure, since the present invention supports high density to deposit Storage, so can configure biggish CPU core number and memory scale for OSD memory node therein, two OSD memory nodes are needed HBA card is configured, is connect by SAS port with JBOD.
3. the network switch: entire cluster is attached using general interchanger, externally provides cloud storage service.
In this structure, in an OSD memory node group, memory node A and B can be accessed on connection disk chassis JBOD All hard disks can be immediately seen the dev equipment of associated hard disk that is, in linux system.If OSD memory node is wanted Manage certain block hard disk, mount equipment operation under directly progress Linux, so that it may read and write the hard disk;If it is desired to abandoning to certain The read-write operation of a hard disk, the umount equipment under directly progress Linux.
The present invention is a kind of cloud storage system and its high density storage method.The system is divided on two OSD memory nodes It does not dispose, the main flow and method for realizing High Availabitity is as follows:
1. calculating weight according to the CPU core number of two OSD memory nodes and memory size, automatic disk ratio of distributing is given OSD memory node management.The calculation method of weight: CPU core number * 50%+ memory size * 50%, but this calculation method can Ratio can be adjusted as needed;
After the starting of 2.OSD memory node, to the hard disk serial number of cluster monitoring this node state of Node registry and management.Registration After success, OSD memory node uses these hard disks, the read-write requests of client is received, to save and read the data of user;
The every status information and last report for reporting oneself for several seconds (can configure) to cluster of 3.OSD memory node Time;Cluster monitoring node certain time discovery OSD memory node does not have report information, will judge that the OSD memory node fails, The node is extractd from cluster, no matter failure node state in this way is as follows, read-write requests will not occur to arrive the node;
4. finding in addition, OSD memory node can also obtain the integrality of cluster per several seconds by monitoring node to end segment Whether point fails.If it is determined that peer node fails, start hard disk takeover process.The hard disk that the program manages opposite end all by This node administration, and be registered in group system, such group system still is able to normally handle all hard disks, but read-write is asked Failure node will not gone to by asking;
5. monitoring nodes program if it is determined that this end node is out of touch with cluster, will not receive any read-write requests, This can guarantee the case where being not in two nodes while writing one piece of hard disk;
Takeover process part is not to be whole with all hard disks of peer node to take over, but is single with each hard disk Position is to determine whether take over single hard disk.The case where for hard disk corruptions, although also will start takeover process, due to the section Point also can not identify or read the hard disk, understand taking over failing in this way, then still starting common hard disk failure alarm.
The network of node or service flash problem generally manually perform adapter tube after failure node restores in order to prevent The inverse process of operation restores entire cluster service.It in this way can be hard to avoid there is repeatedly concussive adapter tube in certain situations Disk.This recovery operation mainly comprises the processes of
After 1. delay machine node restores normal, to cluster monitoring Node registry.
2. monitoring node notice opposite end memory node cancellation to the adapter tube of associative disk.Usually peer node stops to phase The read-write of disk is closed, executes umount unloading operation after complete.
3. after unloading successfully, the memory node of recovery re-executes the adapter tube operation of these disks, generally directly execute Mount carry, and to monitoring Node registry.
It is sub that the above description is only an embodiment of the present invention, is not intended to restrict the invention.It is all in principle of the invention Within, made equivalent replacement should all be included in the protection scope of the present invention.The content category that the present invention is not elaborated The prior art well known to this professional domain technical staff.

Claims (2)

1. a kind of high density storage method for cloud storage system, which is characterized in that specific step is as follows:
Step 1: being initialized first, and two memory nodes have been found that JBOD equipment disk, then carries out uniqueness to disk Number;
Step 2: the respective adapter tube certain proportion disk of two memory nodes, then to monitoring Node registry;
Step 3: memory node timing carries out network communication with monitoring node;
Step 4: it by monitoring node timing acquisition memory node state, if it find that certain memory node fails, thens follow the steps Five;
Step 5: failure node is won from cluster, notice opposite end memory node takes over all disks, and infuses to monitoring node Volume, finally completes adapter tube process;
Step 6: node recovery process, after first restoring normal to memory node, to monitoring Node registry;
Step 7: monitoring node notice opposite end memory node cancels the adapter tube to associative disk, and judges whether to cancel successfully, such as Fruit, which is cancelled, successfully then enters step eight, continue to cancel if not cancelling successfully to the adapter tube of associative disk until cancel at Function;
Step 8: original memory node adapter tube associative disk, to monitoring Node registry, final system restores normal;
Wherein cloud storage system, including several clients, client are connect by cable with interchanger, and interchanger passes through cable It is connect with metadata node, monitoring node and several OSD memory nodes, every two OSD memory node passes through SAS line and disk Cabinet JBOD connection;The disk chassis JBOD is made of bottom plate and several disc drivers, and the disc driver is mounted on bottom On plate.
2. a kind of high density storage method for cloud storage system according to claim 1, which is characterized in that described Specific step is as follows for the calculation method of two memory node adapter tube disk ratios in step 2: according to two OSD memory nodes CPU core number and memory size calculate weight, and automatic disk ratio of distributing gives OSD memory node management, the wherein calculating of weight Method: CPU core number * 50%+ memory size * 50%.
CN201710250990.8A 2017-04-18 2017-04-18 A kind of high density storage method for cloud storage system Active CN107046575B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710250990.8A CN107046575B (en) 2017-04-18 2017-04-18 A kind of high density storage method for cloud storage system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710250990.8A CN107046575B (en) 2017-04-18 2017-04-18 A kind of high density storage method for cloud storage system

Publications (2)

Publication Number Publication Date
CN107046575A CN107046575A (en) 2017-08-15
CN107046575B true CN107046575B (en) 2019-07-12

Family

ID=59544315

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710250990.8A Active CN107046575B (en) 2017-04-18 2017-04-18 A kind of high density storage method for cloud storage system

Country Status (1)

Country Link
CN (1) CN107046575B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111381766B (en) * 2018-12-28 2022-08-02 杭州海康威视系统技术有限公司 Method for dynamically loading disk and cloud storage system
CN111444157B (en) * 2019-01-16 2023-06-20 阿里巴巴集团控股有限公司 Distributed file system and data access method
CN109976946A (en) * 2019-02-27 2019-07-05 深圳点猫科技有限公司 It is a kind of for educating the scheduling system history data restoration methods and device of cloud platform
CN112579384B (en) * 2019-09-27 2023-07-04 杭州海康威视数字技术股份有限公司 Method, device and system for monitoring nodes of SAS domain and nodes
CN111901415B (en) * 2020-07-27 2023-07-14 北京星辰天合科技股份有限公司 Data processing method and system, computer readable storage medium and processor
CN115695425A (en) * 2022-10-28 2023-02-03 济南浪潮数据技术有限公司 BeeGFS file system cluster deployment method, device, equipment and storage medium
CN115988008A (en) * 2022-12-29 2023-04-18 江苏倍鼎网络科技有限公司 High-density storage method and system for cloud storage system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103049225A (en) * 2013-01-05 2013-04-17 浪潮电子信息产业股份有限公司 Double-controller active-active storage system
WO2016070375A1 (en) * 2014-11-06 2016-05-12 华为技术有限公司 Distributed storage replication system and method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103049225A (en) * 2013-01-05 2013-04-17 浪潮电子信息产业股份有限公司 Double-controller active-active storage system
WO2016070375A1 (en) * 2014-11-06 2016-05-12 华为技术有限公司 Distributed storage replication system and method
CN106062717A (en) * 2014-11-06 2016-10-26 华为技术有限公司 Distributed storage replication system and method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
SurFS Product Description;金友兵;《https://github.com/surcloudorg/SurFS/commits/master/SurFS%20Product%20Description.pdf》;20160528;第1-3页

Also Published As

Publication number Publication date
CN107046575A (en) 2017-08-15

Similar Documents

Publication Publication Date Title
CN107046575B (en) A kind of high density storage method for cloud storage system
US6678788B1 (en) Data type and topological data categorization and ordering for a mass storage system
US7627779B2 (en) Multiple hierarichal/peer domain file server with domain based, cross domain cooperative fault handling mechanisms
US6691209B1 (en) Topological data categorization and formatting for a mass storage system
US6594775B1 (en) Fault handling monitor transparently using multiple technologies for fault handling in a multiple hierarchal/peer domain file server with domain centered, cross domain cooperative fault handling mechanisms
US8527561B1 (en) System and method for implementing a networked file system utilizing a media library
US6865157B1 (en) Fault tolerant shared system resource with communications passthrough providing high availability communications
CN103763383B (en) Integrated cloud storage system and its storage method
US7447933B2 (en) Fail-over storage system
EP1370945B1 (en) Failover processing in a storage system
US6578160B1 (en) Fault tolerant, low latency system resource with high level logging of system resource transactions and cross-server mirrored high level logging of system resource transactions
US7219260B1 (en) Fault tolerant system shared system resource with state machine logging
JP5523468B2 (en) Active-active failover for direct attached storage systems
US20180260123A1 (en) SEPARATION OF DATA STORAGE MANAGEMENT ON STORAGE devices FROM LOCAL CONNECTIONS OF STORAGE DEVICES
US20190220379A1 (en) Troubleshooting Method, Apparatus, and Device
US20100125857A1 (en) Cluster control protocol
US20120144006A1 (en) Computer system, control method of computer system, and storage medium on which program is stored
US20060129759A1 (en) Method and system for error strategy in a storage system
WO2010123510A1 (en) Active-active support of virtual storage management in a storage area network ("san")
CN108205573B (en) Data distributed storage method and system
US11544162B2 (en) Computer cluster using expiring recovery rules
US9952951B2 (en) Preserving coredump data during switchover operation
US12038817B2 (en) Methods for cache rewarming in a failover domain and devices thereof
US11809268B1 (en) Discovering host-switch link and ISL issues from the storage array
JP2018085634A (en) Information processing device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant