CN107046575B - A kind of high density storage method for cloud storage system - Google Patents
A kind of high density storage method for cloud storage system Download PDFInfo
- Publication number
- CN107046575B CN107046575B CN201710250990.8A CN201710250990A CN107046575B CN 107046575 B CN107046575 B CN 107046575B CN 201710250990 A CN201710250990 A CN 201710250990A CN 107046575 B CN107046575 B CN 107046575B
- Authority
- CN
- China
- Prior art keywords
- node
- disk
- memory node
- memory
- osd
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/1097—Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0653—Monitoring storage devices or systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/067—Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of cloud storage systems, including several clients, client is connect by cable with interchanger, it is characterized by: interchanger is connect by cable with metadata node, monitoring node and several OSD memory nodes, every two OSD memory node is connect by SAS line with disk chassis JBOD.The high density storage method of above system are as follows: each OSD memory node respectively manages the hard disk in a part of extension cabinet, after a memory node delay machine, by all hard disks in another memory node adapter tube extension cabinet, guarantees that all data are normally read and write.After node after delay machine restores, associated hard disk is taken over back again, ensure that storage overall performance.The present invention can support OSD memory node to realize highdensity storage capacity, substantially reduce the cost of cloud storage, and promote storage performance.
Description
Technical field
The invention belongs to computer application field, in particular to a kind of cloud storage system and its high density storage method.
Background technique
With the fast development of computer technology, cloud storage service is increasingly received by enterprise.Usual cloud storage
With the presence of multiserver role in system, metadata node, monitoring node, OSD memory node etc. are generally comprised.Wherein OSD is deposited
Storage node is also referred to as object storage nodes, it is the place of the main contents storage of user data.In a Mass storage collection
In group, the data or file content that user saves are divided into multiple objects.Each object is according to certain algorithm or metadata
The judgement of service is saved on some OSD memory node.Loss of data in order to prevent, an object can copy as multiple copies,
It is saved on different OSD memory nodes.After some memory node delay machine or damage, it still is able to by other copies
Read-write object data.This storage method can be very good to support the extension of big data and the reliability of data.
But this storage mode is not suitable for providing too many memory space in an OSD memory node.Assuming that an OSD
There is a large amount of hard disk on memory node, many data can be saved.After the OSD memory node delay machine or damage, thereon hard
Disk will all stop servicing.When after the OSD memory node restore after, even if the data of hard disk are not lost thereon, but every piece
The data of hard disk all lag behind the copy data on other nodes, and system needs to carry out a large amount of data recovery and duplication, at this moment
Cloud storage can be seriously affected, the ability of service is externally provided.If the number of hard disk is all lost on the node, that is more likely
Network data storm is generated, at this moment service almost can not be externally provided.Although the speed that can be restored by control data, to protect
The ability that card cloud storage externally services, but reduce the speed restored and will lead to the overlong time of data recovery, in this way other
When equipment breaks down, generates service disruption or a possibility that loss of data is larger.
So failure while can permit single hard disk or small-scale hard disk in common distributed storage, still
Failure can bring serious problem while a large amount of hard disks.OSD memory node single so generally cannot built-in too many hard disk.
In view of the fast development of current CPU and memory, this OSD memory node can not provide high density storage, lead to cloud storage
Cost is excessively high.
Summary of the invention
Goal of the invention: aiming at the problems existing in the prior art, the present invention, which provides one kind, can support that OSD memory node is real
Existing highdensity storage capacity, substantially reduces the cost of cloud storage, and provide the cloud storage system and its high density of storage performance
Storage method.
Technical solution: in order to solve the above technical problems, a kind of cloud storage system provided by the invention, including several clients
End, client are connect by cable with interchanger, and interchanger passes through cable and metadata node, monitoring node and several OSD
Memory node connection, every two OSD memory node are connect by SAS line with disk chassis JBOD.
Further, the disk chassis JBOD is made of bottom plate and several hard disk drives, the hard disk drive peace
On bottom plate.
A kind of high density storage method with cloud storage system as described above, the specific steps are as follows:
Step 1: being initialized first, and two memory nodes have been found that JBOD equipment disk, is then carried out only to disk
One property number;
Step 2: the respective adapter tube certain proportion disk of two memory nodes, then to monitoring Node registry;
Step 3: memory node timing carries out network communication with monitoring node;
Step 4: by monitoring node timing acquisition memory node state, then find that certain memory node fails;
Step 5: winning failure node from cluster, and notice opposite end memory node takes over all disks, and to monitoring node
Registration, finally completes adapter tube process;
Step 6: node recovery process, after first restoring normal to memory node, to monitoring Node registry;
Step 7: monitoring node notice opposite end memory node is cancelled to the adapter tube of associative disk, and judge whether to cancel at
Function enters step eight if cancelling successfully, continues to cancel to the adapter tube of associative disk if not cancelling successfully until taking
Disappear success;
Step 8: original memory node adapter tube associative disk, to monitoring Node registry, final system restores normal.
Further, specific step is as follows for the calculation methods of two memory node adapter tube disk ratios in the step 2:
Weight is calculated according to the CPU core number of two OSD memory nodes and memory size, automatic disk ratio of distributing gives OSD storage section
Point manages, wherein the calculation method of weight: CPU core number * 50%+ memory size * 50%.
Compared with the prior art, the advantages of the present invention are as follows:
Implementation method of the present invention is simple, and traditional dual control storage server also can be realized the adapter tube of hard disk, but due to
This kind of server is generally individually present, and is not to be added in cloud storage cluster, the realization of High Availabitity is extremely complex.For example dual control is deposited
Storage server generally requires three channels judge whether opposite end is normal, has main heartbeat, attached heartbeat and isolation card here, and guarantee three
A channel will not simultaneously disable.Meanwhile two nodes write one piece of hard disk simultaneously in order to prevent, there is also the suicides of a set of complexity
Mechanism.Since equipment is in cloud storage cluster in this patent, it can be determined that mode is simple and effective, quite with introduce cluster monitoring
Node is arbitrated as third, but this arbitration is simpler, efficient.In addition, in the monitoring agent of OSD memory node itself
When program can not be communicated with cluster monitoring node, any read-write requests will be refused, guarantee that there is no two OSD memory nodes are same
When the case where writing a hard disk;
Therefore, the present invention provides the OSD memory node high availability schemes of a cloud storage system, improve entire cloud clothes
The reliabilty and availability of business so that entire cloud storage system service is more reliable and more stable, while can be supported preferably highly dense
Spend disk chassis, reduced carrying cost.
Detailed description of the invention
Fig. 1 is the structural diagram of the present invention;
Fig. 2 is overall flow figure of the invention.
Specific embodiment
With reference to the accompanying drawings and detailed description, the present invention is furture elucidated.
The present invention provides OSD memory node high density storage method in a kind of storage cluster.System structure of the invention is such as
Attached drawing 1.In hardware connection, there is HBA card on each OSD memory node, while every two OSD memory node connects jointly
One extension cabinet JBOD identifies that all disks in extension cabinet, two such memory node can access extension by HBA card
All hard disks on cabinet.Present invention is primarily intended to each OSD memory nodes respectively to manage the hard disk in a part of extension cabinet, when
After one memory node delay machine, by all hard disks in another memory node adapter tube extension cabinet, guarantee that all data are normal
Read-write.After node after delay machine restores, associated hard disk is taken over back again, ensure that storage overall performance.In order to reach above-mentioned
Purpose process step of the invention is as described in Figure 2:
1. initialization procedure: a) all disks in the pairs of each self-discovery extension cabinet in OSD storage service end of every two,
Various OSD memory nodes see that each disk is an equipment;B) each disk Unique Device in cluster is registered as to number;c)
The ownership situation of each disk is controlled according to certain software algorithm.In general, if two OSD memory node configurations are identical, it can
Belong to situation with mean allocation disk;If configuration is different, disk ownership situation can be distributed according to performance ratio;d)OSD
The disk oneself belonged to is carried out carry by memory node, and to monitoring Node registry, can be provided read-write in this way and be supported.e)OSD
Timing is communicated with monitoring node in memory node operational process, reports the state of each disk.
2. delay machine adapter tube process: a) when one of node delay machine, the monitoring node of cluster finds the node failure,
The node is extractd from cluster, is no longer had read-write data in this way and is gone to access the failure node, while peer node being notified to execute
Adapter tube.B) after OSD memory node discovery peer node is detached from cluster, all disks in the extension cabinet are taken over.Because equipment is compiled
It is recorded before number being all, so directly carry.C) after completing adapter tube, to monitoring Node registry, the disk taken in this way
It can normally read and write.This process is fully able to the case where avoiding two nodes while operating a hard disk, has reached realization OSD
Memory node High Availabitity target.
3. rejoining group system after the OSD memory node of failure restores.Node discovery malfunctioning node is monitored to restore
Afterwards, notice peer node unloads original adapter tube disk.After the completion of unloading, the node restored again takes over back original hard disk, and to
Monitor Node registry.Because data are still consistent at this time, do not needed after adapter tube in the recovery operation for carrying out data, it will not
The service ability externally provided is provided.
It include: metadata node, monitoring node, OSD memory node in one typical cloud storage cluster.But scale is most
Big is OSD memory node.Original OSD memory node is replaced with OSD memory node group, the hardware connection side of cluster by this system
Formula is as shown above, and each hardware is all general here.Wherein:
1. extension cabinet JBOD (Just a Bunch Of Disks, hard disk cluster): installed on a bottom plate with more
The storage equipment of a hard disk drive.A usual JBOD can have very multi-diskbit, connect a large amount of hard disks, such as 60 or 90 disks
The JBOD of position starts to popularize on the market.It is simpler in production since JBOD does not have controller, without various intelligent functions
It is single, it is possible to which that there is extreme high reliability.
2. each server node: CPU number and memory size appropriate are needed to configure, since the present invention supports high density to deposit
Storage, so can configure biggish CPU core number and memory scale for OSD memory node therein, two OSD memory nodes are needed
HBA card is configured, is connect by SAS port with JBOD.
3. the network switch: entire cluster is attached using general interchanger, externally provides cloud storage service.
In this structure, in an OSD memory node group, memory node A and B can be accessed on connection disk chassis JBOD
All hard disks can be immediately seen the dev equipment of associated hard disk that is, in linux system.If OSD memory node is wanted
Manage certain block hard disk, mount equipment operation under directly progress Linux, so that it may read and write the hard disk;If it is desired to abandoning to certain
The read-write operation of a hard disk, the umount equipment under directly progress Linux.
The present invention is a kind of cloud storage system and its high density storage method.The system is divided on two OSD memory nodes
It does not dispose, the main flow and method for realizing High Availabitity is as follows:
1. calculating weight according to the CPU core number of two OSD memory nodes and memory size, automatic disk ratio of distributing is given
OSD memory node management.The calculation method of weight: CPU core number * 50%+ memory size * 50%, but this calculation method can
Ratio can be adjusted as needed;
After the starting of 2.OSD memory node, to the hard disk serial number of cluster monitoring this node state of Node registry and management.Registration
After success, OSD memory node uses these hard disks, the read-write requests of client is received, to save and read the data of user;
The every status information and last report for reporting oneself for several seconds (can configure) to cluster of 3.OSD memory node
Time;Cluster monitoring node certain time discovery OSD memory node does not have report information, will judge that the OSD memory node fails,
The node is extractd from cluster, no matter failure node state in this way is as follows, read-write requests will not occur to arrive the node;
4. finding in addition, OSD memory node can also obtain the integrality of cluster per several seconds by monitoring node to end segment
Whether point fails.If it is determined that peer node fails, start hard disk takeover process.The hard disk that the program manages opposite end all by
This node administration, and be registered in group system, such group system still is able to normally handle all hard disks, but read-write is asked
Failure node will not gone to by asking;
5. monitoring nodes program if it is determined that this end node is out of touch with cluster, will not receive any read-write requests,
This can guarantee the case where being not in two nodes while writing one piece of hard disk;
Takeover process part is not to be whole with all hard disks of peer node to take over, but is single with each hard disk
Position is to determine whether take over single hard disk.The case where for hard disk corruptions, although also will start takeover process, due to the section
Point also can not identify or read the hard disk, understand taking over failing in this way, then still starting common hard disk failure alarm.
The network of node or service flash problem generally manually perform adapter tube after failure node restores in order to prevent
The inverse process of operation restores entire cluster service.It in this way can be hard to avoid there is repeatedly concussive adapter tube in certain situations
Disk.This recovery operation mainly comprises the processes of
After 1. delay machine node restores normal, to cluster monitoring Node registry.
2. monitoring node notice opposite end memory node cancellation to the adapter tube of associative disk.Usually peer node stops to phase
The read-write of disk is closed, executes umount unloading operation after complete.
3. after unloading successfully, the memory node of recovery re-executes the adapter tube operation of these disks, generally directly execute
Mount carry, and to monitoring Node registry.
It is sub that the above description is only an embodiment of the present invention, is not intended to restrict the invention.It is all in principle of the invention
Within, made equivalent replacement should all be included in the protection scope of the present invention.The content category that the present invention is not elaborated
The prior art well known to this professional domain technical staff.
Claims (2)
1. a kind of high density storage method for cloud storage system, which is characterized in that specific step is as follows:
Step 1: being initialized first, and two memory nodes have been found that JBOD equipment disk, then carries out uniqueness to disk
Number;
Step 2: the respective adapter tube certain proportion disk of two memory nodes, then to monitoring Node registry;
Step 3: memory node timing carries out network communication with monitoring node;
Step 4: it by monitoring node timing acquisition memory node state, if it find that certain memory node fails, thens follow the steps
Five;
Step 5: failure node is won from cluster, notice opposite end memory node takes over all disks, and infuses to monitoring node
Volume, finally completes adapter tube process;
Step 6: node recovery process, after first restoring normal to memory node, to monitoring Node registry;
Step 7: monitoring node notice opposite end memory node cancels the adapter tube to associative disk, and judges whether to cancel successfully, such as
Fruit, which is cancelled, successfully then enters step eight, continue to cancel if not cancelling successfully to the adapter tube of associative disk until cancel at
Function;
Step 8: original memory node adapter tube associative disk, to monitoring Node registry, final system restores normal;
Wherein cloud storage system, including several clients, client are connect by cable with interchanger, and interchanger passes through cable
It is connect with metadata node, monitoring node and several OSD memory nodes, every two OSD memory node passes through SAS line and disk
Cabinet JBOD connection;The disk chassis JBOD is made of bottom plate and several disc drivers, and the disc driver is mounted on bottom
On plate.
2. a kind of high density storage method for cloud storage system according to claim 1, which is characterized in that described
Specific step is as follows for the calculation method of two memory node adapter tube disk ratios in step 2: according to two OSD memory nodes
CPU core number and memory size calculate weight, and automatic disk ratio of distributing gives OSD memory node management, the wherein calculating of weight
Method: CPU core number * 50%+ memory size * 50%.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710250990.8A CN107046575B (en) | 2017-04-18 | 2017-04-18 | A kind of high density storage method for cloud storage system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710250990.8A CN107046575B (en) | 2017-04-18 | 2017-04-18 | A kind of high density storage method for cloud storage system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107046575A CN107046575A (en) | 2017-08-15 |
CN107046575B true CN107046575B (en) | 2019-07-12 |
Family
ID=59544315
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710250990.8A Active CN107046575B (en) | 2017-04-18 | 2017-04-18 | A kind of high density storage method for cloud storage system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107046575B (en) |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111381766B (en) * | 2018-12-28 | 2022-08-02 | 杭州海康威视系统技术有限公司 | Method for dynamically loading disk and cloud storage system |
CN111444157B (en) * | 2019-01-16 | 2023-06-20 | 阿里巴巴集团控股有限公司 | Distributed file system and data access method |
CN109976946A (en) * | 2019-02-27 | 2019-07-05 | 深圳点猫科技有限公司 | It is a kind of for educating the scheduling system history data restoration methods and device of cloud platform |
CN112579384B (en) * | 2019-09-27 | 2023-07-04 | 杭州海康威视数字技术股份有限公司 | Method, device and system for monitoring nodes of SAS domain and nodes |
CN111901415B (en) * | 2020-07-27 | 2023-07-14 | 北京星辰天合科技股份有限公司 | Data processing method and system, computer readable storage medium and processor |
CN115695425A (en) * | 2022-10-28 | 2023-02-03 | 济南浪潮数据技术有限公司 | BeeGFS file system cluster deployment method, device, equipment and storage medium |
CN115988008A (en) * | 2022-12-29 | 2023-04-18 | 江苏倍鼎网络科技有限公司 | High-density storage method and system for cloud storage system |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103049225A (en) * | 2013-01-05 | 2013-04-17 | 浪潮电子信息产业股份有限公司 | Double-controller active-active storage system |
WO2016070375A1 (en) * | 2014-11-06 | 2016-05-12 | 华为技术有限公司 | Distributed storage replication system and method |
-
2017
- 2017-04-18 CN CN201710250990.8A patent/CN107046575B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103049225A (en) * | 2013-01-05 | 2013-04-17 | 浪潮电子信息产业股份有限公司 | Double-controller active-active storage system |
WO2016070375A1 (en) * | 2014-11-06 | 2016-05-12 | 华为技术有限公司 | Distributed storage replication system and method |
CN106062717A (en) * | 2014-11-06 | 2016-10-26 | 华为技术有限公司 | Distributed storage replication system and method |
Non-Patent Citations (1)
Title |
---|
SurFS Product Description;金友兵;《https://github.com/surcloudorg/SurFS/commits/master/SurFS%20Product%20Description.pdf》;20160528;第1-3页 |
Also Published As
Publication number | Publication date |
---|---|
CN107046575A (en) | 2017-08-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107046575B (en) | A kind of high density storage method for cloud storage system | |
US6678788B1 (en) | Data type and topological data categorization and ordering for a mass storage system | |
US7627779B2 (en) | Multiple hierarichal/peer domain file server with domain based, cross domain cooperative fault handling mechanisms | |
US6691209B1 (en) | Topological data categorization and formatting for a mass storage system | |
US6594775B1 (en) | Fault handling monitor transparently using multiple technologies for fault handling in a multiple hierarchal/peer domain file server with domain centered, cross domain cooperative fault handling mechanisms | |
US8527561B1 (en) | System and method for implementing a networked file system utilizing a media library | |
US6865157B1 (en) | Fault tolerant shared system resource with communications passthrough providing high availability communications | |
CN103763383B (en) | Integrated cloud storage system and its storage method | |
US7447933B2 (en) | Fail-over storage system | |
EP1370945B1 (en) | Failover processing in a storage system | |
US6578160B1 (en) | Fault tolerant, low latency system resource with high level logging of system resource transactions and cross-server mirrored high level logging of system resource transactions | |
US7219260B1 (en) | Fault tolerant system shared system resource with state machine logging | |
JP5523468B2 (en) | Active-active failover for direct attached storage systems | |
US20180260123A1 (en) | SEPARATION OF DATA STORAGE MANAGEMENT ON STORAGE devices FROM LOCAL CONNECTIONS OF STORAGE DEVICES | |
US20190220379A1 (en) | Troubleshooting Method, Apparatus, and Device | |
US20100125857A1 (en) | Cluster control protocol | |
US20120144006A1 (en) | Computer system, control method of computer system, and storage medium on which program is stored | |
US20060129759A1 (en) | Method and system for error strategy in a storage system | |
WO2010123510A1 (en) | Active-active support of virtual storage management in a storage area network ("san") | |
CN108205573B (en) | Data distributed storage method and system | |
US11544162B2 (en) | Computer cluster using expiring recovery rules | |
US9952951B2 (en) | Preserving coredump data during switchover operation | |
US12038817B2 (en) | Methods for cache rewarming in a failover domain and devices thereof | |
US11809268B1 (en) | Discovering host-switch link and ISL issues from the storage array | |
JP2018085634A (en) | Information processing device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |