[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN101901275A - A distributed storage system and method thereof - Google Patents

A distributed storage system and method thereof Download PDF

Info

Publication number
CN101901275A
CN101901275A CN 201010259405 CN201010259405A CN101901275A CN 101901275 A CN101901275 A CN 101901275A CN 201010259405 CN201010259405 CN 201010259405 CN 201010259405 A CN201010259405 A CN 201010259405A CN 101901275 A CN101901275 A CN 101901275A
Authority
CN
China
Prior art keywords
datanode
block
file
data
namenode
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN 201010259405
Other languages
Chinese (zh)
Inventor
王芙蓉
朱好好
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Original Assignee
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology filed Critical Huazhong University of Science and Technology
Priority to CN 201010259405 priority Critical patent/CN101901275A/en
Publication of CN101901275A publication Critical patent/CN101901275A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明公布了一种分布式存储系统,并支持海量数据存储。所述系统包括ApplicationClient,它通过封装实现好的ClientAPI访问NameNode和DataNode,ClientAPI是对协议实现封装,通过该ClientAPI可读、写、查询文件;NameNode,它是系统的中心服务器,存储着整个系统的元数据,负责整个系统的调度;DataNode,它是数据实际存储的地方,众多DataNode集群起来构建存储系统。本发明还公开了一种分布式存储的方法,使得该存储系统从数据完整性、数据一致性、易拓展性以及健壮性得以保证。

The invention discloses a distributed storage system and supports massive data storage. Described system comprises ApplicationClient, and it visits NameNode and DataNode through encapsulation and realizes ClientAPI, and ClientAPI is to realize encapsulation to protocol, can read, write, query file through this ClientAPI; Metadata is responsible for the scheduling of the entire system; DataNode is where data is actually stored, and many DataNodes are clustered to build a storage system. The invention also discloses a distributed storage method, so that the storage system can be guaranteed from data integrity, data consistency, easy expansion and robustness.

Description

A kind of distributed memory system and method thereof
Technical field
The present invention relates to a kind of distributed memory system and method thereof.
Background technology
Along with traditional database technology reaches its maturity, the expansion of the develop rapidly of computer networking technology and range of application, database application generally builds on the computer network.At this moment the centralized data base system shows its deficiency: data are pressed actual needs distributed store on network, adopt centralized processing again, certainly will cause communication overhead big; Application program is concentrated on a computing machine and is moved, in case this computing machine breaks down, then total system is affected, and reliability is not high.Under these circumstances, " centralized calculation " notion is to " Distribution calculation " concept development.
Distributed data base system (DDBS) comprises distributed data base management system (DDBMS) (DDBMS) and distributed data base (DDB).In distributed data base system, an application program can be carried out transparent operation to database, the storage in different local data banks respectively of the data in the database, by different DBMS manage, operation on different machines, by different operating system supports, linked together by different communication networks.
A distributed data base logically is a unified integral body, then is to be stored in respectively on the different physical nodes physically.An application program can be visited the database that is distributed in diverse geographic location by the connection of network.Its distributivity shows that the data in the database are not to be stored in same place.More properly say, be not stored on the memory device of same computing machine.Here it is and the difference of centralized data base.From user's angle, a distributed data base system is logically the same with the centralized data base system, and the user can carry out global application in any one place.All right those data are to be stored on same the computing machine, have individual data base management system (DBMS) management the same, and what the user do not have and feel different.
Distributed data base system is to grow up on the basis of centralized data base system, is the product of computer technology and network technology combination.Distributed data base system is suitable for the department that unit disperses, and allows each department data storage that it is commonly used in this locality, implements to deposit on the spot local the use, thereby improves response speed, reduces communication cost.Distributed data base system is compared with the centralized data base system has extensibility, by increasing suitable data redundancy, improves the reliability of system.In centralized data base, reduce redundance is one of aims of systems as far as possible. its reason is, redundant data waste storage space, and cause inconsistency between each copy easily. and in order to guarantee the consistance of data, system will pay certain maintenance cost. the target that reduces redundance reaches with data sharing.And hope increases redundant data in distributed data base, store a plurality of copies of same data in different places, its reason is: 1.. improve the reliability of system, when availability breaks down when a certain place, system can operate the identical copies on another place, can not cause the paralysis of total system because of place's fault.2.. improving the system performance system can select to operate from the nearest data trnascription of user according to distance, reduces communication cost, improves the performance of total system.
The target of distributed data base system is just developed purpose, the motivation of distributed data base system, mainly comprises the target of technology and tissue two aspects.It at first is the reliabilty and availability of raising system.The reliabilty and availability of improvement system is the main target of distributed data base, and DATA DISTRIBUTION in a plurality of places, and is increased suitable redundance better reliability can be provided.Some reliability requirement higher system, this point is even more important. because being out of order, a ground can not cause the total system collapse.Because the user in fault place can enter system by other place, and the user in other place can select access path automatically by system, avoids the fault place, utilizes other data trnascription executable operations, traffic affecting does not normally move.The secondth, make full use of database resource.After in large enterprises or big department, having built up several databases,,, will develop distributed data base system in order to develop global application in order to utilize mutual resource.This situation can be described as the bottom-up distributed system of setting up, though this method also will be done some change, reconstruct to each existing local database systems, but, these databases rebuild a centralized data base compared with being put together, then no matter from economically still from organizing consideration, distributed data base all is to select preferably.The 3rd, progressively expanding treatment ability and system scale.When enlarging, a unit scale to increase new department (as the new branch of banking system increase, factory increases new section office, workshop) time, the structure of distributed data base system provides approach preferably for the processing power of expanding system: increase a new node in distributed data base system. and make it so much more convenient, flexible, economical than in integrated system, enlarging system scale.Have two kinds for expansion scale method commonly used in integrated system: a kind of is to leave bigger leeway when beginning to design. this causes waste easily, and because the prediction difficulty, design result is the variation of possibility incompatibility situation still.Another kind method is a system upgrade, this can influence existing normal operation of using. and relate to that incompatible hardware or system software have had material alteration and will correspondingly revise the application software of having developed the time when upgrading, the cost of upgrading is just very expensive and usually make that the method for upgrading is infeasible.Distributed data base system can be included a new node in system easily, does not influence the normal operation of the structure and the system of existing system, and the better approach of expanding system ability gradually is provided, sometimes or even unique approach.
Summary of the invention
In order to solve the problem that centralized calculation exists, the invention provides a kind of distributed memory system and method thereof.
A kind of distributed memory system is characterized in that, supports mass data storage, supports the large-size data file, and the load balancing equilibrium is reliable, can tolerate system's partial failure, and described system comprises:
Applications client Application Client, it realizes good client application interface ClientAPI visit address (ADDR server NameNode and back end DataNode by encapsulation, ClientAPI realizes encapsulation to agreement, but by these ClientAPI reading and writing, inquiry file;
Directory service device NameNode, it is the central server of system, is storing the metadata of total system, is responsible for the scheduling of total system;
Back end DataNode, it is the place of data actual storage, numerous DataNode clusters get up to make up storage system;
Each DataNode goes up operation DataNode service routine, automatic and NameNode communication, and NameNode is recorded into its IP list information with IP and the basic capacity information of DataNode, makes system have easy expansion.
A kind of storage means of distributed memory system according to claim 1 is characterized in that, described method comprises following steps:
A, when Application Client uploads data, data file is divided into some Block, each Block comprises a summary, and each summary is the original foundation of the Block data correctness at this summary place;
B, when writing data, in system, create a file, the ID of file is distributed by NameNode, the metadata of described file ID log file, when write operation is not finished, file is added " not finishing " sign, and this file is not visible to other all Client ends, promptly do not include User in; After all Block of file write DataNode, just this sign is changed to " finishing ".If write operation is unsuccessful, the request that Client sends deletion this document to NameNode, the DataNode system also can work as garbage reclamation to these Block, has guaranteed data consistency.
When reading file, obtain the metadata of file earlier from NameNode, comprise the summary info of each Block, obtain the Block data from each DataNode again, and calculate its summary info, the contrast summary info carries out the data integrity checking;
Preferably, in order to improve the robustness of system, when described Application Client uploaded data, described Block saved as many copies, and each Block copy is stored on the different DataNode.
Preferably, the DataNode of described storage Block copy is distributed in different net territories.For example DataNode can not insert same frame, switch even same router.
More optimal, the copy amount of each Block of system cycle inspection, be lower than when setting number when being checked through the Block copy, this Block copy can duplicate automatically to other different DataNode in system, and remains the Block copy amount and reach the setting number.If there is a DataNode to withdraw from service, system can remedy and recover normal rapidly.
The invention has the advantages that: system model is simple, and supports mass data storage, supports the large-size data file, the load balancing equilibrium, reliable, can tolerate system's partial failure, comprise hardware or system software mistake, can not cause the termination of system service and losing of data, easily expansion.
Description of drawings
Fig. 1 is general frame figure of the present invention.
Embodiment
Relevant technology contents of the present invention and detailed description are described as follows:
The invention provides a kind of low cost system, and support mass data storage, support the large-size data file, load balancing (equilibrium), reliably, can tolerate system's partial failure (hardware or system software mistake), can not cause the termination of system service and losing of data, easily expansion.The invention also discloses a kind of method of distributed storage, make this storage system be guaranteed from data integrity, data consistency, easy expansion and robustness.
As shown in Figure 1, described system comprises: applications client Application Client, it realizes good client application interface ClientAPI visit address (ADDR server NameNode and back end DataNode by encapsulation, ClientAPI realizes encapsulation to agreement, but by these ClientAPI reading and writing, inquiry file;
Directory service device NameNode, it is the central server of system, is storing the metadata of total system, is responsible for the scheduling of total system;
Back end DataNode, it is the place of data actual storage, numerous DataNode clusters get up to make up storage system; DataNode A as shown in fig. 1, DataNode B, DataNode C, DataNode X.
Each DataNode goes up operation DataNode service routine, automatic and NameNode communication, and NameNode is recorded into its IP list information with IP and the basic capacity information of DataNode, makes system have easy expansion.
The storage means of distributed memory system comprises following steps:
When Application Client uploads data, data file is divided into some Block, each Block comprises a summary, and each summary is the original foundation of the Block data correctness at this summary place;
When writing data, in system, create a file, the ID of file is distributed by NameNode, the metadata of described file ID log file, when write operation is not finished, file is added " not finishing " sign, and this file is not visible to other all Client ends, promptly do not include User in; After all Block of file write DataNode, just this sign is changed to " finishing ".If write operation is unsuccessful, the request that Client sends deletion this document to NameNode, the DataNode system also can work as garbage reclamation to these Block, has guaranteed data consistency.
When reading file, obtain the metadata of file earlier from NameNode, comprise the summary info of each Block, obtain the Block data from each DataNode again, and calculate its summary info, the contrast summary info carries out the data integrity checking;
In order to improve the robustness of system, when Application Client uploaded data, piece Block saved as copy more than 3, and each Block copy is stored on different DataNode A, DataNode B, the DataNode C.
DataNode A, DataNode B, DataNode C are distributed in different net territories.For example insert on different frames or switch or router.
The copy amount of each Block of system cycle inspection is lower than when setting number when being checked through the Block copy, and this Block copy can duplicate automatically to other different DataNode in system, and remains the Block copy amount and reach the setting number.If there is a DataNode to withdraw from service, system can remedy and recover normal rapidly.
The invention has the advantages that: system model is simple, and supports mass data storage, supports the large-size data file, the load balancing equilibrium, reliable, can tolerate system's partial failure, comprise hardware or systems soft ware mistake, can not cause the termination of system service and losing of data, easily expansion.

Claims (5)

1. a distributed memory system is characterized in that, described system comprises:
Applications client Application Client, it realizes good client application interface ClientAPI visit address (ADDR server NameNode and back end DataNode by encapsulation, ClientAPI realizes encapsulation to agreement, but by these ClientAPI reading and writing, inquiry file;
Directory service device NameNode, it is the central server of system, is storing the metadata of total system, is responsible for the scheduling of total system;
Back end DataNode, it is the place of data actual storage, numerous DataNode clusters get up to make up storage system;
Each DataNode goes up operation DataNode service routine, automatic and NameNode communication, and NameNode is recorded into its IP list information with IP and the basic capacity information of DataNode.
2. the storage means of a distributed memory system according to claim 1 is characterized in that,
A, when Application Client uploads data, data file is divided into several pieces Block, each Block comprises a summary, and each summary is the original foundation of the Block data correctness at this summary place;
B, when writing data, in system, create a file, the ID of file is distributed by NameNode, the metadata of described file ID log file, when write operation is not finished, file is added " not finishing " sign, and this file is not visible to other all Client ends, promptly do not include User in; After all Block of file write DataNode, just this sign is changed to " finishing ", if write operation is unsuccessful, the request that Client sends deletion this document to NameNode, the DataNode system also can work as garbage reclamation to these Block;
C, when reading file, obtain earlier the metadata of file from NameNode, comprise the summary info of each Block, obtain the Block data from each DataNode again, and calculate its summary info, the contrast summary info carries out data integrity and verifies.
3. the storage means of a distributed memory system according to claim 2 is characterized in that, when described Application Client uploaded data, described Block saved as many copies, and each Block copy is stored on the different DataNode.
4. the storage means of a distributed memory system according to claim 3 is characterized in that, the DataNode of described storage Block copy is distributed in different net territories.
5. storage means according to claim 3 or 4 described distributed memory systems, it is characterized in that, the copy amount of each Block of system cycle inspection, be lower than when setting number when being checked through the Block copy, this Block copy can duplicate automatically to other different DataNode in system, and remains the Block copy amount and reach the setting number.
CN 201010259405 2010-08-23 2010-08-23 A distributed storage system and method thereof Pending CN101901275A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201010259405 CN101901275A (en) 2010-08-23 2010-08-23 A distributed storage system and method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201010259405 CN101901275A (en) 2010-08-23 2010-08-23 A distributed storage system and method thereof

Publications (1)

Publication Number Publication Date
CN101901275A true CN101901275A (en) 2010-12-01

Family

ID=43226809

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201010259405 Pending CN101901275A (en) 2010-08-23 2010-08-23 A distributed storage system and method thereof

Country Status (1)

Country Link
CN (1) CN101901275A (en)

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102073741A (en) * 2011-01-30 2011-05-25 宇龙计算机通信科技(深圳)有限公司 Method for realizing file reading and/or writing and data server
CN102307221A (en) * 2011-03-25 2012-01-04 国云科技股份有限公司 Cloud storage system and implementation method thereof
CN102946323A (en) * 2012-10-24 2013-02-27 曙光信息产业(北京)有限公司 Realizing method for location awareness of compute node cabinet in HDFS (Hadoop Distributed File System) and realizing system thereof
CN103294413A (en) * 2013-05-08 2013-09-11 山东地纬计算机软件有限公司 Mass data acquisition terminal supported distributed-memory real-time storage device and storage method
CN103997540A (en) * 2014-06-10 2014-08-20 深圳市友华通信技术有限公司 Method for achieving distributed storage of network, storage system and customer premise equipment
CN104883381A (en) * 2014-05-27 2015-09-02 陈杰 Data access method and system for distributed storage
WO2015180070A1 (en) * 2014-05-28 2015-12-03 北京大学深圳研究生院 Data caching method and device for distributed storage system
CN106372256A (en) * 2016-09-30 2017-02-01 浙江大学 Distributed storage method for massive Argo data
CN106682227A (en) * 2017-01-06 2017-05-17 郑州云海信息技术有限公司 Log data storage system based on distributed file system and reading-writing method
WO2017114213A1 (en) * 2015-12-31 2017-07-06 阿里巴巴集团控股有限公司 Method and apparatus for upgrading distributed storage system
CN107948130A (en) * 2017-10-17 2018-04-20 联动优势科技有限公司 A kind of document handling method, server and system
CN108345510A (en) * 2018-01-11 2018-07-31 中国人民解放军国防科技大学 A method for automatic inspection and detection of reliability of large-scale offline archiving system
CN108363719A (en) * 2018-01-02 2018-08-03 中科边缘智慧信息科技(苏州)有限公司 The transparent compressing method that can configure in distributed file system
US10298709B1 (en) * 2014-12-31 2019-05-21 EMC IP Holding Company LLC Performance of Hadoop distributed file system operations in a non-native operating system
CN110491478A (en) * 2019-08-22 2019-11-22 中电健康云科技有限公司 A kind of image file distributed storage system and its implementation based on ceph
CN110870275A (en) * 2017-07-13 2020-03-06 国际商业机器公司 Shared memory file transfer

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1506848A (en) * 2002-12-10 2004-06-23 �Ҵ���˾ Method and system for allocating memory among competing services in a distributed computing environment
CN101184104A (en) * 2007-12-21 2008-05-21 腾讯科技(深圳)有限公司 Distributed memory system and method
CN101447910A (en) * 2007-11-26 2009-06-03 华为技术有限公司 Distributed network storage control method, device and distribution system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1506848A (en) * 2002-12-10 2004-06-23 �Ҵ���˾ Method and system for allocating memory among competing services in a distributed computing environment
CN101447910A (en) * 2007-11-26 2009-06-03 华为技术有限公司 Distributed network storage control method, device and distribution system
CN101184104A (en) * 2007-12-21 2008-05-21 腾讯科技(深圳)有限公司 Distributed memory system and method

Cited By (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102073741B (en) * 2011-01-30 2013-08-28 宇龙计算机通信科技(深圳)有限公司 Method for realizing file reading and/or writing and data server
CN102073741A (en) * 2011-01-30 2011-05-25 宇龙计算机通信科技(深圳)有限公司 Method for realizing file reading and/or writing and data server
CN102307221A (en) * 2011-03-25 2012-01-04 国云科技股份有限公司 Cloud storage system and implementation method thereof
CN102946323A (en) * 2012-10-24 2013-02-27 曙光信息产业(北京)有限公司 Realizing method for location awareness of compute node cabinet in HDFS (Hadoop Distributed File System) and realizing system thereof
CN103294413A (en) * 2013-05-08 2013-09-11 山东地纬计算机软件有限公司 Mass data acquisition terminal supported distributed-memory real-time storage device and storage method
CN104883381B (en) * 2014-05-27 2018-09-04 陈杰 The data access method and system of distributed storage
CN104883381A (en) * 2014-05-27 2015-09-02 陈杰 Data access method and system for distributed storage
WO2015180070A1 (en) * 2014-05-28 2015-12-03 北京大学深圳研究生院 Data caching method and device for distributed storage system
CN103997540A (en) * 2014-06-10 2014-08-20 深圳市友华通信技术有限公司 Method for achieving distributed storage of network, storage system and customer premise equipment
US10298709B1 (en) * 2014-12-31 2019-05-21 EMC IP Holding Company LLC Performance of Hadoop distributed file system operations in a non-native operating system
CN106936622B (en) * 2015-12-31 2020-01-31 阿里巴巴集团控股有限公司 distributed storage system upgrading method and device
US10884623B2 (en) 2015-12-31 2021-01-05 Alibaba Group Holding Limited Method and apparatus for upgrading a distributed storage system
CN106936622A (en) * 2015-12-31 2017-07-07 阿里巴巴集团控股有限公司 A kind of distributed memory system upgrade method and device
WO2017114213A1 (en) * 2015-12-31 2017-07-06 阿里巴巴集团控股有限公司 Method and apparatus for upgrading distributed storage system
CN106372256A (en) * 2016-09-30 2017-02-01 浙江大学 Distributed storage method for massive Argo data
CN106682227A (en) * 2017-01-06 2017-05-17 郑州云海信息技术有限公司 Log data storage system based on distributed file system and reading-writing method
CN110870275B (en) * 2017-07-13 2022-06-03 国际商业机器公司 Method and apparatus for shared memory file transfer
CN110870275A (en) * 2017-07-13 2020-03-06 国际商业机器公司 Shared memory file transfer
CN107948130A (en) * 2017-10-17 2018-04-20 联动优势科技有限公司 A kind of document handling method, server and system
CN107948130B (en) * 2017-10-17 2021-02-23 联动优势科技有限公司 File processing method, server and system
CN108363719A (en) * 2018-01-02 2018-08-03 中科边缘智慧信息科技(苏州)有限公司 The transparent compressing method that can configure in distributed file system
CN108363719B (en) * 2018-01-02 2022-10-21 中科边缘智慧信息科技(苏州)有限公司 Configurable transparent compression method in distributed file system
CN108345510A (en) * 2018-01-11 2018-07-31 中国人民解放军国防科技大学 A method for automatic inspection and detection of reliability of large-scale offline archiving system
CN110491478A (en) * 2019-08-22 2019-11-22 中电健康云科技有限公司 A kind of image file distributed storage system and its implementation based on ceph

Similar Documents

Publication Publication Date Title
CN101901275A (en) A distributed storage system and method thereof
US10891267B2 (en) Versioning of database partition maps
US8918392B1 (en) Data storage mapping and management
US8386540B1 (en) Scalable relational database service
US8930364B1 (en) Intelligent data integration
US9531809B1 (en) Distributed data storage controller
US11995336B2 (en) Bucket views
CN101964820B (en) Method and system for keeping data consistency
CN112424762B (en) Transferring connections in a multi-deployment database
CN103842969B (en) Information processing system
US7546486B2 (en) Scalable distributed object management in a distributed fixed content storage system
CN105324770B (en) Effectively read copy
US8935203B1 (en) Environment-sensitive distributed data management
CN103312791B (en) Internet of Things isomeric data storage means and system
US20180004777A1 (en) Data distribution across nodes of a distributed database base system
CN109716279A (en) It is persistent adaptive concurrent for being written
CN104378423A (en) Metadata cluster distribution storage system and storage data reading and writing method
CN102033912A (en) Distributed-type database access method and system
CN103905537A (en) System for managing industry real-time data storage in distributed environment
CN101763347A (en) GIS (Geographical Information System) interface platform as well as network GIS management system and management method
US11442645B2 (en) Distributed storage system expansion mechanism
CN104050248A (en) File storage system and storage method
CN110209653A (en) HBase data migration method and moving apparatus
CN105468296A (en) No-sharing storage management method based on virtualization platform
CN103365740B (en) A kind of data cold standby method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20101201