CN101901275A

CN101901275A - A distributed storage system and method thereof

Info

Publication number: CN101901275A
Application number: CN 201010259405
Authority: CN
Inventors: 王芙蓉; 朱好好
Original assignee: Huazhong University of Science and Technology
Current assignee: Huazhong University of Science and Technology
Priority date: 2010-08-23
Filing date: 2010-08-23
Publication date: 2010-12-01

Abstract

The invention discloses a distributed storage system and supports massive data storage. Described system comprises ApplicationClient, and it visits NameNode and DataNode through encapsulation and realizes ClientAPI, and ClientAPI is to realize encapsulation to protocol, can read, write, query file through this ClientAPI; Metadata is responsible for the scheduling of the entire system; DataNode is where data is actually stored, and many DataNodes are clustered to build a storage system. The invention also discloses a distributed storage method, so that the storage system can be guaranteed from data integrity, data consistency, easy expansion and robustness.

Description

A kind of distributed memory system and method thereof

Technical field

The present invention relates to a kind of distributed memory system and method thereof.

Background technology

Along with traditional database technology reaches its maturity, the expansion of the develop rapidly of computer networking technology and range of application, database application generally builds on the computer network.At this moment the centralized data base system shows its deficiency: data are pressed actual needs distributed store on network, adopt centralized processing again, certainly will cause communication overhead big; Application program is concentrated on a computing machine and is moved, in case this computing machine breaks down, then total system is affected, and reliability is not high.Under these circumstances, " centralized calculation " notion is to " Distribution calculation " concept development.

Distributed data base system (DDBS) comprises distributed data base management system (DDBMS) (DDBMS) and distributed data base (DDB).In distributed data base system, an application program can be carried out transparent operation to database, the storage in different local data banks respectively of the data in the database, by different DBMS manage, operation on different machines, by different operating system supports, linked together by different communication networks.

A distributed data base logically is a unified integral body, then is to be stored in respectively on the different physical nodes physically.An application program can be visited the database that is distributed in diverse geographic location by the connection of network.Its distributivity shows that the data in the database are not to be stored in same place.More properly say, be not stored on the memory device of same computing machine.Here it is and the difference of centralized data base.From user's angle, a distributed data base system is logically the same with the centralized data base system, and the user can carry out global application in any one place.All right those data are to be stored on same the computing machine, have individual data base management system (DBMS) management the same, and what the user do not have and feel different.

Distributed data base system is to grow up on the basis of centralized data base system, is the product of computer technology and network technology combination.Distributed data base system is suitable for the department that unit disperses, and allows each department data storage that it is commonly used in this locality, implements to deposit on the spot local the use, thereby improves response speed, reduces communication cost.Distributed data base system is compared with the centralized data base system has extensibility, by increasing suitable data redundancy, improves the reliability of system.In centralized data base, reduce redundance is one of aims of systems as far as possible. its reason is, redundant data waste storage space, and cause inconsistency between each copy easily. and in order to guarantee the consistance of data, system will pay certain maintenance cost. the target that reduces redundance reaches with data sharing.And hope increases redundant data in distributed data base, store a plurality of copies of same data in different places, its reason is: 1.. improve the reliability of system, when availability breaks down when a certain place, system can operate the identical copies on another place, can not cause the paralysis of total system because of place's fault.2.. improving the system performance system can select to operate from the nearest data trnascription of user according to distance, reduces communication cost, improves the performance of total system.

The target of distributed data base system is just developed purpose, the motivation of distributed data base system, mainly comprises the target of technology and tissue two aspects.It at first is the reliabilty and availability of raising system.The reliabilty and availability of improvement system is the main target of distributed data base, and DATA DISTRIBUTION in a plurality of places, and is increased suitable redundance better reliability can be provided.Some reliability requirement higher system, this point is even more important. because being out of order, a ground can not cause the total system collapse.Because the user in fault place can enter system by other place, and the user in other place can select access path automatically by system, avoids the fault place, utilizes other data trnascription executable operations, traffic affecting does not normally move.The secondth, make full use of database resource.After in large enterprises or big department, having built up several databases,,, will develop distributed data base system in order to develop global application in order to utilize mutual resource.This situation can be described as the bottom-up distributed system of setting up, though this method also will be done some change, reconstruct to each existing local database systems, but, these databases rebuild a centralized data base compared with being put together, then no matter from economically still from organizing consideration, distributed data base all is to select preferably.The 3rd, progressively expanding treatment ability and system scale.When enlarging, a unit scale to increase new department (as the new branch of banking system increase, factory increases new section office, workshop) time, the structure of distributed data base system provides approach preferably for the processing power of expanding system: increase a new node in distributed data base system. and make it so much more convenient, flexible, economical than in integrated system, enlarging system scale.Have two kinds for expansion scale method commonly used in integrated system: a kind of is to leave bigger leeway when beginning to design. this causes waste easily, and because the prediction difficulty, design result is the variation of possibility incompatibility situation still.Another kind method is a system upgrade, this can influence existing normal operation of using. and relate to that incompatible hardware or system software have had material alteration and will correspondingly revise the application software of having developed the time when upgrading, the cost of upgrading is just very expensive and usually make that the method for upgrading is infeasible.Distributed data base system can be included a new node in system easily, does not influence the normal operation of the structure and the system of existing system, and the better approach of expanding system ability gradually is provided, sometimes or even unique approach.

Summary of the invention

In order to solve the problem that centralized calculation exists, the invention provides a kind of distributed memory system and method thereof.

A kind of distributed memory system is characterized in that, supports mass data storage, supports the large-size data file, and the load balancing equilibrium is reliable, can tolerate system's partial failure, and described system comprises:

Applications client Application Client, it realizes good client application interface ClientAPI visit address (ADDR server NameNode and back end DataNode by encapsulation, ClientAPI realizes encapsulation to agreement, but by these ClientAPI reading and writing, inquiry file;

Directory service device NameNode, it is the central server of system, is storing the metadata of total system, is responsible for the scheduling of total system;

Back end DataNode, it is the place of data actual storage, numerous DataNode clusters get up to make up storage system;

Each DataNode goes up operation DataNode service routine, automatic and NameNode communication, and NameNode is recorded into its IP list information with IP and the basic capacity information of DataNode, makes system have easy expansion.

A kind of storage means of distributed memory system according to claim 1 is characterized in that, described method comprises following steps:

A, when Application Client uploads data, data file is divided into some Block, each Block comprises a summary, and each summary is the original foundation of the Block data correctness at this summary place;

B, when writing data, in system, create a file, the ID of file is distributed by NameNode, the metadata of described file ID log file, when write operation is not finished, file is added " not finishing " sign, and this file is not visible to other all Client ends, promptly do not include User in; After all Block of file write DataNode, just this sign is changed to " finishing ".If write operation is unsuccessful, the request that Client sends deletion this document to NameNode, the DataNode system also can work as garbage reclamation to these Block, has guaranteed data consistency.

When reading file, obtain the metadata of file earlier from NameNode, comprise the summary info of each Block, obtain the Block data from each DataNode again, and calculate its summary info, the contrast summary info carries out the data integrity checking;

Preferably, in order to improve the robustness of system, when described Application Client uploaded data, described Block saved as many copies, and each Block copy is stored on the different DataNode.

Preferably, the DataNode of described storage Block copy is distributed in different net territories.For example DataNode can not insert same frame, switch even same router.

More optimal, the copy amount of each Block of system cycle inspection, be lower than when setting number when being checked through the Block copy, this Block copy can duplicate automatically to other different DataNode in system, and remains the Block copy amount and reach the setting number.If there is a DataNode to withdraw from service, system can remedy and recover normal rapidly.

The invention has the advantages that: system model is simple, and supports mass data storage, supports the large-size data file, the load balancing equilibrium, reliable, can tolerate system's partial failure, comprise hardware or system software mistake, can not cause the termination of system service and losing of data, easily expansion.

Description of drawings

Fig. 1 is general frame figure of the present invention.

Embodiment

Relevant technology contents of the present invention and detailed description are described as follows:

The invention provides a kind of low cost system, and support mass data storage, support the large-size data file, load balancing (equilibrium), reliably, can tolerate system's partial failure (hardware or system software mistake), can not cause the termination of system service and losing of data, easily expansion.The invention also discloses a kind of method of distributed storage, make this storage system be guaranteed from data integrity, data consistency, easy expansion and robustness.

As shown in Figure 1, described system comprises: applications client Application Client, it realizes good client application interface ClientAPI visit address (ADDR server NameNode and back end DataNode by encapsulation, ClientAPI realizes encapsulation to agreement, but by these ClientAPI reading and writing, inquiry file;

Back end DataNode, it is the place of data actual storage, numerous DataNode clusters get up to make up storage system; DataNode A as shown in fig. 1, DataNode B, DataNode C, DataNode X.

The storage means of distributed memory system comprises following steps:

When Application Client uploads data, data file is divided into some Block, each Block comprises a summary, and each summary is the original foundation of the Block data correctness at this summary place;

When writing data, in system, create a file, the ID of file is distributed by NameNode, the metadata of described file ID log file, when write operation is not finished, file is added " not finishing " sign, and this file is not visible to other all Client ends, promptly do not include User in; After all Block of file write DataNode, just this sign is changed to " finishing ".If write operation is unsuccessful, the request that Client sends deletion this document to NameNode, the DataNode system also can work as garbage reclamation to these Block, has guaranteed data consistency.

In order to improve the robustness of system, when Application Client uploaded data, piece Block saved as copy more than 3, and each Block copy is stored on different DataNode A, DataNode B, the DataNode C.

DataNode A, DataNode B, DataNode C are distributed in different net territories.For example insert on different frames or switch or router.

The copy amount of each Block of system cycle inspection is lower than when setting number when being checked through the Block copy, and this Block copy can duplicate automatically to other different DataNode in system, and remains the Block copy amount and reach the setting number.If there is a DataNode to withdraw from service, system can remedy and recover normal rapidly.

The invention has the advantages that: system model is simple, and supports mass data storage, supports the large-size data file, the load balancing equilibrium, reliable, can tolerate system's partial failure, comprise hardware or systems soft ware mistake, can not cause the termination of system service and losing of data, easily expansion.

Claims

1. a distributed memory system is characterized in that, described system comprises:

Each DataNode goes up operation DataNode service routine, automatic and NameNode communication, and NameNode is recorded into its IP list information with IP and the basic capacity information of DataNode.

2. the storage means of a distributed memory system according to claim 1 is characterized in that,

A, when Application Client uploads data, data file is divided into several pieces Block, each Block comprises a summary, and each summary is the original foundation of the Block data correctness at this summary place;

B, when writing data, in system, create a file, the ID of file is distributed by NameNode, the metadata of described file ID log file, when write operation is not finished, file is added " not finishing " sign, and this file is not visible to other all Client ends, promptly do not include User in; After all Block of file write DataNode, just this sign is changed to " finishing ", if write operation is unsuccessful, the request that Client sends deletion this document to NameNode, the DataNode system also can work as garbage reclamation to these Block;

C, when reading file, obtain earlier the metadata of file from NameNode, comprise the summary info of each Block, obtain the Block data from each DataNode again, and calculate its summary info, the contrast summary info carries out data integrity and verifies.

3. the storage means of a distributed memory system according to claim 2 is characterized in that, when described Application Client uploaded data, described Block saved as many copies, and each Block copy is stored on the different DataNode.

4. the storage means of a distributed memory system according to claim 3 is characterized in that, the DataNode of described storage Block copy is distributed in different net territories.

5. storage means according to claim 3 or 4 described distributed memory systems, it is characterized in that, the copy amount of each Block of system cycle inspection, be lower than when setting number when being checked through the Block copy, this Block copy can duplicate automatically to other different DataNode in system, and remains the Block copy amount and reach the setting number.