CN105930357A

CN105930357A - Distributed file system, and data node data storage processing method and device

Info

Publication number: CN105930357A
Application number: CN201610218862.0A
Authority: CN
Inventors: 潘春球
Original assignee: Shenzhen Wisdom Spark Tech Co Ltd
Current assignee: Shenzhen Wisdom Spark Tech Co Ltd
Priority date: 2016-04-07
Filing date: 2016-04-07
Publication date: 2016-09-07
Anticipated expiration: 2036-04-07
Also published as: CN105930357B

Abstract

The invention discloses a distributed file system, and a data node data storage processing method and device. The data node data storage processing method for the distributed file system includes the following steps: receiving storage data to be stored and sent by a client, and data block ID and name space information which are corresponding to the storage data; and storing the storage data, the data block ID and the name space information. The data block ID and the name space information are stored in a data node, name space of a central node can be backed up dispersedly, and then system recovery can be performed through all the data nodes when metadata of the central node is lost, and the stable work of the distributed file system can be ensured; and meanwhile, the data node data storage processing device for the distributed file system, a central node data storage processing device for the distributed file system, and the distributed file system have the function of dealing with central node down and metadata loss of the system.

Description

Distributed file system and the method for back end data storage processing, device

Technical field

The present invention relates to field of computer technology, particularly relate to a kind of distributed file system and back end The method of data storage processing, device.

Background technology

HDFS, is Hadoop distributed file system, and full name is Hadoop Distributed File System. This system includes that two main parts are respectively Centroid (NameNode) and back end (DataNode).Wherein NameNode is responsible for being managed collectively Namespace, but does not store data； DataNode is responsible for storing data block and BlockID thereof, but does not store Namespace information.And Namespace refers to NameSpace, and it contains file and the hierarchical structure relation of catalogue of HDFS.

In HDFS, data are that piecemeal is stored in multiple stage DataNode, and each data block has one Individual BlockID (data block ID).The Namespace of HDFS have recorded, which number each file has According to block and the BlockID of these data blocks.NameNode by BlockID Namespace and Data block mapping relations on DataNode are set up, i.e. the hierarchical structure relation of Namespace and literary composition The data block of part node, respectively on which platform DataNode, thus constructs a complete distributed literary composition Part system.Additionally, NameNode is periodically in Namespace information persistent storage to disk, as The metadata of NameNode system.The Namespace information in disk is loaded when NameNode is restarted The namespace information of file system can be built to internal memory.

But when Centroid (NameNode) breaks down and causes metadata to lose, HDFS then can face The danger of unrepairable, affects the reading of data in whole system.

Summary of the invention

Based on this, it is necessary to for above-mentioned technical problem, it is provided that one can effectively prevent HDFS because in On heart node, Namespace information dropout causes system to use the distributed file system of obstacle, distributed literary composition Centromere in the method for back end data storage processing, device, and distributed file system in part system The method of point data storage process, device.

For realizing back end data storage processing in a kind of distributed file system that the object of the invention provides Method, comprises the following steps:

Receive the storage data to be stored that client sends, and data block ID that described storage data are corresponding And namespace information；

Store described storage data and described data block ID and namespace information.

As the embodiment of the method for back end data storage processing in a kind of distributed file system, Further comprising the steps of:

Send described data block ID and described namespace information to Centroid.

As the embodiment of the method for back end data storage processing in a kind of distributed file system, Back end sends all data blocks ID stored and all of namespace information to institute when starting every time State Centroid.

The dress of back end data storage processing in a kind of distributed file system based on same inventive concept Put, including:

First data reception module is for receiving the storage data to be stored that client sends and described Store data block ID corresponding to data and namespace information；

First memory module, is used for storing described storage data and described data block ID and namespace information.

As the embodiment of the device of back end data storage processing in a kind of distributed file system, Also include the first sending module, be used for sending described data block ID and described namespace information to Centroid.

The side of Centroid data storage processing in a kind of distributed file system based on same inventive concept Method, comprises the following steps:

Receive the data storage request of client；

According to described data storage request, return corresponding data block ID and namespace information to described visitor Family end；

Receive back end and send data block ID come and corresponding namespace information, and store.

As the embodiment of the method for Centroid data storage processing in a kind of distributed file system, Further comprising the steps of:

Part names space is built according to described namespace information, and by described data block ID and NameSpace It is associated.

Complete NameSpace is built according to the namespace information that all back end send.

As the embodiment of the method for Centroid data storage processing in a kind of distributed file system, Identical data block ID repeatedly received and corresponding namespace information are the most once associated and constitute Part names spatial manipulation.

The dress of Centroid data storage processing in a kind of distributed file system based on same inventive concept Put, including:

Second receiver module, for receiving the data storage request of client；

Second sending module, for according to described data storage request, returns corresponding data block ID and life Name space information is to described client；

3rd receiver module, sends next data block ID and corresponding NameSpace letter for receiving back end Breath, and store.

As the embodiment of the device of Centroid data storage processing in a kind of distributed file system, Also include:

Part names space builds module, for building part names space according to described namespace information, And described data block ID NameSpace is associated；

Complete NameSpace builds module, builds for the namespace information sent according to all back end Complete NameSpace.

A kind of distributed file system based on same inventive concept, including Centroid and two or more data Node, each described back end is all connected with described Centroid communication；And described back end configures There are the device of back end data storage processing, described Centroid in any one distributed file system aforementioned In be configured with the device of Centroid data storage processing in any one distributed file system aforementioned；

The client using described distributed file system needs to write data in described distributed file system Time, send data storage request to described Centroid；

After described Centroid receives the data storage request of client, according to described data storage request, Return corresponding data block ID and namespace information to described client；

Described client sends storage data to be stored, and described storage data pair to described back end Data block ID answered and namespace information；

Described back end stores described storage data and described data block ID and namespace information；

Described back end sends described data block ID and described namespace information to described Centroid；

Described Centroid builds part names space according to described namespace information, and by described data block ID is associated with NameSpace；

It is empty that described Centroid builds complete name according to the namespace information that all back end send Between.

The beneficial effect comprise that in a kind of distributed file system that the present invention provides, data section is counted The method of Centroid data storage processing in the method processed according to storage and distributed file system, by Back end stores data block ID and namespace information, plays the NameSpace of dispersion backup center node Effect such that it is able in Centroid metadata lose time carry out system recovery from each back end, protect The steady operation of card distributed file system.In the distributed file system simultaneously provided, back end data are deposited The device of Centroid data storage processing and distributed document in device that storage processes and distributed file system System also has reply Centroid and delays the function that machine and system metadata lose.

Accompanying drawing explanation

Fig. 1 is the method flow of back end data storage processing in distributed file system in an embodiment Figure；

Fig. 2 is that the device of back end data storage processing in distributed file system in an embodiment is constituted Schematic diagram；

Fig. 3 is the device structure of back end data storage processing in distributed file system in another embodiment Become schematic diagram；

Fig. 4 is the method flow of Centroid data storage processing in distributed file system in an embodiment Figure；

Fig. 5 is the apparatus structure of Centroid data storage processing in distributed file system in an embodiment Schematic diagram；

Fig. 6 is the device knot of Centroid data storage processing in distributed file system in another embodiment Structure schematic diagram；

Fig. 7 is that the distributed file system of an embodiment constitutes schematic diagram；

Fig. 8 be an embodiment distributed file system in each several part communication flow view.

Detailed description of the invention

In order to make the purpose of the present invention, technical scheme and advantage clearer, below in conjunction with accompanying drawing to this The method of back end data storage processing, dress in the distributed file system of invention, distributed file system Put, and the specific embodiment party of the method for Centroid data storage processing, device in distributed file system Formula illustrates.Should be appreciated that specific embodiment described herein only in order to explain the present invention, not For limiting the present invention.

The method of back end data storage processing in the distributed file system of one of them embodiment, such as Fig. 1 Shown in, comprise the following steps:

S101, receives the storage data to be stored that client sends, and the number that described storage data are corresponding According to block ID and namespace information.

Described namespace information refers to file complete trails in file system.Side in the embodiment of the present invention In method, it is desirable to user side not only sends and to be stored stores data into back end, also to send storage simultaneously The information such as data block ID that data are corresponding.User side can be by according in the distributed file system pre-set Data store strategy determines data block ID of data to be stored and corresponding namespace information.Such as system In set in advance each back end be circulated the modes such as storage carry out the storage order of data storage, with And data block ID etc. of each data block prestored in a tabular form.

Certainly, as a kind of embodiment, user side can also be by specific in distributed file system Node, such as Centroid, obtaining data to be stored should corresponding data block ID and corresponding NameSpace Information.Thus, user side then can send when sending information to be stored to back end simultaneously Corresponding data block ID and namespace information.

S102, stores described storage data and described data block ID and namespace information.

In step S102, the back end in distributed file system is except having traditional distributed file system Outside the storage data that middle storage client sends, data block ID corresponding to the data that the most simultaneously stored with And namespace information.

As described above, it can be seen that back end in the distributed file system of the embodiment of the present invention The method of data storage processing, outside its storage General System data, stores what client transmission came the most simultaneously Corresponding data block ID and namespace information, thus play data block ID and the effect of NameSpace backup. Back end in system is equipped with the function of portion centers node simultaneously, such that it is able in distributed field system There is provided part metadata information to carry out ordering space for HDFS when Centroid in system breaks down to repair.And Concrete repair should include new Centroid, or the Centroid after resuming work is by each number Data block ID and the process of namespace information one new complete NameSpace of structure according to node storage.

More preferably, wherein in an embodiment, further comprising the steps of:

S103, sends described data block ID and described namespace information to Centroid.

Back end can send data block ID self stored and namespace information at any time to distribution Centroid in formula file system.So that Centroid according to the data block id information received to self NameSpace carries out improving and the work such as reparation.As back end can after often carrying out a secondary data storage then Send data block ID corresponding to the data once stored and namespace information；Can also separated in time The most stored multiple data blocks ID of unified transmission and namespace information.

Send all data stored when starting every time as a kind of preferably embodiment back end Block ID and all of namespace information are to described Centroid.Or more preferably, back end is opening every time Time dynamic, send all data blocks ID not being transmitted across and namespace information to Centroid.And now, number Also there is the function being marked data block ID stored and namespace information according to node, i.e. data Node uses label symbol to be marked the data block sent, as being transmitted across to Centroid Data block ID and namespace information be labeled as 1, and be data block ID and namespace information being transmitted across Be labeled as 0, then back end only sends data block ID and the NameSpace letter being labeled as 0 when next time starts Cease.While accurate information so can be provided to Centroid, reduce back end and Centroid Between need transmission data volume.Improve the overall performance of distributed file system.

Based on same inventive concept, the present invention also provides for back end data in a kind of distributed file system and deposits The device that storage processes, owing to this device solves principle and the data in aforementioned a kind of distributed file system of problem The method that node data storage processes is similar, and therefore, the enforcement of this device can concrete according to preceding method Step realizes, and repeats no more in place of repetition.

The device of back end data storage processing in the distributed file system of one embodiment, such as Fig. 2 institute Show, including the first data reception module 101 and the first memory module 102.

First data reception module 101, for receiving the storage data to be stored that client sends, Yi Jicun Store up data block ID corresponding to data and namespace information.

First memory module 102, is used for storing described storage data and data block ID and namespace information.

By the device of back end data storage processing in the distributed file system of the embodiment of the present invention, point Back end in cloth file system can store outside necessary data message, also stores stored data pair Data block ID answered and namespace information, thus play and NameSpace partial information in Centroid is carried out The effect of backup, it is possible to recover to provide effective number for HDFS when Centroid metadata is lost in systems According to support, improve the overall performance of distributed file system.

In the distributed file system of another embodiment in the device of back end data storage processing, such as Fig. 3 Shown in, also including the first sending module 103, it is used for sending data block ID and namespace information to center Node.

Back end can when needed (as Centroid restart carry out metadata repair time), or periodically lead to Cross the first sending module and be supplied to the information that Centroid is necessary, thus reach to back up the effect of metadata information.

Matching, the present invention also provides for Centroid data storage processing in a kind of distributed file system Method, as shown in Figure 4, the method comprises the following steps:

S201, receives the data storage request of client.

In the method for the embodiment of the present invention, distributed file system is operationally, when client carries out data storage, It can initially set up and connection between Centroid, sends data storage request to Centroid.And centromere Point can receive data storage request and do suitable feedback processing.Concrete process work such as step S102.

S202, according to data storage request, returns corresponding data block ID and namespace information to client End.

After Centroid receives data storage request, suitable data block can be distributed according to data storage request ID, and the namespace information that clearly these data are corresponding.After Centroid is assigned, can be by data block ID Allocation result and corresponding namespace information feed back to user side, in order to user side uses this information and each number According to carrying out concrete data storing work between node.

S203, receives back end and sends data block ID come and corresponding namespace information, and store.

Herein it should be noted that after step S202, user side is just provided with the data block of storage data ID and namespace information, thus user side just can send when storing data into back end simultaneously and to deposit Store up data block ID corresponding to data and corresponding namespace information to back end.Therefore, back end can To send data block ID and namespace information in the information that stores from self to Centroid, this is also this The emphasis place of the method for the distributed file system Centroid data storage processing of bright embodiment.Centromere Point can obtain data block ID and namespace information at back end, this at Centroid due to fault etc. Reason causes metadata to have very important significance when losing.Centroid is enable to obtain from back end Block ID and the namespace information of fetching data, to carry out system repair, is effectively ensured distributed file system Normal effectively operation.

As a kind of embodiment, further comprising the steps of:

S204, builds part names space according to namespace information, and data block ID is entered with NameSpace Row association.

It should be noted that this step can carry out with step S203 or both cooperate and carry out simultaneously, As Centroid receives after back end sends data block ID come and corresponding namespace information, can first by Data block ID is associated with NameSpace, and stores the information after associating, and NameSpace after association Information, and part names space can be built according to namespace information further.

Wherein, NameSpace have recorded each file in distributed file system have which data block and The BlockID of these data blocks, it is possible to hierarchical relationship between All Files in sign system.Therefore, root The namespace information uploaded according to a back end or the namespace information once uploaded can only be to build Part names space, can comprise the life stored in all back end in distributed file system in Centroid Name space information, and these information integration can be become NameSpace, and store.

Also, it should be noted in step S204, when data block ID and NameSpace are associated, in Heart node is now capable of identify that data block ID is by which back end to be sent, therefore, now by number After being associated with NameSpace according to block ID, just it is able to know that in certain file according to the information in NameSpace Specifically all comprise which data block, and which back end is each data block be stored in and suffer.

The process that namespace information in all back end is integrated into NameSpace be may be considered by this Step S205.

S205, builds complete NameSpace according to the namespace information that all back end send.

But, Centroid docks identical data block ID repeatedly received and corresponding namespace information The most once associate and constitute part NameSpace and process.That is, when Centroid repeatedly (more than twice) When receiving same namespace information, only carry out the related job of data block ID and NameSpace, also The most once build the work in part names space.This is because file backup in distributed file system Feature, has multiple backup, when NameNode receive multiple DataNode report identical information time, Centroid can only carry out single treatment, and ignores the data repeating to report.

Based on same inventive concept, the embodiment of the present invention provides centromere in a kind of distributed file system to count The device processed according to storage, due to this device solve the principle of problem with in aforementioned a kind of distributed file system The method of Centroid data storage processing is similar, and therefore, the enforcement of this device can be according to preceding method Concrete steps realize, and repeat no more in place of repetition.

The device of Centroid data storage processing in the distributed file system of one of them embodiment, such as Fig. 5 Shown in, including the second receiver module the 201, second sending module 202 and the 3rd receiver module 203.Wherein, Second receiver module 201, for receiving the data storage request of client；Second sending module 202, is used for According to data storage request, return corresponding data block ID and namespace information to client；3rd connects Receive module 203, send next data block ID and corresponding namespace information for receiving back end, and Storage.

By the device of Centroid data storage processing in the distributed file system of the embodiment of the present invention, point Centroid in cloth file system can obtain data block in the back end from distributed file system ID and corresponding namespace information such that it is able to when fault by the metadata of back end Recover from damaging, Ensure the properly functioning of distributed file system, it is to avoid because metadata is damaged affects distributed file system File read-write.

In another embodiment, as shown in Figure 6, also include that part names space builds module and completely names Space builds module.Wherein, part names space builds module, for according to described namespace information structure Build part names space, and described data block ID is associated with NameSpace；Complete NameSpace builds Module, builds complete NameSpace for the namespace information sent according to all back end.So, Centroid just can according to the complete NameSpace of the information architecture obtained from back end, even if thus Also can by back end recovery system just in the case of in Centroid, metadata is damaged completely or lost Often work.

Comprehensive aforesaid back end and the function of Centroid, the present invention also provides for a kind of distributed field system System.As it is shown in fig. 7, the system of one of them embodiment includes a Centroid and n back end, It is respectively back end 1, back end 2 ..., back end n.And each back end all with in Heart node communication connects, and the most each back end can carry out the transmission of data with Centroid.Wherein, n For the integer more than 2.The quantity of back end is likely to be 2 the most in other embodiments.It is preferred that this Back end in the distributed file system of inventive embodiments is configured with the distributed of aforementioned any embodiment The device of back end data storage processing in file system, is configured with aforementioned any embodiment in Centroid Distributed file system in the device of Centroid data storage processing.The i.e. distributed system of the present embodiment In back end there are in aforementioned distributed file system data in the method for back end data storage processing The function that node is had, and the Centroid in the embodiment of the present invention has aforesaid distributed file system The function of Centroid in the method for middle Centroid data storage processing.

It will be understood by those skilled in the art that distributed file system is mainly used for user (client) Carry out storage and the reading of data.

As shown in Figure 8, as a example by the data between a back end and client and Centroid are transmitted, Data transmission procedure between three is as follows:

1), when using the client of distributed file system to need to write data in distributed file system, send Data storage request is to Centroid.

2) after Centroid receives the data storage request of client, according to data storage request, phase is returned Data block ID answered and namespace information are to client.

3) client sends storage data to be stored, and the data block that storage data are corresponding to back end ID and namespace information；

4) back end storage stores data and data block ID and namespace information.

5) back end sends data block ID and namespace information to Centroid.

6) Centroid builds part names space according to described namespace information, and by described data block ID It is associated with NameSpace.Wherein build part names space and be associated between the two step regardless of Front and back, can arrange either step according to demand front, another step is rear.

7) namespace information that Centroid sends according to all back end builds complete NameSpace.

So far, Centroid completes the process of the complete NameSpace of information architecture provided according to back end. Centroid this from back end obtain the complete NameSpace of information architecture can, be that this system has Higher stability, and failover capability.

It is further to note that back end in the present invention, Centroid, and and distributed document System communicates, and the client carrying out reading and writing data can be all computer, or other have data and process merit The processor of energy.

One of ordinary skill in the art will appreciate that all or part of flow process realizing in above-described embodiment method, Can be by computer program and complete to instruct relevant hardware, described program can be stored in a computer In read/write memory medium, in the embodiment of the present invention, this program can be stored in the storage of computer system and be situated between In matter, and performed by least one processor in this computer system, to realize including such as above-mentioned each method The flow process of embodiment.Wherein, described storage medium can be magnetic disc, CD, read-only store-memory body (Read-Only Memory, ROM) or random store-memory body (Random Access Memory, RAM) Deng.

Each technical characteristic of embodiment described above can combine arbitrarily, for making description succinct, the most right The all possible combination of each technical characteristic in above-described embodiment is all described, but, if these skills There is not contradiction in the combination of art feature, is all considered to be the scope that this specification is recorded.

Embodiment described above only have expressed the several embodiments of the present invention, and it describes more concrete and detailed, But can not therefore be construed as limiting the scope of the patent.It should be pointed out that, for this area For those of ordinary skill, without departing from the inventive concept of the premise, it is also possible to make some deformation and change Entering, these broadly fall into protection scope of the present invention.Therefore, the protection domain of patent of the present invention should be with appended power Profit requires to be as the criterion.

Claims

1. the method for back end data storage processing in a distributed file system, it is characterised in that bag Include following steps:

Store described storage data and described data block ID and described namespace information.

The method of back end data storage processing in distributed file system the most according to claim 1, It is characterized in that, further comprising the steps of:

Send described data block ID and described namespace information to Centroid.

The method of back end data storage processing in distributed file system the most according to claim 2, It is characterized in that, back end sends all data blocks ID stored and all of name when starting every time Spatial information is to described Centroid.

4. the device of back end data storage processing in a distributed file system, it is characterised in that bag Include:

First memory module, is used for storing described storage data and described data block ID and described NameSpace letter Breath.

The device of back end data storage processing in distributed file system the most according to claim 4, It is characterized in that, also include the first sending module, be used for sending described data block ID and described NameSpace letter Breath is to Centroid.

6. the method for Centroid data storage processing in a distributed file system, it is characterised in that bag Include following steps:

Receive the data storage request of client；

The method of Centroid data storage processing in distributed file system the most according to claim 6, It is characterized in that, further comprising the steps of:

The method of Centroid data storage processing in distributed file system the most according to claim 7, It is characterized in that, further comprising the steps of:

The method of Centroid data storage processing in distributed file system the most according to claim 7, It is characterized in that, identical data block ID repeatedly received and corresponding namespace information are only carried out one Secondary association and composition part NameSpace process.

10. the device of Centroid data storage processing in a distributed file system, it is characterised in that Including:

State the second receiver module, for receiving the data storage request of client；

The dress of Centroid data storage processing in 11. distributed file systems according to claim 10 Put, it is characterised in that also include:

Part names space builds module, for building part names space according to described namespace information, And described data block ID is associated with NameSpace；

12. 1 kinds of distributed file systems, it is characterised in that include Centroid and two or more data section Point, each described back end is all connected with described Centroid communication；And described back end is configured with In distributed file system described in claim 4 or 5, the device of back end data storage processing, described Centroid is configured with Centroid data in the distributed file system described in claim 10 or 11 deposit The device that storage processes；

Described back end stores described storage data and described data block ID and described namespace information；