CN107169056A

CN107169056A - Distributed file system and the method for saving distributed file system memory space

Info

Publication number: CN107169056A
Application number: CN201710287520.9A
Authority: CN
Inventors: 李强; 王凤琴
Original assignee: Sichuan Changhong Electric Co Ltd
Current assignee: Sichuan Changhong Electric Co Ltd
Priority date: 2017-04-27
Filing date: 2017-04-27
Publication date: 2017-09-15

Abstract

The invention discloses a kind of method of saving distributed file system memory space, its step is followed successively by：Set the temperature threshold values that cold file and hot file are distinguished；The file newly write is put into cold file area；The storage of cold file is stored by the way of error correcting code；File metadata information increase starting access time, access times, beginning access time, hot value；To the processing of name node when client is often asked one time；The hot value and temperature threshold values of file are compared.If greater than temperature threshold values, then turn file and move into hot file area, return to a copy to client, otherwise turn to search cold file, client is returned to after being calculated by error correcting code；Open distributed file system simultaneously, the system it include the client, name node or the memory node that are built in the system；The present invention takes full advantage of the different storage processing modes to cold and hot file, to save the memory space of distributed file system as far as possible.

Description

Distributed file system and the method for saving distributed file system memory space

Technical field

The present invention relates to field of computer technology, and in particular to a kind of distributed file system and saving distributed field system The method of system memory space.

Background technology

Computer is managed and data storage by file system.With the fast development of Internet technology, people can be with The data of the acquisition gradually growth of exponentially again, the storage for extending computer file system by increasing hard disk number merely is held The mode of amount, the performance in terms of amount of capacity, capacity growth rate, data backup, data safety is all barely satisfactory.And divide Cloth file system can effectively solve storage and the management problemses of data：Some file system in some place will be fixed on, Any number of places/multiple file system are expanded to, numerous nodes constitute a Filesystem Network.Each node can divide Cloth carries out the communication between node and data transfer in different places by network.People using distributed file system when, It need not be concerned about which node is data be stored on or be the acquisition file from which node, it is only necessary to as using local File system is managed and the data in storage file system like that.These advantages of distributed file system cause it soon Large-scale application is obtained.

And common distributed file system generally employ three copy mechanism to ensure the reliabilty and availability of data. That is, each is stored in the file of distributed file system, three parts of copies are actually all stored.Three copy mechanism are not The reliability of data is improve only, when a certain part loss of data, data recovery can be carried out using other two parts of copies, and Three copies can play the function of relatively good load balancing.However, the cost of this method is higher, the memory space that it is consumed It is three times of actual storage capacity.If when the disk array of server employs similar RAID5 etc technology, and Distributed file system is deployed on the basis of these server nodes, then actual memory space consumption can be more.Such as Fruit further considers a series of problems such as storage hardware, the consumption of computer room, electric quantity consumption, then carrying cost is that comparison is high 's.And with the continuous increase of memory data output, this Cost Problems can also be protruded more.

In view of this, we have designed and Implemented a kind of method for saving distributed file system memory space, Ke Yiyou The consumption problem of effect ground reduction memory space, for that can be greatlyd save in terms of Constructing data center, cloud computing platform construction Cost.

The content of the invention

Instant invention overcomes the deficiencies in the prior art, there is provided a kind of distributed file system.

To solve above-mentioned technical problem, the present invention uses following technical scheme：

A kind of distributed file system, it is characterised in that it include being built in client in the system, name node or Memory node, the client by several access terminals constitute, it is described name node and memory node be single virtual machine, Docker containers or physical server.

The present invention can also provide a kind of method for saving distributed file system memory space, and it comprises the following steps：

Step 1, the heat that cold file area and hot file area are distinguished is set in the configuration file of distributed file system Bottom valve value；

Step 2, the file newly write is stored in the cold file area of distributed file system；

Step 3, the file for newly writing cold file area is stored in memory node and stored by the way of error correcting code；

Step 4, in the file metadata information in name node, increase unit interval, starting access time, access time Number, beginning access time or hot value；

Hot value=access times ÷ (starting access time-starting access time) the ÷ unit interval；

Step 5, as soon as the file of the secondary access of client request, secondary to the access in the file metadata information of this document Number increase once, and newly increases a beginning access time, while calculating the hot value of this document；

Step 6, the hot value of file in step 5 and temperature threshold values are compared, if the hot value of file is more than heat Bottom valve value, then turn to step 7, otherwise turns to step 8；

Step 7, file is moved into hot file area, while randomly choosing the copy of a hot file, returns to client；

Step 8, cold file is searched in memory node, client is returned to after being calculated by cold file and by error correcting code.

It is preferred that, distributed file system is according to different business, the set temperature for distinguishing cold file and hot file Threshold values is different.

It is preferred that, the file of the hot file area is stored by the way of three copies.

Compared with prior art, the beneficial effects of the invention are as follows：

The present invention has very strong practical value, Ke Yiyou for Constructing data center, public cloud or private clound construction aspect Effect ground reduction carrying cost, lifts the utilization rate of memory space.In practice examining we have found that, it is possible to reduce buying storage set Standby quantity more than 30%.

Brief description of the drawings

Fig. 1 is the block schematic illustration of the distributed file system of an embodiment of the present invention.

Fig. 2 is the method flow diagram of the saving distributed file system memory space of an embodiment of the present invention.

Embodiment

The present invention is further elaborated below in conjunction with the accompanying drawings.

A kind of distributed file system as shown in Figure 1, client that it includes being built in the system, name node or Memory node, the client by several access terminals constitute, it is described name node and memory node be single virtual machine, Docker containers or physical server,.

A kind of method of saving distributed file system memory space as shown in Figure 2, it comprises the following steps：

Step 101：Set what cold file area and hot file area were distinguished in the configuration file of distributed file system Temperature threshold values；

Distributed file system is according to different business, and the set cold file of differentiation and the temperature threshold values of hot file are not With, therefore this temperature threshold values is a variable threshold.

Step 102：The file newly write is stored in the cold file area of distributed file system, it is assumed that client request is visited Ask file A；

Step 103：The storage for newly writing the file of cold file area is stored by the way of error correcting code in memory node, To reduce the occupancy of distributed file system memory space, the file stored with error correcting code system, the memory space of occupancy is about For 1.5 times or so of original document memory space；

Step 104：In file metadata information in name node, increase unit interval, starting access time, access Number of times, beginning access time or hot value；

Step 105：As soon as client often asks some file (may be assumed that as A files) of secondary access, to first number of A files It is believed that the access times+1 in breath, and one beginning access time of new record, while calculating and updating hot value；

Step 106：The hot value of A files and temperature threshold values are compared.If the hot value of A files is more than temperature valve Value, then turn to step 107, otherwise turns to step 108.

Step 107：File is moved into hot file area, while randomly choosing the copy of a hot file, client is returned to End.

Step 108：Cold file is searched in memory node, client is returned to after being calculated by cold file and by error correcting code.

Hot file area is stored by the way of three copies, to ensure higher access efficiency.Storing three copies needs original Memory spaces more than beginning file storage three times.

The present invention takes full advantage of the different storage processing modes to cold and hot file, to save distributed text as far as possible The memory space of part system；The temperature of cold and hot file can also be segmented according to different specific business simultaneously, so as to further real With the showing higher efficiency utilization of distributed file system memory space.

Uniform Name term, is defined as distributed file system and is made up of name node and memory node.Name node Have and be tracking node, it is responsible for the NameSpace of storage file system, include the information such as metadata of file, name node is The cluster of multiple servers；And memory node is the server of file system actual storage, it is also the cluster of server.

The essence of the present invention is described in detail above embodiment, but can not be to protection scope of the present invention Limited, it should be apparent that, under the enlightenment of the present invention, the art those of ordinary skill can also carry out many improvement And modification, it should be noted that these are improved and modification all falls within the claims of the present invention.

Claims

1. a kind of distributed file system, it is characterised in that it is including the client being built in the system, name node or deposits Store up node, the client by several access terminals constitute, it is described name node and memory node be single virtual machine, Docker containers or physical server.

2. a kind of method for saving distributed file system memory space, it is characterised in that it comprises the following steps：

Step 1, the temperature valve that cold file area and hot file area are distinguished is set in the configuration file of distributed file system Value；

Step 4, name node in file metadata information in, increase the unit interval, starting access time, access times, Start access time or hot value；

Step 5, as soon as the access times in the file metadata information of this document are increased by the file of the secondary access of client request Plus once, and a beginning access time is newly increased, while calculating the hot value of this document；

Step 6, the hot value of file in step 5 and temperature threshold values are compared, if the hot value of file is more than temperature valve Value, then turn to step 7, otherwise turns to step 8

3. the method according to claim 2 for saving distributed file system memory space, it is characterised in that distributed text Part system is according to different business, and the set cold file of differentiation and the temperature threshold values of hot file are different.

4. the method according to claim 2 for saving distributed file system memory space, it is characterised in that the heat text The file in part region is stored by the way of three copies.