CN107169056A - Distributed file system and the method for saving distributed file system memory space - Google Patents
Distributed file system and the method for saving distributed file system memory space Download PDFInfo
- Publication number
- CN107169056A CN107169056A CN201710287520.9A CN201710287520A CN107169056A CN 107169056 A CN107169056 A CN 107169056A CN 201710287520 A CN201710287520 A CN 201710287520A CN 107169056 A CN107169056 A CN 107169056A
- Authority
- CN
- China
- Prior art keywords
- file
- hot
- cold
- file system
- client
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/182—Distributed file systems
- G06F16/1824—Distributed file systems implemented using Network-attached Storage [NAS] architecture
- G06F16/1827—Management specifically adapted to NAS
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/17—Details of further file system functions
- G06F16/172—Caching, prefetching or hoarding of files
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Multi Processors (AREA)
Abstract
The invention discloses a kind of method of saving distributed file system memory space, its step is followed successively by:Set the temperature threshold values that cold file and hot file are distinguished;The file newly write is put into cold file area;The storage of cold file is stored by the way of error correcting code;File metadata information increase starting access time, access times, beginning access time, hot value;To the processing of name node when client is often asked one time;The hot value and temperature threshold values of file are compared.If greater than temperature threshold values, then turn file and move into hot file area, return to a copy to client, otherwise turn to search cold file, client is returned to after being calculated by error correcting code;Open distributed file system simultaneously, the system it include the client, name node or the memory node that are built in the system;The present invention takes full advantage of the different storage processing modes to cold and hot file, to save the memory space of distributed file system as far as possible.
Description
Technical field
The present invention relates to field of computer technology, and in particular to a kind of distributed file system and saving distributed field system
The method of system memory space.
Background technology
Computer is managed and data storage by file system.With the fast development of Internet technology, people can be with
The data of the acquisition gradually growth of exponentially again, the storage for extending computer file system by increasing hard disk number merely is held
The mode of amount, the performance in terms of amount of capacity, capacity growth rate, data backup, data safety is all barely satisfactory.And divide
Cloth file system can effectively solve storage and the management problemses of data:Some file system in some place will be fixed on,
Any number of places/multiple file system are expanded to, numerous nodes constitute a Filesystem Network.Each node can divide
Cloth carries out the communication between node and data transfer in different places by network.People using distributed file system when,
It need not be concerned about which node is data be stored on or be the acquisition file from which node, it is only necessary to as using local
File system is managed and the data in storage file system like that.These advantages of distributed file system cause it soon
Large-scale application is obtained.
And common distributed file system generally employ three copy mechanism to ensure the reliabilty and availability of data.
That is, each is stored in the file of distributed file system, three parts of copies are actually all stored.Three copy mechanism are not
The reliability of data is improve only, when a certain part loss of data, data recovery can be carried out using other two parts of copies, and
Three copies can play the function of relatively good load balancing.However, the cost of this method is higher, the memory space that it is consumed
It is three times of actual storage capacity.If when the disk array of server employs similar RAID5 etc technology, and
Distributed file system is deployed on the basis of these server nodes, then actual memory space consumption can be more.Such as
Fruit further considers a series of problems such as storage hardware, the consumption of computer room, electric quantity consumption, then carrying cost is that comparison is high
's.And with the continuous increase of memory data output, this Cost Problems can also be protruded more.
In view of this, we have designed and Implemented a kind of method for saving distributed file system memory space, Ke Yiyou
The consumption problem of effect ground reduction memory space, for that can be greatlyd save in terms of Constructing data center, cloud computing platform construction
Cost.
The content of the invention
Instant invention overcomes the deficiencies in the prior art, there is provided a kind of distributed file system.
To solve above-mentioned technical problem, the present invention uses following technical scheme:
A kind of distributed file system, it is characterised in that it include being built in client in the system, name node or
Memory node, the client by several access terminals constitute, it is described name node and memory node be single virtual machine,
Docker containers or physical server.
The present invention can also provide a kind of method for saving distributed file system memory space, and it comprises the following steps:
Step 1, the heat that cold file area and hot file area are distinguished is set in the configuration file of distributed file system
Bottom valve value;
Step 2, the file newly write is stored in the cold file area of distributed file system;
Step 3, the file for newly writing cold file area is stored in memory node and stored by the way of error correcting code;
Step 4, in the file metadata information in name node, increase unit interval, starting access time, access time
Number, beginning access time or hot value;
Hot value=access times ÷ (starting access time-starting access time) the ÷ unit interval;
Step 5, as soon as the file of the secondary access of client request, secondary to the access in the file metadata information of this document
Number increase once, and newly increases a beginning access time, while calculating the hot value of this document;
Step 6, the hot value of file in step 5 and temperature threshold values are compared, if the hot value of file is more than heat
Bottom valve value, then turn to step 7, otherwise turns to step 8;
Step 7, file is moved into hot file area, while randomly choosing the copy of a hot file, returns to client;
Step 8, cold file is searched in memory node, client is returned to after being calculated by cold file and by error correcting code.
It is preferred that, distributed file system is according to different business, the set temperature for distinguishing cold file and hot file
Threshold values is different.
It is preferred that, the file of the hot file area is stored by the way of three copies.
Compared with prior art, the beneficial effects of the invention are as follows:
The present invention has very strong practical value, Ke Yiyou for Constructing data center, public cloud or private clound construction aspect
Effect ground reduction carrying cost, lifts the utilization rate of memory space.In practice examining we have found that, it is possible to reduce buying storage set
Standby quantity more than 30%.
Brief description of the drawings
Fig. 1 is the block schematic illustration of the distributed file system of an embodiment of the present invention.
Fig. 2 is the method flow diagram of the saving distributed file system memory space of an embodiment of the present invention.
Embodiment
The present invention is further elaborated below in conjunction with the accompanying drawings.
A kind of distributed file system as shown in Figure 1, client that it includes being built in the system, name node or
Memory node, the client by several access terminals constitute, it is described name node and memory node be single virtual machine,
Docker containers or physical server,.
A kind of method of saving distributed file system memory space as shown in Figure 2, it comprises the following steps:
Step 101:Set what cold file area and hot file area were distinguished in the configuration file of distributed file system
Temperature threshold values;
Distributed file system is according to different business, and the set cold file of differentiation and the temperature threshold values of hot file are not
With, therefore this temperature threshold values is a variable threshold.
Step 102:The file newly write is stored in the cold file area of distributed file system, it is assumed that client request is visited
Ask file A;
Step 103:The storage for newly writing the file of cold file area is stored by the way of error correcting code in memory node,
To reduce the occupancy of distributed file system memory space, the file stored with error correcting code system, the memory space of occupancy is about
For 1.5 times or so of original document memory space;
Step 104:In file metadata information in name node, increase unit interval, starting access time, access
Number of times, beginning access time or hot value;
Hot value=access times ÷ (starting access time-starting access time) the ÷ unit interval;
Step 105:As soon as client often asks some file (may be assumed that as A files) of secondary access, to first number of A files
It is believed that the access times+1 in breath, and one beginning access time of new record, while calculating and updating hot value;
Step 106:The hot value of A files and temperature threshold values are compared.If the hot value of A files is more than temperature valve
Value, then turn to step 107, otherwise turns to step 108.
Step 107:File is moved into hot file area, while randomly choosing the copy of a hot file, client is returned to
End.
Step 108:Cold file is searched in memory node, client is returned to after being calculated by cold file and by error correcting code.
Hot file area is stored by the way of three copies, to ensure higher access efficiency.Storing three copies needs original
Memory spaces more than beginning file storage three times.
The present invention takes full advantage of the different storage processing modes to cold and hot file, to save distributed text as far as possible
The memory space of part system;The temperature of cold and hot file can also be segmented according to different specific business simultaneously, so as to further real
With the showing higher efficiency utilization of distributed file system memory space.
Uniform Name term, is defined as distributed file system and is made up of name node and memory node.Name node
Have and be tracking node, it is responsible for the NameSpace of storage file system, include the information such as metadata of file, name node is
The cluster of multiple servers;And memory node is the server of file system actual storage, it is also the cluster of server.
The essence of the present invention is described in detail above embodiment, but can not be to protection scope of the present invention
Limited, it should be apparent that, under the enlightenment of the present invention, the art those of ordinary skill can also carry out many improvement
And modification, it should be noted that these are improved and modification all falls within the claims of the present invention.
Claims (4)
1. a kind of distributed file system, it is characterised in that it is including the client being built in the system, name node or deposits
Store up node, the client by several access terminals constitute, it is described name node and memory node be single virtual machine,
Docker containers or physical server.
2. a kind of method for saving distributed file system memory space, it is characterised in that it comprises the following steps:
Step 1, the temperature valve that cold file area and hot file area are distinguished is set in the configuration file of distributed file system
Value;
Step 2, the file newly write is stored in the cold file area of distributed file system;
Step 3, the file for newly writing cold file area is stored in memory node and stored by the way of error correcting code;
Step 4, name node in file metadata information in, increase the unit interval, starting access time, access times,
Start access time or hot value;
Hot value=access times ÷ (starting access time-starting access time) the ÷ unit interval;
Step 5, as soon as the access times in the file metadata information of this document are increased by the file of the secondary access of client request
Plus once, and a beginning access time is newly increased, while calculating the hot value of this document;
Step 6, the hot value of file in step 5 and temperature threshold values are compared, if the hot value of file is more than temperature valve
Value, then turn to step 7, otherwise turns to step 8
Step 7, file is moved into hot file area, while randomly choosing the copy of a hot file, returns to client;
Step 8, cold file is searched in memory node, client is returned to after being calculated by cold file and by error correcting code.
3. the method according to claim 2 for saving distributed file system memory space, it is characterised in that distributed text
Part system is according to different business, and the set cold file of differentiation and the temperature threshold values of hot file are different.
4. the method according to claim 2 for saving distributed file system memory space, it is characterised in that the heat text
The file in part region is stored by the way of three copies.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710287520.9A CN107169056A (en) | 2017-04-27 | 2017-04-27 | Distributed file system and the method for saving distributed file system memory space |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710287520.9A CN107169056A (en) | 2017-04-27 | 2017-04-27 | Distributed file system and the method for saving distributed file system memory space |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107169056A true CN107169056A (en) | 2017-09-15 |
Family
ID=59813801
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710287520.9A Pending CN107169056A (en) | 2017-04-27 | 2017-04-27 | Distributed file system and the method for saving distributed file system memory space |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107169056A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110022338A (en) * | 2018-01-09 | 2019-07-16 | 阿里巴巴集团控股有限公司 | File reading, system, meta data server and user equipment |
CN110554999A (en) * | 2018-05-31 | 2019-12-10 | 华为技术有限公司 | Method and device for identifying and separating cold and hot attributes based on log file system and flash memory device and related products |
CN110830535A (en) * | 2018-08-10 | 2020-02-21 | 网宿科技股份有限公司 | Processing method of super-hot file, load balancing equipment and download server |
CN113760854A (en) * | 2021-09-10 | 2021-12-07 | 北京金山云网络技术有限公司 | Method for identifying data in HDFS memory and related equipment |
CN114422600A (en) * | 2021-12-31 | 2022-04-29 | 成都鲁易科技有限公司 | File scheduling system based on cloud storage and file scheduling method based on cloud storage |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103118133A (en) * | 2013-02-28 | 2013-05-22 | 浙江大学 | Mixed cloud storage method based on file access frequency |
CN103870205A (en) * | 2012-12-11 | 2014-06-18 | 联想(北京)有限公司 | Method and device for storage control and information processing method and device |
CN106570074A (en) * | 2016-10-14 | 2017-04-19 | 深圳前海微众银行股份有限公司 | Distributed database system and implementation method thereof |
-
2017
- 2017-04-27 CN CN201710287520.9A patent/CN107169056A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103870205A (en) * | 2012-12-11 | 2014-06-18 | 联想(北京)有限公司 | Method and device for storage control and information processing method and device |
CN103118133A (en) * | 2013-02-28 | 2013-05-22 | 浙江大学 | Mixed cloud storage method based on file access frequency |
CN106570074A (en) * | 2016-10-14 | 2017-04-19 | 深圳前海微众银行股份有限公司 | Distributed database system and implementation method thereof |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110022338A (en) * | 2018-01-09 | 2019-07-16 | 阿里巴巴集团控股有限公司 | File reading, system, meta data server and user equipment |
CN110022338B (en) * | 2018-01-09 | 2022-05-27 | 阿里巴巴集团控股有限公司 | File reading method and system, metadata server and user equipment |
CN110554999A (en) * | 2018-05-31 | 2019-12-10 | 华为技术有限公司 | Method and device for identifying and separating cold and hot attributes based on log file system and flash memory device and related products |
CN110554999B (en) * | 2018-05-31 | 2023-06-20 | 华为技术有限公司 | Cold and hot attribute identification and separation method and device based on log file system and flash memory device and related products |
CN110830535A (en) * | 2018-08-10 | 2020-02-21 | 网宿科技股份有限公司 | Processing method of super-hot file, load balancing equipment and download server |
CN110830535B (en) * | 2018-08-10 | 2021-03-02 | 网宿科技股份有限公司 | Processing method of super-hot file, load balancing equipment and download server |
US11201914B2 (en) | 2018-08-10 | 2021-12-14 | Wangsu Science & Technology Co., Ltd. | Method for processing a super-hot file, load balancing device and download server |
CN113760854A (en) * | 2021-09-10 | 2021-12-07 | 北京金山云网络技术有限公司 | Method for identifying data in HDFS memory and related equipment |
CN114422600A (en) * | 2021-12-31 | 2022-04-29 | 成都鲁易科技有限公司 | File scheduling system based on cloud storage and file scheduling method based on cloud storage |
CN114422600B (en) * | 2021-12-31 | 2023-11-07 | 成都鲁易科技有限公司 | File scheduling system based on cloud storage and file scheduling method based on cloud storage |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107169056A (en) | Distributed file system and the method for saving distributed file system memory space | |
CN100565512C (en) | Eliminate the system and method for redundant file in the document storage system | |
CN102222085B (en) | Data de-duplication method based on combination of similarity and locality | |
CN101460930B (en) | Maintenance of link level consistency between database and file system | |
CN103116661B (en) | A kind of data processing method of database | |
CN109327539A (en) | A kind of distributed block storage system and its data routing method | |
CN101866305B (en) | Continuous data protection method and system supporting data inquiry and quick recovery | |
CN100583096C (en) | Methods for managing deletion of data | |
CN107844269A (en) | A kind of layering mixing storage system and method based on uniformity Hash | |
CN106446001B (en) | A kind of method and system of the storage file in computer storage medium | |
CN104391930A (en) | Distributed file storage device and method | |
CN104133882A (en) | HDFS (Hadoop Distributed File System)-based old file processing method | |
MX2011010287A (en) | Differential file and system restores from peers and the cloud. | |
CN102855239A (en) | Distributed geographical file system | |
CN103294167B (en) | A kind of low energy consumption cluster-based storage reproducing unit based on data behavior and method | |
CN105320773A (en) | Distributed duplicated data deleting system and method based on Hadoop platform | |
CN103763383A (en) | Integrated cloud storage system and storage method thereof | |
CN103530388A (en) | Performance improving data processing method in cloud storage system | |
CN103455577A (en) | Multi-backup nearby storage and reading method and system of cloud host mirror image file | |
CN104462389A (en) | Method for implementing distributed file systems on basis of hierarchical storage | |
CN106775446A (en) | Based on the distributed file system small documents access method that solid state hard disc accelerates | |
CN103186554A (en) | Distributed data mirroring method and data storage node | |
CN107302561A (en) | A kind of hot spot data Replica placement method in cloud storage system | |
CN102023816A (en) | Object storage policy and access method of object storage system | |
CN107422989B (en) | Server SAN system multi-copy reading method and storage system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170915 |