[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN107169056A - Distributed file system and the method for saving distributed file system memory space - Google Patents

Distributed file system and the method for saving distributed file system memory space Download PDF

Info

Publication number
CN107169056A
CN107169056A CN201710287520.9A CN201710287520A CN107169056A CN 107169056 A CN107169056 A CN 107169056A CN 201710287520 A CN201710287520 A CN 201710287520A CN 107169056 A CN107169056 A CN 107169056A
Authority
CN
China
Prior art keywords
file
hot
cold
file system
client
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710287520.9A
Other languages
Chinese (zh)
Inventor
李强
王凤琴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan Changhong Electric Co Ltd
Original Assignee
Sichuan Changhong Electric Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan Changhong Electric Co Ltd filed Critical Sichuan Changhong Electric Co Ltd
Priority to CN201710287520.9A priority Critical patent/CN107169056A/en
Publication of CN107169056A publication Critical patent/CN107169056A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • G06F16/1824Distributed file systems implemented using Network-attached Storage [NAS] architecture
    • G06F16/1827Management specifically adapted to NAS
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/172Caching, prefetching or hoarding of files

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Multi Processors (AREA)

Abstract

The invention discloses a kind of method of saving distributed file system memory space, its step is followed successively by:Set the temperature threshold values that cold file and hot file are distinguished;The file newly write is put into cold file area;The storage of cold file is stored by the way of error correcting code;File metadata information increase starting access time, access times, beginning access time, hot value;To the processing of name node when client is often asked one time;The hot value and temperature threshold values of file are compared.If greater than temperature threshold values, then turn file and move into hot file area, return to a copy to client, otherwise turn to search cold file, client is returned to after being calculated by error correcting code;Open distributed file system simultaneously, the system it include the client, name node or the memory node that are built in the system;The present invention takes full advantage of the different storage processing modes to cold and hot file, to save the memory space of distributed file system as far as possible.

Description

Distributed file system and the method for saving distributed file system memory space
Technical field
The present invention relates to field of computer technology, and in particular to a kind of distributed file system and saving distributed field system The method of system memory space.
Background technology
Computer is managed and data storage by file system.With the fast development of Internet technology, people can be with The data of the acquisition gradually growth of exponentially again, the storage for extending computer file system by increasing hard disk number merely is held The mode of amount, the performance in terms of amount of capacity, capacity growth rate, data backup, data safety is all barely satisfactory.And divide Cloth file system can effectively solve storage and the management problemses of data:Some file system in some place will be fixed on, Any number of places/multiple file system are expanded to, numerous nodes constitute a Filesystem Network.Each node can divide Cloth carries out the communication between node and data transfer in different places by network.People using distributed file system when, It need not be concerned about which node is data be stored on or be the acquisition file from which node, it is only necessary to as using local File system is managed and the data in storage file system like that.These advantages of distributed file system cause it soon Large-scale application is obtained.
And common distributed file system generally employ three copy mechanism to ensure the reliabilty and availability of data. That is, each is stored in the file of distributed file system, three parts of copies are actually all stored.Three copy mechanism are not The reliability of data is improve only, when a certain part loss of data, data recovery can be carried out using other two parts of copies, and Three copies can play the function of relatively good load balancing.However, the cost of this method is higher, the memory space that it is consumed It is three times of actual storage capacity.If when the disk array of server employs similar RAID5 etc technology, and Distributed file system is deployed on the basis of these server nodes, then actual memory space consumption can be more.Such as Fruit further considers a series of problems such as storage hardware, the consumption of computer room, electric quantity consumption, then carrying cost is that comparison is high 's.And with the continuous increase of memory data output, this Cost Problems can also be protruded more.
In view of this, we have designed and Implemented a kind of method for saving distributed file system memory space, Ke Yiyou The consumption problem of effect ground reduction memory space, for that can be greatlyd save in terms of Constructing data center, cloud computing platform construction Cost.
The content of the invention
Instant invention overcomes the deficiencies in the prior art, there is provided a kind of distributed file system.
To solve above-mentioned technical problem, the present invention uses following technical scheme:
A kind of distributed file system, it is characterised in that it include being built in client in the system, name node or Memory node, the client by several access terminals constitute, it is described name node and memory node be single virtual machine, Docker containers or physical server.
The present invention can also provide a kind of method for saving distributed file system memory space, and it comprises the following steps:
Step 1, the heat that cold file area and hot file area are distinguished is set in the configuration file of distributed file system Bottom valve value;
Step 2, the file newly write is stored in the cold file area of distributed file system;
Step 3, the file for newly writing cold file area is stored in memory node and stored by the way of error correcting code;
Step 4, in the file metadata information in name node, increase unit interval, starting access time, access time Number, beginning access time or hot value;
Hot value=access times ÷ (starting access time-starting access time) the ÷ unit interval;
Step 5, as soon as the file of the secondary access of client request, secondary to the access in the file metadata information of this document Number increase once, and newly increases a beginning access time, while calculating the hot value of this document;
Step 6, the hot value of file in step 5 and temperature threshold values are compared, if the hot value of file is more than heat Bottom valve value, then turn to step 7, otherwise turns to step 8;
Step 7, file is moved into hot file area, while randomly choosing the copy of a hot file, returns to client;
Step 8, cold file is searched in memory node, client is returned to after being calculated by cold file and by error correcting code.
It is preferred that, distributed file system is according to different business, the set temperature for distinguishing cold file and hot file Threshold values is different.
It is preferred that, the file of the hot file area is stored by the way of three copies.
Compared with prior art, the beneficial effects of the invention are as follows:
The present invention has very strong practical value, Ke Yiyou for Constructing data center, public cloud or private clound construction aspect Effect ground reduction carrying cost, lifts the utilization rate of memory space.In practice examining we have found that, it is possible to reduce buying storage set Standby quantity more than 30%.
Brief description of the drawings
Fig. 1 is the block schematic illustration of the distributed file system of an embodiment of the present invention.
Fig. 2 is the method flow diagram of the saving distributed file system memory space of an embodiment of the present invention.
Embodiment
The present invention is further elaborated below in conjunction with the accompanying drawings.
A kind of distributed file system as shown in Figure 1, client that it includes being built in the system, name node or Memory node, the client by several access terminals constitute, it is described name node and memory node be single virtual machine, Docker containers or physical server,.
A kind of method of saving distributed file system memory space as shown in Figure 2, it comprises the following steps:
Step 101:Set what cold file area and hot file area were distinguished in the configuration file of distributed file system Temperature threshold values;
Distributed file system is according to different business, and the set cold file of differentiation and the temperature threshold values of hot file are not With, therefore this temperature threshold values is a variable threshold.
Step 102:The file newly write is stored in the cold file area of distributed file system, it is assumed that client request is visited Ask file A;
Step 103:The storage for newly writing the file of cold file area is stored by the way of error correcting code in memory node, To reduce the occupancy of distributed file system memory space, the file stored with error correcting code system, the memory space of occupancy is about For 1.5 times or so of original document memory space;
Step 104:In file metadata information in name node, increase unit interval, starting access time, access Number of times, beginning access time or hot value;
Hot value=access times ÷ (starting access time-starting access time) the ÷ unit interval;
Step 105:As soon as client often asks some file (may be assumed that as A files) of secondary access, to first number of A files It is believed that the access times+1 in breath, and one beginning access time of new record, while calculating and updating hot value;
Step 106:The hot value of A files and temperature threshold values are compared.If the hot value of A files is more than temperature valve Value, then turn to step 107, otherwise turns to step 108.
Step 107:File is moved into hot file area, while randomly choosing the copy of a hot file, client is returned to End.
Step 108:Cold file is searched in memory node, client is returned to after being calculated by cold file and by error correcting code.
Hot file area is stored by the way of three copies, to ensure higher access efficiency.Storing three copies needs original Memory spaces more than beginning file storage three times.
The present invention takes full advantage of the different storage processing modes to cold and hot file, to save distributed text as far as possible The memory space of part system;The temperature of cold and hot file can also be segmented according to different specific business simultaneously, so as to further real With the showing higher efficiency utilization of distributed file system memory space.
Uniform Name term, is defined as distributed file system and is made up of name node and memory node.Name node Have and be tracking node, it is responsible for the NameSpace of storage file system, include the information such as metadata of file, name node is The cluster of multiple servers;And memory node is the server of file system actual storage, it is also the cluster of server.
The essence of the present invention is described in detail above embodiment, but can not be to protection scope of the present invention Limited, it should be apparent that, under the enlightenment of the present invention, the art those of ordinary skill can also carry out many improvement And modification, it should be noted that these are improved and modification all falls within the claims of the present invention.

Claims (4)

1. a kind of distributed file system, it is characterised in that it is including the client being built in the system, name node or deposits Store up node, the client by several access terminals constitute, it is described name node and memory node be single virtual machine, Docker containers or physical server.
2. a kind of method for saving distributed file system memory space, it is characterised in that it comprises the following steps:
Step 1, the temperature valve that cold file area and hot file area are distinguished is set in the configuration file of distributed file system Value;
Step 2, the file newly write is stored in the cold file area of distributed file system;
Step 3, the file for newly writing cold file area is stored in memory node and stored by the way of error correcting code;
Step 4, name node in file metadata information in, increase the unit interval, starting access time, access times, Start access time or hot value;
Hot value=access times ÷ (starting access time-starting access time) the ÷ unit interval;
Step 5, as soon as the access times in the file metadata information of this document are increased by the file of the secondary access of client request Plus once, and a beginning access time is newly increased, while calculating the hot value of this document;
Step 6, the hot value of file in step 5 and temperature threshold values are compared, if the hot value of file is more than temperature valve Value, then turn to step 7, otherwise turns to step 8
Step 7, file is moved into hot file area, while randomly choosing the copy of a hot file, returns to client;
Step 8, cold file is searched in memory node, client is returned to after being calculated by cold file and by error correcting code.
3. the method according to claim 2 for saving distributed file system memory space, it is characterised in that distributed text Part system is according to different business, and the set cold file of differentiation and the temperature threshold values of hot file are different.
4. the method according to claim 2 for saving distributed file system memory space, it is characterised in that the heat text The file in part region is stored by the way of three copies.
CN201710287520.9A 2017-04-27 2017-04-27 Distributed file system and the method for saving distributed file system memory space Pending CN107169056A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710287520.9A CN107169056A (en) 2017-04-27 2017-04-27 Distributed file system and the method for saving distributed file system memory space

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710287520.9A CN107169056A (en) 2017-04-27 2017-04-27 Distributed file system and the method for saving distributed file system memory space

Publications (1)

Publication Number Publication Date
CN107169056A true CN107169056A (en) 2017-09-15

Family

ID=59813801

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710287520.9A Pending CN107169056A (en) 2017-04-27 2017-04-27 Distributed file system and the method for saving distributed file system memory space

Country Status (1)

Country Link
CN (1) CN107169056A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110022338A (en) * 2018-01-09 2019-07-16 阿里巴巴集团控股有限公司 File reading, system, meta data server and user equipment
CN110554999A (en) * 2018-05-31 2019-12-10 华为技术有限公司 Method and device for identifying and separating cold and hot attributes based on log file system and flash memory device and related products
CN110830535A (en) * 2018-08-10 2020-02-21 网宿科技股份有限公司 Processing method of super-hot file, load balancing equipment and download server
CN113760854A (en) * 2021-09-10 2021-12-07 北京金山云网络技术有限公司 Method for identifying data in HDFS memory and related equipment
CN114422600A (en) * 2021-12-31 2022-04-29 成都鲁易科技有限公司 File scheduling system based on cloud storage and file scheduling method based on cloud storage

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103118133A (en) * 2013-02-28 2013-05-22 浙江大学 Mixed cloud storage method based on file access frequency
CN103870205A (en) * 2012-12-11 2014-06-18 联想(北京)有限公司 Method and device for storage control and information processing method and device
CN106570074A (en) * 2016-10-14 2017-04-19 深圳前海微众银行股份有限公司 Distributed database system and implementation method thereof

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103870205A (en) * 2012-12-11 2014-06-18 联想(北京)有限公司 Method and device for storage control and information processing method and device
CN103118133A (en) * 2013-02-28 2013-05-22 浙江大学 Mixed cloud storage method based on file access frequency
CN106570074A (en) * 2016-10-14 2017-04-19 深圳前海微众银行股份有限公司 Distributed database system and implementation method thereof

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110022338A (en) * 2018-01-09 2019-07-16 阿里巴巴集团控股有限公司 File reading, system, meta data server and user equipment
CN110022338B (en) * 2018-01-09 2022-05-27 阿里巴巴集团控股有限公司 File reading method and system, metadata server and user equipment
CN110554999A (en) * 2018-05-31 2019-12-10 华为技术有限公司 Method and device for identifying and separating cold and hot attributes based on log file system and flash memory device and related products
CN110554999B (en) * 2018-05-31 2023-06-20 华为技术有限公司 Cold and hot attribute identification and separation method and device based on log file system and flash memory device and related products
CN110830535A (en) * 2018-08-10 2020-02-21 网宿科技股份有限公司 Processing method of super-hot file, load balancing equipment and download server
CN110830535B (en) * 2018-08-10 2021-03-02 网宿科技股份有限公司 Processing method of super-hot file, load balancing equipment and download server
US11201914B2 (en) 2018-08-10 2021-12-14 Wangsu Science & Technology Co., Ltd. Method for processing a super-hot file, load balancing device and download server
CN113760854A (en) * 2021-09-10 2021-12-07 北京金山云网络技术有限公司 Method for identifying data in HDFS memory and related equipment
CN114422600A (en) * 2021-12-31 2022-04-29 成都鲁易科技有限公司 File scheduling system based on cloud storage and file scheduling method based on cloud storage
CN114422600B (en) * 2021-12-31 2023-11-07 成都鲁易科技有限公司 File scheduling system based on cloud storage and file scheduling method based on cloud storage

Similar Documents

Publication Publication Date Title
CN107169056A (en) Distributed file system and the method for saving distributed file system memory space
CN100565512C (en) Eliminate the system and method for redundant file in the document storage system
CN102222085B (en) Data de-duplication method based on combination of similarity and locality
CN101460930B (en) Maintenance of link level consistency between database and file system
CN103116661B (en) A kind of data processing method of database
CN109327539A (en) A kind of distributed block storage system and its data routing method
CN101866305B (en) Continuous data protection method and system supporting data inquiry and quick recovery
CN100583096C (en) Methods for managing deletion of data
CN107844269A (en) A kind of layering mixing storage system and method based on uniformity Hash
CN106446001B (en) A kind of method and system of the storage file in computer storage medium
CN104391930A (en) Distributed file storage device and method
CN104133882A (en) HDFS (Hadoop Distributed File System)-based old file processing method
MX2011010287A (en) Differential file and system restores from peers and the cloud.
CN102855239A (en) Distributed geographical file system
CN103294167B (en) A kind of low energy consumption cluster-based storage reproducing unit based on data behavior and method
CN105320773A (en) Distributed duplicated data deleting system and method based on Hadoop platform
CN103763383A (en) Integrated cloud storage system and storage method thereof
CN103530388A (en) Performance improving data processing method in cloud storage system
CN103455577A (en) Multi-backup nearby storage and reading method and system of cloud host mirror image file
CN104462389A (en) Method for implementing distributed file systems on basis of hierarchical storage
CN106775446A (en) Based on the distributed file system small documents access method that solid state hard disc accelerates
CN103186554A (en) Distributed data mirroring method and data storage node
CN107302561A (en) A kind of hot spot data Replica placement method in cloud storage system
CN102023816A (en) Object storage policy and access method of object storage system
CN107422989B (en) Server SAN system multi-copy reading method and storage system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20170915