CN111984196B - File migration method, device, equipment and readable storage medium - Google Patents
File migration method, device, equipment and readable storage medium Download PDFInfo
- Publication number
- CN111984196B CN111984196B CN202010850086.2A CN202010850086A CN111984196B CN 111984196 B CN111984196 B CN 111984196B CN 202010850086 A CN202010850086 A CN 202010850086A CN 111984196 B CN111984196 B CN 111984196B
- Authority
- CN
- China
- Prior art keywords
- file
- invalid
- files
- aggregated
- small
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000005012 migration Effects 0.000 title claims abstract description 175
- 238000013508 migration Methods 0.000 title claims abstract description 175
- 238000000034 method Methods 0.000 title claims abstract description 70
- 230000002776 aggregation Effects 0.000 claims abstract description 52
- 238000004220 aggregation Methods 0.000 claims abstract description 52
- 238000012545 processing Methods 0.000 claims description 17
- 238000004590 computer program Methods 0.000 claims description 12
- 239000012634 fragment Substances 0.000 claims description 10
- 238000011084 recovery Methods 0.000 claims description 9
- 230000004044 response Effects 0.000 claims description 4
- 230000008569 process Effects 0.000 abstract description 25
- 238000004140 cleaning Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 238000013467 fragmentation Methods 0.000 description 4
- 238000006062 fragmentation reaction Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 3
- 230000004931 aggregating effect Effects 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000003672 processing method Methods 0.000 description 2
- 239000002699 waste material Substances 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000004883 computer application Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000011010 flushing procedure Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000002085 persistent effect Effects 0.000 description 1
- 238000004064 recycling Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0646—Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
- G06F3/0647—Migration mechanisms
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/0223—User address space allocation, e.g. contiguous or non contiguous base addressing
- G06F12/023—Free address space management
- G06F12/0253—Garbage collection, i.e. reclamation of unreferenced memory
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/16—File or folder operations, e.g. details of user interfaces specifically adapted to file systems
- G06F16/162—Delete operations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/185—Hierarchical storage management [HSM] systems, e.g. file migration or policies thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0638—Organizing or formatting or addressing of data
- G06F3/0643—Management of files
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0638—Organizing or formatting or addressing of data
- G06F3/0644—Management of space entities, e.g. partitions, extents, pools
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0646—Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
- G06F3/0652—Erasing, e.g. deleting, data cleaning, moving of data to a wastebasket
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/0671—In-line storage system
- G06F3/0673—Single storage device
- G06F3/0674—Disk device
- G06F3/0676—Magnetic disk device
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
本发明公开了一种文件迁移方法、装置、设备及可读存储介质,该方法包括:接收文件迁移任务;判断文件迁移任务是否为未完成任务;如果是,则从文件迁移任务对应位于目的存储池中的聚合大文件和各个位于源存储池中的小文件中,确定出无效文件;聚合大文件中具有将各个小文件经聚合而分别对应的迁移小文件;对无效文件进行垃圾回收。该方法可对聚合迁移过程中产生的无效数据进行回收处理,能够提高磁盘利用率。
The invention discloses a file migration method, device, equipment and readable storage medium. The method includes: receiving a file migration task; judging whether the file migration task is an unfinished task; Invalid files are determined from the aggregated large files in the pool and each of the small files in the source storage pool; the aggregated large files have migrated small files corresponding to each small file after aggregation; the invalid files are garbage collected. The method can recycle the invalid data generated during the aggregation migration process, and can improve the disk utilization rate.
Description
技术领域technical field
本发明涉及存储技术领域,特别是涉及一种文件迁移方法、装置、设备及可读存储介质。The present invention relates to the field of storage technologies, and in particular, to a file migration method, apparatus, device and readable storage medium.
背景技术Background technique
由于磁盘读写大文件的速度往往要显著高于小文件。为了利用这一特性,海量小文件应用场景,写小文件时并非直接落盘,而是聚合为大文件之后再进行落盘操作,从而有效降低小文件写入磁盘次数、减轻写数据压力,同时还提高读取命中率并缩短读I/O路径。Because the speed of reading and writing large files from disk is often significantly higher than that of small files. In order to take advantage of this feature, in the application scenario of massive small files, when writing small files, it is not directly placed on the disk, but aggregated into large files and then placed on the disk, thereby effectively reducing the number of times small files are written to disk and reducing the pressure of writing data, and at the same time Also improves read hit rates and shortens read I/O paths.
但是,若文件在迁移聚合过程中,文件分级迁移客户端挂掉,就会导致整个迁移聚合过程无法完成,从而产生无效的垃圾数据。无效的垃圾数据会占用存储空间,导致存储利用率低的问题。However, if the file grading migration client hangs up during the file migration and aggregation process, the entire migration and aggregation process cannot be completed, resulting in invalid garbage data. Invalid junk data will occupy storage space, resulting in low storage utilization.
综上所述,如何有效地解决小文件迁移过程的磁盘清理等问题,是目前本领域技术人员急需解决的技术问题。To sum up, how to effectively solve the problems such as disk cleanup in the process of small file migration is a technical problem that those skilled in the art need to solve urgently.
发明内容SUMMARY OF THE INVENTION
本发明的目的是提供一种文件迁移方法、装置、设备及可读存储介质,能够对因聚合迁移故障所产生的垃圾数据进行有效回收。The purpose of the present invention is to provide a file migration method, device, device and readable storage medium, which can effectively recover garbage data generated due to aggregate migration failures.
为解决上述技术问题,本发明提供如下技术方案:In order to solve the above-mentioned technical problems, the present invention provides the following technical solutions:
一种文件迁移方法,包括:A file migration method including:
接收文件迁移任务;Receive file migration tasks;
判断所述文件迁移任务是否为未完成任务;Determine whether the file migration task is an unfinished task;
如果是,则从所述文件迁移任务对应位于目的存储池中的聚合大文件和各个位于源存储池中的小文件中,确定出无效文件;所述聚合大文件中具有将各个所述小文件经聚合而分别对应的迁移小文件;If yes, determine an invalid file from the aggregated large file located in the destination storage pool corresponding to the file migration task and each small file located in the source storage pool; The migration small files corresponding to each other after aggregation;
对所述无效文件进行垃圾回收。The invalid files are garbage collected.
优选地,从所述文件迁移任务对应位于目的存储池中的聚合大文件和各个位于源存储池中的小文件中,确定出无效文件,包括:Preferably, the invalid files are determined from the aggregated large files located in the destination storage pool and each of the small files located in the source storage pool corresponding to the file migration task, including:
获取各个所述小文件的聚合属性;Obtain the aggregated attributes of each of the small files;
利用所述聚合属性,从所述聚合大文件和各个所述小文件中,确定出所述无效文件。Using the aggregated attribute, the invalid file is determined from the aggregated large file and each of the small files.
优选地,利用所述聚合属性,从所述聚合大文件和各个所述小文件中,确定出所述无效文件,包括:Preferably, the invalid file is determined from the aggregated large file and each of the small files by using the aggregated attribute, including:
判断所述聚合属性是否变化;Determine whether the aggregation attribute changes;
如果是,则确定对应的所述小文件为所述无效文件;If yes, then determine that the corresponding small file is the invalid file;
如果否,则确定存储于所述聚合大文件中对应的所述迁移小文件为无效文件。If not, it is determined that the corresponding migrated small file stored in the aggregated large file is an invalid file.
优选地,对所述无效文件进行垃圾回收,包括:Preferably, garbage collection is performed on the invalid files, including:
若所述无效文件为所述小文件,则删除所述源存储池中的所述小文件;If the invalid file is the small file, delete the small file in the source storage pool;
若所述无效文件为所述迁移小文件,则对所述聚合大文件进行碎片回收处理。If the invalid file is the migrated small file, perform fragment recovery processing on the aggregated large file.
优选地,对所述聚合大文件进行碎片回收处理,包括:Preferably, performing fragment recovery processing on the aggregated large file, including:
获取所述聚合大文件的文件头信息;obtain the file header information of the aggregated large file;
利用所述文件头信息,确定出所述聚合大文件中的无效数据比例;Using the file header information, determine the proportion of invalid data in the aggregated large file;
在所述无效数据比例大于阈值的情况下,将所述聚合大文件中的有效数据迁移至目标聚合大文件中并删除所述聚合大文件。In the case that the invalid data ratio is greater than the threshold, the valid data in the aggregated large file is migrated to the target aggregated large file and the aggregated large file is deleted.
优选地,对所述聚合大文件进行碎片回收处理,包括:Preferably, performing fragment recovery processing on the aggregated large file, including:
获取所述聚合大文件中各个文件对应的聚合属性;obtaining the aggregated attributes corresponding to each file in the aggregated large file;
将所述聚合属性未发生变化的文件确定为无效数据,并统计无效数据比例;Determining the files whose aggregated attributes have not changed as invalid data, and count the proportion of invalid data;
在所述无效数据比例大于阈值的情况下,将所述聚合大文件中的有效数据迁移至目标聚合大文件中并删除所述聚合大文件。In the case that the invalid data ratio is greater than the threshold, the valid data in the aggregated large file is migrated to the target aggregated large file and the aggregated large file is deleted.
优选地,所述接收文件迁移任务,包括:Preferably, the receiving file migration task includes:
接收元数据服务器发送的所述文件迁移任务;receiving the file migration task sent by the metadata server;
相应地,在完成对所述无效文件进行垃圾回收的情况下,还包括:Correspondingly, when the garbage collection of the invalid file is completed, the method further includes:
向所述元数据服务器反馈清理应答数据。Feedback cleanup response data to the metadata server.
一种文件迁移装置,包括:A file migration device, comprising:
任务接收模块,用于接收文件迁移任务;The task receiving module is used to receive file migration tasks;
判断模块,用于判断所述文件迁移任务是否为未完成任务;a judging module for judging whether the file migration task is an unfinished task;
无效文件确定模块,用于如果判断结果为是,则从所述文件迁移任务对应位于目的存储池中的聚合大文件和各个位于源存储池中的小文件中,确定出无效文件;所述聚合大文件具有将各个所述小文件经聚合而分别对应的迁移小文件;an invalid file determination module, configured to determine invalid files from the aggregated large files located in the destination storage pool and each of the small files located in the source storage pool corresponding to the file migration task if the judgment result is yes; the aggregated The large file has migration small files corresponding to each of the small files by aggregation;
垃圾回收处理模块,用于对所述无效文件进行垃圾回收。A garbage collection processing module, configured to perform garbage collection on the invalid files.
一种文件迁移设备,包括:A file migration device including:
存储器,用于存储计算机程序;memory for storing computer programs;
处理器,用于执行所述计算机程序时实现上述文件迁移方法的步骤。The processor is configured to implement the steps of the above file migration method when executing the computer program.
一种可读存储介质,所述可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现上述文件迁移方法的步骤。A readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, implements the steps of the above-mentioned file migration method.
应用本发明实施例所提供的方法,接收文件迁移任务;判断文件迁移任务是否为未完成任务;如果是,则从文件迁移任务对应位于目的存储池中的聚合大文件和各个位于源存储池中的小文件中,确定出无效文件;聚合大文件中具有将各个小文件经聚合而分别对应的迁移小文件;对无效文件进行垃圾回收。Applying the method provided by the embodiment of the present invention, receiving a file migration task; judging whether the file migration task is an unfinished task; if so, corresponding to the aggregated large file located in the destination storage pool and each located in the source storage pool from the file migration task In the small files, invalid files are determined; the aggregated large files have migration small files corresponding to each small file after aggregation; garbage collection is performed on the invalid files.
可以理解的是,文件分级迁移客户端发生故障,对应处理的迁移任务便无法完成,此时,若小文件已经写入聚合大文件,此时可能会出现同一个小文件既在源存储池中,以及目的存储池的聚合大文件中。对应聚合迁移过程,同一个文件既在源存储池,也在目标存储池,显然存在无效文件。基于此,在本方法中,在接收到文件迁移任务后,首先确定文件迁移任务是否为未完成任务,如果是未完成任务,则会对应于该文件迁移任务会存在垃圾数据。此时可,从该文件迁移任务对应的目标存储池中的聚合大文件和各个位于源存储池中的小文件中确定出无效文件;其中,聚合大文件中具有将各个小文件经聚合而分别对应的迁移小文件。然后,针对无效文件进行回收。如此,便可对聚合迁移过程中产生的无效数据进行回收处理,能够提高磁盘利用率。It is understandable that if the file grading migration client fails, the corresponding migration task cannot be completed. At this time, if the small file has been written to the aggregated large file, the same small file may appear in the source storage pool. , and the aggregated large files of the destination storage pool. Corresponding to the aggregation migration process, the same file exists in both the source storage pool and the target storage pool. Obviously, there are invalid files. Based on this, in this method, after receiving the file migration task, it is first determined whether the file migration task is an unfinished task, and if it is an unfinished task, there will be garbage data corresponding to the file migration task. At this time, invalid files can be determined from the aggregated large files in the target storage pool corresponding to the file migration task and each of the small files in the source storage pool; wherein, the aggregated large files have the ability to aggregate each small file to separate The corresponding migration small file. Then, recycle the invalid files. In this way, the invalid data generated during the aggregation migration process can be recycled, which can improve the disk utilization rate.
相应地,本发明实施例还提供了与上述文件迁移方法相对应的文件迁移装置、设备和可读存储介质,具有上述技术效果,在此不再赘述。Correspondingly, the embodiments of the present invention also provide a file migration apparatus, a device, and a readable storage medium corresponding to the above-mentioned file migration method, which have the above-mentioned technical effects, and are not repeated here.
附图说明Description of drawings
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to explain the embodiments of the present invention or the technical solutions in the prior art more clearly, the following briefly introduces the accompanying drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only These are some embodiments of the present invention. For those of ordinary skill in the art, other drawings can also be obtained according to these drawings without creative efforts.
图1为本发明实施例中一种文件迁移方法的实施流程图;FIG. 1 is an implementation flowchart of a file migration method in an embodiment of the present invention;
图2为本发明实施例中一种文件迁移装置的结构示意图;2 is a schematic structural diagram of a file migration device according to an embodiment of the present invention;
图3为本发明实施例中一种文件迁移设备的结构示意图;3 is a schematic structural diagram of a file migration device in an embodiment of the present invention;
图4为本发明实施例中一种文件迁移设备的具体结构示意图。FIG. 4 is a schematic diagram of a specific structure of a file migration device according to an embodiment of the present invention.
具体实施方式Detailed ways
为了使本技术领域的人员更好地理解本发明方案,下面结合附图和具体实施方式对本发明作进一步的详细说明。显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。In order to make those skilled in the art better understand the solution of the present invention, the present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments. Obviously, the described embodiments are only some, but not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.
为便于理解,下面针对本发明实施例中涉及的技术用语进行解释说明:For ease of understanding, the technical terms involved in the embodiments of the present invention are explained below:
小文件,即指大小较小的文件,例如可以定义小于1M的文件为小文件,也可以定义小于512K的文件为小文件。A small file refers to a file with a small size. For example, a file smaller than 1M can be defined as a small file, and a file smaller than 512K can also be defined as a small file.
聚合大文件,即将批量的小文件进行聚合后,得到的大文件。例如,聚合大文件的大小可以为大小上限为512M的大文件。Aggregate large files, that is, large files obtained by aggregating batches of small files. For example, the size of the aggregated large file may be a large file with a maximum size of 512M.
Backend:文件分级迁移客户端,负责接收MDS端发送的迁移请求,执行数据从源存储池迁移到目的存储池,并根据聚合条件执行相关的迁移聚合流程;Backend: The file grading migration client is responsible for receiving the migration request sent by the MDS, executing data migration from the source storage pool to the destination storage pool, and executing the relevant migration aggregation process according to the aggregation conditions;
MDS:元数据服务器,维护文件系统所有文件的元数据信息,同时负责组织待迁移聚合的小文件,发送到Backend端;MDS: Metadata server, which maintains the metadata information of all files in the file system, and is also responsible for organizing the small files to be migrated and aggregated and sent to the Backend;
分布式文件系统:多个文件存储节点服务器构成的集群,文件切块存储,以对象为基本单位,支持一份数据存储在多个节点上,每个节点通过节点间通信都可以获取到完整的数据,当节点出现宕机时根据配置的策略可以进行完整数据的恢复,具有高可用、高性能、高扩展性等特点,其中每个节点都提供元数据服务即MDS,用于元数据的各种访问操作,均衡业务压力;Distributed file system: a cluster composed of multiple file storage node servers, files are stored in blocks, with an object as the basic unit, supporting a piece of data to be stored on multiple nodes, and each node can obtain complete data through inter-node communication. Data, when the node is down, the complete data can be recovered according to the configured strategy, which has the characteristics of high availability, high performance, and high scalability. Each node provides metadata services, namely MDS, for metadata A variety of access operations to balance business pressure;
文件分级:文件分级即是文件迁移功能,通过配置相应的策略,文件可在不同池子间流转,通过指定相关的聚合策略,可以在迁移的时候写入新池子时先执行聚合,再写入新池子;File classification: File classification is the file migration function. By configuring the corresponding policy, files can be transferred between different pools. By specifying the relevant aggregation policy, the aggregation can be performed first when writing to a new pool during migration, and then the new pool can be written. pond;
小文件聚合:将批量的小文件聚合成一个大文件,由于磁盘读写大文件的速度往往要显著高于小文件。为了利用这一特性,海量小文件应用场景,写小文件时并非直接落盘,而是合并为大文件之后再进行落盘操作,从而有效降低小文件写入磁盘次数、减轻写数据压力,同时还提高读取命中率并缩短读I/O 路径。Small file aggregation: Aggregate batches of small files into a large file, because the speed of reading and writing large files from disk is often significantly higher than that of small files. In order to take advantage of this feature, in the application scenario of massive small files, when writing small files, it is not directly placed on the disk, but merged into large files and then placed on the disk, thereby effectively reducing the number of small files written to disk and reducing the pressure of writing data, and at the same time Also improves read hit rates and shortens read I/O paths.
请参考图1,图1为本发明实施例中一种文件迁移方法的流程图,该方法包括以下步骤:Please refer to FIG. 1. FIG. 1 is a flowchart of a file migration method according to an embodiment of the present invention. The method includes the following steps:
S101、接收文件迁移任务。S101. Receive a file migration task.
具体的,该文件分级迁移客户端故障恢复后,元数据服务器可向文件分级迁移客户端发送文件迁移任务。也就是说,文件迁移任务可以为新任务也可以为因故障而未完成的任务。Specifically, after the file grading migration client recovers from a fault, the metadata server may send a file migration task to the file grading migration client. That is, a file migration task can be a new task or a task that has not been completed due to a failure.
在本发明实施例中,文件分级迁移客户端可以接收元数据服务器发送的文件迁移任务。具体的,元数据服务器可以在检查到故障恢复后的文件分级迁移客户端具有未完成迁移任务的情况下,向文件分级迁移客户端发送文件迁移任务。In this embodiment of the present invention, the file level migration client may receive the file migration task sent by the metadata server. Specifically, the metadata server may send a file migration task to the file classification migration client when it is detected that the file classification migration client after the failure recovery has an unfinished migration task.
也就是说,在本发明实施例中,MDS在发送迁移聚合任务后,只有文件整个迁移聚合过程都完成,MDS才认为迁移聚合完成,否则一直处于未完成状态,等重新发起分级迁移聚合操作时,会首先将未完成的任务再次启动,即 MDS向Backend发送文件迁移任务。That is to say, in this embodiment of the present invention, after the MDS sends the migration aggregation task, the MDS considers the migration aggregation to be completed only if the entire file migration aggregation process is completed. , the unfinished task will be restarted first, that is, the MDS will send a file migration task to Backend.
S102、判断文件迁移任务是否为未完成任务。S102. Determine whether the file migration task is an unfinished task.
在文件迁移任务中,可以携带未完成文件迁移任务标识,以及迁移的小文件标识,以便文件分级迁移客户端基于这些标识信息确定该未完成文件迁移任务对应的无效文件。In the file migration task, the unfinished file migration task identifier and the migrated small file identifier can be carried, so that the file grading migration client can determine the invalid file corresponding to the unfinished file migration task based on the identifier information.
基于未完成文件迁移任务标识的有无便可确定该文件迁移任务是否为未完成。如果是,则执行步骤S103的操作,如果否,则正常处理该文件迁移任务,如执行S105的操作。Whether the file migration task is incomplete can be determined based on the presence or absence of the flag of the incomplete file migration task. If yes, execute the operation of step S103; if not, process the file migration task normally, such as execute the operation of S105.
当然,判断文件迁移任务是否为未完成任务,还可通过记录的文件迁移任务记录表来确定是否为未完成任务。Of course, to determine whether the file migration task is an unfinished task, it can also be determined through the recorded file migration task record table whether it is an unfinished task.
S103、从文件迁移任务对应位于目的存储池中的聚合大文件和各个位于源存储池中的小文件中,确定出无效文件。S103: Determine invalid files from the file migration task corresponding to the aggregated large file located in the destination storage pool and each of the small files located in the source storage pool.
其中,聚合大文件中具有将各个小文件经聚合而分别对应的迁移小文件。Wherein, the aggregated large file includes migration small files corresponding to each small file after aggregation.
文件聚合迁移,即将批量小文件聚合为大文件然后进行文件迁移。File aggregation migration, that is, aggregating batches of small files into large files and then performing file migration.
为便于理解本发明实施例所提供的技术方案,下面针对小文件聚合流程进行举例说明。小文件聚合迁移过程包括以下步骤:In order to facilitate the understanding of the technical solutions provided by the embodiments of the present invention, an example is given below for the small file aggregation process. The small file aggregation migration process includes the following steps:
步骤1、MDS(元数据服务器)发送迁移聚合任务,Backend(文件分级迁移客户端)接收到文件迁移任务;Step 1. The MDS (metadata server) sends the migration aggregation task, and the Backend (file grading migration client) receives the file migration task;
步骤2、Backend将从源存储池中读取该小文件的数据;Step 2. Backend will read the data of the small file from the source storage pool;
步骤3、Backend将读取的数据写入聚合缓存中;Step 3. Backend writes the read data into the aggregate cache;
步骤4、Backend聚合缓存达到下刷条件,即多个小文件写入聚合缓存达到一定大小时写入目的存储池中的磁盘;Step 4. The Backend aggregate cache reaches the flushing condition, that is, when multiple small files are written into the aggregate cache to reach a certain size, they are written to the disk in the destination storage pool;
步骤5、Backend写入完成后,设置文件的聚合属性,即聚合大文件。在聚合大文件中的偏移、所在数据池,并发送给MDS;Step 5. After backend writing is completed, set the aggregation properties of the file, that is, aggregate large files. The offset in the aggregated large file, the data pool where it is located, and sent to MDS;
步骤6、MDS设置完成后,应答Backend,Backend收到更新成功聚合属性等元数据后,删除源存储池中小文件的数据;Step 6. After the MDS setting is completed, reply to Backend. After Backend receives metadata such as the successfully updated aggregation attribute, it deletes the data of small files in the source storage pool;
步骤7、完成后应答MDS,表明该文件的迁移聚合过程完成;Step 7. Respond to MDS after completion, indicating that the migration and aggregation process of the file is completed;
小文件在迁移聚合过程中,Backend挂掉,就会导致整个迁移聚合过程无法完成,从而产生无效的垃圾数据,由于迁移聚合过程分几步,任何一步都可能出问题,主要会产生垃圾数据的情况包括但不限于以下两种情况:During the migration and aggregation process of small files, if the Backend hangs up, the entire migration and aggregation process cannot be completed, resulting in invalid garbage data. Since the migration and aggregation process is divided into several steps, problems may occur at any step, mainly resulting in garbage data. The situations include but are not limited to the following two situations:
情况1:小文件已经写入聚合大文件中,但小文件的聚合属性没有修改,此时有效数据为源存储池中的小文件数据,聚合大文件中的该小文件数据为垃圾数据,需要清除掉,否则持续占用空间,浪费系统存储资源;Scenario 1: The small file has been written into the aggregated large file, but the aggregation attribute of the small file has not been modified. At this time, the valid data is the small file data in the source storage pool, and the small file data in the aggregated large file is garbage data. Clear it, otherwise it will continue to occupy space and waste system storage resources;
情况2:小文件已经写入聚合大文件中,但小文件的聚合属性已经修改,有效数据为聚合大文件中的数据,源存储池中的普通小文件数据为垃圾数据,需要清除掉,否则持续占用空间,浪费系统存储资源。Case 2: The small file has been written into the aggregated large file, but the aggregation attribute of the small file has been modified, the valid data is the data in the aggregated large file, and the ordinary small file data in the source storage pool is garbage data, which needs to be cleared, otherwise Continue to occupy space and waste system storage resources.
基于此,在本发明实施例中,可以从文件迁移任务对应位于目的存储池中的聚合大文件和各个位于源存储池中的小文件中,确定出无效文件。无效文件对应无效数据。根据小文件聚合迁移流程分析可知,无效文件可以为源存储池中的小文件,也可以为目标存储池中的聚合文件。Based on this, in the embodiment of the present invention, an invalid file may be determined from the aggregated large file located in the destination storage pool corresponding to the file migration task and each small file located in the source storage pool. Invalid files correspond to invalid data. According to the analysis of the small file aggregation migration process, the invalid files can be small files in the source storage pool or aggregated files in the target storage pool.
具体的,确定无效文件的过程,包括:Specifically, the process of determining invalid files includes:
步骤一、获取各个小文件的聚合属性。Step 1: Obtain the aggregated attributes of each small file.
步骤二、利用聚合属性,从聚合大文件和各个小文件中,确定出无效文件。Step 2: Use the aggregation attribute to determine invalid files from the aggregated large files and individual small files.
具体的,可以从元数据服务器中获取小文件的聚合属性。即该聚合属性可作为元数据进行存储。当然,该聚合属性也可作为小文件的标签信息,通过读取对应的标签信息也可获取小文件的聚合属性。Specifically, the aggregated attributes of the small files can be obtained from the metadata server. That is, the aggregated attribute can be stored as metadata. Of course, the aggregated attribute can also be used as the tag information of the small file, and the aggregated attribute of the small file can also be obtained by reading the corresponding tag information.
其中,小文件的聚合属性可具体包括该小文件在聚合大文件中的偏移、所在数据池。The aggregated attribute of the small file may specifically include the offset of the small file in the aggregated large file and the data pool where the small file is located.
其中步骤二,可具体包括:The second step may specifically include:
步骤2.1、判断聚合属性是否变化;Step 2.1. Determine whether the aggregation attribute changes;
步骤2.2、如果是,则确定对应的小文件为无效文件;Step 2.2. If yes, then determine that the corresponding small file is an invalid file;
步骤2.3、如果否,则确定存储于聚合大文件中对应的迁移小文件为无效文件。Step 2.3: If no, determine that the corresponding migrated small file stored in the aggregated large file is an invalid file.
文件分级迁移客户端得到小文件的聚合属性之后,可以判断存储于不同位置的各个小文件是否为无效文件。具体的,由于聚合迁移过程中,若小文件完成写入聚合大文件中,则聚合属性会发送变化,因此可以根据聚合属性是否变化,确定小文件是否有效。After the file grading migration client obtains the aggregated attributes of the small files, it can determine whether each small file stored in different locations is an invalid file. Specifically, during the aggregation migration process, if the small file is written into the large aggregated file, the aggregation attribute will be changed, so whether the small file is valid can be determined according to whether the aggregation attribute changes.
在本发明实施例中为便于区别迁移前后的小文件,将迁移前位于源存储池中的小文件依然称之为小文件;将迁移后位于目的存储池中聚合大文件中的小文件称之为迁移小文件。目的存储池中的一个迁移小文件与源存储池中的一个小文件对应。In the embodiment of the present invention, in order to distinguish the small files before and after the migration, the small files located in the source storage pool before the migration are still called small files; the small files located in the aggregated large files in the destination storage pool after the migration are called as small files For migrating small files. A small file to migrate in the destination storage pool corresponds to a small file in the source storage pool.
具体的,当小文件的聚合属性发生了变化,则确定对应源存储池中的小文件无效,该小文件为无效小文件。当小文件的聚合属性未发生变化,则可能是该小文件写入聚合大文件的过程未完成,即此时聚合大文件中对应的迁移小文件可能存在数据内容缺失的情况,因而可确定迁移小文件无效。Specifically, when the aggregate attribute of the small file changes, it is determined that the small file in the corresponding source storage pool is invalid, and the small file is an invalid small file. When the aggregate attribute of the small file does not change, it may be that the process of writing the small file to the aggregated large file has not been completed, that is, the corresponding migrated small file in the aggregated large file may have missing data content, so it can be determined that the migration Small files are invalid.
需要说明的是,在本发明实施例中,判断文件是否有效,是针对同一个小文件的不同位置而言进行判断的。例如,对于小文件a,其即存在于源存储池中(将其称之为a1),又存在于目的存储池的聚合大文件中的迁移小文件(将其称之为a2),此时判断文件是否有效,是判断a1和a2哪一个是有效的,若 a1有效,即小文件有效,而a2无效;若a2有效,即迁移小文件有效,则a 无效。也就是说,若存在对应的a1和a2,其中必然有一个文件无效。It should be noted that, in this embodiment of the present invention, the determination of whether a file is valid is performed with respect to different locations of the same small file. For example, for the small file a, it exists in the source storage pool (call it a1), and also exists in the migrated small file (call it a2) in the aggregated large file of the destination storage pool, at this time To judge whether the file is valid, it is to judge which one of a1 and a2 is valid. If a1 is valid, the small file is valid, but a2 is invalid; if a2 is valid, that is, the migration small file is valid, then a is invalid. That is to say, if there are corresponding a1 and a2, one of the files must be invalid.
S104、对无效文件进行垃圾回收。S104. Perform garbage collection on invalid files.
在本发明实施例中,对无效文件进行垃圾回收,即解除无效文件对磁盘的占用。In the embodiment of the present invention, garbage collection is performed on invalid files, that is, the occupation of the disk by the invalid files is released.
具体的,对于同一个小文件而言,若该小文件既在源存储池,又存在目的存储池(将存在于目的存储池的小文件称之为迁移小文件),那么必然有一个存储池中的小文件是无效的。也就是说,无效文件可以为源存储池中的小文件,也可以为目的存储至中聚合大文件中的迁移小文件。Specifically, for the same small file, if the small file exists in both the source storage pool and the destination storage pool (the small file existing in the destination storage pool is called a migrated small file), then there must be a storage pool. Small files in are invalid. That is to say, the invalid file can be a small file in the source storage pool, or it can be a migrated small file stored in the aggregated large file for the purpose.
优选地,对于不同存储位置的无效文件,可采用不同的处理方式进行垃圾回收。具体情况包括:Preferably, for invalid files in different storage locations, different processing methods can be used to perform garbage collection. Specific circumstances include:
情况1:若无效文件为小文件,则删除源存储池中的小文件;Case 1: If the invalid file is a small file, delete the small file in the source storage pool;
情况2:若无效文件为迁移小文件,则对聚合大文件进行碎片回收处理。Case 2: If the invalid file is a small migration file, the aggregated large file will be fragmented and recycled.
也就是说,若源存储池中的小文件无效,此时可直接删除原存储至中的小文件。若无效文件为目的存储池中聚合大文件中的迁移小文件,此时为了避免因删除无效的迁移小文件导致出现大量磁盘碎片,可以采用碎片回收处理方式去除聚合大文件中的迁移小文件。That is to say, if the small files in the source storage pool are invalid, you can directly delete the small files in the original storage pool. If the invalid file is the migrated small file in the aggregated large file in the destination storage pool, in order to avoid a large number of disk fragments caused by deleting the invalid migrated small file, you can use the fragment recovery method to remove the migrated small file in the aggregated large file.
需要说明的是,对聚合大文件进行碎片回收处理的方式包括但不限于以下两种具体实现方式:It should be noted that the methods for performing fragment recovery processing on aggregated large files include but are not limited to the following two specific implementation methods:
方式1:基于文件头信息进行碎片清理,具体实现步骤包括:Mode 1: Perform fragmentation cleaning based on file header information. The specific implementation steps include:
步骤2.1.1、获取聚合大文件的文件头信息;Step 2.1.1. Obtain the file header information of the aggregated large file;
步骤2.1.2、利用文件头信息,确定出聚合大文件中的无效数据比例;Step 2.1.2. Use the file header information to determine the proportion of invalid data in the aggregated large file;
步骤2.1.3、在无效数据比例大于阈值的情况下,将聚合大文件中的有效数据迁移至目标聚合大文件中并删除聚合大文件。Step 2.1.3. When the proportion of invalid data is greater than the threshold, migrate the valid data in the aggregated large file to the target aggregated large file and delete the aggregated large file.
一般来说,一个文件聚合迁移任务在迁移的任意时刻都可能会因故障中断,因此对应的需迁移的小文件对应的有效数据可能一部分在目的存储池中,一部分在源存储池中。相应地,聚合大文件中的迁移小文件也会有部份为有效数据,而部分为失效数据。为了减少重复迁移次数,在本发明实施例中,可以在聚合大文件中的无效部分达到一定门限之后,再统一进行碎片清理。Generally speaking, a file aggregation migration task may be interrupted by a fault at any time during the migration. Therefore, the valid data corresponding to the corresponding small files to be migrated may be partly in the destination storage pool and partly in the source storage pool. Correspondingly, some of the migrated small files in the aggregated large files are valid data and some are invalid data. In order to reduce the number of repeated migrations, in this embodiment of the present invention, after the invalid part in the aggregated large file reaches a certain threshold, fragmentation may be cleaned up uniformly.
具体的,可利用文件头信息确定出聚合大文件中的无效数据比例,然后在该无效数据比例大于阈值的情况下,再将聚合大文件中的有效数据迁移至目标聚合大文件中,并删除聚合大文件。其中,阈值可以根据实际应用的需求进行设置和调整,例如可设置为80%,75%等具体数值。具体来说,该阈值越高,垃圾回收效果越好,但可能会带来更多的数据迁移次数;该阈值越低,聚合大文件中的垃圾碎片越多,但会减少因垃圾回收带来的数据迁移。Specifically, the file header information can be used to determine the proportion of invalid data in the aggregated large file, and then when the proportion of invalid data is greater than the threshold, the valid data in the aggregated large file is migrated to the target aggregated large file and deleted. Aggregate large files. Among them, the threshold value can be set and adjusted according to the requirements of the actual application, for example, it can be set to specific values such as 80% and 75%. Specifically, the higher the threshold, the better the garbage collection effect, but it may bring more data migration times; the lower the threshold, the more garbage fragments in the aggregated large files, but it will reduce the amount of garbage collection caused by garbage collection. data migration.
方式2:基于小文件的聚合属性进行碎片清理,具体实现步骤包括:Method 2: Perform fragmentation cleaning based on the aggregated attributes of small files. The specific implementation steps include:
步骤2.2.1、获取聚合大文件中各个文件对应的聚合属性;Step 2.2.1. Obtain the aggregation attributes corresponding to each file in the aggregation large file;
步骤2.2.2、将聚合属性未发生变化的文件确定为无效数据,并统计无效数据比例;Step 2.2.2. Determine the files whose aggregate attributes have not changed as invalid data, and count the proportion of invalid data;
步骤2.2.2、在无效数据比例大于阈值的情况下,将聚合大文件中的有效数据迁移至目标聚合大文件中并删除聚合大文件。Step 2.2.2. When the proportion of invalid data is greater than the threshold, migrate the valid data in the aggregated large file to the target aggregated large file and delete the aggregated large file.
从上文可知,聚合属性可以确定同一个小文件是在源存储池中的有效,还是在目的存储池中的有效,因而可以基于聚合属性确定出聚合大文件中的无线数据,并统计得到该无效数据比例。得到无效数据比例后,后续的处理方式可具体参照上述方式1。As can be seen from the above, the aggregation attribute can determine whether the same small file is valid in the source storage pool or in the destination storage pool. Therefore, the wireless data in the aggregated large file can be determined based on the aggregation attribute, and the statistics can be obtained. Invalid data ratio. After the invalid data ratio is obtained, the subsequent processing method may refer to the above method 1 for details.
特别地,在无效数据比例小于等于阈值的情况下,可无需对聚合大文件进行碎片处理。In particular, in the case where the proportion of invalid data is less than or equal to the threshold, it may not be necessary to perform fragmentation processing on the aggregated large files.
优选地,在完成数据清理后,便可恢复正常的文件迁移流程。具体的,即在完成对无效文件进行垃圾回收的情况下,向元数据服务器反馈清理应答数据。如此,元数据服务器便可明确数据清理工作已完成,可以下发新的文件聚合迁移任务。Preferably, after the data cleaning is completed, the normal file migration process can be resumed. Specifically, in the case of completing the garbage collection for invalid files, feedback cleanup response data to the metadata server. In this way, the metadata server can confirm that the data cleaning has been completed, and can issue a new file aggregation migration task.
S105、执行文件迁移任务。S105. Execute a file migration task.
即从源存储池中读取对应的小文件,多个对小文件进行聚合,得到聚合大文件,将聚合大文件迁移至目的存储池。That is, the corresponding small files are read from the source storage pool, multiple small files are aggregated, large aggregated files are obtained, and the aggregated large files are migrated to the destination storage pool.
应用本发明实施例所提供的方法,接收文件迁移任务;判断文件迁移任务是否为未完成任务;如果是,则从文件迁移任务对应位于目的存储池中的聚合大文件和各个位于源存储池中的小文件中,确定出无效文件;聚合大文件中具有将各个小文件经聚合而分别对应的迁移小文件;对无效文件进行垃圾回收。Applying the method provided by the embodiment of the present invention, receiving a file migration task; judging whether the file migration task is an unfinished task; if so, corresponding to the aggregated large file located in the destination storage pool and each located in the source storage pool from the file migration task In the small files, invalid files are determined; the aggregated large files have migration small files corresponding to each small file after aggregation; garbage collection is performed on the invalid files.
可以理解的是,文件分级迁移客户端发生故障,对应处理的迁移任务便无法完成,此时,若小文件已经写入聚合大文件,此时可能会出现同一个小文件既在源存储池中,以及目的存储池的聚合大文件中。对应聚合迁移过程,同一个文件既在源存储池,也在目标存储池,显然存在无效文件。基于此,在本方法中,在接收到文件迁移任务后,首先确定文件迁移任务是否为未完成任务,如果是未完成任务,则会对应于该文件迁移任务会存在垃圾数据。此时可,从该文件迁移任务对应的目标存储池中的聚合大文件和各个位于源存储池中的小文件中确定出无效文件;其中,聚合大文件中具有将各个小文件经聚合而分别对应的迁移小文件。然后,针对无效文件进行回收。如此,便可对聚合迁移过程中产生的无效数据进行回收处理,能够提高磁盘利用率。It is understandable that if the file grading migration client fails, the corresponding migration task cannot be completed. At this time, if the small file has been written to the aggregated large file, the same small file may appear in the source storage pool. , and the aggregated large files of the destination storage pool. Corresponding to the aggregation migration process, the same file exists in both the source storage pool and the target storage pool. Obviously, there are invalid files. Based on this, in this method, after receiving the file migration task, it is first determined whether the file migration task is an unfinished task, and if it is an unfinished task, there will be garbage data corresponding to the file migration task. At this time, invalid files can be determined from the aggregated large files in the target storage pool corresponding to the file migration task and each of the small files in the source storage pool; wherein, the aggregated large files have the ability to aggregate each small file to separate The corresponding migration small file. Then, recycle the invalid files. In this way, invalid data generated during the aggregation migration process can be recycled, which can improve disk utilization.
相应于上面的方法实施例,本发明实施例还提供了一种文件迁移装置,下文描述的文件迁移装置与上文描述的文件迁移方法可相互对应参照。Corresponding to the above method embodiments, the embodiments of the present invention further provide a file migration apparatus, and the file migration apparatus described below and the file migration method described above may refer to each other correspondingly.
参见图2所示,该装置包括以下模块:Referring to Figure 2, the device includes the following modules:
任务接收模块101,用于接收文件迁移任务;a
判断模块102,用于判断文件迁移任务是否为未完成任务;Judging
无效文件确定模块103,用于如果判断结果为是,则从文件迁移任务对应位于目的存储池中的聚合大文件和各个位于源存储池中的小文件中,确定出无效文件;聚合大文件具有将各个小文件经聚合而分别对应的迁移小文件;The invalid
垃圾回收处理模块104,用于对无效文件进行垃圾回收。The garbage
应用本发明实施例所提供的装置,接收文件迁移任务;判断文件迁移任务是否为未完成任务;如果是,则从文件迁移任务对应位于目的存储池中的聚合大文件和各个位于源存储池中的小文件中,确定出无效文件;聚合大文件中具有将各个小文件经聚合而分别对应的迁移小文件;对无效文件进行垃圾回收。Apply the device provided by the embodiment of the present invention to receive a file migration task; determine whether the file migration task is an unfinished task; if so, the slave file migration task corresponds to the aggregated large files located in the destination storage pool and each of them located in the source storage pool. In the small files, invalid files are determined; the aggregated large files have migration small files corresponding to each small file after aggregation; garbage collection is performed on the invalid files.
可以理解的是,文件分级迁移客户端发生故障,对应处理的迁移任务便无法完成,此时,若小文件已经写入聚合大文件,此时可能会出现同一个小文件既在源存储池中,以及目的存储池的聚合大文件中。对应聚合迁移过程,同一个文件既在源存储池,也在目标存储池,显然存在无效文件。基于此,在本方法中,在接收到文件迁移任务后,首先确定文件迁移任务是否为未完成任务,如果是未完成任务,则会对应于该文件迁移任务会存在垃圾数据。此时可,从该文件迁移任务对应的目标存储池中的聚合大文件和各个位于源存储池中的小文件中确定出无效文件;其中,聚合大文件中具有将各个小文件经聚合而分别对应的迁移小文件。然后,针对无效文件进行回收。如此,便可对聚合迁移过程中产生的无效数据进行回收处理,能够提高磁盘利用率。It is understandable that if the file grading migration client fails, the corresponding migration task cannot be completed. At this time, if the small file has been written to the aggregated large file, the same small file may appear in the source storage pool. , and the aggregated large files of the destination storage pool. Corresponding to the aggregation migration process, the same file exists in both the source storage pool and the target storage pool. Obviously, there are invalid files. Based on this, in this method, after receiving the file migration task, it is first determined whether the file migration task is an unfinished task, and if it is an unfinished task, there will be garbage data corresponding to the file migration task. At this time, invalid files can be determined from the aggregated large files in the target storage pool corresponding to the file migration task and each of the small files in the source storage pool; wherein, the aggregated large files have the ability to aggregate each small file to separate The corresponding migration small file. Then, recycle the invalid files. In this way, the invalid data generated during the aggregation migration process can be recycled, which can improve the disk utilization rate.
在本发明的一种具体实施方式中,无效文件确定模块103,具体用于获取各个小文件的聚合属性;利用聚合属性,从聚合大文件和各个小文件中,确定出无效文件。In a specific embodiment of the present invention, the invalid
在本发明的一种具体实施方式中,无效文件确定模块103,具体用于判断聚合属性是否变化;如果是,则确定对应的小文件为无效文件;如果否,则确定存储于聚合大文件中对应的迁移小文件为无效文件。In a specific embodiment of the present invention, the invalid
在本发明的一种具体实施方式中,垃圾回收处理模块104,具体用于若无效文件为小文件,则删除源存储池中的小文件;若无效文件为迁移小文件,则对聚合大文件进行碎片回收处理。In a specific embodiment of the present invention, the garbage
在本发明的一种具体实施方式中,垃圾回收处理模块104,具体用于获取聚合大文件的文件头信息;利用文件头信息,确定出聚合大文件中的无效数据比例;在无效数据比例大于阈值的情况下,将聚合大文件中的有效数据迁移至目标聚合大文件中并删除聚合大文件。In a specific embodiment of the present invention, the garbage
在本发明的一种具体实施方式中,垃圾回收处理模块104,具体用于获取聚合大文件中各个文件对应的聚合属性;将聚合属性未发生变化的文件确定为无效数据,并统计无效数据比例;在无效数据比例大于阈值的情况下,将聚合大文件中的有效数据迁移至目标聚合大文件中并删除聚合大文件。In a specific embodiment of the present invention, the garbage
在本发明的一种具体实施方式中,任务接收模块101,具体用于接收元数据服务器发送的文件迁移任务;In a specific embodiment of the present invention, the
相应地,还包括:清理反馈模块,用于在完成对无效文件进行垃圾回收的情况下,向元数据服务器反馈清理应答数据。Correspondingly, it also includes: a cleanup feedback module, configured to feed back cleanup response data to the metadata server when garbage collection of invalid files is completed.
相应于上面的方法实施例,本发明实施例还提供了一种文件迁移设备,下文描述的一种文件迁移设备与上文描述的一种文件迁移方法可相互对应参照。Corresponding to the above method embodiments, the embodiments of the present invention further provide a file migration device, and a file migration device described below and a file migration method described above may refer to each other correspondingly.
参见图3所示,该文件迁移设备包括:Referring to Figure 3, the file migration device includes:
存储器332,用于存储计算机程序;
处理器322,用于执行计算机程序时实现上述方法实施例的文件迁移方法的步骤。The
具体的,请参考图4,图4为本实施例提供的一种文件迁移设备的具体结构示意图,该文件迁移设备可因配置或性能不同而产生比较大的差异,可以包括一个或一个以上处理器(central processing units,CPU)322(例如,一个或一个以上处理器)和存储器332,存储器332存储有一个或一个以上的计算机应用程序342或数据344。其中,存储器332可以是短暂存储或持久存储。存储在存储器332的程序可以包括一个或一个以上模块(图示没标出),每个模块可以包括对数据处理设备中的一系列指令操作。更进一步地,中央处理器322可以设置为与存储器332通信,在文件迁移设备301上执行存储器332中的一系列指令操作。Specifically, please refer to FIG. 4. FIG. 4 is a schematic diagram of a specific structure of a file migration device provided in this embodiment. The file migration device may have a relatively large difference due to different configurations or performances, and may include one or more processes. Central processing units (CPUs) 322 (eg, one or more processors) and
文件迁移设备301还可以包括一个或一个以上电源326,一个或一个以上有线或无线网络接口350,一个或一个以上输入输出接口358,和/或,一个或一个以上操作系统341。
上文所描述的文件迁移方法中的步骤可以由文件迁移设备的结构实现。The steps in the file migration method described above can be implemented by the structure of the file migration device.
相应于上面的方法实施例,本发明实施例还提供了一种可读存储介质,下文描述的一种可读存储介质与上文描述的一种文件迁移方法可相互对应参照。Corresponding to the above method embodiments, the embodiments of the present invention further provide a readable storage medium, and a readable storage medium described below and a file migration method described above may refer to each other correspondingly.
一种可读存储介质,可读存储介质上存储有计算机程序,计算机程序被处理器执行时实现上述方法实施例的文件迁移方法的步骤。A readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, implements the steps of the file migration method in the above method embodiment.
该可读存储介质具体可以为U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可存储程序代码的可读存储介质。The readable storage medium may specifically be a USB flash drive, a mobile hard disk, a read-only memory (Read-Only Memory, ROM), a random access memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, etc. that can store program codes. Readable storage medium.
本领域技术人员还可以进一步意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。本领域技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本发明的范围。Those skilled in the art may further realize that the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, computer software, or a combination of the two, in order to clearly illustrate the hardware and software In the above description, the components and steps of each example have been generally described according to their functions. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may use different methods of implementing the described functionality for each particular application, but such implementations should not be considered beyond the scope of the present invention.
Claims (8)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010850086.2A CN111984196B (en) | 2020-08-21 | 2020-08-21 | File migration method, device, equipment and readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010850086.2A CN111984196B (en) | 2020-08-21 | 2020-08-21 | File migration method, device, equipment and readable storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111984196A CN111984196A (en) | 2020-11-24 |
CN111984196B true CN111984196B (en) | 2022-08-19 |
Family
ID=73443100
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010850086.2A Active CN111984196B (en) | 2020-08-21 | 2020-08-21 | File migration method, device, equipment and readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111984196B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112631991B (en) * | 2020-12-26 | 2024-07-05 | 中国农业银行股份有限公司 | File migration method and device |
CN113704027B (en) * | 2021-10-29 | 2022-02-18 | 苏州浪潮智能科技有限公司 | File aggregation compatible method, apparatus, computer device and storage medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107643880A (en) * | 2017-09-27 | 2018-01-30 | 郑州云海信息技术有限公司 | The method and device of file data migration based on distributed file system |
CN109471836A (en) * | 2018-11-01 | 2019-03-15 | 浪潮电子信息产业股份有限公司 | Data migration method, device and system |
CN111176571A (en) * | 2019-12-27 | 2020-05-19 | 浪潮电子信息产业股份有限公司 | A local object management method, device, device and medium |
-
2020
- 2020-08-21 CN CN202010850086.2A patent/CN111984196B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107643880A (en) * | 2017-09-27 | 2018-01-30 | 郑州云海信息技术有限公司 | The method and device of file data migration based on distributed file system |
CN109471836A (en) * | 2018-11-01 | 2019-03-15 | 浪潮电子信息产业股份有限公司 | Data migration method, device and system |
CN111176571A (en) * | 2019-12-27 | 2020-05-19 | 浪潮电子信息产业股份有限公司 | A local object management method, device, device and medium |
Also Published As
Publication number | Publication date |
---|---|
CN111984196A (en) | 2020-11-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10261853B1 (en) | Dynamic replication error retry and recovery | |
US8332367B2 (en) | Parallel data redundancy removal | |
US10970190B2 (en) | Hybrid log viewer with thin memory usage | |
CN107402722B (en) | A data migration method and storage device | |
CN110196836A (en) | A kind of date storage method and device | |
CN106802840A (en) | A kind of virtual machine backup, restoration methods and device | |
US8621143B2 (en) | Elastic data techniques for managing cache storage using RAM and flash-based memory | |
US10031948B1 (en) | Idempotence service | |
CN109918352B (en) | Memory system and method of storing data | |
CN113568566A (en) | Method, host device and storage server for seamless migration of simple storage service by using index object | |
CN111984196B (en) | File migration method, device, equipment and readable storage medium | |
CN107977167A (en) | Optimization method is read in a kind of degeneration of distributed memory system based on correcting and eleting codes | |
CN116700634A (en) | Garbage recycling method and device for distributed storage system and distributed storage system | |
CN104035728A (en) | Hard disk hot plug handling method, device and node | |
CN109976896A (en) | Business re-scheduling treating method and apparatus | |
CN104915376A (en) | Cloud storage file archiving and compressing method | |
CN117632860A (en) | Method and device for merging small files based on Flink engine and electronic equipment | |
CN118152224A (en) | Distributed training method and platform based on GPU cluster, and electronic equipment | |
CN109947704B (en) | Lock type switching method and device and cluster file system | |
CN108121514B (en) | Meta-information updating method, apparatus, computing device and computer storage medium | |
CN116360680A (en) | Method and system for performing replication recovery operations in a storage system | |
CN117667299A (en) | Virtual machine migration method, chip, network card, processing equipment, system and medium | |
WO2020215223A1 (en) | Distributed storage system and garbage collection method used in distributed storage system | |
CN113867626A (en) | Method, system, equipment and storage medium for optimizing performance of storage system | |
CN107357536A (en) | Distributed memory system data modification write method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |