CN111984196B - File migration method, device, equipment and readable storage medium - Google Patents
File migration method, device, equipment and readable storage medium Download PDFInfo
- Publication number
- CN111984196B CN111984196B CN202010850086.2A CN202010850086A CN111984196B CN 111984196 B CN111984196 B CN 111984196B CN 202010850086 A CN202010850086 A CN 202010850086A CN 111984196 B CN111984196 B CN 111984196B
- Authority
- CN
- China
- Prior art keywords
- file
- invalid
- migration
- small
- aggregated
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000005012 migration Effects 0.000 title claims abstract description 196
- 238000013508 migration Methods 0.000 title claims abstract description 196
- 238000000034 method Methods 0.000 title claims abstract description 66
- 230000002776 aggregation Effects 0.000 claims abstract description 110
- 238000004220 aggregation Methods 0.000 claims abstract description 110
- 238000012545 processing Methods 0.000 claims description 19
- 239000012634 fragment Substances 0.000 claims description 14
- 238000004590 computer program Methods 0.000 claims description 9
- 238000011084 recovery Methods 0.000 claims description 9
- 238000004140 cleaning Methods 0.000 claims description 8
- 238000004064 recycling Methods 0.000 claims description 8
- 230000004044 response Effects 0.000 claims description 4
- 238000013467 fragmentation Methods 0.000 claims description 2
- 238000006062 fragmentation reaction Methods 0.000 claims description 2
- 230000008569 process Effects 0.000 abstract description 22
- 238000006116 polymerization reaction Methods 0.000 abstract description 2
- 230000004931 aggregating effect Effects 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 230000001680 brushing effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000004883 computer application Methods 0.000 description 1
- 235000014510 cooky Nutrition 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000002085 persistent effect Effects 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 230000001052 transient effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0646—Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
- G06F3/0647—Migration mechanisms
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/0223—User address space allocation, e.g. contiguous or non contiguous base addressing
- G06F12/023—Free address space management
- G06F12/0253—Garbage collection, i.e. reclamation of unreferenced memory
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/16—File or folder operations, e.g. details of user interfaces specifically adapted to file systems
- G06F16/162—Delete operations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/185—Hierarchical storage management [HSM] systems, e.g. file migration or policies thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0638—Organizing or formatting or addressing of data
- G06F3/0643—Management of files
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0638—Organizing or formatting or addressing of data
- G06F3/0644—Management of space entities, e.g. partitions, extents, pools
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0646—Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
- G06F3/0652—Erasing, e.g. deleting, data cleaning, moving of data to a wastebasket
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/0671—In-line storage system
- G06F3/0673—Single storage device
- G06F3/0674—Disk device
- G06F3/0676—Magnetic disk device
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a file migration method, a device, equipment and a readable storage medium, wherein the method comprises the following steps: receiving a file migration task; judging whether the file migration task is an unfinished task or not; if so, determining invalid files from the aggregated large files located in the destination storage pool and the small files located in the source storage pool corresponding to the file migration task; the aggregation large file is provided with migration small files corresponding to the small files after aggregation; and carrying out garbage collection on the invalid files. The method can recycle invalid data generated in the polymerization migration process, and can improve the utilization rate of the disk.
Description
Technical Field
The present invention relates to the field of storage technologies, and in particular, to a file migration method, apparatus, device, and readable storage medium.
Background
Since the speed of reading and writing large files on a disk is often significantly higher than that of small files. In order to utilize the characteristic, a large amount of small files are applied to a scene, the small files are not directly landed when being written, but are aggregated into a large file and then landed, so that the times of writing the small files into a disk are effectively reduced, the data writing pressure is reduced, the reading hit rate is improved, and the reading I/O path is shortened.
However, if the file is hung up in the migration aggregation process, the whole migration aggregation process cannot be completed, and thus invalid garbage data is generated. Invalid garbage data occupies a storage space, resulting in a problem of low storage utilization rate.
In summary, how to effectively solve the problems of disk cleaning and the like in the small file migration process is a technical problem that needs to be solved urgently by those skilled in the art at present.
Disclosure of Invention
The invention aims to provide a file migration method, a file migration device, file migration equipment and a readable storage medium, which can effectively recycle garbage data generated by aggregation migration faults.
In order to solve the technical problems, the invention provides the following technical scheme:
a file migration method, comprising:
receiving a file migration task;
judging whether the file migration task is an unfinished task;
if so, determining invalid files from the aggregated large file located in the destination storage pool and the small files located in the source storage pool corresponding to the file migration task; the aggregation large file is provided with migration small files corresponding to the small files respectively after aggregation;
and performing garbage collection on the invalid files.
Preferably, determining an invalid file from the aggregated large file located in the destination storage pool and the small files located in the source storage pools corresponding to the file migration task includes:
acquiring the aggregation attribute of each small file;
and determining the invalid file from the aggregated large file and each small file by using the aggregated attribute.
Preferably, determining the invalid file from the aggregated large file and each of the small files by using the aggregated attribute includes:
judging whether the aggregation attribute changes;
if so, determining the corresponding small file as the invalid file;
if not, determining that the corresponding migration small file stored in the aggregation large file is an invalid file.
Preferably, the garbage collection of the invalid file includes:
if the invalid file is the small file, deleting the small file in the source storage pool;
and if the invalid file is the migrated small file, performing fragment recovery processing on the aggregated large file.
Preferably, the fragment recovery processing is performed on the aggregated large file, and comprises:
acquiring file header information of the aggregated large file;
determining the invalid data proportion in the aggregated large file by using the file header information;
and under the condition that the invalid data proportion is larger than a threshold value, transferring the valid data in the large aggregated file to a target large aggregated file and deleting the large aggregated file.
Preferably, the fragment recovery processing is performed on the aggregated large file, and comprises:
acquiring aggregation attributes corresponding to all files in the large aggregation files;
determining the files with unchanged aggregation attributes as invalid data, and counting the proportion of the invalid data;
and under the condition that the invalid data proportion is larger than a threshold value, transferring the valid data in the large aggregated file to a target large aggregated file and deleting the large aggregated file.
Preferably, the receiving a file migration task includes:
receiving the file migration task sent by a metadata server;
correspondingly, under the condition that the garbage collection of the invalid file is completed, the method further comprises the following steps:
and feeding back cleaning response data to the metadata server.
A file migration apparatus comprising:
the task receiving module is used for receiving a file migration task;
the judging module is used for judging whether the file migration task is an unfinished task;
an invalid file determining module, configured to determine an invalid file from the aggregated large file located in the destination storage pool and the small files located in the source storage pools, which correspond to the file migration task, if the determination result is yes; the aggregation large file is provided with migration small files corresponding to the small files respectively after aggregation;
and the garbage recycling processing module is used for recycling garbage of the invalid files.
A file migration apparatus comprising:
a memory for storing a computer program;
and the processor is used for realizing the steps of the file migration method when executing the computer program.
A readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the file migration method described above.
The method provided by the embodiment of the invention is applied to receive the file migration task; judging whether the file migration task is an unfinished task or not; if so, determining invalid files from the aggregated large files located in the target storage pool and the small files located in the source storage pool corresponding to the file migration task; the aggregation large file is provided with migration small files corresponding to the small files after aggregation; and carrying out garbage collection on the invalid files.
It can be understood that, when a file hierarchical migration client fails, a migration task corresponding to the processing cannot be completed, and at this time, if a small file is already written into an aggregate large file, it may happen that the same small file is both in the source storage pool and in the aggregate large file in the destination storage pool. Corresponding to the aggregation migration process, the same file is in both the source storage pool and the target storage pool, and obviously, an invalid file exists. Based on this, in the method, after receiving the file migration task, it is first determined whether the file migration task is an incomplete task, and if the file migration task is an incomplete task, there will be junk data corresponding to the file migration task. At this time, an invalid file is determined from the aggregated large file in the target storage pool corresponding to the file migration task and each small file located in the source storage pool; the aggregation large file comprises migration small files corresponding to the small files after aggregation. Then, the invalid file is reclaimed. Therefore, invalid data generated in the aggregation migration process can be recycled, and the utilization rate of the disk can be improved.
Accordingly, embodiments of the present invention further provide a file migration apparatus, a device, and a readable storage medium corresponding to the file migration method, which have the above technical effects and are not described herein again.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
FIG. 1 is a flowchart illustrating an implementation of a file migration method according to an embodiment of the present invention;
FIG. 2 is a schematic structural diagram of a file migration apparatus according to an embodiment of the present invention;
FIG. 3 is a schematic structural diagram of a file migration apparatus according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a file migration apparatus in an embodiment of the present invention.
Detailed Description
In order that those skilled in the art will better understand the disclosure, the invention will be described in further detail with reference to the accompanying drawings and specific embodiments. It should be apparent that the described embodiments are only some embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
For convenience of understanding, the following explanation is made for technical terms involved in the embodiments of the present invention:
a small file, i.e. a file with a small size, for example, a file smaller than 1M may be defined as a small file, or a file smaller than 512K may be defined as a small file.
And aggregating the large files, namely aggregating the small files in batches to obtain the large files. For example, the size of the aggregate large file may be a large file with an upper size limit of 512M.
Backend: the file hierarchical migration client is responsible for receiving a migration request sent by the MDS, executing data migration from the source storage pool to the target storage pool, and executing a related migration aggregation process according to the aggregation conditions;
MDS: the metadata server is used for maintaining metadata information of all files of the file system, organizing small files to be migrated and aggregated at the same time, and sending the small files to the Backend end;
distributed file system: the method comprises the steps that a cluster formed by a plurality of file storage node servers is used for storing files in a block mode, an object is used as a basic unit to support one part of data to be stored on a plurality of nodes, each node can acquire complete data through communication among the nodes, when the nodes are down, the complete data can be recovered according to a configured strategy, and the method has the characteristics of high availability, high performance, high expansibility and the like, wherein each node provides metadata service (MDS) for various access operations of the metadata and balancing service pressure;
grading files: the file classification is a file migration function, files can be circulated among different pools by configuring corresponding strategies, and aggregation can be executed firstly and then written into a new pool by specifying related aggregation strategies when the new pool is written into during migration;
aggregation of small files: the small files in batches are aggregated into a large file, and the speed of reading and writing the large file by the disk is usually obviously higher than that of the small files. In order to utilize the characteristic, a large amount of small files are applied to a scene, the small files are not directly landed when being written, but are merged into a large file and then are landed, so that the times of writing the small files into a disk are effectively reduced, the data writing pressure is relieved, the reading hit rate is improved, and the reading I/O path is shortened.
Referring to fig. 1, fig. 1 is a flowchart illustrating a file migration method according to an embodiment of the present invention, where the method includes the following steps:
and S101, receiving a file migration task.
Specifically, after the file hierarchical migration client fails and recovers, the metadata server may send a file migration task to the file hierarchical migration client. That is, the file migration task may be a new task or a task that is not completed due to a failure.
In the embodiment of the invention, the file hierarchical migration client can receive the file migration task sent by the metadata server. Specifically, the metadata server may send the file migration task to the file hierarchical migration client when it is checked that the file hierarchical migration client after the failure recovery has an incomplete migration task.
That is to say, in the embodiment of the present invention, after the MDS sends the migration aggregation task, the MDS considers that the migration aggregation is completed only if the whole migration aggregation process of the file is completed, otherwise, the MDS is in an incomplete state all the time, and when the hierarchical migration aggregation operation is restarted, the incomplete task is started again first, that is, the MDS sends the file migration task to the Backend.
S102, judging whether the file migration task is an unfinished task.
In the file migration task, an identifier of an incomplete file migration task and an identifier of a small file to be migrated may be carried, so that the file hierarchical migration client determines an invalid file corresponding to the incomplete file migration task based on the identifier information.
Whether the file migration task is incomplete can be determined based on the existence of the incomplete file migration task identifier. If so, the operation of step S103 is performed, and if not, the file migration task is processed normally, such as the operation of S105 is performed.
Of course, whether the file migration task is an incomplete task is judged, and whether the file migration task is an incomplete task can be determined through the recorded file migration task record table.
S103, determining invalid files from the aggregated large files located in the target storage pool and the small files located in the source storage pool corresponding to the file migration task.
The aggregation large file comprises migration small files corresponding to the aggregated small files.
And (4) file aggregation migration, namely aggregating small files in batches into large files and then performing file migration.
In order to facilitate understanding of the technical solution provided by the embodiment of the present invention, the following description is made by taking an example of a small file aggregation process. The small file aggregation migration process comprises the following steps:
step 1, an MDS (metadata server) sends a migration aggregation task, and a Backend (file hierarchical migration client) receives the file migration task;
step 2, backup reads the data of the small file from the source storage pool;
step 3, writing the read data into an aggregation buffer by Backend;
step 4, the backup aggregation cache achieves a lower brushing condition, namely a plurality of small files are written into the disk in the target storage pool when the aggregation cache achieves a certain size;
and 5, after the backhaul writing is finished, setting the aggregation attribute of the file, namely aggregating the large file. The offset and the data pool of the aggregation large file are sent to the MDS;
step 6, after the MDS is set, responding to the backup, and after the backup receives the metadata such as the successfully updated aggregation attribute and the like, deleting the data of the small files in the source storage pool;
step 7, responding the MDS after completion, and indicating that the migration polymerization process of the file is completed;
in the migration and aggregation process of the small file, due to the fact that Backend is hung, the whole migration and aggregation process cannot be completed, and therefore invalid junk data is generated, due to the fact that the migration and aggregation process is divided into a plurality of steps, any step may be problematic, and situations which mainly generate the junk data include, but are not limited to, the following two situations:
case 1: the small file is written into the aggregation large file, but the aggregation attribute of the small file is not modified, at this time, the effective data is the small file data in the source storage pool, the small file data in the aggregation large file is garbage data and needs to be removed, otherwise, the small file data continuously occupies space, and the system storage resource is wasted;
case 2: the small file is written into the aggregation large file, but the aggregation attribute of the small file is modified, the valid data is the data in the aggregation large file, the common small file data in the source storage pool is garbage data and needs to be removed, otherwise, the space is continuously occupied, and the system storage resource is wasted.
Based on this, in the embodiment of the present invention, an invalid file may be determined from the aggregate large file located in the destination storage pool and each small file located in the source storage pool corresponding to the file migration task. The invalid file corresponds to invalid data. According to the analysis of the small file aggregation migration flow, the invalid file can be a small file in the source storage pool or an aggregation file in the target storage pool.
Specifically, the process of determining an invalid file includes:
step one, acquiring the aggregation attribute of each small file.
And step two, determining invalid files from the aggregated large files and all the small files by using the aggregation attributes.
Specifically, the aggregation attribute of the small file may be obtained from the metadata server. I.e. the aggregated attribute may be stored as metadata. Of course, the aggregation attribute may also be used as the tag information of the small file, and the aggregation attribute of the small file may also be obtained by reading the corresponding tag information.
The aggregation attribute of the small file may specifically include an offset of the small file in the aggregated large file and a data pool where the small file is located.
Wherein the second step may specifically include:
step 2.1, judging whether the aggregation attribute changes;
step 2.2, if yes, determining the corresponding small file as an invalid file;
and 2.3, if not, determining that the corresponding migration small file stored in the aggregation large file is an invalid file.
After the file classification migration client obtains the aggregation attribute of the small files, whether the small files stored in different positions are invalid or not can be judged. Specifically, in the aggregation migration process, if the small file is written into the aggregated large file, the aggregation attribute will change, so that whether the small file is valid can be determined according to whether the aggregation attribute changes.
In the embodiment of the invention, in order to distinguish the small files before and after migration, the small file in the source storage pool before migration is still called as the small file; the small file in the aggregated large file in the destination storage pool after migration is called a migration small file. One migration minifile in the destination storage pool corresponds to one minifile in the source storage pool.
Specifically, when the aggregation attribute of the small file is changed, it is determined that the small file in the corresponding source storage pool is invalid, and the small file is an invalid small file. When the aggregation attribute of the small file is not changed, it may be that the process of writing the small file into the aggregated large file is not completed, that is, there may be a case where data content of the corresponding migrated small file in the aggregated large file is missing at this time, so that it may be determined that the migrated small file is invalid.
In the embodiment of the present invention, whether a file is valid is determined for different positions of the same small file. For example, for a small file a, which exists in both the source storage pool (referred to as a1) and the aggregate large file in the destination storage pool (referred to as a2), the judgment of whether the file is valid is to judge which of a1 and a2 is valid, and if a1 is valid, the small file is valid, and a2 is invalid; if a2 is valid, i.e. the migration cookie is valid, then a is invalid. That is, if there are corresponding a1 and a2, there must be one file that is invalid.
And S104, performing garbage collection on the invalid files.
In the embodiment of the invention, the invalid file is subjected to garbage collection, namely, the occupation of the invalid file on a disk is eliminated.
Specifically, for the same small file, if the small file exists in both the source storage pool and the destination storage pool (the small file existing in the destination storage pool is called a migration small file), then it is necessary to have the small file in one storage pool invalid. That is, the invalid file may be a small file in the source storage pool, or a migrated small file stored to the middle aggregate large file for the purpose.
Preferably, for invalid files in different storage positions, different processing modes can be adopted for garbage collection. The concrete conditions comprise:
case 1: if the invalid file is a small file, deleting the small file in the source storage pool;
case 2: and if the invalid file is the migration small file, performing fragment recovery processing on the aggregation large file.
That is, if the small file in the source storage pool is invalid, the small file originally stored in the source storage pool can be deleted directly. If the invalid file is a migration small file in the aggregation large file in the destination storage pool, in order to avoid a large number of disk fragments caused by deleting the invalid migration small file, a fragment recovery processing mode may be adopted to remove the migration small file in the aggregation large file.
It should be noted that, the way of performing the fragment recycling process on the aggregated large file includes, but is not limited to, the following two specific implementation ways:
mode 1: based on file header information, fragment cleaning is carried out, and the specific implementation steps comprise:
step 2.1.1, acquiring file header information of the aggregated large file;
step 2.1.2, determining the proportion of invalid data in the aggregated large file by using the file header information;
and 2.1.3, under the condition that the invalid data proportion is larger than a threshold value, transferring the valid data in the aggregated large file to the target aggregated large file and deleting the aggregated large file.
Generally, a file aggregation migration task may be interrupted by a fault at any time of migration, and therefore, valid data corresponding to a small file to be migrated may be partially in a destination storage pool and partially in a source storage pool. Accordingly, the migrated small file in the aggregate large file may also have some valid data and some invalid data. In order to reduce the number of times of repeated migration, in the embodiment of the present invention, after the invalid portion in the aggregated large file reaches a certain threshold, fragment cleaning may be performed uniformly.
Specifically, the file header information can be used to determine the invalid data proportion in the aggregated large file, and then, under the condition that the invalid data proportion is greater than the threshold value, the valid data in the aggregated large file is migrated to the target aggregated large file, and the aggregated large file is deleted. The threshold value may be set and adjusted according to the requirements of the actual application, and may be set to specific values such as 80%, 75%, and the like, for example. Specifically, the higher the threshold value is, the better the garbage collection effect is, but more data migration times may be brought; the lower the threshold, the more garbage fragments in the aggregate large file, but the less data migration due to garbage collection.
Mode 2: fragment cleaning is carried out based on the aggregation attribute of the small files, and the specific implementation steps comprise:
step 2.2.1, acquiring aggregation attributes corresponding to all files in the aggregated large file;
step 2.2.2, determining the file with unchanged aggregation attribute as invalid data, and counting the proportion of the invalid data;
and 2.2.2, under the condition that the invalid data proportion is larger than the threshold value, transferring the valid data in the aggregated large file to the target aggregated large file, and deleting the aggregated large file.
As can be seen from the above, the aggregation attribute may determine whether the same small file is valid in the source storage pool or the destination storage pool, and thus the wireless data in the aggregated large file may be determined based on the aggregation attribute and the invalid data ratio may be statistically obtained. After obtaining the invalid data ratio, the following processing method may specifically refer to the above method 1.
In particular, in the case where the invalid data ratio is equal to or less than the threshold, fragmentation of the aggregate large file may not be necessary.
Preferably, after the data cleaning is completed, the normal file migration process can be resumed. Specifically, in the case of completing garbage collection of the invalid file, the clearing response data is fed back to the metadata server. Therefore, the metadata server can confirm that the data cleaning work is finished and can issue a new file aggregation migration task.
And S105, executing a file migration task.
The method comprises the steps of reading corresponding small files from a source storage pool, aggregating a plurality of small files to obtain an aggregated large file, and migrating the aggregated large file to a target storage pool.
The method provided by the embodiment of the invention is applied to receive the file migration task; judging whether the file migration task is an unfinished task or not; if so, determining invalid files from the aggregated large files located in the destination storage pool and the small files located in the source storage pool corresponding to the file migration task; the aggregation large file is provided with migration small files corresponding to the small files after aggregation; and carrying out garbage collection on the invalid files.
It can be understood that, when a file hierarchical migration client fails, a migration task corresponding to the processing cannot be completed, and at this time, if a small file is already written into an aggregate large file, it may happen that the same small file is both in the source storage pool and in the aggregate large file in the destination storage pool. Corresponding to the aggregation migration process, the same file is in both the source storage pool and the target storage pool, and obviously, an invalid file exists. Based on this, in the method, after receiving the file migration task, it is first determined whether the file migration task is an incomplete task, and if the file migration task is an incomplete task, there will be garbage data corresponding to the file migration task. At this time, an invalid file is determined from the aggregated large file in the target storage pool corresponding to the file migration task and each small file located in the source storage pool; the aggregation large file comprises migration small files corresponding to the aggregated small files. Then, the invalid file is reclaimed. Therefore, invalid data generated in the aggregation migration process can be recycled, and the utilization rate of the disk can be improved.
Corresponding to the above method embodiment, an embodiment of the present invention further provides a file migration apparatus, and the file migration apparatus described below and the file migration method described above may be referred to in a corresponding manner.
Referring to fig. 2, the apparatus includes the following modules:
a task receiving module 101, configured to receive a file migration task;
the judging module 102 is configured to judge whether the file migration task is an unfinished task;
an invalid file determining module 103, configured to determine, if the determination result is yes, an invalid file from the aggregated large file located in the destination storage pool and the small files located in the source storage pools corresponding to the file migration task; the aggregation large file comprises migration small files corresponding to the small files after aggregation;
and the garbage recycling processing module 104 is used for performing garbage recycling on the invalid files.
The device provided by the embodiment of the invention is applied to receive the file migration task; judging whether the file migration task is an unfinished task or not; if so, determining invalid files from the aggregated large files located in the target storage pool and the small files located in the source storage pool corresponding to the file migration task; the aggregation large file is provided with migration small files corresponding to the small files after aggregation; and carrying out garbage collection on the invalid files.
It can be understood that, when a file hierarchical migration client fails, a migration task corresponding to the processing cannot be completed, and at this time, if a small file is already written into an aggregate large file, it may happen that the same small file is both in the source storage pool and in the aggregate large file in the destination storage pool. Corresponding to the aggregation migration process, the same file is in both the source storage pool and the target storage pool, and obviously, an invalid file exists. Based on this, in the method, after receiving the file migration task, it is first determined whether the file migration task is an incomplete task, and if the file migration task is an incomplete task, there will be garbage data corresponding to the file migration task. At this time, an invalid file is determined from the aggregated large file in the target storage pool corresponding to the file migration task and the small files located in the source storage pool; the aggregation large file comprises migration small files corresponding to the aggregated small files. Then, the invalid file is reclaimed. Therefore, invalid data generated in the aggregation migration process can be recycled, and the utilization rate of the disk can be improved.
In a specific embodiment of the present invention, the invalid file determining module 103 is specifically configured to obtain an aggregation attribute of each small file; and determining invalid files from the aggregated large files and the small files by using the aggregation attributes.
In a specific embodiment of the present invention, the invalid file determining module 103 is specifically configured to determine whether an aggregation attribute changes; if so, determining the corresponding small file as an invalid file; and if not, determining that the corresponding migration small file stored in the aggregation large file is an invalid file.
In an embodiment of the present invention, the garbage collection processing module 104 is specifically configured to delete a small file in the source storage pool if the invalid file is a small file; and if the invalid file is the migration small file, performing fragment recovery processing on the aggregation large file.
In a specific embodiment of the present invention, the garbage collection processing module 104 is specifically configured to obtain header information of a large aggregated file; determining the invalid data proportion in the aggregated large file by using the file header information; and under the condition that the invalid data proportion is larger than the threshold value, transferring the valid data in the large aggregated file to the target large aggregated file, and deleting the large aggregated file.
In a specific embodiment of the present invention, the garbage collection processing module 104 is specifically configured to obtain aggregation attributes corresponding to each file in the aggregated large file; determining the files with unchanged aggregation attributes as invalid data, and counting the proportion of the invalid data; and under the condition that the invalid data proportion is larger than the threshold value, transferring the valid data in the large aggregated file to the target large aggregated file, and deleting the large aggregated file.
In a specific embodiment of the present invention, the task receiving module 101 is specifically configured to receive a file migration task sent by a metadata server;
correspondingly, the method also comprises the following steps: and the clearing feedback module is used for feeding back clearing response data to the metadata server under the condition of completing garbage collection of the invalid file.
Corresponding to the above method embodiment, an embodiment of the present invention further provides a file migration apparatus, and a file migration apparatus described below and a file migration method described above may be referred to in a corresponding manner.
Referring to fig. 3, the file migration apparatus includes:
a memory 332 for storing computer programs;
a processor 322, configured to implement the steps of the file migration method of the above-described method embodiments when executing the computer program.
Specifically, referring to fig. 4, fig. 4 is a schematic diagram of a specific structure of a file migration device provided in this embodiment, the file migration device may generate a relatively large difference due to different configurations or performances, and may include one or more processors (CPUs) 322 (e.g., one or more processors) and a memory 332, where the memory 332 stores one or more computer applications 342 or data 344. Memory 332 may be, among other things, transient or persistent storage. The program stored in memory 332 may include one or more modules (not shown), each of which may include a sequence of instructions operating on a data processing device. Still further, the central processor 322 may be configured to communicate with the memory 332 to execute a series of instruction operations in the memory 332 on the file migration device 301.
The file migration apparatus 301 may also include one or more power supplies 326, one or more wired or wireless network interfaces 350, one or more input-output interfaces 358, and/or one or more operating systems 341.
The steps in the file migration method described above may be implemented by the structure of the file migration apparatus.
Corresponding to the above method embodiment, the embodiment of the present invention further provides a readable storage medium, and a readable storage medium described below and a file migration method described above may be referred to in correspondence.
A readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the file migration method of the above-mentioned method embodiments.
The readable storage medium may be a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and various other readable storage media capable of storing program codes.
Those of skill would further appreciate that the various illustrative components and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the components and steps of the various examples have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
Claims (8)
1. A method for file migration, comprising:
receiving a file migration task;
judging whether the file migration task is an unfinished task;
if so, determining invalid files from the aggregated large file located in the destination storage pool and the small files located in the source storage pools corresponding to the migration task; the aggregation large file is provided with migration small files corresponding to the small files respectively after aggregation;
performing garbage collection on the invalid files;
determining invalid files from the aggregated large file located in the destination storage pool and the small files located in the source storage pools corresponding to the file migration task, wherein the determining comprises the following steps:
acquiring the aggregation attribute of each small file;
determining the invalid file from the aggregated large file and each small file by using the aggregation attribute;
determining the invalid file from the aggregated large file and each small file by using the aggregated attribute, wherein the determining the invalid file comprises the following steps of:
judging whether the aggregation attribute changes;
if so, determining the corresponding small file as the invalid file;
if not, determining that the corresponding migration small file stored in the aggregation large file is an invalid file.
2. The file migration method according to claim 1, wherein garbage-collecting the invalid file comprises:
if the invalid file is the small file, deleting the small file in the source storage pool;
and if the invalid file is the small migrated file, performing fragment recovery processing on the large aggregated file.
3. The file migration method according to claim 2, wherein performing fragment recycling processing on the aggregated large file comprises:
acquiring file header information of the aggregated large file;
determining the invalid data proportion in the aggregated large file by using the file header information;
and under the condition that the invalid data proportion is larger than a threshold value, transferring the valid data in the large aggregated file to a target large aggregated file and deleting the large aggregated file.
4. The file migration method according to claim 2, wherein performing a fragmentation recovery process on the aggregated large file comprises:
acquiring aggregation attributes corresponding to all files in the large aggregation files;
determining the files with unchanged aggregation attributes as invalid data, and counting the proportion of the invalid data;
and under the condition that the invalid data proportion is larger than a threshold value, transferring the valid data in the large aggregated file to a target large aggregated file and deleting the large aggregated file.
5. The file migration method according to any one of claims 1 to 4, wherein the receiving of the file migration task includes:
receiving the file migration task sent by a metadata server;
correspondingly, in the case of completing garbage collection of the invalid file, the method further comprises the following steps:
and feeding back cleaning response data to the metadata server.
6. A file migration apparatus, comprising:
the task receiving module is used for receiving a file migration task;
the judging module is used for judging whether the file migration task is an unfinished task;
an invalid file determining module, configured to determine an invalid file from the aggregated large file located in the destination storage pool and the small files located in the source storage pools, which correspond to the file migration task, if the determination result is yes; the aggregation large file is provided with migration small files corresponding to the small files respectively after aggregation;
the garbage recycling processing module is used for carrying out garbage recycling on the invalid files;
an invalid file determining module, configured to obtain an aggregation attribute of each of the small files; determining the invalid file from the aggregated large file and each small file by using the aggregation attribute; determining the invalid file from the aggregated large file and each small file by using the aggregated attribute, wherein the determining the invalid file comprises the following steps of:
judging whether the aggregation attribute changes;
if so, determining the corresponding small file as the invalid file;
if not, determining that the corresponding migration small file stored in the aggregation large file is an invalid file.
7. A file migration apparatus, comprising:
a memory for storing a computer program;
a processor for implementing the steps of the file migration method according to any one of claims 1 to 5 when executing the computer program.
8. A readable storage medium, characterized in that the readable storage medium has stored thereon a computer program which, when being executed by a processor, carries out the steps of the file migration method according to any one of claims 1 to 5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010850086.2A CN111984196B (en) | 2020-08-21 | 2020-08-21 | File migration method, device, equipment and readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010850086.2A CN111984196B (en) | 2020-08-21 | 2020-08-21 | File migration method, device, equipment and readable storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111984196A CN111984196A (en) | 2020-11-24 |
CN111984196B true CN111984196B (en) | 2022-08-19 |
Family
ID=73443100
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010850086.2A Active CN111984196B (en) | 2020-08-21 | 2020-08-21 | File migration method, device, equipment and readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111984196B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112631991B (en) * | 2020-12-26 | 2024-07-05 | 中国农业银行股份有限公司 | File migration method and device |
CN113704027B (en) * | 2021-10-29 | 2022-02-18 | 苏州浪潮智能科技有限公司 | File aggregation compatible method and device, computer equipment and storage medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107643880A (en) * | 2017-09-27 | 2018-01-30 | 郑州云海信息技术有限公司 | The method and device of file data migration based on distributed file system |
CN109471836A (en) * | 2018-11-01 | 2019-03-15 | 浪潮电子信息产业股份有限公司 | Data migration method, device and system |
CN111176571A (en) * | 2019-12-27 | 2020-05-19 | 浪潮电子信息产业股份有限公司 | Method, device, equipment and medium for managing local object |
-
2020
- 2020-08-21 CN CN202010850086.2A patent/CN111984196B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107643880A (en) * | 2017-09-27 | 2018-01-30 | 郑州云海信息技术有限公司 | The method and device of file data migration based on distributed file system |
CN109471836A (en) * | 2018-11-01 | 2019-03-15 | 浪潮电子信息产业股份有限公司 | Data migration method, device and system |
CN111176571A (en) * | 2019-12-27 | 2020-05-19 | 浪潮电子信息产业股份有限公司 | Method, device, equipment and medium for managing local object |
Also Published As
Publication number | Publication date |
---|---|
CN111984196A (en) | 2020-11-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10970190B2 (en) | Hybrid log viewer with thin memory usage | |
CN109582213B (en) | Data reconstruction method and device and data storage system | |
CN108073352B (en) | Virtual disk processing method and device | |
CN111984196B (en) | File migration method, device, equipment and readable storage medium | |
CN111198845B (en) | Data migration method, readable storage medium and computing device | |
CN115061630A (en) | Data migration method, device, equipment and medium | |
CN104035728A (en) | Hard disk hot plug handling method, device and node | |
CN108306780B (en) | Cloud environment-based virtual machine communication quality self-optimization system and method | |
CN107566470B (en) | Method and device for managing virtual machine in cloud data system | |
CN112269763A (en) | File aggregation method and related device | |
CN113885809B (en) | Data management system and method | |
CN111580932B (en) | Virtual machine disk online migration redundancy removal method | |
CN115543222B (en) | Storage optimization method, system, equipment and readable storage medium | |
CN111625506A (en) | Distributed data deleting method, device and equipment based on deleting queue | |
CN110119389B (en) | Writing operation method of virtual machine block equipment, snapshot creation method and device | |
CN111459913A (en) | Capacity expansion method and device of distributed database and electronic equipment | |
CN115599295A (en) | Node capacity expansion method and device of storage system | |
CN111581157B (en) | Object storage platform, object operation method, device and server | |
CN115309336A (en) | Data writing method, cache information updating method and related device | |
CN114625474A (en) | Container migration method and device, electronic equipment and storage medium | |
CN115904211A (en) | Storage system, data processing method and related equipment | |
CN114722261A (en) | Resource processing method and device, electronic equipment and storage medium | |
CN117614973B (en) | File storage method based on multi-cloud architecture | |
CN114281246B (en) | Cloud hard disk online migration method, device and equipment based on cloud management platform | |
WO2024066904A1 (en) | Container creation method, system, and node |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |