WO2024187900A1 - Data storage method, system and device for distributed storage system, and storage medium - Google Patents
Data storage method, system and device for distributed storage system, and storage medium Download PDFInfo
- Publication number
- WO2024187900A1 WO2024187900A1 PCT/CN2023/141779 CN2023141779W WO2024187900A1 WO 2024187900 A1 WO2024187900 A1 WO 2024187900A1 CN 2023141779 W CN2023141779 W CN 2023141779W WO 2024187900 A1 WO2024187900 A1 WO 2024187900A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- hard disk
- data
- cluster
- state
- written
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 67
- 238000013500 data storage Methods 0.000 title claims abstract description 49
- 238000012986 modification Methods 0.000 claims description 51
- 230000004048 modification Effects 0.000 claims description 51
- 238000013136 deep learning model Methods 0.000 claims description 24
- 238000012549 training Methods 0.000 claims description 21
- 238000013508 migration Methods 0.000 claims description 12
- 230000005012 migration Effects 0.000 claims description 12
- 238000004590 computer program Methods 0.000 claims description 11
- 230000008569 process Effects 0.000 claims description 10
- 238000010586 diagram Methods 0.000 description 7
- 101100012902 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) FIG2 gene Proteins 0.000 description 4
- 230000008859 change Effects 0.000 description 4
- 230000006870 function Effects 0.000 description 3
- 238000013403 standard screening design Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 2
- 238000007726 management method Methods 0.000 description 2
- 230000001960 triggered effect Effects 0.000 description 2
- 101001121408 Homo sapiens L-amino-acid oxidase Proteins 0.000 description 1
- 101000827703 Homo sapiens Polyphosphoinositide phosphatase Proteins 0.000 description 1
- 102100026388 L-amino-acid oxidase Human genes 0.000 description 1
- 102100023591 Polyphosphoinositide phosphatase Human genes 0.000 description 1
- 101100233916 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) KAR5 gene Proteins 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000000704 physical effect Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Definitions
- the present application relates to the field of storage technology, and in particular to a data storage method, system, device and storage medium of a distributed storage system.
- Flash media has gone through four generations of development.
- TLC Multiple Level Cell
- QLC Quadrature Level Cell
- each generation has higher data density, cheaper price, lower durability, lower performance, and improved energy efficiency, which are all determined by the physical properties of flash memory.
- QLC has two main disadvantages: low performance and poor durability.
- the poor durability that is, the P/E (Program/Erase cycle) is decreasing rapidly, the current method is to do wear leveling in an SSD (Solid-State Drives) disk.
- the current P/E of QLC has dropped to around 1000, but the effect of the above approach is limited in distributed systems.
- the purpose of the present application is to provide a data storage method, system, device and storage medium of a distributed storage system, so as to effectively improve the durability of hard disks and avoid frequent replacement of bad disks.
- a data storage method for a distributed storage system wherein the distributed storage system includes multiple hard disk clusters, each hard disk cluster includes multiple hard disks of the same model, and each hard disk in the same hard disk cluster is added to the distributed storage system in the same batch.
- the data storage method for the distributed storage system includes:
- the data type of the data to be written is the i-th data type, determining whether there is a hard disk cluster in the i-th state;
- the data to be written is written into the selected hard disk cluster in the i-th state; wherein, among the hard disk clusters in the set N states, the wear degree of the hard disk cluster in the i+1th state is lower than the wear degree of the hard disk cluster in the i-th state, and i is a positive integer.
- the data types of the data to be written are divided into:
- the data type of the data to be written is divided.
- the data type of the data to be written is divided, including:
- the data type of the data to be written is divided into the j-th data type; wherein j is a positive integer and 1 ⁇ j ⁇ N-1;
- the data type of the data to be written is divided into the Nth data type.
- it also includes:
- the file name is used as a training sample, and the modification count of the training sample in the first time period is used as a training label of the training sample to train the preset deep learning model;
- it also includes:
- An adjustment instruction for the jth database is received, and a data item adding operation, a data item deleting operation, and/or a data item modifying operation is performed on the jth database according to the adjustment instruction.
- the first data type is a read-only data type
- the second data type is a cold data type
- the third data type is a hot data type
- the first state of the hard disk cluster is a G readonly state
- the second state of the hard disk cluster is a G cold state
- the third state of the hard disk cluster is a G hot state.
- it also includes:
- the data to be written is of the first data type and it is determined that there is no hard disk cluster in the first state currently, the data to be written is written into a hard disk cluster in the G hot state or the G cold state.
- it also includes:
- the data to be written is of the second data type and it is determined that there is no hard disk cluster in the second state currently, the data to be written is written into the hard disk cluster in the G hot state.
- it also includes:
- selecting a hard disk cluster in the i-th state includes:
- a hard disk cluster in the i-th state is selected.
- selecting a hard disk cluster in the i-th state according to the rule that the lower the wear degree of the hard disk cluster, the higher the priority includes:
- the disk cluster with the smallest cluster busyness VGbusy is selected as the disk cluster in the i-th state;
- the cluster busyness VGbusy of the hard disk cluster represents the value obtained by dividing the current cluster write queue depth VG cur_queue_depth of the hard disk cluster by the maximum cluster write queue depth VG max_queue_depth .
- writing the data to be written into the selected hard disk cluster in the i-th state includes:
- the target hard disk is selected from the selected hard disk cluster in the i-th state
- selecting a target hard disk from the selected hard disk cluster in the i-th state according to the rule that the lower the wear degree of the hard disk, the higher the priority includes:
- the hard disk with the smallest hard disk busyness VDbusy is selected as the target hard disk;
- the hard disk busyness VDbusy of the hard disk represents a value obtained by dividing the current hard disk write queue depth VD cur_queue_depth of the hard disk by the maximum hard disk write queue depth VD max_queue_depth .
- each hard disk cluster in the distributed storage system is arranged in the first medium layer, and an SCM medium layer is also arranged in the distributed storage system to store target type data through the SCM medium layer and process non-block aligned write data through the SCM medium layer.
- the first dielectric layer is a PLC dielectric layer or a QLC dielectric layer.
- each hard disk cluster data is stored in blocks of a set size
- the data storage method of the distributed storage system further includes:
- each block to be migrated in the hard disk cluster is migrated to the hard disk cluster in the (i+1)th state.
- it also includes:
- the K blocks with the largest P/E times in the hard disk cluster in the i-th state are exchanged with the data in the K blocks to be migrated in the hard disk cluster in the i-th state, so as to complete the internal migration of the K blocks to be migrated in the hard disk cluster in the i-th state;
- K represents the number of blocks to be migrated determined in the hard disk cluster in the i-th state.
- a data storage system of a distributed storage system includes multiple hard disk clusters, the hard disk clusters include multiple hard disks of the same model, and each hard disk in the same hard disk cluster is added to the distributed storage system in the same batch, the data storage system of the distributed storage system includes:
- a type classification module used for receiving the data to be written and classifying the data types of the data to be written; wherein, among the set N data types, the estimated number of modifications of the data to be written of the i+1th data type within the first time length is higher than the estimated number of modifications of the data to be written of the ith data type within the first time length, and N is a positive integer not less than 2;
- the hard disk cluster status judgment module is used to judge whether there is a hard disk cluster in the i-th state when the data type of the data to be written is the i-th data type; if there is a hard disk cluster in the i-th state, the hard disk cluster selection module is triggered;
- a hard disk cluster selection module is used to select a hard disk cluster in the i-th state
- the writing module is used to write the data to be written into the selected hard disk cluster in the i-th state; wherein, in the set Among the hard disk clusters in N states, the wear degree of the hard disk cluster in the i+1th state is lower than the wear degree of the hard disk cluster in the ith state, where i is a positive integer.
- a data storage device of a distributed storage system comprising:
- the processor is used to execute a computer program to implement the steps of the data storage method of the distributed storage system as described above.
- a computer non-volatile readable storage medium stores a computer program, which implements the steps of the data storage method of the above-mentioned distributed storage system when executed by a processor.
- the distributed storage system is divided into multiple hard disk clusters. Compared with directly managing each hard disk, the management data required for the hard disk cluster is lower, that is, the amount of metadata, which is easy to implement.
- the hard disk cluster includes multiple hard disks of the same model, and each hard disk in the same hard disk cluster is added to the distributed storage system in the same batch.
- the present application divides the hard disk cluster into N states. After receiving the data to be written, the data type of the data to be written will be divided. When the data to be written is divided into the i-th data type, it is determined whether there is a hard disk cluster in the i-th state at present. If so, a hard disk cluster in the i-th state will be selected. Since different data types reflect the different frequencies of future modification of the data to be written, and specifically, among the N data types, the estimated number of modifications of the data to be written of the i+1th data type within the first time length is higher than the estimated number of modifications of the data to be written of the i-th data type within the first time length.
- the wear degree of the hard disk cluster in the i+1th state is lower than the wear degree of the hard disk cluster in the i-th state. It can be seen that for data that almost does not need to be modified, that is, when the estimated number of modifications of the data to be written within the first time length is very low, the data to be written will be divided into the first data type, and therefore will be written into the hard disk cluster in the first state.
- the wear degree of the hard disk cluster in the first state is the highest, which means that the hard disk cluster in the first state has written a large amount of data, so the data written is data that almost does not need to be modified. Writing to the hard disk will wear out the hard disk, but reading will not.
- the data written to the hard disk cluster with a high degree of wear is data that hardly needs to be modified, even if the hard disk cluster has a high degree of wear, it can still be read, thus giving full play to its residual value.
- the more frequently the data to be written needs to be modified that is, the higher the estimated number of modifications to the data to be written within the first time period, the data to be written will be written to the hard disk cluster with a lower degree of wear, so that the hard disk cluster with a lower degree of wear can be used more fully, thus realizing the global wear leveling of the distributed storage system.
- the present application divides the distributed storage system into multiple hard disk clusters, which is conducive to conveniently realizing global wear leveling of the distributed storage system, thereby ensuring the durability of the hard disks in the distributed storage system and avoiding the frequent replacement of bad disks.
- FIG1 is a flowchart of an implementation method of a distributed storage system in the present application.
- FIG2 is a schematic diagram showing a principle framework for implementing global wear leveling in a specific implementation of the present application
- FIG3 is a schematic diagram of a multi-layer flash memory architecture of a distributed storage system in a specific implementation manner of the present application
- FIG4 is a schematic diagram of the structure of a data storage system of a distributed storage system in the present application.
- FIG5 is a schematic diagram of the structure of a data storage device of a distributed storage system in the present application.
- FIG6 is a schematic diagram of the structure of a computer non-volatile readable storage medium in the present application.
- the core of this application is to provide a data storage method for a distributed storage system, which divides the distributed storage system into multiple hard disk clusters, which is conducive to conveniently realizing global wear leveling of the distributed storage system, thereby ensuring the durability of the hard disks in the distributed storage system and avoiding the frequent replacement of bad disks.
- FIG. 1 is a flowchart of an implementation of a data storage method of a distributed storage system in the present application.
- the distributed storage system includes multiple hard disk clusters, and the hard disk clusters include multiple hard disks of the same model, and each hard disk in the same hard disk cluster is added to the distributed storage system in the same batch.
- the data storage method of the distributed storage system may include the following steps:
- Step S101 receiving data to be written, and classifying the data types of the data to be written;
- the estimated number of modifications to the data to be written of the i+1th data type within the first time period is higher than the estimated number of modifications to the data to be written of the i-th data type within the first time period, and N is a positive integer not less than 2.
- a distributed storage system may include multiple hard disk clusters, and one hard disk cluster may include multiple hard disks of the same model.
- the number and models of hard disks included can be different, that is, the disks can be from different manufacturers, of different sizes, and added to the cluster at different times. They can be set according to actual needs and will not affect the implementation of this application.
- the hard disks in the same hard disk cluster have the same model, that is, the same hard disk cluster is composed of a group of hard disks with the same capacity and the same properties.
- All hard disks in the same hard disk cluster are added to the distributed storage system in the same batch, and since the present application realizes global wear balancing of the distributed storage system, the wear degree of each hard disk in the same hard disk cluster is similar, and if replacement is required, they are replaced at the same time.
- hard disks described in this application can be set and adjusted as needed, and are usually SSD hard disks. Of course, in other specific implementations, they can be other types of hard disks. However, it should be pointed out that since mechanical hard disks do not have the problem of friction loss, there is basically no limit on the number of writes.
- the wear leveling implemented in this application is for solid-state hard disks with a limit on the number of writes. Therefore, the hard disks in the solutions of this application are usually solid-state hard disks with a limit on the number of writes.
- Figure 2 is a principle framework diagram for realizing global wear leveling.
- the medium user in Figure 2 represents the division of the data type of the data to be written, and then transmits it to the flash allocation module in Figure 2 through a tag, so that the flash allocation module knows the data type of the data to be written according to the tag, and then decides which hard disk cluster the data to be written should be written to.
- the data types of the data to be written in the present application are divided according to the frequency of future modification of the data to be written. That is, different data types reflect the different frequency of modification of the data to be written. Therefore, the data types of the data to be written are divided according to the different frequency of future modification of the data to be written.
- the estimated number of modifications to the data to be written of the i+1th data type within the first time period is higher than the estimated number of modifications to the data to be written of the i-th data type within the first time period, and N is a positive integer not less than 2, that is, at least 2 data types need to be set, and N represents the total number of data types.
- the data to be written is of the i-th data type
- the estimated number of modifications of the data to be written within the first time length is in the i-th estimated value interval, and the N estimated value intervals are sorted from small to large in order from 1 to N.
- the estimated number of modifications of the data to be written within the first time length is the smallest, indicating that the data to be written hardly needs to be modified.
- the estimated number of modifications of the data to be written within the first time length is the largest, indicating that the data to be written needs to be modified frequently.
- the data types of the data to be written described in step S101 may specifically include:
- the data type of the data to be written is divided.
- the present application divides the data types of the data to be written according to the frequency of future modification of the data to be written.
- This implementation method takes into account whether the data to be written needs to be modified frequently, which can be reflected to a certain extent in the file name. Therefore, the frequency of modification of the data to be written can be predicted based on the file name of the data to be written, that is, the data type of the data to be written can be divided, which will be more convenient in implementation.
- the data type of the data to be written is divided, which may specifically include:
- the data type of the data to be written is divided into the j-th data type; wherein j is a positive integer and 1 ⁇ j ⁇ N-1;
- the data type of the data to be written is divided into the Nth data type.
- N in this application is a positive integer not less than 2, which can represent the total number of data types.
- N in this application is also the total number of states of the hard disk cluster, that is, the number of data types set is the number of states of the hard disk cluster set.
- This implementation takes into account that N-1 databases can be pre-set, so that when the file name of the data to be written matches a certain database, the data type of the data to be written can be determined to be the data type corresponding to the database, that is, if the file name of the data to be written matches the j-th database among the N-1 databases, the data type of the data to be written is classified as the j-th data type.
- N-1 databases are set up instead of N databases in this implementation is that no matter whether N-1 databases or N databases are set up, there may still be a situation where the file name of the data to be written does not match any database, that is, the set database is difficult to comprehensively cover the file names of various data to be written in actual applications.
- the data to be written in this situation is divided into the Nth data type, that is, the subsequent data to be written will be written to the hard disk cluster in the Nth state.
- the hard disk cluster in the Nth state is the hard disk cluster with the lowest wear among the hard disk clusters in various states.
- Nth data type Since in this implementation mode, it is necessary to classify the data type of the data to be written whose file name does not match any database as the Nth data type, there is no need to set up N databases in this implementation mode, but only N-1 databases. That is, as long as the file name of the data to be written does not match the preset N-1 databases, no matter whether the frequency of modification of the data to be written in the future cannot be predicted or whether it is indeed necessary to modify it very frequently in the future, it is directly classified as the Nth data type.
- the specific content in the N-1 databases can be set by the staff based on experience, and also supports dynamic adjustment of the database content to better meet actual needs.
- it can also include:
- An adjustment instruction for the jth database is received, and a data item adding operation, a data item deleting operation, and/or a data item modifying operation is performed on the jth database according to the adjustment instruction.
- the adjustment instruction may be issued by the staff.
- the data items of the j-th database may be added, and/or deleted, and/or modified according to the adjustment instruction.
- the specific matching rules can also be set according to actual needs.
- one or more file name suffixes are set in the j-th database. As long as the suffix of the file name of the data to be written is consistent with any file name suffix in the j-th database, the file name of the data to be written is considered to match the preset j-th database.
- N 3, then two databases need to be set up, which are called the first database and the second database.
- the first database has a suffix of avi, bmp, etc.
- the file name of the data to be written has a suffix of bak, log, etc.
- matching based on the file name suffix is only an example of a relatively simple implementation method. In other specific implementations, other more complex matching methods can also be set.
- the header of the file name can be analyzed, and the analysis result of the file name header and the file name suffix can be combined to determine whether the file name of the data to be written matches the corresponding database.
- the overall pattern of the file name can be analyzed to determine whether the file name of the data to be written matches the corresponding database.
- it may also include:
- the file name is used as a training sample, and the modification count of the training sample in the first time period is used as a training label of the training sample to train the preset deep learning model;
- the specific contents in the N-1 databases all support dynamic adjustment.
- This implementation method takes into account that if the specific contents in the N-1 databases are set and adjusted based on the experience of the staff, the workload will be large, and the adjustment effect will also be affected by the staff's business level.
- data updates of N-1 databases can be implemented based on the deep learning model.
- a deep learning model can be established and trained.
- the training samples are the file names
- the training labels are the statistical values of the number of modifications of the training samples in the first time period.
- the file name is input to the trained deep learning model, and the deep learning model can output the prediction result corresponding to the file name.
- the prediction result represents the estimation of the number of modifications of the data with the file name in the first time period in the future.
- the data of N-1 databases can be updated based on the output results of the deep learning model, that is, different file names are placed in the corresponding databases according to the output results of the deep learning model.
- Step S102 when the data type of the data to be written is the i-th data type, determine whether there is a hard disk cluster in the i-th state; if there is a hard disk cluster in the i-th state, execute step S103.
- Step S103 Select a hard disk cluster in the i-th state.
- the data to be written After the data types of the data to be written are classified, if the data to be written is of the i-th data type, as long as there is a hard disk cluster in the i-th state, the data to be written can be subsequently written into the hard disk cluster in the i-th state.
- the estimated number of modifications of the data to be written of the i+1th data type within the first time period is higher than the estimated number of modifications of the data to be written of the i-th data type within the first time period, that is, the data to be written of the 1st data type hardly needs to be modified, while the data to be written of the Nth data type needs to be modified most frequently.
- the wear degree of the hard disk cluster in the i+1th state is lower than the wear degree of the hard disk cluster in the ith state, where i is a positive integer. That is, the hard disk cluster in the ith state reflects that the wear degree of the hard disk cluster is in the ith wear degree interval. Therefore, there are N wear degree intervals in total, and the N wear degree intervals are sorted from large to small in the order from 1 to N.
- the wear degree of the hard disk cluster is the wear degree of the hard disk cluster.
- the wear degree of the hard disk cluster can be represented by how much data has been written in the hard disk cluster. That is, in a specific case, the wear degree Vwl of the hard disk cluster can be defined as: the sum of the amount of written data of the hard disk cluster/the total amount of writable data of all hard disks in the hard disk cluster. In other words, the more data has been written to the hard disk cluster, the higher the wear degree Vwl of the hard disk cluster, and the highest is 100%.
- N is the total number of data types set, and is also the total number of states of the hard disk cluster.
- N is the total number of data types set, and is also the total number of states of the hard disk cluster.
- the first data type is specifically a read-only data type
- the second data type is specifically a cold data type
- the third data type is specifically a hot data type.
- the first state of the hard disk cluster is the G readonly state
- the second state of the hard disk cluster is the G cold state
- the third state of the hard disk cluster is the G hot state.
- the data type is 1 data type, that is, read-only data type, will be written to the hard disk cluster in the G readonly state.
- backup files belong to this type of data.
- the hard disk cluster in the G readonly state has a high degree of wear, but it can still be read.
- the read operation will not affect the life of the hard disk.
- Writing data that will hardly be modified in the future into the hard disk cluster in the G readonly state is conducive to giving full play to the value of this type of hard disk cluster.
- data that will be modified after being written but will not be modified too frequently has a data type of the second data type, ie, a cold data type, and will be written to a hard disk cluster in the G cold state.
- a data type of the second data type ie, a cold data type
- Data that will be modified frequently after being written is of the third data type, that is, the hot data type, and will be written to the hard disk cluster in the G hot state.
- V1 and V2 described here are two preset parameter thresholds, which can be set and adjusted by the staff as needed. It can be understood that the set V1 ⁇ V2.
- the data to be written is of the i-th data type
- the specific selection rules can be set and adjusted according to actual needs.
- a hard disk cluster in the i-th state can be selected.
- they can be sorted from low to high according to the wear degree, and the hard disk cluster with low wear degree can be selected first, so as to more effectively achieve global wear balancing of the distributed storage system.
- FIG2 there are three G hot state hard disk clusters, which are marked as G1, G2 and G3, six SSDs are set in G1, which are marked as d1 to d6, four SSDs are set in G2, which are marked as d1 to d4, and seven SSDs are set in G3, which are marked as d1 to d7.
- the three G hot state hard disk clusters are arranged in order from small to large according to the wear degree Vwl, so as to give priority to the hard disk cluster with low wear degree.
- the Vwl sorting of each hard disk cluster in the same state may also be dynamically updated in real time or periodically.
- selecting a hard disk cluster in the i-th state according to the rule that the lower the wear degree of the hard disk cluster, the higher the priority may specifically include:
- the disk cluster with the smallest cluster busyness VGbusy is selected as the disk cluster in the i-th state;
- the cluster busyness VGbusy of the hard disk cluster represents the value obtained by dividing the current cluster write queue depth VG cur_queue_depth of the hard disk cluster by the maximum cluster write queue depth VG max_queue_depth .
- search is performed in order of wear degree from small to large.
- the purpose of the search is to determine whether the current cluster write queue depth VG cur_queue_depth of the disk cluster is less than the preset maximum cluster write queue depth VG max_queue_depth .
- the first search is for the hard disk cluster G1 with the lowest wear. If the current cluster write queue depth VG cur_queue_depth of G1 is less than the preset maximum cluster write queue depth VG max_queue_depth , it means that G1 is not busy at present, so the search can be stopped and G1 is selected as the i-th state hard disk cluster according to the preset rules, so that the data to be written can be written to G1 later.
- G2 will continue to be searched. If the current cluster write queue depth VG cur_queue_depth of G2 is less than the preset maximum cluster write queue depth VG max_queue_depth , it means that G2 is not busy at present, so the search can be stopped and G2 will be selected as the hard disk cluster in the i-th state according to the preset rules, so that the data to be written can be written to G2 later.
- the maximum cluster write queue depth VG max_queue_depth is hardware-related and indicates the maximum number of write requests that a disk cluster can process simultaneously, which depends on the computing resources and storage resources of the disk cluster.
- Step S104 writing the data to be written into the selected hard disk cluster in the i-th state.
- the data to be written can be written into the selected hard disk cluster.
- step S104 may specifically include:
- the target hard disk is selected from the selected hard disk cluster in the i-th state
- This implementation method takes into account that after determining the hard disk cluster for storing the data to be written, the specific hard disk for storing the data to be written is selected from the hard disk cluster.
- the wear degree of each hard disk in the same hard disk cluster is generally the same, there are still some differences.
- the hard disk with low wear degree is preferentially selected as the selected target hard disk.
- the hard disks in the selected hard disk cluster can be sorted from low to high according to the wear degree, and the hard disk with low wear degree is preferentially selected to more effectively achieve global wear balance of the distributed storage system.
- the wear degree ranking of each hard disk in the hard disk cluster may also be dynamically updated in real time or periodically.
- selecting a target hard disk from the selected hard disk cluster in the i-th state according to the rule that the lower the wear degree of the hard disk, the higher the priority includes:
- the hard disk with the smallest hard disk busyness VDbusy is selected as the target hard disk;
- the hard disk busyness VDbusy of the hard disk represents a value obtained by dividing the current hard disk write queue depth VD cur_queue_depth of the hard disk by the maximum hard disk write queue depth VD max_queue_depth .
- the principle of selecting one hard disk cluster from each hard disk cluster in the i-th state is the same as in the previous implementation, that is, for each hard disk in the hard disk cluster, when selecting the target hard disk, not only the wear of the hard disk is considered, but also the current hard disk write queue depth of the hard disk is considered, which is conducive to ensuring high concurrency, that is, it is conducive to achieving high IOPS and high bandwidth of the distributed storage system.
- each hard disk in the selected hard disk cluster in the i-th state will be searched in order from small to large wear degree.
- the purpose of the search is to determine whether the current hard disk write queue depth VD cur_queue_depth of the hard disk is less than the preset maximum hard disk write queue depth VD max_queue_depth .
- the first hard disk to be searched is the one with the lowest wear. If the current hard disk write queue depth VD cur_queue_depth of the hard disk is less than the preset maximum hard disk write queue depth VD max_queue_depth , it means that the hard disk is not busy at present, so the search can be stopped and the hard disk can be used as the target hard disk so that the data to be written can be written to the hard disk later.
- the maximum disk write queue depth VD max_queue_depth is hardware-related and indicates the maximum number of write requests that a hard disk can process simultaneously, which depends on the computing resources and storage resources of the hard disk.
- it may also include:
- the data to be written is of the second data type and it is determined that there is no hard disk cluster in the second state currently, the data to be written is written into the hard disk cluster in the G hot state.
- the state of the hard disk cluster will change over time.
- the wear degree is very low, that is, the state of each hard disk cluster is in the G hot state.
- the data to be written is classified as the second data type, since there is currently no hard disk cluster in the second state, that is, there is no hard disk cluster in the G cold state, the data to be written can be written to the hard disk cluster in the G hot state.
- it may also include:
- the data to be written is of the first data type and it is determined that there is no hard disk cluster in the first state currently, the data to be written is written into a hard disk cluster in the G hot state or the G cold state.
- This implementation takes into account that when the data to be written is classified as the first data type, if there is currently no hard disk cluster in the first state, that is, there is no hard disk cluster in the G readonly state, the data to be written can be written to a hard disk cluster in the G hot state or the G cold state.
- it may also include:
- This implementation takes into account that when the data to be written is of the third data type, if there is currently no hard disk cluster in the third state, that is, there is no hard disk cluster in the G hot state, it means that the storage space of the distributed storage system has been used in large quantities and there is insufficient remaining space. Therefore, in order to avoid data loss, this implementation will directly feedback a prompt message of write failure.
- an alarm can be sent to the system to remind staff to add additional resources to the distributed storage system in a timely manner.
- each hard disk cluster in the distributed storage system is arranged in the first medium layer, and an SCM medium layer is also arranged in the distributed storage system to store target type data through the SCM medium layer and process non-block-aligned write data through the SCM medium layer.
- the flash memory medium used in the solution of the present application can generally be QLC or PLC (Penta Level Cell, five-layer storage unit). Since global wear leveling is achieved, the service life is guaranteed while having high data density, high energy efficiency ratio and low price, that is, a high cost performance is achieved.
- This implementation further considers that the data in the hard disk cluster is usually stored in blocks of a set size, and an SCM (Storage-Class Memory) medium layer can also be set in the distributed storage system.
- SCM Storage-Class Memory
- the IOPS can be effectively improved.
- the large block IO Input/Output
- the non-block-aligned write data that is, the IO with no aligned boundaries.
- some small block IO also belongs to non-block-aligned write data.
- data that needs to be modified very frequently may not be stored in the first medium layer, that is, not stored in the SSD, but may be directly placed in the SCM to achieve high-speed reading and writing of such data, which is also beneficial to further improve the life of the distributed storage system.
- Figure 3 is a schematic diagram of a multi-layer flash memory architecture of a distributed storage system in a specific implementation.
- a first medium layer and an SCM medium layer are set.
- the first medium layer in Figure 3 is specifically a PLC medium layer, which is also the solution usually selected in practical applications.
- the PLC medium layer has the advantages of high data density, high energy efficiency ratio, and low price. Through the global wear leveling strategy of this application, the service life of the PLC medium layer is effectively guaranteed and its cost performance is improved.
- the first medium layer can also be a QLC medium layer.
- each hard disk cluster data is stored in blocks of a set size, and the data storage method of the distributed storage system further includes:
- each block to be migrated in the hard disk cluster is migrated to the hard disk cluster in the (i+1)th state.
- This implementation further takes into account that in the aforementioned implementation, the data type of the data to be written is divided, which is equivalent to predicting the frequency of future modification of the data to be written. It is understandable that there will be deviations in the prediction results, and even for the same data to be written, the actual frequency of modification may be different in different occasions and different time periods. Therefore, in this implementation, data migration will be performed.
- the P/E number can also be called the number of erase cycles. If the P/E number of a block is very low, it means that the data in the block is not frequently modified. Conversely, if the P/E number of a block is very high, it means that the data in the block is frequently modified.
- the P/E times of each block in the disk cluster in the i-th state will be determined, and the average value of the P/E times of each block in the disk cluster will be calculated. If the P/E times of a block are much lower than the average value, the block will be used as a block to be migrated. If there is currently a disk cluster in the i+1th state, each block to be migrated in the disk cluster in the i-th state will be migrated to the disk cluster in the i+1th state.
- the P/E times of two blocks are particularly low, indicating that the modification frequency of these two blocks is very low, so the data of these two blocks are migrated to the hard disk cluster in the G cold state.
- it may also include:
- the K blocks with the largest P/E times in the hard disk cluster in the i-th state are exchanged with the data in the K blocks to be migrated in the hard disk cluster in the i-th state, so as to complete the internal migration of the K blocks to be migrated in the hard disk cluster in the i-th state;
- K represents the number of blocks to be migrated determined in the hard disk cluster in the i-th state.
- a hard disk cluster in the G readonly state cannot be migrated, that is, a hard disk cluster in the Nth state cannot be migrated because there is no hard disk cluster in the N+1th state.
- the i-th state is the G cold state
- the i-th state is the G cold state
- after determining a number of blocks to be migrated it is detected that the storage space of each hard disk cluster in the G readonly state has been exhausted, and it is also deemed that there is no hard disk cluster in the i+1 state, and the operation of this implementation method will be executed, that is, migration will be performed within the hard disk cluster.
- the migration destination does not exist or the space of the migration destination is full, the blocks to be migrated will be migrated within the source hard disk cluster. shift.
- the data in the K blocks with the largest P/E times in the hard disk cluster in the i-th state are exchanged with the data in the K blocks to be migrated in the hard disk cluster, thereby further ensuring the wear balance of each block in the hard disk cluster.
- block 1 and block 2 are blocks to be migrated, and the two blocks with the largest P/E times in the hard disk cluster are block 3 and block 4. Therefore, the data of block 1 and block 2 need to be exchanged with the data of block 3 and block 4.
- the data of block 1 can be exchanged with the data of block 3, and the data of block 2 can be exchanged with the data of block 4, so as to complete the internal migration of the two blocks to be migrated in the hard disk cluster.
- the distributed storage system is divided into multiple hard disk clusters. Compared with directly managing each hard disk, the management data required for the hard disk cluster is lower, that is, the amount of metadata, which is easy to implement.
- the hard disk cluster includes multiple hard disks of the same model, and each hard disk in the same hard disk cluster is added to the distributed storage system in the same batch.
- the present application divides the hard disk cluster into N states. After receiving the data to be written, the data type of the data to be written will be divided. When the data to be written is divided into the i-th data type, it is determined whether there is a hard disk cluster in the i-th state at present. If so, a hard disk cluster in the i-th state will be selected. Since different data types reflect the different frequencies of future modification of the data to be written, and specifically, among the N data types, the estimated number of modifications of the data to be written of the i+1th data type within the first time length is higher than the estimated number of modifications of the data to be written of the i-th data type within the first time length.
- the wear degree of the hard disk cluster in the i+1th state is lower than the wear degree of the hard disk cluster in the i-th state. It can be seen that for data that almost does not need to be modified, that is, when the estimated number of modifications of the data to be written within the first time length is very low, the data to be written will be divided into the first data type, and therefore will be written into the hard disk cluster in the first state.
- the wear degree of the hard disk cluster in the first state is the highest, which means that the hard disk cluster in the first state has written a large amount of data, so the data written is data that almost does not need to be modified. Writing to the hard disk will wear out the hard disk, but reading will not.
- the data written to the hard disk cluster with a high degree of wear is data that hardly needs to be modified, even if the hard disk cluster has a high degree of wear, it can still be read, thus giving full play to its residual value.
- the more frequently the data to be written needs to be modified that is, the higher the estimated number of modifications to the data to be written within the first time period, the data to be written will be written to the hard disk cluster with a lower degree of wear, so that the hard disk cluster with a lower degree of wear can be used more fully, thus realizing the global wear leveling of the distributed storage system.
- the present application divides the distributed storage system into multiple hard disk clusters, which is conducive to conveniently realizing global wear leveling of the distributed storage system, thereby ensuring the durability of the hard disks in the distributed storage system and avoiding the frequent replacement of bad disks.
- the embodiment of the present application further provides a data storage system of a distributed storage system, which can be referenced in correspondence with the above.
- the distributed storage system includes multiple hard disk clusters, each hard disk cluster includes multiple hard disks of the same model, and each hard disk in the same hard disk cluster is added to the distributed storage system in the same batch.
- the data storage system includes:
- the type classification module 401 is used to receive the data to be written and classify the data types of the data to be written; wherein, among the set N data types, the estimated number of modifications of the data to be written of the i+1th data type within the first time length is higher than the estimated number of modifications of the data to be written of the ith data type within the first time length, and N is a positive integer not less than 2;
- the hard disk cluster status judgment module 402 is used to judge whether there is a hard disk cluster in the i-th state when the data type of the data to be written is the i-th data type; if there is a hard disk cluster in the i-th state, the hard disk cluster selection module 403 is triggered;
- the hard disk cluster selection module 403 is used to select a hard disk cluster in the i-th state
- the writing module 404 is used to write the data to be written into the selected hard disk cluster in the i-th state; wherein, among the hard disk clusters in the set N states, the wear degree of the hard disk cluster in the i+1-th state is lower than the wear degree of the hard disk cluster in the i-th state, and i is a positive integer.
- the type classification module 401 classifies the data types of the data to be written, including:
- the data type of the data to be written is divided.
- the type classification module 401 classifies the data types of the data to be written based on the file names of the data to be written, including:
- the data type of the data to be written is divided into the j-th data type; wherein j is a positive integer and 1 ⁇ j ⁇ N-1;
- the data type of the data to be written is divided into the Nth data type.
- a first update module is also included, which is used to:
- the file name is used as a training sample, and the modification count of the training sample in the first time period is used as a training label of the training sample to train the preset deep learning model;
- a second updating module is further included, which is used to:
- An adjustment instruction for the jth database is received, and a data item adding operation, a data item deleting operation, and/or a data item modifying operation is performed on the jth database according to the adjustment instruction.
- N 3
- the first data type is a read-only data type
- the second data type is a cold data type
- the third data type is a hot data type
- the first state of the hard disk cluster is a G readonly state
- the second state of the hard disk cluster is a G cold state
- the third state of the hard disk cluster is a G hot state.
- a first execution module is further included, which is used to:
- the data to be written is of the first data type and it is determined that there is no hard disk cluster in the first state currently, the data to be written is written into a hard disk cluster in the G hot state or the G cold state.
- a second execution module is further included, which is used to:
- the data to be written is of the second data type and it is determined that there is no hard disk cluster in the second state currently, the data to be written is written into the hard disk cluster in the G hot state.
- a third execution module is further included, which is used to:
- the hard disk cluster selection module 403 is specifically used to:
- a hard disk cluster in the i-th state is selected.
- the hard disk cluster selection module 403 is specifically used to:
- the disk cluster with the smallest cluster busyness VGbusy is selected as the disk cluster in the i-th state;
- the cluster busyness VGbusy of the hard disk cluster represents the value obtained by dividing the current cluster write queue depth VG cur_queue_depth of the hard disk cluster by the maximum cluster write queue depth VG max_queue_depth .
- the writing module 404 is specifically used for:
- the target hard disk is selected from the selected hard disk cluster in the i-th state
- the writing module 404 is specifically used for:
- the hard disk with the smallest hard disk busyness VDbusy is selected as the target hard disk;
- the hard disk busyness VDbusy of the hard disk represents a value obtained by dividing the current hard disk write queue depth VD cur_queue_depth of the hard disk by the maximum hard disk write queue depth VD max_queue_depth .
- each hard disk cluster in the distributed storage system is arranged in the first medium layer, and an SCM medium layer is also arranged in the distributed storage system to store target type data through the SCM medium layer and process non-block-aligned write data through the SCM medium layer.
- the first dielectric layer is a PLC dielectric layer or a QLC dielectric layer.
- each hard disk cluster data is stored in blocks of a set size, and a migration module is also included for:
- each block to be migrated in the hard disk cluster is migrated to the hard disk cluster in the (i+1)th state.
- the migration module is also used to:
- the K blocks with the largest P/E times in the hard disk cluster in the i-th state are exchanged with the data in the K blocks to be migrated in the hard disk cluster in the i-th state, so as to complete the internal migration of the K blocks to be migrated in the hard disk cluster in the i-th state;
- K represents the number of blocks to be migrated determined in the hard disk cluster in the i-th state.
- the embodiments of the present application also provide a data storage device of a distributed storage system and a computer non-volatile readable storage medium, which can be referenced in correspondence with the above.
- the data storage device of the distributed storage system may include:
- Memory 501 used for storing computer programs
- the processor 502 is used to execute a computer program to implement the steps of the data storage method of the distributed storage system.
- a computer program 61 is stored on the non-volatile computer readable storage medium 60.
- the non-volatile computer readable storage medium 60 mentioned here includes a random access memory (RAM), a memory, a read-only memory (ROM), an electrically programmable ROM, an electrically erasable programmable ROM, a register, a hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the technical field.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present application is applied to the technical field of storage. Disclosed are a data storage method, system and device for a distributed storage system, and a storage medium. The distributed storage system comprises a plurality of hard disk clusters, wherein one hard disk cluster comprises a plurality of hard disks which have the same model number and are all added in the same batch to the distributed storage system. The method comprises: receiving data to be written, and determining the data type thereof; when the data type of said data is an ith data type, determining whether there is a hard disk cluster in an ith state currently; if there is a hard disk cluster in the ith state, selecting one hard disk cluster in the ith state; and writing said data into the selected hard disk cluster in the ith state, wherein the lower an estimated value of the number of times that said data is modified within a first duration, the higher the wear degree of the hard disk cluster into which said data is written. By applying the solution of the present application, the global wear balance of a distributed storage system is realized, the durability of a hard disk is ensured, and the situation in which a bad disk is frequently replaced is avoided.
Description
相关申请的交叉引用CROSS-REFERENCE TO RELATED APPLICATIONS
本申请要求于2023年03月15日提交中国专利局,申请号为202310247233.0,申请名称为“分布式存储系统的数据存储方法、系统、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims priority to a Chinese patent application filed with the China Patent Office on March 15, 2023, with application number 202310247233.0 and application name “Data storage method, system, device and storage medium for distributed storage system”, the entire contents of which are incorporated by reference in this application.
本申请涉及存储技术领域,特别是涉及一种分布式存储系统的数据存储方法、系统、设备及存储介质。The present application relates to the field of storage technology, and in particular to a data storage method, system, device and storage medium of a distributed storage system.
闪存介质经过了四代的发展,当前大规模使用的是TLC(Triple Level Cell,三层式储存单元),QLC(Quad Level Cell,四层式储存单元)也即将进入量产。每一代和上一代相比,数据密度更高,价格更便宜,耐久度更低,性能变低,能效比在提高,这些都是由闪存的物理特性所决定的。但QLC有两个主要缺点,一个是性能低,一个耐久度差。关于耐久度差,即P/E(Program/Erase cycle,固态存储擦除周期)在快速下降,目前使用的办法是在一个SSD(Solid-State Drives,固态硬盘驱动器)盘内做磨损均衡,目前的QLC的P/E已经下降到了1000左右,但是在分布式系统中上述做法所取得的效果有限,分布式系统中存在成千上万的QLC SSD盘,而每个盘所写的数据量不均衡,并且由于新盘和旧盘的原因,使得有的盘很快就坏了,有的盘可以用很久,QLC的低成本特性就体现不出来,并且频繁地进行坏盘的更换也增大了运维成本。Flash media has gone through four generations of development. Currently, TLC (Triple Level Cell) is widely used, and QLC (Quad Level Cell) is about to enter mass production. Compared with the previous generation, each generation has higher data density, cheaper price, lower durability, lower performance, and improved energy efficiency, which are all determined by the physical properties of flash memory. However, QLC has two main disadvantages: low performance and poor durability. Regarding the poor durability, that is, the P/E (Program/Erase cycle) is decreasing rapidly, the current method is to do wear leveling in an SSD (Solid-State Drives) disk. The current P/E of QLC has dropped to around 1000, but the effect of the above approach is limited in distributed systems. There are thousands of QLC SSD disks in the distributed system, and the amount of data written to each disk is uneven. In addition, due to the reasons of new and old disks, some disks break down quickly, while some disks can be used for a long time. The low-cost characteristics of QLC cannot be reflected, and the frequent replacement of bad disks also increases the operation and maintenance costs.
发明内容Summary of the invention
本申请的目的是提供一种分布式存储系统的数据存储方法、系统、设备及存储介质,以有效地提高硬盘的耐久度,避免频繁地进行坏盘的更换。The purpose of the present application is to provide a data storage method, system, device and storage medium of a distributed storage system, so as to effectively improve the durability of hard disks and avoid frequent replacement of bad disks.
为解决上述技术问题,本申请提供如下技术方案:In order to solve the above technical problems, this application provides the following technical solutions:
一种分布式存储系统的数据存储方法,分布式存储系统中包括多个硬盘集群,硬盘集群中包括多个型号相同的硬盘,且同一硬盘集群中的各个硬盘均在同一批次加入分布式存储系统,分布式存储系统的数据存储方法包括:A data storage method for a distributed storage system, wherein the distributed storage system includes multiple hard disk clusters, each hard disk cluster includes multiple hard disks of the same model, and each hard disk in the same hard disk cluster is added to the distributed storage system in the same batch. The data storage method for the distributed storage system includes:
接收待写入数据,并划分出待写入数据的数据类型;其中,在设定的N种数据类型中,第i+1数据类型的待写入数据在第一时长内的修改次数预估值,高于第i数据类型的待写入数据在第一时长内的修改次数预估值,N为不小于2的正整数;Receive data to be written and classify the data types of the data to be written; wherein, among the set N data types, the estimated number of modifications of the data to be written of the i+1th data type within the first time length is higher than the estimated number of modifications of the data to be written of the ith data type within the first time length, and N is a positive integer not less than 2;
当待写入数据的数据类型为第i数据类型时,判断当前是否存在第i状态的硬盘集群;When the data type of the data to be written is the i-th data type, determining whether there is a hard disk cluster in the i-th state;
如果当前存在第i状态的硬盘集群,则选取出1个第i状态的硬盘集群;If there is currently a hard disk cluster in the i-th state, select a hard disk cluster in the i-th state;
将待写入数据写入选取出的该第i状态的硬盘集群中;其中,在设定的N种状态的硬盘集群中,第i+1状态的硬盘集群的磨损度低于第i状态的硬盘集群的磨损度,i为正整数。The data to be written is written into the selected hard disk cluster in the i-th state; wherein, among the hard disk clusters in the set N states, the wear degree of the hard disk cluster in the i+1th state is lower than the wear degree of the hard disk cluster in the i-th state, and i is a positive integer.
优选的,划分出待写入数据的数据类型,包括:Preferably, the data types of the data to be written are divided into:
基于待写入数据的文件名,划分出待写入数据的数据类型。Based on the file name of the data to be written, the data type of the data to be written is divided.
优选的,基于待写入数据的文件名,划分出待写入数据的数据类型,包括:Preferably, based on the file name of the data to be written, the data type of the data to be written is divided, including:
当待写入数据的文件名与预设的第j数据库匹配时,将待写入数据的数据类型划分为第j数据类型;其中,j为正整数且1≤j≤N-1;
When the file name of the data to be written matches the preset j-th database, the data type of the data to be written is divided into the j-th data type; wherein j is a positive integer and 1≤j≤N-1;
当待写入数据的文件名与预设的N-1个数据库均不匹配时,将待写入数据的数据类型划分为第N数据类型。When the file name of the data to be written does not match any of the preset N-1 databases, the data type of the data to be written is divided into the Nth data type.
优选的,还包括:Preferably, it also includes:
以文件名作为训练样本,以训练样本在第一时长内的修改次数统计值作为训练样本的训练标签,对预设的深度学习模型进行训练;The file name is used as a training sample, and the modification count of the training sample in the first time period is used as a training label of the training sample to train the preset deep learning model;
在深度学习模型训练完毕之后,依次输入各个不同的文件名至训练完毕的深度学习模型,并基于深度学习模型的输出结果,进行N-1个数据库的数据更新。After the deep learning model is trained, different file names are input into the trained deep learning model in sequence, and data of N-1 databases are updated based on the output results of the deep learning model.
优选的,还包括:Preferably, it also includes:
接收针对第j数据库的调整指令,并根据调整指令进行第j数据库的数据项增加操作和/或数据项删除操作和/或数据项修改操作。An adjustment instruction for the jth database is received, and a data item adding operation, a data item deleting operation, and/or a data item modifying operation is performed on the jth database according to the adjustment instruction.
优选的,N=3,第1数据类型为只读数据类型,第2数据类型为冷数据类型,第3数据类型为热数据类型,硬盘集群的第1状态为Greadonly状态,硬盘集群的第2状态为Gcold状态,硬盘集群的第3状态为Ghot状态。Preferably, N=3, the first data type is a read-only data type, the second data type is a cold data type, the third data type is a hot data type, the first state of the hard disk cluster is a G readonly state, the second state of the hard disk cluster is a G cold state, and the third state of the hard disk cluster is a G hot state.
优选的,还包括:Preferably, it also includes:
当待写入数据为第1数据类型,且判断出当前不存在第1状态的硬盘集群时,将待写入数据写入Ghot状态或者Gcold状态的硬盘集群中。When the data to be written is of the first data type and it is determined that there is no hard disk cluster in the first state currently, the data to be written is written into a hard disk cluster in the G hot state or the G cold state.
优选的,还包括:Preferably, it also includes:
当待写入数据为第2数据类型,且判断出当前不存在第2状态的硬盘集群时,将待写入数据写入Ghot状态的硬盘集群中。When the data to be written is of the second data type and it is determined that there is no hard disk cluster in the second state currently, the data to be written is written into the hard disk cluster in the G hot state.
优选的,还包括:Preferably, it also includes:
当待写入数据为第3数据类型,且判断出当前不存在第3状态的硬盘集群时,反馈写失败的提示信息。When the data to be written is of the third data type and it is determined that there is no disk cluster in the third state currently, a prompt message indicating write failure is fed back.
优选的,选取出1个第i状态的硬盘集群,包括:Preferably, selecting a hard disk cluster in the i-th state includes:
以硬盘集群的磨损度越低则优先级越高的规则,选取出1个第i状态的硬盘集群。According to the rule that the lower the wear degree of the hard disk cluster, the higher the priority, a hard disk cluster in the i-th state is selected.
优选的,以硬盘集群的磨损度越低则优先级越高的规则,选取出1个第i状态的硬盘集群,包括:Preferably, selecting a hard disk cluster in the i-th state according to the rule that the lower the wear degree of the hard disk cluster, the higher the priority, includes:
针对第i状态的各个硬盘集群,按照磨损度从小到大的顺序依次进行查找;For each hard disk cluster in the i-th state, search in order from the smallest to the largest wear degree;
当查找到任意1个第i状态的硬盘集群的当前集群写队列深度VGcur_queue_depth<预设的最大集群写队列深度VGmax_queue_depth时,停止查找并将该硬盘集群作为选取出的1个第i状态的硬盘集群;When the current cluster write queue depth VG cur_queue_depth of any disk cluster in the i-th state is found to be less than the preset maximum cluster write queue depth VG max_queue_depth , the search is stopped and the disk cluster is used as the selected disk cluster in the i-th state;
当各个第i状态的硬盘集群均进行了查找之后,不存在当前集群写队列深度VGcur_queue_depth<预设的最大集群写队列深度VGmax_queue_depth的硬盘集群时,则将集群繁忙度VGbusy最小的硬盘集群作为选取出的1个第i状态的硬盘集群;After searching all disk clusters in the i-th state, if there is no disk cluster whose current cluster write queue depth VG cur_queue_depth is less than the preset maximum cluster write queue depth VG max_queue_depth , the disk cluster with the smallest cluster busyness VGbusy is selected as the disk cluster in the i-th state;
其中,硬盘集群的集群繁忙度VGbusy表示的是硬盘集群的当前集群写队列深度VGcur_queue_depth除以最大集群写队列深度VGmax_queue_depth之后所得到的数值。The cluster busyness VGbusy of the hard disk cluster represents the value obtained by dividing the current cluster write queue depth VG cur_queue_depth of the hard disk cluster by the maximum cluster write queue depth VG max_queue_depth .
优选的,将待写入数据写入选取出的该第i状态的硬盘集群中,包括:Preferably, writing the data to be written into the selected hard disk cluster in the i-th state includes:
以硬盘的磨损度越低则优先级越高的规则,从选取出的该第i状态的硬盘集群中选取出目标硬盘;According to the rule that the lower the wear degree of the hard disk, the higher the priority, the target hard disk is selected from the selected hard disk cluster in the i-th state;
将待写入数据写入选取出的目标硬盘中。
Write the data to be written into the selected target hard disk.
优选的,以硬盘的磨损度越低则优先级越高的规则,从选取出的该第i状态的硬盘集群中选取出目标硬盘,包括:Preferably, selecting a target hard disk from the selected hard disk cluster in the i-th state according to the rule that the lower the wear degree of the hard disk, the higher the priority, includes:
针对选取出的该第i状态的硬盘集群中的各个硬盘,按照磨损度从小到大的顺序依次进行查找;Search each hard disk in the selected hard disk cluster in the i-th state in order of the wear degree from small to large;
当查找到该第i状态的硬盘集群中的任意1个硬盘的当前硬盘写队列深度VDcur_queue_depth<预设的最大硬盘写队列深度VDmax_queue_depth时,停止查找并将该硬盘作为选取出的目标硬盘;When it is found that the current hard disk write queue depth VD cur_queue_depth of any hard disk in the hard disk cluster in the i-th state is less than the preset maximum hard disk write queue depth VD max_queue_depth , the search is stopped and the hard disk is selected as the target hard disk;
当该第i状态的硬盘集群中的各个硬盘均进行了查找之后,不存在当前硬盘写队列深度VDcur_queue_depth<预设的最大硬盘写队列深度VDmax_queue_depth的硬盘时,则将硬盘繁忙度VDbusy最小的硬盘作为选取出的目标硬盘;After searching all hard disks in the hard disk cluster in the i-th state, if there is no hard disk whose current hard disk write queue depth VD cur_queue_depth is less than the preset maximum hard disk write queue depth VD max_queue_depth , the hard disk with the smallest hard disk busyness VDbusy is selected as the target hard disk;
其中,硬盘的硬盘繁忙度VDbusy表示的是硬盘的当前硬盘写队列深度VDcur_queue_depth除以最大硬盘写队列深度VDmax_queue_depth之后所得到的数值。The hard disk busyness VDbusy of the hard disk represents a value obtained by dividing the current hard disk write queue depth VD cur_queue_depth of the hard disk by the maximum hard disk write queue depth VD max_queue_depth .
优选的,分布式存储系统中的各个硬盘集群均设置在第一介质层,分布式存储系统中还设置有SCM介质层,以通过SCM介质层存储目标类型的数据,并通过SCM介质层处理未块对齐的写入数据。Preferably, each hard disk cluster in the distributed storage system is arranged in the first medium layer, and an SCM medium layer is also arranged in the distributed storage system to store target type data through the SCM medium layer and process non-block aligned write data through the SCM medium layer.
优选的,第一介质层为PLC介质层或者为QLC介质层。Preferably, the first dielectric layer is a PLC dielectric layer or a QLC dielectric layer.
优选的,在每1个硬盘集群中,数据均按照设定大小的块进行存储,分布式存储系统的数据存储方法还包括:Preferably, in each hard disk cluster, data is stored in blocks of a set size, and the data storage method of the distributed storage system further includes:
确定出第i状态的硬盘集群中的各个块的P/E次数,并统计出该硬盘集群中的各个块的P/E次数的平均值;Determine the P/E times of each block in the hard disk cluster in the i-th state, and calculate the average value of the P/E times of each block in the hard disk cluster;
当该第i状态的硬盘集群中存在P/E次数与平均值之间的差值低于设定的第一数值的块时,则将在P/E次数与平均值之间的差值低于设定的第一数值的各个块均作为待迁移块;When there are blocks in the disk cluster in the i-th state whose difference between the P/E times and the average value is lower than the set first value, all blocks whose difference between the P/E times and the average value is lower than the set first value are taken as blocks to be migrated;
如果当前存在第i+1状态的硬盘集群,则将硬盘集群中的各个待迁移块,迁移至第i+1状态的硬盘集群中。If there is currently a hard disk cluster in the (i+1)th state, each block to be migrated in the hard disk cluster is migrated to the hard disk cluster in the (i+1)th state.
优选的,还包括:Preferably, it also includes:
在确定出各个待迁移块之后,如果当前不存在第i+1状态的硬盘集群,则将该第i状态的硬盘集群中的K个P/E次数最大的块,与该第i状态的硬盘集群中的K个待迁移块中的数据进行互换,以完成K个待迁移块在该第i状态的硬盘集群的内部迁移;After determining each block to be migrated, if there is no hard disk cluster in the i+1th state, the K blocks with the largest P/E times in the hard disk cluster in the i-th state are exchanged with the data in the K blocks to be migrated in the hard disk cluster in the i-th state, so as to complete the internal migration of the K blocks to be migrated in the hard disk cluster in the i-th state;
其中,K表示的是该第i状态的硬盘集群中所确定出的待迁移块的数量。Here, K represents the number of blocks to be migrated determined in the hard disk cluster in the i-th state.
一种分布式存储系统的数据存储系统,分布式存储系统中包括多个硬盘集群,硬盘集群中包括多个型号相同的硬盘,且同一硬盘集群中的各个硬盘均在同一批次加入分布式存储系统,分布式存储系统的数据存储系统包括:A data storage system of a distributed storage system, the distributed storage system includes multiple hard disk clusters, the hard disk clusters include multiple hard disks of the same model, and each hard disk in the same hard disk cluster is added to the distributed storage system in the same batch, the data storage system of the distributed storage system includes:
类型划分模块,用于接收待写入数据,并划分出待写入数据的数据类型;其中,在设定的N种数据类型中,第i+1数据类型的待写入数据在第一时长内的修改次数预估值,高于第i数据类型的待写入数据在第一时长内的修改次数预估值,N为不小于2的正整数;A type classification module, used for receiving the data to be written and classifying the data types of the data to be written; wherein, among the set N data types, the estimated number of modifications of the data to be written of the i+1th data type within the first time length is higher than the estimated number of modifications of the data to be written of the ith data type within the first time length, and N is a positive integer not less than 2;
硬盘集群状态判断模块,用于当待写入数据的数据类型为第i数据类型时,判断当前是否存在第i状态的硬盘集群;如果存在第i状态的硬盘集群,则触发硬盘集群选择模块;The hard disk cluster status judgment module is used to judge whether there is a hard disk cluster in the i-th state when the data type of the data to be written is the i-th data type; if there is a hard disk cluster in the i-th state, the hard disk cluster selection module is triggered;
硬盘集群选择模块,用于选取出1个第i状态的硬盘集群;A hard disk cluster selection module is used to select a hard disk cluster in the i-th state;
写入模块,用于将待写入数据写入选取出的该第i状态的硬盘集群中;其中,在设定的
N种状态的硬盘集群中,第i+1状态的硬盘集群的磨损度低于第i状态的硬盘集群的磨损度,i为正整数。The writing module is used to write the data to be written into the selected hard disk cluster in the i-th state; wherein, in the set Among the hard disk clusters in N states, the wear degree of the hard disk cluster in the i+1th state is lower than the wear degree of the hard disk cluster in the ith state, where i is a positive integer.
一种分布式存储系统的数据存储设备,包括:A data storage device of a distributed storage system, comprising:
存储器,用于存储计算机程序;Memory for storing computer programs;
处理器,用于执行计算机程序以实现如上述的分布式存储系统的数据存储方法的步骤。The processor is used to execute a computer program to implement the steps of the data storage method of the distributed storage system as described above.
一种计算机非易失性可读存储介质,计算机非易失性可读存储介质上存储有计算机程序,计算机程序被处理器执行时实现如上述的分布式存储系统的数据存储方法的步骤。A computer non-volatile readable storage medium stores a computer program, which implements the steps of the data storage method of the above-mentioned distributed storage system when executed by a processor.
应用本申请实施例所提供的技术方案,考虑到为了充分实现闪存介质的高性价比特性,不能局限于在一个硬盘内做磨损均衡,而是需要基于分布式存储系统做全局的磨损均衡。具体的,本申请的方案中将分布式存储系统划分为多个硬盘集群,进行硬盘集群的管理相较于直接进行各个硬盘的管理,所需要的管理数据即元数据量较低,便于实现。并且,针对任意1个硬盘集群,硬盘集群中包括多个型号相同的硬盘,且同一硬盘集群中的各个硬盘均在同一批次加入分布式存储系统,通过这样的设置,有利于方便地实现硬盘集群中的各个硬盘的磨损均衡,再通过实现硬盘集群之间的磨损均衡,便可以实现整个分布式存储系统的全局磨损均衡,也就保障了硬盘的耐久度,可以避免频繁地进行坏盘的更换的情况。Applying the technical solution provided by the embodiment of the present application, considering that in order to fully realize the high cost-effectiveness of flash memory media, it is not limited to wear leveling within one hard disk, but it is necessary to perform global wear leveling based on a distributed storage system. Specifically, in the solution of the present application, the distributed storage system is divided into multiple hard disk clusters. Compared with directly managing each hard disk, the management data required for the hard disk cluster is lower, that is, the amount of metadata, which is easy to implement. In addition, for any one hard disk cluster, the hard disk cluster includes multiple hard disks of the same model, and each hard disk in the same hard disk cluster is added to the distributed storage system in the same batch. Through such a setting, it is conducive to conveniently realizing the wear leveling of each hard disk in the hard disk cluster. Then, by realizing the wear leveling between the hard disk clusters, the global wear leveling of the entire distributed storage system can be realized, which also ensures the durability of the hard disk and can avoid the situation of frequent replacement of bad disks.
具体的,在进行硬盘集群之间的磨损均衡时,本申请是将硬盘集群划分为N种状态,接收待写入数据之后,会划分待写入数据的数据类型,当待写入数据划分为第i数据类型时,判断当前是否存在第i状态的硬盘集群,如果存在,则会选取出1个第i状态的硬盘集群。由于不同数据类型反映的是待写入数据未来修改频繁程度的不同,并且具体的,N种数据类型中,第i+1数据类型的待写入数据在第一时长内的修改次数预估值,高于第i数据类型的待写入数据在第一时长内的修改次数预估值。而在设定的N种状态的硬盘集群中,第i+1状态的硬盘集群的磨损度低于第i状态的硬盘集群的磨损度。可以看出,对于几乎不需要修改的数据,即待写入数据在第一时长内的修改次数预估值很低时,待写入数据会被划分为第1数据类型,因此会被写入第1状态的硬盘集群中,第1状态的硬盘集群的磨损度最高,即说明第1状态的硬盘集群已写了大量的数据量,因此写入的是几乎不需要修改的数据。对硬盘进行写操作会磨损硬盘而读操作不会磨损硬盘,可以看出,由于对于磨损度较高的硬盘集群,写入的是几乎不需要修改的数据,使得即便硬盘集群磨损度较高,也仍然可以进行读,从而充分发挥其剩余价值。相应的,待写入数据需要被修改地越频繁,即待写入数据在第一时长内的修改次数预估值越高时,待写入数据便会被写入磨损度越低的硬盘集群中,以便对磨损度低的硬盘集群进行更为充分的使用,实现了分布式存储系统的全局磨损均衡。Specifically, when performing wear leveling between hard disk clusters, the present application divides the hard disk cluster into N states. After receiving the data to be written, the data type of the data to be written will be divided. When the data to be written is divided into the i-th data type, it is determined whether there is a hard disk cluster in the i-th state at present. If so, a hard disk cluster in the i-th state will be selected. Since different data types reflect the different frequencies of future modification of the data to be written, and specifically, among the N data types, the estimated number of modifications of the data to be written of the i+1th data type within the first time length is higher than the estimated number of modifications of the data to be written of the i-th data type within the first time length. And among the hard disk clusters in the set N states, the wear degree of the hard disk cluster in the i+1th state is lower than the wear degree of the hard disk cluster in the i-th state. It can be seen that for data that almost does not need to be modified, that is, when the estimated number of modifications of the data to be written within the first time length is very low, the data to be written will be divided into the first data type, and therefore will be written into the hard disk cluster in the first state. The wear degree of the hard disk cluster in the first state is the highest, which means that the hard disk cluster in the first state has written a large amount of data, so the data written is data that almost does not need to be modified. Writing to the hard disk will wear out the hard disk, but reading will not. It can be seen that, since the data written to the hard disk cluster with a high degree of wear is data that hardly needs to be modified, even if the hard disk cluster has a high degree of wear, it can still be read, thus giving full play to its residual value. Correspondingly, the more frequently the data to be written needs to be modified, that is, the higher the estimated number of modifications to the data to be written within the first time period, the data to be written will be written to the hard disk cluster with a lower degree of wear, so that the hard disk cluster with a lower degree of wear can be used more fully, thus realizing the global wear leveling of the distributed storage system.
综上,本申请将分布式存储系统划分为多个硬盘集群,有利于方便地实现分布式存储系统的全局磨损均衡,也就保障了分布式存储系统中的硬盘的耐久度,可以避免出现频繁地进行坏盘的更换的情况。In summary, the present application divides the distributed storage system into multiple hard disk clusters, which is conducive to conveniently realizing global wear leveling of the distributed storage system, thereby ensuring the durability of the hard disks in the distributed storage system and avoiding the frequent replacement of bad disks.
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings required for use in the embodiments or the description of the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present application. For ordinary technicians in this field, other drawings can be obtained based on these drawings without paying any creative work.
图1为本申请中一种分布式存储系统的数据存储方法的实施流程图;FIG1 is a flowchart of an implementation method of a distributed storage system in the present application;
图2为本申请一种具体实施方式中实现全局磨损均衡的原理框架图;FIG2 is a schematic diagram showing a principle framework for implementing global wear leveling in a specific implementation of the present application;
图3为本申请一种具体实施方式中的分布式存储系统的多层闪存架构示意图;FIG3 is a schematic diagram of a multi-layer flash memory architecture of a distributed storage system in a specific implementation manner of the present application;
图4为本申请中一种分布式存储系统的数据存储系统的结构示意图;FIG4 is a schematic diagram of the structure of a data storage system of a distributed storage system in the present application;
图5为本申请中一种分布式存储系统的数据存储设备的结构示意图;FIG5 is a schematic diagram of the structure of a data storage device of a distributed storage system in the present application;
图6为本申请中一种计算机非易失性可读存储介质的结构示意图。FIG6 is a schematic diagram of the structure of a computer non-volatile readable storage medium in the present application.
本申请的核心是提供一种分布式存储系统的数据存储方法,将分布式存储系统划分为多个硬盘集群,有利于方便地实现分布式存储系统的全局磨损均衡,也就保障了分布式存储系统中的硬盘的耐久度,可以避免出现频繁地进行坏盘的更换的情况。The core of this application is to provide a data storage method for a distributed storage system, which divides the distributed storage system into multiple hard disk clusters, which is conducive to conveniently realizing global wear leveling of the distributed storage system, thereby ensuring the durability of the hard disks in the distributed storage system and avoiding the frequent replacement of bad disks.
为了使本技术领域的人员更好地理解本申请方案,下面结合附图和具体实施方式对本申请作进一步的详细说明。显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。In order to enable those skilled in the art to better understand the present application, the present application is further described in detail below in conjunction with the accompanying drawings and specific implementation methods. Obviously, the described embodiments are only part of the embodiments of the present application, rather than all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by ordinary technicians in the field without making creative work are within the scope of protection of the present application.
请参考图1,图1为本申请中一种分布式存储系统的数据存储方法的实施流程图,分布式存储系统中包括多个硬盘集群,硬盘集群中包括多个型号相同的硬盘,且同一硬盘集群中的各个硬盘均在同一批次加入分布式存储系统,该分布式存储系统的数据存储方法可以包括以下步骤:Please refer to FIG. 1, which is a flowchart of an implementation of a data storage method of a distributed storage system in the present application. The distributed storage system includes multiple hard disk clusters, and the hard disk clusters include multiple hard disks of the same model, and each hard disk in the same hard disk cluster is added to the distributed storage system in the same batch. The data storage method of the distributed storage system may include the following steps:
步骤S101:接收待写入数据,并划分出待写入数据的数据类型;Step S101: receiving data to be written, and classifying the data types of the data to be written;
其中,在设定的N种数据类型中,第i+1数据类型的待写入数据在第一时长内的修改次数预估值,高于第i数据类型的待写入数据在第一时长内的修改次数预估值,N为不小于2的正整数。Among them, among the set N data types, the estimated number of modifications to the data to be written of the i+1th data type within the first time period is higher than the estimated number of modifications to the data to be written of the i-th data type within the first time period, and N is a positive integer not less than 2.
具体的,本申请的方案中,定义了硬盘集群的概念,分布式存储系统中可以包括多个硬盘集群,而1个硬盘集群中则可以包括多个型号相同的硬盘。Specifically, in the solution of the present application, the concept of a hard disk cluster is defined. A distributed storage system may include multiple hard disk clusters, and one hard disk cluster may include multiple hard disks of the same model.
需要说明的是,对于不同的硬盘集群,所包含的硬盘数量和型号均可以不同,即可以是不同厂商的盘,不同大小的盘,不同时间加入集群,根据实际需要进行设置即可,并不影响本申请的实施,而同一硬盘集群中的各个硬盘,型号均相同,也即同一硬盘集群,是由一组相同容量,相同属性的硬盘组成。It should be noted that for different hard disk clusters, the number and models of hard disks included can be different, that is, the disks can be from different manufacturers, of different sizes, and added to the cluster at different times. They can be set according to actual needs and will not affect the implementation of this application. The hard disks in the same hard disk cluster have the same model, that is, the same hard disk cluster is composed of a group of hard disks with the same capacity and the same properties.
同一硬盘集群中的各个硬盘均在同一批次加入分布式存储系统,并且由于本申请实现了分布式存储系统全局的磨损均衡,因此同一硬盘集群中的各个硬盘的磨损程度是相近的,如果需要替换,也是同时进行替换。All hard disks in the same hard disk cluster are added to the distributed storage system in the same batch, and since the present application realizes global wear balancing of the distributed storage system, the wear degree of each hard disk in the same hard disk cluster is similar, and if replacement is required, they are replaced at the same time.
本申请描述的硬盘的具体类型可以根据需要进行设定和调整,通常均为SSD硬盘,当然在其他具体实施方式中可以是其他类型的硬盘,但需要指出的是,由于机械硬盘不存在摩擦损耗的问题,写入次数基本没有限制,而本申请实现磨损均衡应对的是存在写入次数限制的固态硬盘,因此本申请的方案中的硬盘通常均为存在写入次数限制的固态硬盘。The specific types of hard disks described in this application can be set and adjusted as needed, and are usually SSD hard disks. Of course, in other specific implementations, they can be other types of hard disks. However, it should be pointed out that since mechanical hard disks do not have the problem of friction loss, there is basically no limit on the number of writes. The wear leveling implemented in this application is for solid-state hard disks with a limit on the number of writes. Therefore, the hard disks in the solutions of this application are usually solid-state hard disks with a limit on the number of writes.
接收待写入数据之后,需要划分出待写入数据的数据类型。可参阅图2,为实现全局磨损均衡的原理框架图,图2中的介质使用者表示的是进行待写入数据的数据类型的划分,然后通过一个标记传给图2的闪存分配模块,使得闪存分配模块根据该标记得知待写入数据的数据类型,进而决定应当将待写入数据写入哪一个硬盘集群。
After receiving the data to be written, the data type of the data to be written needs to be divided. Please refer to Figure 2, which is a principle framework diagram for realizing global wear leveling. The medium user in Figure 2 represents the division of the data type of the data to be written, and then transmits it to the flash allocation module in Figure 2 through a tag, so that the flash allocation module knows the data type of the data to be written according to the tag, and then decides which hard disk cluster the data to be written should be written to.
本申请的方案中,为了实现全局的磨损均衡,数据越需要频繁修改,越应当写入磨损度低的硬盘集群中,换而言之,本申请进行待写入数据的数据类型的划分,是按照待写入数据未来的修改频繁程度来划分的。即,不同的数据类型,反映的是待写入数据不同的修改频繁程度,因此,对待写入数据的数据类型进行划分,也就是按照待写入数据未来修改频繁程度的不同,进行数据类型的划分。In the solution of the present application, in order to achieve global wear leveling, the more frequently the data needs to be modified, the more it should be written to the hard disk cluster with low wear. In other words, the data types of the data to be written in the present application are divided according to the frequency of future modification of the data to be written. That is, different data types reflect the different frequency of modification of the data to be written. Therefore, the data types of the data to be written are divided according to the different frequency of future modification of the data to be written.
在设定的N种数据类型中,第i+1数据类型的待写入数据在第一时长内的修改次数预估值,高于第i数据类型的待写入数据在第一时长内的修改次数预估值,N为不小于2的正整数,即至少需要设定2种数据类型,N表示的是数据类型的总类型数。Among the N data types that are set, the estimated number of modifications to the data to be written of the i+1th data type within the first time period is higher than the estimated number of modifications to the data to be written of the i-th data type within the first time period, and N is a positive integer not less than 2, that is, at least 2 data types need to be set, and N represents the total number of data types.
也就是说,待写入数据为第i数据类型时,说明待写入数据在第一时长内的修改次数预估值位于第i预估值区间中,而按照从1至N的顺序,N个预估值区间从小到大排序。That is to say, when the data to be written is of the i-th data type, it means that the estimated number of modifications of the data to be written within the first time length is in the i-th estimated value interval, and the N estimated value intervals are sorted from small to large in order from 1 to N.
可以看出,待写入数据为第1数据类型时,待写入数据在第一时长内的修改次数预估值最小,表示待写入数据几乎不需要修改,待写入数据为第N数据类型时,待写入数据在第一时长内的修改次数预估值最大,表示待写入数据需要频繁修改。It can be seen that when the data to be written is of the first data type, the estimated number of modifications of the data to be written within the first time length is the smallest, indicating that the data to be written hardly needs to be modified. When the data to be written is of the Nth data type, the estimated number of modifications of the data to be written within the first time length is the largest, indicating that the data to be written needs to be modified frequently.
为了按照待写入数据的修改频繁程度,进行待写入数据的数据类型的划分,具体实现方式上可以有多种,例如在本申请的一种具体实施方式中,步骤S101中描述的划分出待写入数据的数据类型,可以具体包括:In order to divide the data types of the data to be written according to the modification frequency of the data to be written, there may be multiple specific implementations. For example, in a specific implementation of the present application, the data types of the data to be written described in step S101 may specifically include:
基于待写入数据的文件名,划分出待写入数据的数据类型。Based on the file name of the data to be written, the data type of the data to be written is divided.
如上文的描述,本申请进行待写入数据的数据类型划分,是按照待写入数据未来的修改频繁程度来划分的,而该种实施方式考虑到,待写入数据是否需要频繁修改,可以在文件名中进行一定程度的体现,因此,可以基于待写入数据的文件名来预测待写入数据的修改频繁程度,也即划分出待写入数据的数据类型,在实现上会较为方便。As described above, the present application divides the data types of the data to be written according to the frequency of future modification of the data to be written. This implementation method takes into account whether the data to be written needs to be modified frequently, which can be reflected to a certain extent in the file name. Therefore, the frequency of modification of the data to be written can be predicted based on the file name of the data to be written, that is, the data type of the data to be written can be divided, which will be more convenient in implementation.
进一步的,在本申请的一种具体实施方式中,基于待写入数据的文件名,划分出待写入数据的数据类型,可以具体包括:Further, in a specific implementation of the present application, based on the file name of the data to be written, the data type of the data to be written is divided, which may specifically include:
当待写入数据的文件名与预设的第j数据库匹配时,将待写入数据的数据类型划分为第j数据类型;其中,j为正整数且1≤j≤N-1;When the file name of the data to be written matches the preset j-th database, the data type of the data to be written is divided into the j-th data type; wherein j is a positive integer and 1≤j≤N-1;
当待写入数据的文件名与预设的N-1个数据库均不匹配时,将待写入数据的数据类型划分为第N数据类型。When the file name of the data to be written does not match any of the preset N-1 databases, the data type of the data to be written is divided into the Nth data type.
本申请的N为不小于2的正整数,可以表示数据类型的总类型数,同时,本申请的N也是硬盘集群的总状态数,也就是说,设定的数据类型总共有几种,设定的硬盘集群的状态便有几种。通过判断待写入数据的文件名是否与相应的数据库匹配,在实现上非常简单,即该种实施方式可以方便、快速地预测出待写入数据的具体类型。N in this application is a positive integer not less than 2, which can represent the total number of data types. At the same time, N in this application is also the total number of states of the hard disk cluster, that is, the number of data types set is the number of states of the hard disk cluster set. By judging whether the file name of the data to be written matches the corresponding database, it is very simple to implement, that is, this implementation method can easily and quickly predict the specific type of the data to be written.
该种实施方式考虑到,可以预先设置N-1个数据库,这样使得待写入数据的文件名与某1个数据库匹配时,便可以确定待写入数据的数据类型为对应于该数据库的数据类型,即,如果待写入数据的文件名与N-1个数据库中的第j数据库匹配时,则将待写入数据的数据类型划分为第j数据类型。This implementation takes into account that N-1 databases can be pre-set, so that when the file name of the data to be written matches a certain database, the data type of the data to be written can be determined to be the data type corresponding to the database, that is, if the file name of the data to be written matches the j-th database among the N-1 databases, the data type of the data to be written is classified as the j-th data type.
此外需要指出的是,该种实施方式中之所以设置了N-1个数据库,而不是N个数据库,是考虑到无论设置N-1个数据库还是N个数据库,都仍然可能存在待写入数据的文件名与任意数据库均不匹配的情况,即所设置的数据库很难全面地涵盖实际应用中的各种待写入数据的文件名。
In addition, it should be pointed out that the reason why N-1 databases are set up instead of N databases in this implementation is that no matter whether N-1 databases or N databases are set up, there may still be a situation where the file name of the data to be written does not match any database, that is, the set database is difficult to comprehensively cover the file names of various data to be written in actual applications.
而如果出现找不到相匹配数据库的情况,说明待写入数据的修改频繁程度无法进行预判,因此为了避免对磨损度高的硬盘集群带来过多的磨损,该种实施方式中,将发生这种情况的待写入数据划分为第N数据类型,也即使得后续待写入数据会被写入第N状态的硬盘集群中,第N状态的硬盘集群是各种状态的硬盘集群中,磨损度最低的硬盘集群。If no matching database is found, it means that the modification frequency of the data to be written cannot be predicted. Therefore, in order to avoid excessive wear on the hard disk cluster with high wear, in this implementation, the data to be written in this situation is divided into the Nth data type, that is, the subsequent data to be written will be written to the hard disk cluster in the Nth state. The hard disk cluster in the Nth state is the hard disk cluster with the lowest wear among the hard disk clusters in various states.
由于该种实施方式中,需要将文件名与任意数据库均不匹配的待写入数据的数据类型划分为第N数据类型,因此该种实施方式中便无需设置N个数据库,而是设置N-1个数据库即可,即只要待写入数据的文件名与预设的N-1个数据库均不匹配,无论待写入数据具体是无法预测未来的修改频繁程度,还是未来确实需要进行非常频繁的修改,均直接划分为第N数据类型。Since in this implementation mode, it is necessary to classify the data type of the data to be written whose file name does not match any database as the Nth data type, there is no need to set up N databases in this implementation mode, but only N-1 databases. That is, as long as the file name of the data to be written does not match the preset N-1 databases, no matter whether the frequency of modification of the data to be written in the future cannot be predicted or whether it is indeed necessary to modify it very frequently in the future, it is directly classified as the Nth data type.
此外,N-1个数据库中的具体内容,可以由工作人员基于经验进行设定,并且也支持进行数据库内容的动态调整,以更好地适应实际需求。例如在本申请的一种具体实施方式中,还可以包括:In addition, the specific content in the N-1 databases can be set by the staff based on experience, and also supports dynamic adjustment of the database content to better meet actual needs. For example, in a specific implementation of the present application, it can also include:
接收针对第j数据库的调整指令,并根据调整指令进行第j数据库的数据项增加操作和/或数据项删除操作和/或数据项修改操作。An adjustment instruction for the jth database is received, and a data item adding operation, a data item deleting operation, and/or a data item modifying operation is performed on the jth database according to the adjustment instruction.
调整指令可以由工作人员进行下发,当接收到下发的针对第j数据库的调整指令时,便可以根据调整指令,对第j数据库的数据项进行增加,和/或删除,和/或修改的操作。The adjustment instruction may be issued by the staff. When the adjustment instruction for the j-th database is received, the data items of the j-th database may be added, and/or deleted, and/or modified according to the adjustment instruction.
还需要说明的是,待写入数据的文件名与预设的第j数据库是否匹配,具体的匹配规则也可以根据实际需要进行设定,例如一种场合中,第j数据库中设置了一种或者多种文件名后缀,只要待写入数据的文件名的后缀,与第j数据库中的任意一种文件名后缀一致,便认为待写入数据的文件名与预设的第j数据库匹配。It should also be noted that whether the file name of the data to be written matches the preset j-th database, the specific matching rules can also be set according to actual needs. For example, in one scenario, one or more file name suffixes are set in the j-th database. As long as the suffix of the file name of the data to be written is consistent with any file name suffix in the j-th database, the file name of the data to be written is considered to match the preset j-th database.
例如一种具体场合中N=3,则需要设置2个数据库,分别称为第1数据库和第2数据库。例如待写入数据的文件名的后缀为avi,bmp等格式时,便可以确定待写入数据的文件名与第1数据库匹配,则将待写入数据的划分为第1数据类型,后续便需要将待写入数据写入第1状态的硬盘集群中。而例如待写入数据的文件名的后缀为bak,log等格式时,便可以确定待写入数据的文件名与第2数据库匹配,则将待写入数据的划分为第2数据类型,后续便需要将待写入数据写入第2状态的硬盘集群中。For example, in a specific case, N=3, then two databases need to be set up, which are called the first database and the second database. For example, when the file name of the data to be written has a suffix of avi, bmp, etc., it can be determined that the file name of the data to be written matches the first database, and the data to be written is divided into the first data type, and the data to be written needs to be written to the hard disk cluster in the first state. For example, when the file name of the data to be written has a suffix of bak, log, etc., it can be determined that the file name of the data to be written matches the second database, and the data to be written is divided into the second data type, and the data to be written needs to be written to the hard disk cluster in the second state.
当然,基于文件名后缀进行匹配只是一种较为简单的实现方式举例,其他具体实施方式中,还可以设置其他更为复杂的匹配方式。例如可以分析文件名的头部,结合文件名头部的分析结果和文件名后缀,来确定待写入数据的文件名是否与相应数据库匹配。又如,可以分析文件名的整体规律,来确定待写入数据的文件名是否与相应数据库匹配。Of course, matching based on the file name suffix is only an example of a relatively simple implementation method. In other specific implementations, other more complex matching methods can also be set. For example, the header of the file name can be analyzed, and the analysis result of the file name header and the file name suffix can be combined to determine whether the file name of the data to be written matches the corresponding database. For another example, the overall pattern of the file name can be analyzed to determine whether the file name of the data to be written matches the corresponding database.
在本申请的一种具体实施方式中,还可以包括:In a specific implementation of the present application, it may also include:
以文件名作为训练样本,以训练样本在第一时长内的修改次数统计值作为训练样本的训练标签,对预设的深度学习模型进行训练;The file name is used as a training sample, and the modification count of the training sample in the first time period is used as a training label of the training sample to train the preset deep learning model;
在深度学习模型训练完毕之后,依次输入各个不同的文件名至训练完毕的深度学习模型,并基于深度学习模型的输出结果,进行N-1个数据库的数据更新。After the deep learning model is trained, different file names are input into the trained deep learning model in sequence, and data of N-1 databases are updated based on the output results of the deep learning model.
如上文的描述,N-1个数据库中的具体内容均支持进行动态调整,而该种实施方式考虑到,如果基于工作人员的经验对N-1个数据库中的具体内容进行设定和调整,工作量较大,并且调整效果也受到工作人员业务水平的影响。As described above, the specific contents in the N-1 databases all support dynamic adjustment. This implementation method takes into account that if the specific contents in the N-1 databases are set and adjusted based on the experience of the staff, the workload will be large, and the adjustment effect will also be affected by the staff's business level.
对此,该种实施方式中,可以基于深度学习模型实现N-1个数据库的数据更新。具体
的,可以建立深度学习模型,并且进行训练。在训练的过程中,训练样本便是各个文件名,而训练标签便是训练样本在第一时长内的修改次数统计值,这样使得深度学习模型训练完毕之后,输入文件名至训练完毕的深度学习模型,深度学习模型便可以输出对应于该文件名的预测结果,该预测结果便表示的是对具有该文件名的数据未来在第一时长内的修改次数的估计。In this regard, in this implementation, data updates of N-1 databases can be implemented based on the deep learning model. A deep learning model can be established and trained. During the training process, the training samples are the file names, and the training labels are the statistical values of the number of modifications of the training samples in the first time period. After the deep learning model is trained, the file name is input to the trained deep learning model, and the deep learning model can output the prediction result corresponding to the file name. The prediction result represents the estimation of the number of modifications of the data with the file name in the first time period in the future.
依次输入各个不同的文件名至训练完毕的深度学习模型之后,便可以基于深度学习模型的输出结果,进行N-1个数据库的数据更新,即,依据深度学习模型的输出结果,将不同的文件名置入相应的数据库中。After inputting each different file name into the trained deep learning model in turn, the data of N-1 databases can be updated based on the output results of the deep learning model, that is, different file names are placed in the corresponding databases according to the output results of the deep learning model.
步骤S102:当待写入数据的数据类型为第i数据类型时,判断当前是否存在第i状态的硬盘集群;如果存在第i状态的硬盘集群,则执行步骤S103。Step S102: when the data type of the data to be written is the i-th data type, determine whether there is a hard disk cluster in the i-th state; if there is a hard disk cluster in the i-th state, execute step S103.
步骤S103:选取出1个第i状态的硬盘集群。Step S103: Select a hard disk cluster in the i-th state.
划分了待写入数据的数据类型之后,如果待写入数据为第i数据类型,则只要当前存在第i状态的硬盘集群,后续便可以将待写入数据写入第i状态的硬盘集群中。After the data types of the data to be written are classified, if the data to be written is of the i-th data type, as long as there is a hard disk cluster in the i-th state, the data to be written can be subsequently written into the hard disk cluster in the i-th state.
如上文的描述,本申请的方案中,在设定的N种数据类型中,第i+1数据类型的待写入数据在第一时长内的修改次数预估值,高于第i数据类型的待写入数据在第一时长内的修改次数预估值,即第1数据类型的待写入数据几乎不需要修改,而第N数据类型的待写入数据需要被修改地最为频繁。As described above, in the scheme of the present application, among the set N data types, the estimated number of modifications of the data to be written of the i+1th data type within the first time period is higher than the estimated number of modifications of the data to be written of the i-th data type within the first time period, that is, the data to be written of the 1st data type hardly needs to be modified, while the data to be written of the Nth data type needs to be modified most frequently.
而在设定的N种状态的硬盘集群中,第i+1状态的硬盘集群的磨损度低于第i状态的硬盘集群的磨损度,i为正整数。也就是,第i状态的硬盘集群反映的是该硬盘集群的磨损度位于第i磨损度区间,因此,一共有N个磨损度区间,并且按照从1至N的顺序,N个磨损度区间是从大到小排序的。Among the N states of hard disk clusters, the wear degree of the hard disk cluster in the i+1th state is lower than the wear degree of the hard disk cluster in the ith state, where i is a positive integer. That is, the hard disk cluster in the ith state reflects that the wear degree of the hard disk cluster is in the ith wear degree interval. Therefore, there are N wear degree intervals in total, and the N wear degree intervals are sorted from large to small in the order from 1 to N.
可以看出,待写入数据需要修改地越频繁,也即表示待写入数据在第一时长内的修改次数预估值会越高,便会选出磨损度越低的硬盘集群用来存储该待写入数据。It can be seen that the more frequently the data to be written needs to be modified, that is, the higher the estimated number of modifications of the data to be written within the first period of time will be, the harder it will be to select a hard disk cluster with a lower degree of wear to store the data to be written.
硬盘集群的磨损度也即该硬盘集群的磨损程度,硬盘集群的磨损度越高,说明硬盘集群后续可以进行的写操作的次数越低,也即硬盘集群的寿命越低。在一种具体场合中,可以通过硬盘集群中已经写了多少数据,来表示硬盘集群的磨损程度。即一种具体场合中,硬盘集群的磨损度Vwl,可以定义为:硬盘集群的已写数据量/该硬盘集群中所有硬盘的可写入数据总量的总和,也就是说,硬盘集群当前已经写入了越多的数据,硬盘集群的磨损度Vwl越高,最高便是100%。The wear degree of the hard disk cluster is the wear degree of the hard disk cluster. The higher the wear degree of the hard disk cluster, the lower the number of subsequent write operations that can be performed on the hard disk cluster, that is, the lower the life of the hard disk cluster. In a specific case, the wear degree of the hard disk cluster can be represented by how much data has been written in the hard disk cluster. That is, in a specific case, the wear degree Vwl of the hard disk cluster can be defined as: the sum of the amount of written data of the hard disk cluster/the total amount of writable data of all hard disks in the hard disk cluster. In other words, the more data has been written to the hard disk cluster, the higher the wear degree Vwl of the hard disk cluster, and the highest is 100%.
此外如上文的描述,N是所设定的数据类型的总类型数,同时也是硬盘集群的总状态数。而在实际应用中,考虑到本申请对待写入数据的数据类型进行划分,相当于是对待写入数据在未来的修改频繁程程度进行预测,而可以理解的是,待写入数据写入分布式存储系统之后,实际上的修改频繁程度在不同场合,不同时间段中均可能不同,因此实际应用中,不必将待写入数据的数据类型划分地过于精细,也即硬盘集群的总状态数不必划分地过多,并且硬盘集群的总状态数过多也不便于管理,因此,一种具体实施方式中,N可以设置为3,此时,第1数据类型具体为只读数据类型,第2数据类型具体为冷数据类型,第3数据类型具体为热数据类型,硬盘集群的第1状态为Greadonly状态,硬盘集群的第2状态为Gcold状态,硬盘集群的第3状态为Ghot状态。In addition, as described above, N is the total number of data types set, and is also the total number of states of the hard disk cluster. In practical applications, considering that the present application divides the data types of the data to be written, it is equivalent to predicting the frequency of modification of the data to be written in the future. It can be understood that after the data to be written is written into the distributed storage system, the actual frequency of modification may be different in different occasions and time periods. Therefore, in practical applications, it is not necessary to divide the data types of the data to be written too finely, that is, the total number of states of the hard disk cluster does not need to be divided too much, and too many total states of the hard disk cluster are not easy to manage. Therefore, in a specific implementation, N can be set to 3. At this time, the first data type is specifically a read-only data type, the second data type is specifically a cold data type, and the third data type is specifically a hot data type. The first state of the hard disk cluster is the G readonly state, the second state of the hard disk cluster is the G cold state, and the third state of the hard disk cluster is the G hot state.
可以看出,该种实施方式中,对于写入之后将来几乎不会修改的数据,数据类型便是第
1数据类型,即只读数据类型,会被写入Greadonly状态的硬盘集群中,例如备份文件便属于这样的数据。It can be seen that in this implementation, for data that will hardly be modified in the future after being written, the data type is 1 data type, that is, read-only data type, will be written to the hard disk cluster in the G readonly state. For example, backup files belong to this type of data.
Greadonly状态的硬盘集群磨损度较高,但是仍然可以进行读,读操作不会影响硬盘寿命,将写入之后将来几乎不会修改的数据写入Greadonly状态的硬盘集群中,有利于充分发挥这类硬盘集群的价值。The hard disk cluster in the G readonly state has a high degree of wear, but it can still be read. The read operation will not affect the life of the hard disk. Writing data that will hardly be modified in the future into the hard disk cluster in the G readonly state is conducive to giving full play to the value of this type of hard disk cluster.
相应的,该例子中,写入之后会修改,但不会过于频繁修改的数据,数据类型是第2数据类型,即冷数据类型,会被写入Gcold状态的硬盘集群中。Correspondingly, in this example, data that will be modified after being written but will not be modified too frequently has a data type of the second data type, ie, a cold data type, and will be written to a hard disk cluster in the G cold state.
而写入之后会较为频繁地进行修改的数据,数据类型是第3数据类型,即热数据类型,会被写入Ghot状态的硬盘集群中。Data that will be modified frequently after being written is of the third data type, that is, the hot data type, and will be written to the hard disk cluster in the G hot state.
本申请的后文中,也均以N=3的实施方式为例进行说明。由于N=3,因此,当硬盘集群的磨损度Vwl<V1时,硬盘集群的状态为Ghot状态;当V1≤Vwl<V2时,硬盘集群的状态为Gcold状态;当Vwl≥V2时,硬盘集群的状态为Greadonly状态,此处描述的V1和V2为预设的2个参数阈值,可以由工作人员根据需要进行设定和调整,可以理解的是,所设定的V1<V2。In the following text of this application, the implementation method of N=3 is also used as an example for explanation. Since N=3, when the wear degree of the hard disk cluster Vwl<V1, the state of the hard disk cluster is G hot state; when V1≤Vwl<V2, the state of the hard disk cluster is G cold state; when Vwl≥V2, the state of the hard disk cluster is G readonly state. V1 and V2 described here are two preset parameter thresholds, which can be set and adjusted by the staff as needed. It can be understood that the set V1<V2.
还需要说明的是,当待写入数据为第i数据类型时,当前的第i状态的硬盘集群可能有多个,则需要按照设定的规则选取出1个第i状态的硬盘集群,以将待写入数据写入选取出的该第i状态的硬盘集群中,具体的选取规则可以根据实际需要进行设定和调整。It should also be noted that when the data to be written is of the i-th data type, there may be multiple hard disk clusters in the current i-th state. In this case, it is necessary to select one hard disk cluster in the i-th state according to the set rules to write the data to be written into the selected hard disk cluster in the i-th state. The specific selection rules can be set and adjusted according to actual needs.
例如在本申请的一种具体实施方式中,考虑到如果有多个第i状态的硬盘集群,虽然这些硬盘集群的磨损度属于相同的磨损度区间,但是各个第i状态的硬盘集群的磨损度仍然存在差异。对此,该种实施方式中为了更为有效地实现分布式存储系统的全局磨损均衡,考虑到可以以硬盘集群的磨损度越低则优先级越高的规则,选取出1个第i状态的硬盘集群,也就是说,对于各个第i状态的硬盘集群,可以按照磨损度从低到高进行排序,优先选取磨损度低的硬盘集群,以更加有效地实现分布式存储系统的全局磨损均衡。For example, in a specific implementation of the present application, considering that there are multiple hard disk clusters in the i-th state, although the wear degrees of these hard disk clusters belong to the same wear degree range, there are still differences in the wear degrees of each hard disk cluster in the i-th state. In this regard, in order to more effectively achieve global wear balancing of the distributed storage system in this implementation, considering that the lower the wear degree of the hard disk cluster, the higher the priority, a hard disk cluster in the i-th state can be selected. In other words, for each hard disk cluster in the i-th state, they can be sorted from low to high according to the wear degree, and the hard disk cluster with low wear degree can be selected first, so as to more effectively achieve global wear balancing of the distributed storage system.
图2的实施方式中,便存在3个Ghot状态的硬盘集群,依次标记为G1,G2和G3,G1中设置了6个SSD,依次标记为d1至d6,G2中则设置了4个SSD,依次标记为d1至d4,G3中则设置了7个SSD,依次标记为d1至d7。图2中,便是将3个Ghot状态的硬盘集群按照磨损度Vwl从小到大进行次序排列,以优先选取磨损度低的硬盘集群。In the implementation of FIG2 , there are three G hot state hard disk clusters, which are marked as G1, G2 and G3, six SSDs are set in G1, which are marked as d1 to d6, four SSDs are set in G2, which are marked as d1 to d4, and seven SSDs are set in G3, which are marked as d1 to d7. In FIG2 , the three G hot state hard disk clusters are arranged in order from small to large according to the wear degree Vwl, so as to give priority to the hard disk cluster with low wear degree.
此外可以理解的是,由于硬盘集群的磨损度会发生变化,因此,对于相同状态的各个硬盘集群的Vwl排序,也可以实时或者周期性地进行排序的动态更新。In addition, it is understandable that, since the wear degree of the hard disk cluster may change, the Vwl sorting of each hard disk cluster in the same state may also be dynamically updated in real time or periodically.
进一步的,在本申请的一种具体实施方式中,以硬盘集群的磨损度越低则优先级越高的规则,选取出1个第i状态的硬盘集群,可以具体包括:Further, in a specific implementation of the present application, selecting a hard disk cluster in the i-th state according to the rule that the lower the wear degree of the hard disk cluster, the higher the priority, may specifically include:
针对第i状态的各个硬盘集群,按照磨损度从小到大的顺序依次进行查找;For each hard disk cluster in the i-th state, search in order from the smallest to the largest wear degree;
当查找到任意1个第i状态的硬盘集群的当前集群写队列深度VGcur_queue_depth<预设的最大集群写队列深度VGmax_queue_depth时,停止查找并将该硬盘集群作为选取出的1个第i状态的硬盘集群;When the current cluster write queue depth VG cur_queue_depth of any disk cluster in the i-th state is found to be less than the preset maximum cluster write queue depth VG max_queue_depth , the search is stopped and the disk cluster is used as the selected disk cluster in the i-th state;
当各个第i状态的硬盘集群均进行了查找之后,不存在当前集群写队列深度VGcur_queue_depth<预设的最大集群写队列深度VGmax_queue_depth的硬盘集群时,则将集群繁忙度VGbusy最小的硬盘集群作为选取出的1个第i状态的硬盘集群;After searching all disk clusters in the i-th state, if there is no disk cluster whose current cluster write queue depth VG cur_queue_depth is less than the preset maximum cluster write queue depth VG max_queue_depth , the disk cluster with the smallest cluster busyness VGbusy is selected as the disk cluster in the i-th state;
其中,硬盘集群的集群繁忙度VGbusy表示的是硬盘集群的当前集群写队列深度VGcur_queue_depth除以最大集群写队列深度VGmax_queue_depth之后所得到的数值。
The cluster busyness VGbusy of the hard disk cluster represents the value obtained by dividing the current cluster write queue depth VG cur_queue_depth of the hard disk cluster by the maximum cluster write queue depth VG max_queue_depth .
该种实施方式中,不仅考虑了硬盘集群的磨损度,还考虑了硬盘集群的当前集群写队列深度,有利于保证高并发度,也即有利于实现分布式存储系统的高IOPS(Input/Output Operations Per Second,每秒读写操作次数)和高带宽。In this implementation, not only the wear of the hard disk cluster is taken into consideration, but also the current cluster write queue depth of the hard disk cluster is taken into consideration, which is conducive to ensuring high concurrency, that is, it is conducive to achieving high IOPS (Input/Output Operations Per Second) and high bandwidth of the distributed storage system.
具体的,针对第i状态的各个硬盘集群,会按照磨损度从小到大的顺序依次进行查找,查找的目的,是为了确定硬盘集群的当前集群写队列深度VGcur_queue_depth,是否小于预设的最大集群写队列深度VGmax_queue_depth。Specifically, for each disk cluster in the i-th state, searches are performed in order of wear degree from small to large. The purpose of the search is to determine whether the current cluster write queue depth VG cur_queue_depth of the disk cluster is less than the preset maximum cluster write queue depth VG max_queue_depth .
以图2为例,在Ghot状态的3个硬盘集群中,先查找的是磨损度最低的硬盘集群G1,如果G1的当前集群写队列深度VGcur_queue_depth小于预设的最大集群写队列深度VGmax_queue_depth,则说明G1当前并不繁忙,因此可以停止查找,并将G1作为按照预设规则选取出的1个第i状态的硬盘集群,使得后续可以将待写入数据写入G1。Taking Figure 2 as an example, among the three hard disk clusters in the G hot state, the first search is for the hard disk cluster G1 with the lowest wear. If the current cluster write queue depth VG cur_queue_depth of G1 is less than the preset maximum cluster write queue depth VG max_queue_depth , it means that G1 is not busy at present, so the search can be stopped and G1 is selected as the i-th state hard disk cluster according to the preset rules, so that the data to be written can be written to G1 later.
而如果G1当前非常繁忙,出现了当前集群写队列深度VGcur_queue_depth等于甚至超过最大集群写队列深度VGmax_queue_depth的情况,即便G1的磨损度低于G2和G3,但如果此时选取G1来写入待写入数据,会加重G1的繁忙程度,造成队列的堵塞,不利于实现分布式存储系统的高IOPS和高带宽。However, if G1 is currently very busy and the current cluster write queue depth VG cur_queue_depth is equal to or even exceeds the maximum cluster write queue depth VG max_queue_depth , even if G1's wear is lower than G2 and G3, if G1 is selected to write the data to be written, G1's busyness will increase, causing queue congestion, which is not conducive to achieving high IOPS and high bandwidth of the distributed storage system.
因此,该种实施方式中,此时会继续查找G2,如果G2的当前集群写队列深度VGcur_queue_depth小于预设的最大集群写队列深度VGmax_queue_depth,则说明G2当前并不繁忙,因此可以停止查找,并将G2作为按照预设规则选取出的1个第i状态的硬盘集群,使得后续可以将待写入数据写入G2。Therefore, in this implementation, G2 will continue to be searched. If the current cluster write queue depth VG cur_queue_depth of G2 is less than the preset maximum cluster write queue depth VG max_queue_depth , it means that G2 is not busy at present, so the search can be stopped and G2 will be selected as the hard disk cluster in the i-th state according to the preset rules, so that the data to be written can be written to G2 later.
如果G1,G2,G3都进行了查找,均为繁忙状态,即不存在当前集群写队列深度VGcur_queue_depth<预设的最大集群写队列深度VGmax_queue_depth的硬盘集群时,则将集群繁忙度VGbusy最小的硬盘集群作为按照预设规则选取出的1个第i状态的硬盘集群。If G1, G2, and G3 are all searched and are all busy, that is, there is no disk cluster whose current cluster write queue depth VG cur_queue_depth is less than the preset maximum cluster write queue depth VG max_queue_depth , then the disk cluster with the smallest cluster busyness VGbusy is selected as the disk cluster in the i-th state according to the preset rule.
硬盘集群的集群繁忙度VGbusy表示的是硬盘集群的当前集群写队列深度VGcur_queue_depth除以最大集群写队列深度VGmax_queue_depth之后所得到的数值,即VGbusy=VGcur_queue_depth/VGmax_queue_depth。The cluster busyness VGbusy of the hard disk cluster represents the value obtained by dividing the current cluster write queue depth VG cur_queue_depth of the hard disk cluster by the maximum cluster write queue depth VG max_queue_depth , that is, VGbusy=VG cur_queue_depth /VG max_queue_depth .
最大集群写队列深度VGmax_queue_depth与硬件相关,表示的是1个硬盘集群能够同时处理的最大写请求数量,具体取决于该硬盘集群的计算资源和存储资源。The maximum cluster write queue depth VG max_queue_depth is hardware-related and indicates the maximum number of write requests that a disk cluster can process simultaneously, which depends on the computing resources and storage resources of the disk cluster.
步骤S104:将待写入数据写入选取出的该第i状态的硬盘集群中。Step S104: writing the data to be written into the selected hard disk cluster in the i-th state.
选取出了1个用于存储待写入数据的硬盘集群之后,便可以将待写入数据写入选取出的硬盘集群中。After a hard disk cluster for storing the data to be written is selected, the data to be written can be written into the selected hard disk cluster.
在本申请的一种具体实施方式中,步骤S104可以具体包括:In a specific implementation of the present application, step S104 may specifically include:
以硬盘的磨损度越低则优先级越高的规则,从选取出的该第i状态的硬盘集群中选取出目标硬盘;According to the rule that the lower the wear degree of the hard disk, the higher the priority, the target hard disk is selected from the selected hard disk cluster in the i-th state;
将待写入数据写入选取出的目标硬盘中。Write the data to be written into the selected target hard disk.
该种实施方式考虑到,确定出了用于存储待写入数据的硬盘集群之后,便是从该硬盘集群中选取出用于存储待写入数据的具体硬盘,而考虑到虽然相同的硬盘集群中的各个硬盘的磨损度大体相同,但仍然存在一定程度的差异,为了更加有效地保障硬盘集群中的各个硬盘的磨损均衡,该种实施方式中,优先选取的是磨损度低的硬盘作为选取出的目标硬盘。也就是说,对于选取出的硬盘集群中的各个硬盘,可以按照磨损度从低到高进行排序,优先选取磨损度低的硬盘,以更加有效地实现分布式存储系统的全局磨损均衡。
This implementation method takes into account that after determining the hard disk cluster for storing the data to be written, the specific hard disk for storing the data to be written is selected from the hard disk cluster. Although the wear degree of each hard disk in the same hard disk cluster is generally the same, there are still some differences. In order to more effectively ensure the wear balance of each hard disk in the hard disk cluster, in this implementation method, the hard disk with low wear degree is preferentially selected as the selected target hard disk. In other words, the hard disks in the selected hard disk cluster can be sorted from low to high according to the wear degree, and the hard disk with low wear degree is preferentially selected to more effectively achieve global wear balance of the distributed storage system.
此外可以理解的是,由于硬盘的磨损度会发生变化,因此,对于硬盘集群中的各个硬盘的磨损度排序,也可以实时或者周期性地进行该排序的动态更新。In addition, it is understandable that, since the wear degree of the hard disk may change, the wear degree ranking of each hard disk in the hard disk cluster may also be dynamically updated in real time or periodically.
进一步的,在本申请的一种具体实施方式中,以硬盘的磨损度越低则优先级越高的规则,从选取出的该第i状态的硬盘集群中选取出目标硬盘,包括:Further, in a specific implementation of the present application, selecting a target hard disk from the selected hard disk cluster in the i-th state according to the rule that the lower the wear degree of the hard disk, the higher the priority, includes:
针对选取出的该第i状态的硬盘集群中的各个硬盘,按照磨损度从小到大的顺序依次进行查找;Search each hard disk in the selected hard disk cluster in the i-th state in order of the wear degree from small to large;
当查找到该第i状态的硬盘集群中的任意1个硬盘的当前硬盘写队列深度VDcur_queue_depth<预设的最大硬盘写队列深度VDmax_queue_depth时,停止查找并将该硬盘作为选取出的目标硬盘;When it is found that the current hard disk write queue depth VD cur_queue_depth of any hard disk in the hard disk cluster in the i-th state is less than the preset maximum hard disk write queue depth VD max_queue_depth , the search is stopped and the hard disk is selected as the target hard disk;
当该第i状态的硬盘集群中的各个硬盘均进行了查找之后,不存在当前硬盘写队列深度VDcur_queue_depth<预设的最大硬盘写队列深度VDmax_queue_depth的硬盘时,则将硬盘繁忙度VDbusy最小的硬盘作为选取出的目标硬盘;After searching all hard disks in the hard disk cluster in the i-th state, if there is no hard disk whose current hard disk write queue depth VD cur_queue_depth is less than the preset maximum hard disk write queue depth VD max_queue_depth , the hard disk with the smallest hard disk busyness VDbusy is selected as the target hard disk;
其中,硬盘的硬盘繁忙度VDbusy表示的是硬盘的当前硬盘写队列深度VDcur_queue_depth除以最大硬盘写队列深度VDmax_queue_depth之后所得到的数值。The hard disk busyness VDbusy of the hard disk represents a value obtained by dividing the current hard disk write queue depth VD cur_queue_depth of the hard disk by the maximum hard disk write queue depth VD max_queue_depth .
该种实施方式中,与前文的实施方式中从第i状态的各个硬盘集群中选取1个硬盘集群的原理相同,即对于硬盘集群中的各个硬盘,在选取出目标硬盘时,不仅考虑了硬盘的磨损度,还考虑了硬盘的当前硬盘写队列深度,有利于保证高并发度,也即有利于实现分布式存储系统的高IOPS和高带宽。In this implementation, the principle of selecting one hard disk cluster from each hard disk cluster in the i-th state is the same as in the previous implementation, that is, for each hard disk in the hard disk cluster, when selecting the target hard disk, not only the wear of the hard disk is considered, but also the current hard disk write queue depth of the hard disk is considered, which is conducive to ensuring high concurrency, that is, it is conducive to achieving high IOPS and high bandwidth of the distributed storage system.
具体的,选取出1个第i状态的硬盘集群之后,针对选取出的该第i状态的硬盘集群中的各个硬盘,会按照磨损度从小到大的顺序依次进行查找,查找的目的,是为了确定硬盘的当前硬盘写队列深度VDcur_queue_depth,是否小于预设的最大硬盘写队列深度VDmax_queue_depth。Specifically, after selecting a hard disk cluster in the i-th state, each hard disk in the selected hard disk cluster in the i-th state will be searched in order from small to large wear degree. The purpose of the search is to determine whether the current hard disk write queue depth VD cur_queue_depth of the hard disk is less than the preset maximum hard disk write queue depth VD max_queue_depth .
最先查找的是磨损度最低的硬盘,如果该硬盘的当前硬盘写队列深度VDcur_queue_depth小于预设的最大硬盘写队列深度VDmax_queue_depth。则说明该硬盘当前并不繁忙,因此可以停止查找,并将该硬盘作为目标硬盘,使得后续可以将待写入数据写入该硬盘。The first hard disk to be searched is the one with the lowest wear. If the current hard disk write queue depth VD cur_queue_depth of the hard disk is less than the preset maximum hard disk write queue depth VD max_queue_depth , it means that the hard disk is not busy at present, so the search can be stopped and the hard disk can be used as the target hard disk so that the data to be written can be written to the hard disk later.
而如果磨损度最低的硬盘当前非常繁忙,出现了当前硬盘写队列深度VDcur_queue_depth等于甚至超过最大硬盘写队列深度VDmax_queue_depth的情况,如果此时选取该硬盘来写入待写入数据,会加重该硬盘的繁忙程度,造成队列的堵塞,不利于实现分布式存储系统的高IOPS和高带宽。However, if the hard disk with the lowest wear is currently very busy, and the current hard disk write queue depth VD cur_queue_depth is equal to or even exceeds the maximum hard disk write queue depth VD max_queue_depth , if this hard disk is selected to write the data to be written, it will increase the busyness of the hard disk and cause queue congestion, which is not conducive to achieving high IOPS and high bandwidth of the distributed storage system.
因此,该种实施方式中,此时会继续查找其他硬盘,原理与上文相同,不再重复说明。Therefore, in this implementation mode, other hard disks will continue to be searched at this time. The principle is the same as above and will not be repeated.
同样的,如果硬盘集群中的各个硬盘均进行了查找,均为繁忙状态,即不存在当前硬盘写队列深度VDcur_queue_depth<预设的最大硬盘写队列深度VDmax_queue_depth的硬盘时,则将硬盘繁忙度VDbusy最小的硬盘作为目标硬盘。Similarly, if all hard disks in the hard disk cluster are searched and are all busy, that is, there is no hard disk whose current hard disk write queue depth VD cur_queue_depth is less than the preset maximum hard disk write queue depth VD max_queue_depth , then the hard disk with the smallest hard disk busyness VDbusy is used as the target hard disk.
硬盘繁忙度VDbusy表示的是硬盘的当前硬盘写队列深度VDcur_queue_depth除以最大硬盘写队列深度VDmax_queue_depth之后所得到的数值,即VDbusy=VDcur_queue_depth/VDmax_queue_depth。The hard disk busyness VDbusy represents the value obtained by dividing the current hard disk write queue depth VD cur_queue_depth of the hard disk by the maximum hard disk write queue depth VD max_queue_depth , that is, VDbusy=VD cur_queue_depth /VD max_queue_depth .
最大硬盘写队列深度VDmax_queue_depth与硬件相关,表示的是1个硬盘能够同时处理的最大写请求数量,具体取决于该硬盘的计算资源和存储资源。The maximum disk write queue depth VD max_queue_depth is hardware-related and indicates the maximum number of write requests that a hard disk can process simultaneously, which depends on the computing resources and storage resources of the hard disk.
可以理解的是,本申请方案中,硬盘集群的状态会随着时间的推移发生变化,仍然以N=3为例,刚开始时,一个新加入分布式存储系统的硬盘集群的状态是Ghot状态,随着数据的磨损,硬盘集群的磨损度Vwl会越来越高,当其大于或者等于V1时,其状态其改为Gcold状
态。当硬盘集群的磨损度Vwl大于或者等于V2时,其状态其改为Greadonly状态。实际应用中,当状态为Greadonly状态的硬盘集群数量过多时,系统可以报警并提示更新SSD盘。It is understandable that in the present application, the state of the hard disk cluster will change over time. Still taking N=3 as an example, at the beginning, the state of a hard disk cluster newly added to the distributed storage system is G hot . As the data wears out, the wear degree Vwl of the hard disk cluster will become higher and higher. When it is greater than or equal to V1, its state is changed to G cold . When the wear level Vwl of the hard disk cluster is greater than or equal to V2, its state is changed to G readonly state. In practical applications, when there are too many hard disk clusters in the G readonly state, the system can alarm and prompt to update the SSD disk.
在本申请的一种具体实施方式中,还可以包括:In a specific implementation of the present application, it may also include:
当待写入数据为第2数据类型,且判断出当前不存在第2状态的硬盘集群时,将待写入数据写入Ghot状态的硬盘集群中。When the data to be written is of the second data type and it is determined that there is no hard disk cluster in the second state currently, the data to be written is written into the hard disk cluster in the G hot state.
如上文的描述,硬盘集群的状态会随着时间的推移发生变化,而分布式存储系统刚建立时,由于各个硬盘集群均未写入数据,因此磨损度均很低,即此时各个硬盘集群的状态均为Ghot状态。此时,如果待写入数据被划分为第2数据类型,由于当前不存在第2状态的硬盘集群,即不存在Gcold状态的硬盘集群,因此,可以将待写入数据写入Ghot状态的硬盘集群中。As described above, the state of the hard disk cluster will change over time. When the distributed storage system is just established, since no data has been written to each hard disk cluster, the wear degree is very low, that is, the state of each hard disk cluster is in the G hot state. At this time, if the data to be written is classified as the second data type, since there is currently no hard disk cluster in the second state, that is, there is no hard disk cluster in the G cold state, the data to be written can be written to the hard disk cluster in the G hot state.
在本申请的一种具体实施方式中,还可以包括:In a specific implementation of the present application, it may also include:
当待写入数据为第1数据类型,且判断出当前不存在第1状态的硬盘集群时,将待写入数据写入Ghot状态或者Gcold状态的硬盘集群中。When the data to be written is of the first data type and it is determined that there is no hard disk cluster in the first state currently, the data to be written is written into a hard disk cluster in the G hot state or the G cold state.
该种实施方式考虑到,当待写入数据被划分为第1数据类型时,如果当前不存在第1状态的硬盘集群,即不存在Greadonly状态的硬盘集群,则可以将待写入数据写入Ghot状态或者Gcold状态的硬盘集群中。This implementation takes into account that when the data to be written is classified as the first data type, if there is currently no hard disk cluster in the first state, that is, there is no hard disk cluster in the G readonly state, the data to be written can be written to a hard disk cluster in the G hot state or the G cold state.
此外需要说明的是,当前不存在第1状态的硬盘集群,可能是各个硬盘集群的磨损度均比较低,例如N=3的例子中,可能是各个硬盘集群的磨损度均比较低,只有Ghot状态或者Gcold状态的硬盘集群;也可能是存在磨损度很高的硬盘,但该硬盘完全写满了无法再写入新的数据,此时也可以视为是当前不存在第1状态的硬盘集群,需要将待写入数据写入Ghot状态或者Gcold状态的硬盘集群中。In addition, it should be noted that there is currently no hard disk cluster in the first state. It may be that the wear of each hard disk cluster is relatively low. For example, in the example of N=3, it may be that the wear of each hard disk cluster is relatively low, and there is only a hard disk cluster in the G hot state or the G cold state. It may also be that there is a hard disk with a very high wear degree, but the hard disk is completely full and no new data can be written. In this case, it can also be regarded as that there is currently no hard disk cluster in the first state, and the data to be written needs to be written to the hard disk cluster in the G hot state or the G cold state.
在本申请的一种具体实施方式中,还可以包括:In a specific implementation of the present application, it may also include:
当待写入数据为第3数据类型,且判断出当前不存在第3状态的硬盘集群时,反馈写失败的提示信息。When the data to be written is of the third data type and it is determined that there is no disk cluster in the third state currently, a prompt message indicating write failure is fed back.
该种实施方式考虑到,当待写入数据为第3数据类型时,如果当前不存在第3状态的硬盘集群,即不存在Ghot状态的硬盘集群,则说明分布式存储系统的存储空间已经被大量使用,剩余空间不足,因此为了避免数据丢失,该种实施方式中会直接反馈写失败的提示信息。This implementation takes into account that when the data to be written is of the third data type, if there is currently no hard disk cluster in the third state, that is, there is no hard disk cluster in the G hot state, it means that the storage space of the distributed storage system has been used in large quantities and there is insufficient remaining space. Therefore, in order to avoid data loss, this implementation will directly feedback a prompt message of write failure.
此外实际应用中,还可以向系统报警,以提醒工作人员及时增加额外的资源至分布式存储系统。In addition, in actual applications, an alarm can be sent to the system to remind staff to add additional resources to the distributed storage system in a timely manner.
在本申请的一种具体实施方式中,分布式存储系统中的各个硬盘集群均设置在第一介质层,分布式存储系统中还设置有SCM介质层,以通过SCM介质层存储目标类型的数据,并通过SCM介质层处理未块对齐的写入数据。In a specific embodiment of the present application, each hard disk cluster in the distributed storage system is arranged in the first medium layer, and an SCM medium layer is also arranged in the distributed storage system to store target type data through the SCM medium layer and process non-block-aligned write data through the SCM medium layer.
本申请的方案中所使用的闪存介质通常可以为QLC或者PLC(Penta Level Cell,五层式储存单元),由于实现了全局磨损均衡,因此在具有高数据密度,高能效比,低价格的情况下,保障了使用寿命,也即实现了较高的性价比。The flash memory medium used in the solution of the present application can generally be QLC or PLC (Penta Level Cell, five-layer storage unit). Since global wear leveling is achieved, the service life is guaranteed while having high data density, high energy efficiency ratio and low price, that is, a high cost performance is achieved.
而该种实施方式中进一步地考虑到,硬盘集群中的数据通常是按照设定大小的块进行存储,在分布式存储系统中还可以设置有SCM(Storage-Class Memory,存储级内存)介质层,通过SCM介质层处理未块对齐的写入数据,可以有效地提高IOPS。大块IO(Input/Output,输入/输出)划分为各个块之后,可能会剩余部分,这部分便是未块对齐的写入数据,也即边界没有对齐的IO,此外,有的小块IO也属于未块对齐的写入数据。
This implementation further considers that the data in the hard disk cluster is usually stored in blocks of a set size, and an SCM (Storage-Class Memory) medium layer can also be set in the distributed storage system. By processing the non-block-aligned write data through the SCM medium layer, the IOPS can be effectively improved. After the large block IO (Input/Output) is divided into blocks, there may be a remaining part, which is the non-block-aligned write data, that is, the IO with no aligned boundaries. In addition, some small block IO also belongs to non-block-aligned write data.
此外在部分实施方式中,对于需要修改地极其频繁的数据,也可以不存储至第一介质层,即不存储至SSD中,而是直接放置在SCM中,以实现这类数据的高速读写,也有利于进一步提高分布式存储系统的寿命。In addition, in some implementations, data that needs to be modified very frequently may not be stored in the first medium layer, that is, not stored in the SSD, but may be directly placed in the SCM to achieve high-speed reading and writing of such data, which is also beneficial to further improve the life of the distributed storage system.
可参阅图3,为一种具体实施方式中的分布式存储系统的多层闪存架构示意图,在图3的实施方式中,便设置了第一介质层和SCM介质层。且图3的第一介质层具体为PLC介质层,也是实际应用中通常会选择的方案,PLC介质层具有高数据密度,高能效比,低价格的优点,而通过本申请的全局磨损均衡的策略,有效地保障了PLC介质层的使用寿命,提高了其性价比。此外,部分场合中,第一介质层也可以为QLC介质层。Please refer to Figure 3, which is a schematic diagram of a multi-layer flash memory architecture of a distributed storage system in a specific implementation. In the implementation of Figure 3, a first medium layer and an SCM medium layer are set. The first medium layer in Figure 3 is specifically a PLC medium layer, which is also the solution usually selected in practical applications. The PLC medium layer has the advantages of high data density, high energy efficiency ratio, and low price. Through the global wear leveling strategy of this application, the service life of the PLC medium layer is effectively guaranteed and its cost performance is improved. In addition, in some occasions, the first medium layer can also be a QLC medium layer.
在本申请的一种具体实施方式中,在每1个硬盘集群中,数据均按照设定大小的块进行存储,分布式存储系统的数据存储方法还包括:In a specific implementation of the present application, in each hard disk cluster, data is stored in blocks of a set size, and the data storage method of the distributed storage system further includes:
确定出第i状态的硬盘集群中的各个块的P/E次数,并统计出该硬盘集群中的各个块的P/E次数的平均值;Determine the P/E times of each block in the hard disk cluster in the i-th state, and calculate the average value of the P/E times of each block in the hard disk cluster;
当该第i状态的硬盘集群中存在P/E次数与平均值之间的差值低于设定的第一数值的块时,则将在P/E次数与平均值之间的差值低于设定的第一数值的各个块均作为待迁移块;When there are blocks in the disk cluster in the i-th state whose difference between the P/E times and the average value is lower than the set first value, all blocks whose difference between the P/E times and the average value is lower than the set first value are taken as blocks to be migrated;
如果当前存在第i+1状态的硬盘集群,则将硬盘集群中的各个待迁移块,迁移至第i+1状态的硬盘集群中。If there is currently a hard disk cluster in the (i+1)th state, each block to be migrated in the hard disk cluster is migrated to the hard disk cluster in the (i+1)th state.
该种实施方式中进一步地考虑到,在前述实施方式中,是对待写入数据的数据类型进行划分,相当于是预测待写入数据未来的修改频繁程度,可以理解的是,预测结果会存在偏差,并且即便是相同的待写入数据,在不同场合下,不同时间段,其实际的修改频繁程度也可能是不同的,因此,该种实施方式中,会进行数据的迁移。This implementation further takes into account that in the aforementioned implementation, the data type of the data to be written is divided, which is equivalent to predicting the frequency of future modification of the data to be written. It is understandable that there will be deviations in the prediction results, and even for the same data to be written, the actual frequency of modification may be different in different occasions and different time periods. Therefore, in this implementation, data migration will be performed.
P/E次数也可以称为循环擦写次数,如果某个块的P/E次数很低,说明该块中的数据未被频繁修改,反之,如果某个块的P/E次数很高,说明该块中的数据被频繁修改。The P/E number can also be called the number of erase cycles. If the P/E number of a block is very low, it means that the data in the block is not frequently modified. Conversely, if the P/E number of a block is very high, it means that the data in the block is frequently modified.
具体的,针对第i状态的硬盘集群,会确定出第i状态的硬盘集群中的各个块的P/E次数,并且统计出该硬盘集群中的各个块的P/E次数的平均值。如果某1个块的P/E次数远低于平均值,该块便会作为待迁移块。如果当前存在第i+1状态的硬盘集群,则将该第i状态的硬盘集群中的各个待迁移块,迁移至第i+1状态的硬盘集群中。Specifically, for the disk cluster in the i-th state, the P/E times of each block in the disk cluster in the i-th state will be determined, and the average value of the P/E times of each block in the disk cluster will be calculated. If the P/E times of a block are much lower than the average value, the block will be used as a block to be migrated. If there is currently a disk cluster in the i+1th state, each block to be migrated in the disk cluster in the i-th state will be migrated to the disk cluster in the i+1th state.
例如对于图2的G1中,有2个块的P/E次数特别低,说明这2个块修改频繁程度很低,因此将这2个块的数据迁移至Gcold状态的硬盘集群中。For example, in G1 of FIG2 , the P/E times of two blocks are particularly low, indicating that the modification frequency of these two blocks is very low, so the data of these two blocks are migrated to the hard disk cluster in the G cold state.
进一步的,在本申请的一种具体实施方式中,还可以包括:Furthermore, in a specific implementation of the present application, it may also include:
在确定出各个待迁移块之后,如果当前不存在第i+1状态的硬盘集群,则将该第i状态的硬盘集群中的K个P/E次数最大的块,与该第i状态的硬盘集群中的K个待迁移块中的数据进行互换,以完成K个待迁移块在该第i状态的硬盘集群的内部迁移;After determining each block to be migrated, if there is no hard disk cluster in the i+1th state, the K blocks with the largest P/E times in the hard disk cluster in the i-th state are exchanged with the data in the K blocks to be migrated in the hard disk cluster in the i-th state, so as to complete the internal migration of the K blocks to be migrated in the hard disk cluster in the i-th state;
其中,K表示的是该第i状态的硬盘集群中所确定出的待迁移块的数量。Here, K represents the number of blocks to be migrated determined in the hard disk cluster in the i-th state.
例如,对于Greadonly状态的硬盘集群,便无法进行迁移,即对于第N状态的硬盘集群,因为不存在第N+1状态的硬盘集群,便无法进行迁移。For example, a hard disk cluster in the G readonly state cannot be migrated, that is, a hard disk cluster in the Nth state cannot be migrated because there is no hard disk cluster in the N+1th state.
又如,第i状态为Gcold状态时,对于Gcold状态的某个硬盘集群,确定出了若干个待迁移块之后,检测出各个Greadonly状态的硬盘集群的存储空间均已用尽,便也视为不存在第i+1状态的硬盘集群,则会执行该种实施方式的操作,即在硬盘集群的内部进行迁移。换而言之,如果迁移目的地不存在或者迁移目的地的空间已满,待迁移块将会将在源硬盘集群内进行迁
移。For another example, when the i-th state is the G cold state, for a certain hard disk cluster in the G cold state, after determining a number of blocks to be migrated, it is detected that the storage space of each hard disk cluster in the G readonly state has been exhausted, and it is also deemed that there is no hard disk cluster in the i+1 state, and the operation of this implementation method will be executed, that is, migration will be performed within the hard disk cluster. In other words, if the migration destination does not exist or the space of the migration destination is full, the blocks to be migrated will be migrated within the source hard disk cluster. shift.
进行内部迁移时,是将第i状态的硬盘集群中的K个P/E次数最大的块,与硬盘集群中的K个待迁移块中的数据进行互换,从而进一步地保障了硬盘集群中的各个块的磨损均衡。When performing internal migration, the data in the K blocks with the largest P/E times in the hard disk cluster in the i-th state are exchanged with the data in the K blocks to be migrated in the hard disk cluster, thereby further ensuring the wear balance of each block in the hard disk cluster.
例如某个硬盘集群中有100个块,其中,1号块的P/E次数为2,2号块的P/E次数为3,3号块的P/E次数为90,4号块的P/E次数为80,剩余的96个块的P/E次数例如均为30。则该种实施方式中,1号块和2号块是待迁移块,该硬盘集群中的2个P/E次数最大的块是3号块和4号块,因此,需要将1号块和2号块的数据,与3号块和4号块的数据进行互换,例如具体可以将1号块的数据与3号块的数据互换,将2号块的数据与4号块的数据互换,以完成2个待迁移块在该硬盘集群的内部迁移。For example, there are 100 blocks in a hard disk cluster, among which the P/E times of block 1 is 2, the P/E times of block 2 is 3, the P/E times of block 3 is 90, the P/E times of block 4 is 80, and the P/E times of the remaining 96 blocks are, for example, all 30. In this implementation, block 1 and block 2 are blocks to be migrated, and the two blocks with the largest P/E times in the hard disk cluster are block 3 and block 4. Therefore, the data of block 1 and block 2 need to be exchanged with the data of block 3 and block 4. For example, the data of block 1 can be exchanged with the data of block 3, and the data of block 2 can be exchanged with the data of block 4, so as to complete the internal migration of the two blocks to be migrated in the hard disk cluster.
应用本申请实施例所提供的技术方案,考虑到为了充分实现闪存介质的高性价比特性,不能局限于在一个硬盘内做磨损均衡,而是需要基于分布式存储系统做全局的磨损均衡。具体的,本申请的方案中将分布式存储系统划分为多个硬盘集群,进行硬盘集群的管理相较于直接进行各个硬盘的管理,所需要的管理数据即元数据量较低,便于实现。并且,针对任意1个硬盘集群,硬盘集群中包括多个型号相同的硬盘,且同一硬盘集群中的各个硬盘均在同一批次加入分布式存储系统,通过这样的设置,有利于方便地实现硬盘集群中的各个硬盘的磨损均衡,再通过实现硬盘集群之间的磨损均衡,便可以实现整个分布式存储系统的全局磨损均衡,也就保障了硬盘的耐久度,可以避免频繁地进行坏盘的更换的情况。Applying the technical solution provided by the embodiment of the present application, considering that in order to fully realize the high cost-effectiveness of flash memory media, it is not limited to wear leveling within one hard disk, but it is necessary to perform global wear leveling based on a distributed storage system. Specifically, in the solution of the present application, the distributed storage system is divided into multiple hard disk clusters. Compared with directly managing each hard disk, the management data required for the hard disk cluster is lower, that is, the amount of metadata, which is easy to implement. In addition, for any one hard disk cluster, the hard disk cluster includes multiple hard disks of the same model, and each hard disk in the same hard disk cluster is added to the distributed storage system in the same batch. Through such a setting, it is conducive to conveniently realizing the wear leveling of each hard disk in the hard disk cluster. Then, by realizing the wear leveling between the hard disk clusters, the global wear leveling of the entire distributed storage system can be realized, which also ensures the durability of the hard disk and can avoid the situation of frequent replacement of bad disks.
具体的,在进行硬盘集群之间的磨损均衡时,本申请是将硬盘集群划分为N种状态,接收待写入数据之后,会划分待写入数据的数据类型,当待写入数据划分为第i数据类型时,判断当前是否存在第i状态的硬盘集群,如果存在,则会选取出1个第i状态的硬盘集群。由于不同数据类型反映的是待写入数据未来修改频繁程度的不同,并且具体的,N种数据类型中,第i+1数据类型的待写入数据在第一时长内的修改次数预估值,高于第i数据类型的待写入数据在第一时长内的修改次数预估值。而在设定的N种状态的硬盘集群中,第i+1状态的硬盘集群的磨损度低于第i状态的硬盘集群的磨损度。可以看出,对于几乎不需要修改的数据,即待写入数据在第一时长内的修改次数预估值很低时,待写入数据会被划分为第1数据类型,因此会被写入第1状态的硬盘集群中,第1状态的硬盘集群的磨损度最高,即说明第1状态的硬盘集群已写了大量的数据量,因此写入的是几乎不需要修改的数据。对硬盘进行写操作会磨损硬盘而读操作不会磨损硬盘,可以看出,由于对于磨损度较高的硬盘集群,写入的是几乎不需要修改的数据,使得即便硬盘集群磨损度较高,也仍然可以进行读,从而充分发挥其剩余价值。相应的,待写入数据需要被修改地越频繁,即待写入数据在第一时长内的修改次数预估值越高时,待写入数据便会被写入磨损度越低的硬盘集群中,以便对磨损度低的硬盘集群进行更为充分的使用,实现了分布式存储系统的全局磨损均衡。Specifically, when performing wear leveling between hard disk clusters, the present application divides the hard disk cluster into N states. After receiving the data to be written, the data type of the data to be written will be divided. When the data to be written is divided into the i-th data type, it is determined whether there is a hard disk cluster in the i-th state at present. If so, a hard disk cluster in the i-th state will be selected. Since different data types reflect the different frequencies of future modification of the data to be written, and specifically, among the N data types, the estimated number of modifications of the data to be written of the i+1th data type within the first time length is higher than the estimated number of modifications of the data to be written of the i-th data type within the first time length. And among the hard disk clusters in the set N states, the wear degree of the hard disk cluster in the i+1th state is lower than the wear degree of the hard disk cluster in the i-th state. It can be seen that for data that almost does not need to be modified, that is, when the estimated number of modifications of the data to be written within the first time length is very low, the data to be written will be divided into the first data type, and therefore will be written into the hard disk cluster in the first state. The wear degree of the hard disk cluster in the first state is the highest, which means that the hard disk cluster in the first state has written a large amount of data, so the data written is data that almost does not need to be modified. Writing to the hard disk will wear out the hard disk, but reading will not. It can be seen that, since the data written to the hard disk cluster with a high degree of wear is data that hardly needs to be modified, even if the hard disk cluster has a high degree of wear, it can still be read, thus giving full play to its residual value. Correspondingly, the more frequently the data to be written needs to be modified, that is, the higher the estimated number of modifications to the data to be written within the first time period, the data to be written will be written to the hard disk cluster with a lower degree of wear, so that the hard disk cluster with a lower degree of wear can be used more fully, thus realizing the global wear leveling of the distributed storage system.
综上,本申请将分布式存储系统划分为多个硬盘集群,有利于方便地实现分布式存储系统的全局磨损均衡,也就保障了分布式存储系统中的硬盘的耐久度,可以避免出现频繁地进行坏盘的更换的情况。In summary, the present application divides the distributed storage system into multiple hard disk clusters, which is conducive to conveniently realizing global wear leveling of the distributed storage system, thereby ensuring the durability of the hard disks in the distributed storage system and avoiding the frequent replacement of bad disks.
相应于上面的方法实施例,本申请实施例还提供了一种分布式存储系统的数据存储系统,可与上文相互对应参照。Corresponding to the above method embodiment, the embodiment of the present application further provides a data storage system of a distributed storage system, which can be referenced in correspondence with the above.
分布式存储系统中包括多个硬盘集群,硬盘集群中包括多个型号相同的硬盘,且同一硬盘集群中的各个硬盘均在同一批次加入分布式存储系统,可参阅图4,该分布式存储系统的
数据存储系统包括:The distributed storage system includes multiple hard disk clusters, each hard disk cluster includes multiple hard disks of the same model, and each hard disk in the same hard disk cluster is added to the distributed storage system in the same batch. The data storage system includes:
类型划分模块401,用于接收待写入数据,并划分出待写入数据的数据类型;其中,在设定的N种数据类型中,第i+1数据类型的待写入数据在第一时长内的修改次数预估值,高于第i数据类型的待写入数据在第一时长内的修改次数预估值,N为不小于2的正整数;The type classification module 401 is used to receive the data to be written and classify the data types of the data to be written; wherein, among the set N data types, the estimated number of modifications of the data to be written of the i+1th data type within the first time length is higher than the estimated number of modifications of the data to be written of the ith data type within the first time length, and N is a positive integer not less than 2;
硬盘集群状态判断模块402,用于当待写入数据的数据类型为第i数据类型时,判断当前是否存在第i状态的硬盘集群;如果存在第i状态的硬盘集群,则触发硬盘集群选择模块403;The hard disk cluster status judgment module 402 is used to judge whether there is a hard disk cluster in the i-th state when the data type of the data to be written is the i-th data type; if there is a hard disk cluster in the i-th state, the hard disk cluster selection module 403 is triggered;
硬盘集群选择模块403,用于选取出1个第i状态的硬盘集群;The hard disk cluster selection module 403 is used to select a hard disk cluster in the i-th state;
写入模块404,用于将待写入数据写入选取出的该第i状态的硬盘集群中;其中,在设定的N种状态的硬盘集群中,第i+1状态的硬盘集群的磨损度低于第i状态的硬盘集群的磨损度,i为正整数。The writing module 404 is used to write the data to be written into the selected hard disk cluster in the i-th state; wherein, among the hard disk clusters in the set N states, the wear degree of the hard disk cluster in the i+1-th state is lower than the wear degree of the hard disk cluster in the i-th state, and i is a positive integer.
在本申请的一种具体实施方式中,类型划分模块401划分出待写入数据的数据类型,包括:In a specific implementation of the present application, the type classification module 401 classifies the data types of the data to be written, including:
基于待写入数据的文件名,划分出待写入数据的数据类型。Based on the file name of the data to be written, the data type of the data to be written is divided.
在本申请的一种具体实施方式中,类型划分模块401基于待写入数据的文件名,划分出待写入数据的数据类型,包括:In a specific implementation of the present application, the type classification module 401 classifies the data types of the data to be written based on the file names of the data to be written, including:
当待写入数据的文件名与预设的第j数据库匹配时,将待写入数据的数据类型划分为第j数据类型;其中,j为正整数且1≤j≤N-1;When the file name of the data to be written matches the preset j-th database, the data type of the data to be written is divided into the j-th data type; wherein j is a positive integer and 1≤j≤N-1;
当待写入数据的文件名与预设的N-1个数据库均不匹配时,将待写入数据的数据类型划分为第N数据类型。When the file name of the data to be written does not match any of the preset N-1 databases, the data type of the data to be written is divided into the Nth data type.
在本申请的一种具体实施方式中,还包括第一更新模块,用于:In a specific implementation of the present application, a first update module is also included, which is used to:
以文件名作为训练样本,以训练样本在第一时长内的修改次数统计值作为训练样本的训练标签,对预设的深度学习模型进行训练;The file name is used as a training sample, and the modification count of the training sample in the first time period is used as a training label of the training sample to train the preset deep learning model;
在深度学习模型训练完毕之后,依次输入各个不同的文件名至训练完毕的深度学习模型,并基于深度学习模型的输出结果,进行N-1个数据库的数据更新。After the deep learning model is trained, different file names are input into the trained deep learning model in sequence, and the data of N-1 databases are updated based on the output results of the deep learning model.
在本申请的一种具体实施方式中,还包括第二更新模块,用于:In a specific implementation of the present application, a second updating module is further included, which is used to:
接收针对第j数据库的调整指令,并根据调整指令进行第j数据库的数据项增加操作和/或数据项删除操作和/或数据项修改操作。An adjustment instruction for the jth database is received, and a data item adding operation, a data item deleting operation, and/or a data item modifying operation is performed on the jth database according to the adjustment instruction.
在本申请的一种具体实施方式中,N=3,第1数据类型为只读数据类型,第2数据类型为冷数据类型,第3数据类型为热数据类型,硬盘集群的第1状态为Greadonly状态,硬盘集群的第2状态为Gcold状态,硬盘集群的第3状态为Ghot状态。In a specific implementation of the present application, N=3, the first data type is a read-only data type, the second data type is a cold data type, the third data type is a hot data type, the first state of the hard disk cluster is a G readonly state, the second state of the hard disk cluster is a G cold state, and the third state of the hard disk cluster is a G hot state.
在本申请的一种具体实施方式中,还包括第一执行模块,用于:In a specific implementation of the present application, a first execution module is further included, which is used to:
当待写入数据为第1数据类型,且判断出当前不存在第1状态的硬盘集群时,将待写入数据写入Ghot状态或者Gcold状态的硬盘集群中。When the data to be written is of the first data type and it is determined that there is no hard disk cluster in the first state currently, the data to be written is written into a hard disk cluster in the G hot state or the G cold state.
在本申请的一种具体实施方式中,还包括第二执行模块,用于:In a specific implementation of the present application, a second execution module is further included, which is used to:
当待写入数据为第2数据类型,且判断出当前不存在第2状态的硬盘集群时,将待写入数据写入Ghot状态的硬盘集群中。When the data to be written is of the second data type and it is determined that there is no hard disk cluster in the second state currently, the data to be written is written into the hard disk cluster in the G hot state.
在本申请的一种具体实施方式中,还包括第三执行模块,用于:In a specific implementation of the present application, a third execution module is further included, which is used to:
当待写入数据为第3数据类型,且判断出当前不存在第3状态的硬盘集群时,反馈写失败的提示信息。
When the data to be written is of the third data type and it is determined that there is no disk cluster in the third state currently, a prompt message indicating write failure is fed back.
在本申请的一种具体实施方式中,硬盘集群选择模块403具体用于:In a specific implementation of the present application, the hard disk cluster selection module 403 is specifically used to:
以硬盘集群的磨损度越低则优先级越高的规则,选取出1个第i状态的硬盘集群。According to the rule that the lower the wear degree of the hard disk cluster, the higher the priority, a hard disk cluster in the i-th state is selected.
在本申请的一种具体实施方式中,硬盘集群选择模块403具体用于:In a specific implementation of the present application, the hard disk cluster selection module 403 is specifically used to:
针对第i状态的各个硬盘集群,按照磨损度从小到大的顺序依次进行查找;For each hard disk cluster in the i-th state, search in order from the smallest to the largest wear degree;
当查找到任意1个第i状态的硬盘集群的当前集群写队列深度VGcur_queue_depth<预设的最大集群写队列深度VGmax_queue_depth时,停止查找并将该硬盘集群作为选取出的1个第i状态的硬盘集群;When the current cluster write queue depth VG cur_queue_depth of any disk cluster in the i-th state is found to be less than the preset maximum cluster write queue depth VG max_queue_depth , the search is stopped and the disk cluster is used as the selected disk cluster in the i-th state;
当各个第i状态的硬盘集群均进行了查找之后,不存在当前集群写队列深度VGcur_queue_depth<预设的最大集群写队列深度VGmax_queue_depth的硬盘集群时,则将集群繁忙度VGbusy最小的硬盘集群作为选取出的1个第i状态的硬盘集群;After searching all disk clusters in the i-th state, if there is no disk cluster whose current cluster write queue depth VG cur_queue_depth is less than the preset maximum cluster write queue depth VG max_queue_depth , the disk cluster with the smallest cluster busyness VGbusy is selected as the disk cluster in the i-th state;
其中,硬盘集群的集群繁忙度VGbusy表示的是硬盘集群的当前集群写队列深度VGcur_queue_depth除以最大集群写队列深度VGmax_queue_depth之后所得到的数值。The cluster busyness VGbusy of the hard disk cluster represents the value obtained by dividing the current cluster write queue depth VG cur_queue_depth of the hard disk cluster by the maximum cluster write queue depth VG max_queue_depth .
在本申请的一种具体实施方式中,写入模块404具体用于:In a specific implementation of the present application, the writing module 404 is specifically used for:
以硬盘的磨损度越低则优先级越高的规则,从选取出的该第i状态的硬盘集群中选取出目标硬盘;According to the rule that the lower the wear degree of the hard disk, the higher the priority, the target hard disk is selected from the selected hard disk cluster in the i-th state;
将待写入数据写入选取出的目标硬盘中。Write the data to be written into the selected target hard disk.
在本申请的一种具体实施方式中,写入模块404具体用于:In a specific implementation of the present application, the writing module 404 is specifically used for:
针对选取出的该第i状态的硬盘集群中的各个硬盘,按照磨损度从小到大的顺序依次进行查找;Search each hard disk in the selected hard disk cluster in the i-th state in order of the wear degree from small to large;
当查找到该第i状态的硬盘集群中的任意1个硬盘的当前硬盘写队列深度VDcur_queue_depth<预设的最大硬盘写队列深度VDmax_queue_depth时,停止查找并将该硬盘作为选取出的目标硬盘;When it is found that the current hard disk write queue depth VD cur_queue_depth of any hard disk in the hard disk cluster in the i-th state is less than the preset maximum hard disk write queue depth VD max_queue_depth , the search is stopped and the hard disk is selected as the target hard disk;
当该第i状态的硬盘集群中的各个硬盘均进行了查找之后,不存在当前硬盘写队列深度VDcur_queue_depth<预设的最大硬盘写队列深度VDmax_queue_depth的硬盘时,则将硬盘繁忙度VDbusy最小的硬盘作为选取出的目标硬盘;After searching all hard disks in the hard disk cluster in the i-th state, if there is no hard disk whose current hard disk write queue depth VD cur_queue_depth is less than the preset maximum hard disk write queue depth VD max_queue_depth , the hard disk with the smallest hard disk busyness VDbusy is selected as the target hard disk;
其中,硬盘的硬盘繁忙度VDbusy表示的是硬盘的当前硬盘写队列深度VDcur_queue_depth除以最大硬盘写队列深度VDmax_queue_depth之后所得到的数值。The hard disk busyness VDbusy of the hard disk represents a value obtained by dividing the current hard disk write queue depth VD cur_queue_depth of the hard disk by the maximum hard disk write queue depth VD max_queue_depth .
在本申请的一种具体实施方式中,分布式存储系统中的各个硬盘集群均设置在第一介质层,分布式存储系统中还设置有SCM介质层,以通过SCM介质层存储目标类型的数据,并通过SCM介质层处理未块对齐的写入数据。In a specific embodiment of the present application, each hard disk cluster in the distributed storage system is arranged in the first medium layer, and an SCM medium layer is also arranged in the distributed storage system to store target type data through the SCM medium layer and process non-block-aligned write data through the SCM medium layer.
在本申请的一种具体实施方式中,第一介质层为PLC介质层或者为QLC介质层。In a specific implementation of the present application, the first dielectric layer is a PLC dielectric layer or a QLC dielectric layer.
在本申请的一种具体实施方式中,在每1个硬盘集群中,数据均按照设定大小的块进行存储,还包括迁移模块,用于:In a specific implementation of the present application, in each hard disk cluster, data is stored in blocks of a set size, and a migration module is also included for:
确定出第i状态的硬盘集群中的各个块的P/E次数,并统计出该硬盘集群中的各个块的P/E次数的平均值;Determine the P/E times of each block in the hard disk cluster in the i-th state, and calculate the average value of the P/E times of each block in the hard disk cluster;
当该第i状态的硬盘集群中存在P/E次数与平均值之间的差值低于设定的第一数值的块时,则将在P/E次数与平均值之间的差值低于设定的第一数值的各个块均作为待迁移块;When there are blocks in the disk cluster in the i-th state whose difference between the P/E times and the average value is lower than the set first value, all blocks whose difference between the P/E times and the average value is lower than the set first value are taken as blocks to be migrated;
如果当前存在第i+1状态的硬盘集群,则将硬盘集群中的各个待迁移块,迁移至第i+1状态的硬盘集群中。
If there is currently a hard disk cluster in the (i+1)th state, each block to be migrated in the hard disk cluster is migrated to the hard disk cluster in the (i+1)th state.
在本申请的一种具体实施方式中,迁移模块还用于:In a specific implementation of the present application, the migration module is also used to:
在确定出各个待迁移块之后,如果当前不存在第i+1状态的硬盘集群,则将该第i状态的硬盘集群中的K个P/E次数最大的块,与该第i状态的硬盘集群中的K个待迁移块中的数据进行互换,以完成K个待迁移块在该第i状态的硬盘集群的内部迁移;After determining each block to be migrated, if there is no hard disk cluster in the i+1th state, the K blocks with the largest P/E times in the hard disk cluster in the i-th state are exchanged with the data in the K blocks to be migrated in the hard disk cluster in the i-th state, so as to complete the internal migration of the K blocks to be migrated in the hard disk cluster in the i-th state;
其中,K表示的是该第i状态的硬盘集群中所确定出的待迁移块的数量。Here, K represents the number of blocks to be migrated determined in the hard disk cluster in the i-th state.
相应于上面的方法和系统实施例,本申请实施例还提供了一种分布式存储系统的数据存储设备以及一种计算机非易失性可读存储介质,可与上文相互对应参照。Corresponding to the above method and system embodiments, the embodiments of the present application also provide a data storage device of a distributed storage system and a computer non-volatile readable storage medium, which can be referenced in correspondence with the above.
该分布式存储系统的数据存储设备可以包括:The data storage device of the distributed storage system may include:
存储器501,用于存储计算机程序;Memory 501, used for storing computer programs;
处理器502,用于执行计算机程序以实现上述的分布式存储系统的数据存储方法的步骤。The processor 502 is used to execute a computer program to implement the steps of the data storage method of the distributed storage system.
可参阅图6,该计算机非易失性可读存储介质60上存储有计算机程序61,计算机程序61被处理器执行时实现如上述任一实施例中的分布式存储系统的数据存储方法的步骤。这里所说的计算机非易失性可读存储介质60包括随机存储器(RAM)、内存、只读存储器(ROM)、电可编程ROM、电可擦除可编程ROM、寄存器、硬盘、可移动磁盘、CD-ROM、或技术领域内所公知的任意其它形式的存储介质。Referring to FIG6 , a computer program 61 is stored on the non-volatile computer readable storage medium 60. When the computer program 61 is executed by the processor, the steps of the data storage method of the distributed storage system in any of the above embodiments are implemented. The non-volatile computer readable storage medium 60 mentioned here includes a random access memory (RAM), a memory, a read-only memory (ROM), an electrically programmable ROM, an electrically erasable programmable ROM, a register, a hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the technical field.
还需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括要素的过程、方法、物品或者设备中还存在另外的相同要素。It should also be noted that, in this article, relational terms such as first and second, etc. are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Moreover, the terms "include", "comprise" or any other variants thereof are intended to cover non-exclusive inclusion, so that a process, method, article or device including a series of elements includes not only those elements, but also other elements not explicitly listed, or also includes elements inherent to such process, method, article or device. In the absence of further restrictions, the elements defined by the sentence "comprise a ..." do not exclude the presence of other identical elements in the process, method, article or device including the elements.
专业人员还可以进一步意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。Professionals may further appreciate that the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, computer software, or a combination of the two. In order to clearly illustrate the interchangeability of hardware and software, the composition and steps of each example have been generally described in the above description according to function. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Professionals and technicians may use different methods to implement the described functions for each specific application, but such implementation should not be considered to be beyond the scope of this application.
本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的技术方案及其核心思想。应当指出,对于本技术领域的普通技术人员来说,在不脱离本申请原理的前提下,还可以对本申请进行若干改进和修饰,这些改进和修饰也落入本申请的保护范围内。
Specific examples are used herein to illustrate the principles and implementation methods of the present application, and the description of the above embodiments is only used to help understand the technical solution and core ideas of the present application. It should be pointed out that for ordinary technicians in this technical field, without departing from the principles of the present application, several improvements and modifications can be made to the present application, and these improvements and modifications also fall within the scope of protection of the present application.
Claims (20)
- 一种分布式存储系统的数据存储方法,其特征在于,所述分布式存储系统中包括多个硬盘集群,所述硬盘集群中包括多个型号相同的硬盘,且同一所述硬盘集群中的各个硬盘均在同一批次加入所述分布式存储系统,所述分布式存储系统的数据存储方法包括:A data storage method for a distributed storage system, characterized in that the distributed storage system includes multiple hard disk clusters, the hard disk clusters include multiple hard disks of the same model, and each hard disk in the same hard disk cluster is added to the distributed storage system in the same batch, and the data storage method for the distributed storage system includes:接收待写入数据,并划分出所述待写入数据的数据类型;其中,在设定的N种数据类型中,第i+1数据类型的待写入数据在第一时长内的修改次数预估值,高于第i数据类型的待写入数据在第一时长内的修改次数预估值,N为不小于2的正整数;Receive data to be written, and classify the data types of the data to be written; wherein, among the set N data types, the estimated number of modifications of the data to be written of the i+1th data type within the first time length is higher than the estimated number of modifications of the data to be written of the ith data type within the first time length, and N is a positive integer not less than 2;当所述待写入数据的数据类型为第i数据类型时,判断当前是否存在第i状态的硬盘集群;When the data type of the data to be written is the i-th data type, determining whether there is a hard disk cluster in the i-th state currently;如果存在第i状态的硬盘集群,则选取出1个第i状态的硬盘集群;If there is a hard disk cluster in the i-th state, select a hard disk cluster in the i-th state;将所述待写入数据写入选取出的该第i状态的硬盘集群中;其中,在设定的N种状态的硬盘集群中,第i+1状态的硬盘集群的磨损度低于第i状态的硬盘集群的磨损度,i为正整数。The data to be written is written into the selected hard disk cluster in the i-th state; wherein, among the hard disk clusters in the set N states, the wear degree of the hard disk cluster in the i+1-th state is lower than the wear degree of the hard disk cluster in the i-th state, and i is a positive integer.
- 根据权利要求1所述的分布式存储系统的数据存储方法,其特征在于,所述划分出所述待写入数据的数据类型,包括:The data storage method of a distributed storage system according to claim 1, characterized in that said dividing the data type of the data to be written comprises:基于所述待写入数据的文件名,划分出所述待写入数据的数据类型。Based on the file name of the data to be written, the data type of the data to be written is divided.
- 根据权利要求2所述的分布式存储系统的数据存储方法,其特征在于,所述基于所述待写入数据的文件名,划分出所述待写入数据的数据类型,包括:The data storage method of a distributed storage system according to claim 2, characterized in that the data type of the data to be written is divided based on the file name of the data to be written, comprising:当所述待写入数据的文件名与预设的第j数据库匹配时,将所述待写入数据的数据类型划分为第j数据类型;其中,j为正整数且1≤j≤N-1;When the file name of the data to be written matches the preset j-th database, the data type of the data to be written is divided into the j-th data type; wherein j is a positive integer and 1≤j≤N-1;当所述待写入数据的文件名与预设的N-1个数据库均不匹配时,将所述待写入数据的数据类型划分为第N数据类型。When the file name of the data to be written does not match any of the preset N-1 databases, the data type of the data to be written is divided into an Nth data type.
- 根据权利要求3所述的分布式存储系统的数据存储方法,其特征在于,还包括:The data storage method of the distributed storage system according to claim 3, characterized in that it also includes:以文件名作为训练样本,以训练样本在第一时长内的修改次数统计值作为所述训练样本的训练标签,对预设的深度学习模型进行训练;Using the file name as a training sample and the statistical value of the number of modifications of the training sample within the first time period as a training label of the training sample, training the preset deep learning model;在所述深度学习模型训练完毕之后,依次输入各个不同的文件名至训练完毕的所述深度学习模型,并基于所述深度学习模型的输出结果,进行N-1个所述数据库的数据更新。After the deep learning model is trained, different file names are input into the trained deep learning model in sequence, and based on the output results of the deep learning model, data of N-1 databases are updated.
- 根据权利要求3所述的分布式存储系统的数据存储方法,其特征在于,还包括:The data storage method of the distributed storage system according to claim 3, characterized in that it also includes:接收针对第j数据库的调整指令,并根据所述调整指令进行第j数据库的数据项增加操作和/或数据项删除操作和/或数据项修改操作。An adjustment instruction for the jth database is received, and a data item adding operation, a data item deleting operation, and/or a data item modifying operation is performed on the jth database according to the adjustment instruction.
- 根据权利要求1所述的分布式存储系统的数据存储方法,其特征在于,N=3,第1数据类型为只读数据类型,第2数据类型为冷数据类型,第3数据类型为热数据类型,硬盘集群的第1状态为Greadonly状态,硬盘集群的第2状态为Gcold状态,硬盘集群的第3状态为Ghot状态。The data storage method of a distributed storage system according to claim 1 is characterized in that N=3, the first data type is a read-only data type, the second data type is a cold data type, the third data type is a hot data type, the first state of the hard disk cluster is a G readonly state, the second state of the hard disk cluster is a G cold state, and the third state of the hard disk cluster is a G hot state.
- 根据权利要求6所述的分布式存储系统的数据存储方法,其特征在于,还包括:The data storage method of the distributed storage system according to claim 6, characterized in that it also includes:当所述待写入数据为第1数据类型,且判断出当前不存在第1状态的硬盘集群时,将所述待写入数据写入Ghot状态或者Gcold状态的硬盘集群中。When the data to be written is of the first data type and it is determined that there is no hard disk cluster in the first state currently, the data to be written is written into a hard disk cluster in the G hot state or the G cold state.
- 根据权利要求6所述的分布式存储系统的数据存储方法,其特征在于,还包括: The data storage method of the distributed storage system according to claim 6, characterized in that it also includes:当所述待写入数据为第2数据类型,且判断出当前不存在第2状态的硬盘集群时,将所述待写入数据写入Ghot状态的硬盘集群中。When the data to be written is of the second data type and it is determined that there is no hard disk cluster in the second state currently, the data to be written is written into the hard disk cluster in the G hot state.
- 根据权利要求6所述的分布式存储系统的数据存储方法,其特征在于,还包括:The data storage method of the distributed storage system according to claim 6, characterized in that it also includes:当所述待写入数据为第3数据类型,且判断出当前不存在第3状态的硬盘集群时,反馈写失败的提示信息。When the data to be written is of the third data type and it is determined that there is no hard disk cluster in the third state currently, a prompt message indicating write failure is fed back.
- 根据权利要求1所述的分布式存储系统的数据存储方法,其特征在于,所述选取出1个第i状态的硬盘集群,包括:The data storage method of a distributed storage system according to claim 1, wherein the step of selecting a hard disk cluster in the i-th state comprises:以硬盘集群的磨损度越低则优先级越高的规则,选取出1个第i状态的硬盘集群。According to the rule that the lower the wear degree of the hard disk cluster, the higher the priority, a hard disk cluster in the i-th state is selected.
- 根据权利要求10所述的分布式存储系统的数据存储方法,其特征在于,所述以硬盘集群的磨损度越低则优先级越高的规则,选取出1个第i状态的硬盘集群,包括:The data storage method of a distributed storage system according to claim 10, characterized in that the step of selecting a hard disk cluster in the i-th state according to the rule that the lower the wear degree of the hard disk cluster, the higher the priority, comprises:针对第i状态的各个硬盘集群,按照磨损度从小到大的顺序依次进行查找;For each hard disk cluster in the i-th state, search in order from the smallest to the largest wear degree;当查找到任意1个第i状态的硬盘集群的当前集群写队列深度VGcur_queue_depth<预设的最大集群写队列深度VGmax_queue_depth时,停止查找并将该硬盘集群作为选取出的1个第i状态的硬盘集群;When the current cluster write queue depth VG cur_queue_depth of any disk cluster in the i-th state is found to be less than the preset maximum cluster write queue depth VG max_queue_depth , the search is stopped and the disk cluster is used as the selected disk cluster in the i-th state;当各个第i状态的硬盘集群均进行了查找之后,不存在当前集群写队列深度VGcur_queue_depth<预设的最大集群写队列深度VGmax_queue_depth的硬盘集群时,则将集群繁忙度VGbusy最小的硬盘集群作为选取出的1个第i状态的硬盘集群;After searching all disk clusters in the i-th state, if there is no disk cluster whose current cluster write queue depth VG cur_queue_depth is less than the preset maximum cluster write queue depth VG max_queue_depth , the disk cluster with the smallest cluster busyness VGbusy is selected as the disk cluster in the i-th state;其中,硬盘集群的集群繁忙度VGbusy表示的是所述硬盘集群的当前集群写队列深度VGcur_queue_depth除以所述最大集群写队列深度VGmax_queue_depth之后所得到的数值。The cluster busyness VGbusy of the hard disk cluster represents a value obtained by dividing the current cluster write queue depth VG cur_queue_depth of the hard disk cluster by the maximum cluster write queue depth VG max_queue_depth .
- 根据权利要求1所述的分布式存储系统的数据存储方法,其特征在于,将所述待写入数据写入选取出的该第i状态的硬盘集群中,包括:The data storage method of the distributed storage system according to claim 1, characterized in that writing the data to be written into the selected hard disk cluster in the i-th state comprises:以硬盘的磨损度越低则优先级越高的规则,从选取出的该第i状态的硬盘集群中选取出目标硬盘;According to the rule that the lower the wear degree of the hard disk, the higher the priority, the target hard disk is selected from the selected hard disk cluster in the i-th state;将所述待写入数据写入选取出的所述目标硬盘中。The data to be written is written into the selected target hard disk.
- 根据权利要求12所述的分布式存储系统的数据存储方法,其特征在于,所述以硬盘的磨损度越低则优先级越高的规则,从选取出的该第i状态的硬盘集群中选取出目标硬盘,包括:The data storage method of a distributed storage system according to claim 12, characterized in that the step of selecting a target hard disk from the selected hard disk cluster in the i-th state according to the rule that the lower the wear degree of the hard disk, the higher the priority, comprises:针对选取出的该第i状态的硬盘集群中的各个硬盘,按照磨损度从小到大的顺序依次进行查找;Search each hard disk in the selected hard disk cluster in the i-th state in order of the wear degree from small to large;当查找到该第i状态的硬盘集群中的任意1个硬盘的当前硬盘写队列深度VDcur_queue_depth<预设的最大硬盘写队列深度VDmax_queue_depth时,停止查找并将该硬盘作为选取出的目标硬盘;When it is found that the current hard disk write queue depth VD cur_queue_depth of any hard disk in the hard disk cluster in the i-th state is less than the preset maximum hard disk write queue depth VD max_queue_depth , the search is stopped and the hard disk is selected as the target hard disk;当该第i状态的硬盘集群中的各个硬盘均进行了查找之后,不存在当前硬盘写队列深度VDcur_queue_depth<预设的最大硬盘写队列深度VDmax_queue_depth的硬盘时,则将硬盘繁忙度VDbusy最小的硬盘作为选取出的目标硬盘;After searching all hard disks in the hard disk cluster in the i-th state, if there is no hard disk whose current hard disk write queue depth VD cur_queue_depth is less than the preset maximum hard disk write queue depth VD max_queue_depth , the hard disk with the smallest hard disk busyness VDbusy is selected as the target hard disk;其中,硬盘的硬盘繁忙度VDbusy表示的是所述硬盘的当前硬盘写队列深度VDcur_queue_depth除以所述最大硬盘写队列深度VDmax_queue_depth之后所得到的数值。The hard disk busyness VDbusy of the hard disk represents a value obtained by dividing the current hard disk write queue depth VD cur_queue_depth of the hard disk by the maximum hard disk write queue depth VD max_queue_depth .
- 根据权利要求1所述的分布式存储系统的数据存储方法,其特征在于,所述分布式存储系统中的各个所述硬盘集群均设置在第一介质层,所述分布式存储系统中还设置有SCM介质层,以通过SCM介质层存储目标类型的数据,并通过所述SCM介质层处理未 块对齐的写入数据。The data storage method of a distributed storage system according to claim 1 is characterized in that each of the hard disk clusters in the distributed storage system is arranged in a first medium layer, and the distributed storage system is further provided with an SCM medium layer to store target type data through the SCM medium layer, and to process unprocessed data through the SCM medium layer. Block-aligned write data.
- 根据权利要求14所述的分布式存储系统的数据存储方法,其特征在于,所述第一介质层为PLC介质层或者为QLC介质层。The data storage method of a distributed storage system according to claim 14 is characterized in that the first medium layer is a PLC medium layer or a QLC medium layer.
- 根据权利要求1至15任一项所述的分布式存储系统的数据存储方法,其特征在于,在每1个硬盘集群中,数据均按照设定大小的块进行存储,分布式存储系统的数据存储方法还包括:The data storage method of a distributed storage system according to any one of claims 1 to 15, characterized in that in each hard disk cluster, data is stored in blocks of a set size, and the data storage method of a distributed storage system further comprises:确定出第i状态的硬盘集群中的各个块的P/E次数,并统计出该硬盘集群中的各个块的P/E次数的平均值;Determine the P/E times of each block in the hard disk cluster in the i-th state, and calculate the average value of the P/E times of each block in the hard disk cluster;当该第i状态的硬盘集群中存在P/E次数与所述平均值之间的差值低于设定的第一数值的块时,则将在P/E次数与所述平均值之间的差值低于设定的第一数值的各个块均作为待迁移块;When there are blocks in the hard disk cluster in the i-th state whose difference between the P/E times and the average value is lower than the set first value, all blocks whose difference between the P/E times and the average value is lower than the set first value are taken as blocks to be migrated;如果当前存在第i+1状态的硬盘集群,则将所述硬盘集群中的各个所述待迁移块,迁移至第i+1状态的硬盘集群中。If there is currently a hard disk cluster in the (i+1)th state, each of the blocks to be migrated in the hard disk cluster is migrated to the hard disk cluster in the (i+1)th state.
- 根据权利要求16所述的分布式存储系统的数据存储方法,其特征在于,还包括:The data storage method of the distributed storage system according to claim 16, characterized in that it also includes:在确定出各个所述待迁移块之后,如果当前不存在第i+1状态的硬盘集群,则将该第i状态的硬盘集群中的K个P/E次数最大的块,与该第i状态的硬盘集群中的K个所述待迁移块中的数据进行互换,以完成K个所述待迁移块在该第i状态的硬盘集群的内部迁移;After determining each of the blocks to be migrated, if there is no hard disk cluster in the i+1th state, then the K blocks with the largest P/E times in the hard disk cluster in the i-th state are exchanged with the data in the K blocks to be migrated in the hard disk cluster in the i-th state, so as to complete the internal migration of the K blocks to be migrated in the hard disk cluster in the i-th state;其中,K表示的是该第i状态的硬盘集群中所确定出的所述待迁移块的数量。Here, K represents the number of blocks to be migrated determined in the hard disk cluster in the i-th state.
- 一种分布式存储系统的数据存储系统,其特征在于,所述分布式存储系统中包括多个硬盘集群,所述硬盘集群中包括多个型号相同的硬盘,且同一所述硬盘集群中的各个硬盘均在同一批次加入所述分布式存储系统,所述分布式存储系统的数据存储系统包括:A data storage system of a distributed storage system, characterized in that the distributed storage system includes multiple hard disk clusters, the hard disk clusters include multiple hard disks of the same model, and each hard disk in the same hard disk cluster is added to the distributed storage system in the same batch, and the data storage system of the distributed storage system includes:类型划分模块,用于接收待写入数据,并划分出所述待写入数据的数据类型;其中,在设定的N种数据类型中,第i+1数据类型的待写入数据在第一时长内的修改次数预估值,高于第i数据类型的待写入数据在第一时长内的修改次数预估值,N为不小于2的正整数;A type classification module, used for receiving data to be written and classifying the data types of the data to be written; wherein, among the set N data types, the estimated number of modifications of the data to be written of the i+1th data type within the first time length is higher than the estimated number of modifications of the data to be written of the ith data type within the first time length, and N is a positive integer not less than 2;硬盘集群状态判断模块,用于当所述待写入数据的数据类型为第i数据类型时,判断当前是否存在第i状态的硬盘集群;如果当前存在第i状态的硬盘集群,则触发硬盘集群选择模块;A hard disk cluster status judgment module, used for judging whether there is a hard disk cluster in the i-th state when the data type of the data to be written is the i-th data type; if there is a hard disk cluster in the i-th state, triggering the hard disk cluster selection module;所述硬盘集群选择模块,用于选取出1个第i状态的硬盘集群;The hard disk cluster selection module is used to select a hard disk cluster in the i-th state;写入模块,用于将所述待写入数据写入选取出的该第i状态的硬盘集群中;其中,在设定的N种状态的硬盘集群中,第i+1状态的硬盘集群的磨损度低于第i状态的硬盘集群的磨损度,i为正整数。A writing module is used to write the data to be written into the selected hard disk cluster in the i-th state; wherein, among the hard disk clusters in the set N states, the wear degree of the hard disk cluster in the i+1-th state is lower than the wear degree of the hard disk cluster in the i-th state, and i is a positive integer.
- 一种分布式存储系统的数据存储设备,其特征在于,包括:A data storage device of a distributed storage system, characterized by comprising:存储器,用于存储计算机程序;Memory for storing computer programs;处理器,用于执行所述计算机程序以实现如权利要求1至17任一项所述的分布式存储系统的数据存储方法的步骤。A processor, configured to execute the computer program to implement the steps of the data storage method of the distributed storage system as described in any one of claims 1 to 17.
- 一种计算机非易失性可读存储介质,其特征在于,所述计算机非易失性可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1至17任一 项所述的分布式存储系统的数据存储方法的步骤。 A computer non-volatile readable storage medium, characterized in that a computer program is stored on the computer non-volatile readable storage medium, and when the computer program is executed by a processor, it implements any one of claims 1 to 17 The steps of the data storage method of the distributed storage system described in item.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310247233.0A CN115934007B (en) | 2023-03-15 | 2023-03-15 | Data storage method, system, equipment and storage medium of distributed storage system |
CN202310247233.0 | 2023-03-15 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2024187900A1 true WO2024187900A1 (en) | 2024-09-19 |
Family
ID=85825556
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2023/141779 WO2024187900A1 (en) | 2023-03-15 | 2023-12-26 | Data storage method, system and device for distributed storage system, and storage medium |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN115934007B (en) |
WO (1) | WO2024187900A1 (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115934007B (en) * | 2023-03-15 | 2023-05-23 | 浪潮电子信息产业股份有限公司 | Data storage method, system, equipment and storage medium of distributed storage system |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102393809A (en) * | 2011-12-02 | 2012-03-28 | 浪潮集团有限公司 | Method for prolonging service life of cloud storage SSD (Solid State Disk) device |
US9946471B1 (en) * | 2015-03-31 | 2018-04-17 | EMC IP Holding Company LLC | RAID groups based on endurance sets |
CN109196459A (en) * | 2016-05-31 | 2019-01-11 | 重庆大学 | A kind of distributed heterogeneous memory system data location mode of decentralization |
CN112817880A (en) * | 2021-03-17 | 2021-05-18 | 深圳市安信达存储技术有限公司 | Solid state disk, wear balance method thereof and terminal equipment |
CN115934007A (en) * | 2023-03-15 | 2023-04-07 | 浪潮电子信息产业股份有限公司 | Data storage method, system, equipment and storage medium of distributed storage system |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8639877B2 (en) * | 2009-06-30 | 2014-01-28 | International Business Machines Corporation | Wear leveling of solid state disks distributed in a plurality of redundant array of independent disk ranks |
CN108334278B (en) * | 2017-12-15 | 2021-05-07 | 中兴通讯股份有限公司 | Storage system balance management method and device |
CN108255419A (en) * | 2017-12-19 | 2018-07-06 | 深圳忆联信息系统有限公司 | A kind of abrasion equilibrium method and SSD for TLC types SSD |
CN111880748B (en) * | 2020-07-30 | 2023-10-31 | 北京计算机技术及应用研究所 | Solid state disk wear balancing method for distributed storage system |
CN114924691A (en) * | 2022-03-31 | 2022-08-19 | 杭州电子科技大学 | Wear leveling method for data stripe reconstruction in RAID5 |
-
2023
- 2023-03-15 CN CN202310247233.0A patent/CN115934007B/en active Active
- 2023-12-26 WO PCT/CN2023/141779 patent/WO2024187900A1/en unknown
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102393809A (en) * | 2011-12-02 | 2012-03-28 | 浪潮集团有限公司 | Method for prolonging service life of cloud storage SSD (Solid State Disk) device |
US9946471B1 (en) * | 2015-03-31 | 2018-04-17 | EMC IP Holding Company LLC | RAID groups based on endurance sets |
CN109196459A (en) * | 2016-05-31 | 2019-01-11 | 重庆大学 | A kind of distributed heterogeneous memory system data location mode of decentralization |
CN112817880A (en) * | 2021-03-17 | 2021-05-18 | 深圳市安信达存储技术有限公司 | Solid state disk, wear balance method thereof and terminal equipment |
CN115934007A (en) * | 2023-03-15 | 2023-04-07 | 浪潮电子信息产业股份有限公司 | Data storage method, system, equipment and storage medium of distributed storage system |
Also Published As
Publication number | Publication date |
---|---|
CN115934007A (en) | 2023-04-07 |
CN115934007B (en) | 2023-05-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Sethi et al. | RecShard: statistical feature-based memory optimization for industry-scale neural recommendation | |
CN105653591B (en) | A kind of industrial real-time data classification storage and moving method | |
US10140034B2 (en) | Solid-state drive assignment based on solid-state drive write endurance | |
US9311252B2 (en) | Hierarchical storage for LSM-based NoSQL stores | |
JP2017021805A (en) | Interface providing method capable of utilizing data attribute reference data arrangement in nonvolatile memory device and computer device | |
Herodotou et al. | Automating distributed tiered storage management in cluster computing | |
US11914894B2 (en) | Using scheduling tags in host compute commands to manage host compute task execution by a storage device in a storage system | |
CN116737064B (en) | Data management method and system for solid state disk | |
WO2024187900A1 (en) | Data storage method, system and device for distributed storage system, and storage medium | |
Herodotou | AutoCache: Employing machine learning to automate caching in distributed file systems | |
CN101861573A (en) | The statistical counting that is used for memory hierarchy optimization | |
Boukhelef et al. | Optimizing the cost of DBaaS object placement in hybrid storage systems | |
US20170351721A1 (en) | Predicting index fragmentation caused by database statements | |
Dieye et al. | On achieving high data availability in heterogeneous cloud storage systems | |
KR20220040348A (en) | Data stream management method and device | |
CN112799597A (en) | Hierarchical storage fault-tolerant method for stream data processing | |
CN115858510A (en) | Method for evaluating data temperature and performing dynamic storage management and storage medium | |
WO2008007348A1 (en) | A data storage system | |
US10872015B2 (en) | Data storage system with strategic contention avoidance | |
CA2415018C (en) | Adaptive parallel data clustering when loading a data structure containing data clustered along one or more dimensions | |
CN116364148A (en) | Wear balancing method and system for distributed full flash memory system | |
Irie et al. | A novel automated tiered storage architecture for achieving both cost saving and qoe | |
US7865460B2 (en) | Method and system for data dispatch | |
CN103970671B (en) | Allocating Additional Requested Storage Space For A Data Set In A First Managed Space In A Second Managed Space | |
Alatorre et al. | Intelligent information lifecycle management in virtualized storage environments |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23927240 Country of ref document: EP Kind code of ref document: A1 |