CN107249035B - Shared repeated data storage and reading method with dynamically variable levels - Google Patents
Shared repeated data storage and reading method with dynamically variable levels Download PDFInfo
- Publication number
- CN107249035B CN107249035B CN201710506611.7A CN201710506611A CN107249035B CN 107249035 B CN107249035 B CN 107249035B CN 201710506611 A CN201710506611 A CN 201710506611A CN 107249035 B CN107249035 B CN 107249035B
- Authority
- CN
- China
- Prior art keywords
- throughput
- tenant
- priority
- tenants
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/1095—Replication or mirroring of data, e.g. scheduling or transport for data synchronisation between network nodes
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/08—Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
- H04L43/0876—Network utilisation, e.g. volume of load or congestion level
- H04L43/0888—Throughput
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/50—Network services
- H04L67/60—Scheduling or organising the servicing of application requests, e.g. requests for application data transmissions using the analysis and optimisation of the required network resources
- H04L67/61—Scheduling or organising the servicing of application requests, e.g. requests for application data transmissions using the analysis and optimisation of the required network resources taking into account QoS or priority requirements
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Environmental & Geological Engineering (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Computer And Data Communications (AREA)
Abstract
The invention provides a hierarchical dynamically variable shared repeated data storage and reading mechanism for a cloud backup system, and is suitable for hierarchical and fair service quality requirements of the cloud backup system. The method is organically matched with the repeated data deleting technology, different service quality strategies are provided for tenants of different grades, the repeated deleting rate of the system is improved, and the optimal service effect is achieved. The invention ensures that the tenant enjoys fair and graded service quality in the cloud backup system environment.
Description
Technical Field
The invention belongs to the technical field of computer information storage, and provides a shared repeated data storage and reading method with dynamically variable grades for meeting the service quality requirement of a cloud backup system for deleting repeated data of multiple tenants, so that fair grade-guaranteed service quality is provided for the tenants.
Background
In the cloud backup system, different tenants can purchase corresponding backup resources to a cloud backup provider according to respective service requirements, and the tenants need the cloud backup system to provide fair and level-guaranteed service quality. With the application of the data de-duplication technology, the difficulty of maintaining the service quality of the data de-duplication technology is aggravated by sharing the data de-duplication, and the requirement of the tenant on the hierarchical service cannot be met only by the traditional method.
Aiming at the problems of the cloud backup system, the invention provides a shared repeated data storage and reading method with dynamically variable grades, so that the fairness of tenants is guaranteed and the graded service quality is achieved. Different from the existing method, the method is a hierarchical service quality control method based on data de-duplication, the resource allocation and the throughput monitoring adopt fine-grained control based on data blocks, the hierarchical service quality which is more fair than other methods can be obtained, and the re-deleting rate, the throughput rate and the data recovery speed of the system can be improved.
Disclosure of Invention
The invention provides a method for storing and reading shared repeated data with dynamically variable levels. According to the invention, the resource allocation and throughput conditions of each data block in the backup and recovery processing stages are fully considered, and the hierarchical service quality of the cloud backup system is realized from three aspects of tenant hierarchy, resource fair allocation and shared repeated data processing on the premise of stable system performance.
One of the core ideas of the present invention is a fair allocation of resources. The resource fair distribution method can ensure that the tenant obtains fair service in each processing stage of data backup and data recovery. The resource allocation method is used for allocating fair memory space for the tenants according to the service levels of the tenants. The method comprises the following steps: (1) firstly, applying for a memory space from a cloud backup system; (2) quantifying the memory space, and dividing the applied memory space capacity by the size of the metadata corresponding to 1 data block to obtain the total memory space number of the cloud backup system; (3) and (3) allocating memory space for each tenant by using a formula (1) according to the weight corresponding to the tenant grade and the total number of the tenants in each grade.
Wherein the MemoryLMemory space size Memory representing L-level tenant allocationtotalIs the total amount of memory space applied by the system, N represents the number of grades of the current system, Pn、AnRespectively representing the memory space weight and the tenant population corresponding to the nth level of the current system,represents the sum of the weights of all tenants, PLiRepresenting the memory space weight corresponding to the tenant of the L level,is the ratio of the total amount of the memory space occupied by the L-level tenants,and the memory space size obtained by the tenants representing the L level from the total memory space.
The second core idea of the invention is throughput monitoring, and the purpose of the throughput monitoring method is to dynamically adjust the memory space and throughput threshold of the tenant, and ensure the hierarchical service quality of the tenant. The method comprises the following steps: (1) periodically monitoring the throughput of each tenant in data backup and data recovery in real time; (2) and (3) summing the tenant throughputs obtained through monitoring, and calculating the average throughput of each grade by using a formula (2) according to the weight corresponding to the tenant grade and the number of the tenants in each grade in the system.
Wherein ThroughputL,aAverage Throughput size, Throughput, of tenants representing L classtotalIs the total throughput of the system, N represents the number of levels of the current system, Pn、AnRespectively representing the throughput weight and the tenant population corresponding to the nth level of the current system,represents the sum of the weights of all tenants, PLtIs the throughput weight corresponding to the L-level tenant.Is the ratio of the throughput size of the L-class tenant to the total throughput size,representing the average throughput size of the L-class tenants.
(3) And initializing a tenant throughput threshold by using the throughput size of each tenant monitored in the first period. After each monitoring period is finished, if the real-time throughput of the tenant is not equal to the average throughput of the corresponding level, the memory space size and the throughput threshold value of the tenant are dynamically adjusted by using a formula (3), wherein the formula (3) is shown as follows.
Wherein ThroughputL,aRepresents the average Throughput size, Throughput, of the L classiIs the current real-time Throughput of the tenant i, and Δ Memory, Δ Throughput are the increases of the Memory space and the Throughput threshold when the current Throughput of the tenant is not equal to the average Throughput of the corresponding level, and Δ Throughput is the Throughput of the tenantL,a-ThroughputiRepresenting the amount of throughput lost by tenant i, MemoryiOn behalf of tenant i the current memory size,is the size of the memory space compensated for tenant i.
And (4) according to the increment delta Memory of the Memory space and the increment delta Throughput of the Throughput threshold calculated by the formula (3), respectively increasing the Memory space size and the Throughput threshold of the tenant by delta Memory and delta Throughput.
The third core idea of the invention is to share the repeated data processing. The shared repeated data processing method comprises a shared repeated data storage method and a shared data reading method. The method specifically comprises the following steps:
(a) the method for storing the shared repeated data comprises the following steps: and when the new data blocks backed up by the high-priority tenants and the low-priority tenants are repeated in a unit time period, checking whether the throughput ratio of the high-priority tenants and the low-priority tenants is greater than or equal to the average throughput ratio of the corresponding levels according to the throughput of the tenants and the average throughput of each level obtained by the throughput monitoring module. If the throughput ratio values of the high-priority tenants and the low-priority tenants are larger than or equal to the average throughput ratio value of the corresponding levels, the fact that the high-priority tenants complete the data block backup and have no influence on the performance of the high-priority tenants is indicated, the backup tasks of the data block are given to the high-priority tenants to complete, and the data block of the low-priority tenants is marked as being repeated and points to the data block; otherwise, the low-priority tenant completes the backup of the data block, and the data block mark of the high-priority tenant is repeated and points to the data block.
(b) The shared data reading method comprises the following steps: and checking whether the throughput ratio values of the high-priority tenants and the low-priority tenants are larger than or equal to the average throughput ratio value of the corresponding grade or not according to the throughput of the tenants and the average throughput of each grade obtained by the throughput monitoring module when the data blocks recovered by the high-priority tenants and the low-priority tenants are repeated and are not in the cache in a unit time period. If the throughput ratio values of the high-priority tenants and the low-priority tenants are larger than or equal to the average throughput ratio value of the corresponding levels, it is indicated that the high-priority tenants finish the data block caching and have no influence on the performance of the data block caching, the caching tasks of the data block are finished and recovered by the high-priority tenants, the memory space of the high-priority tenants is increased by 1 data block corresponding metadata, and the memory space of the low-priority tenants is reduced by 1 data block corresponding metadata; otherwise, caching the data block by the low tenant and completing the recovery of the data block.
The invention provides a method for storing and reading shared repeated data with dynamically variable grades, which mainly comprises two parts of data backup and data recovery.
The data backup method comprises the following specific steps:
(10) the client performs data blocking on a data stream needing to be backed up by a tenant, then calculates the blocked data blocks by using a Hash algorithm to obtain corresponding fingerprints, and sends the data fingerprints and tenant grade information to a server.
(11) After receiving the data information sent by the client, the server performs the following steps:
(11.1) establishing corresponding priority for backup services of the tenants according to the service levels of the tenants; and the resource fair distribution module is used for distributing the memory space and the throughput threshold value for the tenant by using a formula (1) according to the weight corresponding to the service level.
And (11.2) carrying out periodic real-time monitoring on the throughput of each tenant in the data backup. And (3) summing the tenant throughputs obtained through monitoring, and calculating the average throughputs of different levels by using a formula (2) according to the weight corresponding to the tenant level and the number of the tenants in each level in the system. And (4) after each monitoring period is finished, if the throughput of the tenant is not equal to the average throughput of the corresponding grade, adjusting the memory space and the throughput threshold of the step (11.11) by using a formula (3).
(11.3) after the service priority is determined in the step (11.1), traversing the fingerprint sequence sent in the step (1) from high priority to low priority in sequence according to the tenant backup service priority, inquiring in a fingerprint index table, and if the fingerprint index table does not exist, marking the corresponding data block as a new data block; otherwise, the corresponding data block is stored, the data block is marked as a repeated data block, and the storage address of the data block is recorded.
(11.4) storing the new data block, and specifically comprising the following steps:
(a) and if the new data block is data commonly backed up by the high-priority tenants and the low-priority tenants in a unit time period, adopting a shared repeated data storage strategy, and updating the fingerprint index table according to the storage address of the new data block. The strategy for storing the shared repeated data is specifically as follows:
and (4) checking whether the throughput ratio values of the high-priority tenants and the low-priority tenants are larger than or equal to the average throughput ratio value of the corresponding grade or not according to the tenant throughput and the average throughput of each grade obtained in the step (11.2). And if the throughput ratio values of the high-priority tenants and the low-priority tenants are larger than or equal to the average throughput ratio value of the corresponding grade, finishing the storage of the new data block by the high-priority tenants, and otherwise finishing the storage of the new data block by the low-priority tenants.
(b) And if the new data block is not the data commonly backed up by the high-priority tenant and the low-priority tenant in the unit time period, the tenant to which the data block belongs finishes data block storage, and the fingerprint index table is updated according to the storage address of the new data block.
The data recovery method comprises the following specific steps:
(20) the client reads the address of the data needing to be recovered by the tenant, and sends the address of the data needing to be recovered and the tenant level information to the server.
(21) The server receives the recovery data address and the tenant grade information sent by the client, and the following steps are carried out:
and (21.1) establishing corresponding priority for the recovery service of the tenant according to the service level of the tenant.
And (21.2) searching the metadata information for storing the data on the disk by restoring the address of the data.
And (21.3) according to the weight corresponding to the service level, allocating memory space and a throughput threshold value for the tenant by using the formula (1).
And (21.4) carrying out periodic real-time monitoring on the throughput of each tenant in data recovery. And summing the tenant throughputs obtained through monitoring, and calculating the average throughputs of different levels according to the formula (2) according to the weight corresponding to the tenant level and the number of the tenants in each level in the system. And (4) after each monitoring period is finished, if the throughput of the tenant is not equal to the average throughput of the corresponding grade, adjusting the memory space and the throughput threshold of the step (21.3) by using a formula (3).
(21.5) after the service priority is determined in the step (21.1), scanning metadata information of the data recovered in the step (21.2) from high priority to low priority according to the service priority recovered by the tenant, searching in a server cache, and directly recovering if a data block corresponding to the metadata information exists in the cache; if the data block corresponding to the metadata information is not in the cache, executing the following steps:
(a) and if the data block is the data which is recovered by the high-priority tenant and the low-priority tenant together in the unit time period, adopting a shared data reading strategy for processing. The strategy is specifically as follows:
and (5) according to the tenant throughput and the average throughput of each grade obtained in the step (21.4), checking whether the throughput ratio values of the high-priority tenant and the low-priority tenant are larger than or equal to the average throughput ratio value of the corresponding grade, and judging the performance influence of the high-priority tenant on the data block cache. If the throughput ratio of the high-priority tenant and the low-priority tenant is larger than or equal to the average throughput ratio of the corresponding grade, the high-priority tenant caches the data blocks, the memory space of the high-priority tenant is increased by the size of metadata corresponding to 1 data block, and the memory space of the low-priority tenant is decreased by the size of metadata corresponding to 1 data block; otherwise, the low-priority tenants finish caching and recovering the data blocks;
(b) if the data block is not the data commonly backed up by the high-priority tenant and the low-priority tenant in the unit time period, the tenant to which the data block belongs caches and recovers the data block.
The invention has the characteristics that: the invention relates to a hierarchical service quality control method based on repeated data deletion, which is characterized in that the resource allocation and the throughput monitoring adopt fine-grained control based on data blocks, the data blocks at each stage of data backup and data recovery are accurately obtained, the problem of unfairness of service quality caused by repeated data deletion is solved, and the repeated data deletion technology is better fit with a cloud backup system.
Drawings
FIG. 1 is a schematic block diagram;
FIG. 2 is a flowchart of a data de-duplication method and a shared data reading method
Detailed Description
Fig. 1 is a schematic diagram of a module structure according to the present invention. The present invention relates to a client 100 and a server 200. The client comprises a fingerprint processing module 110, which mainly performs data block chunking on the backup data set and calculates a fingerprint of each data block by using a hash function. The server comprises a tenant hierarchical management module 210, a resource fair distribution module 220, a throughput monitoring module 240 and a shared repeated data processing module 230. The tenant-level management module 210 establishes a corresponding priority according to the service level of each tenant, and schedules the tenant data from a high priority to a low priority in sequence according to the service priority of data backup or data recovery of the tenant. The resource fair allocation module 220 and the throughput monitoring module 240 are used for guaranteeing the hierarchical service quality of the tenant, wherein the resource fair allocation module 220 allocates a fair memory space for the tenant by using a formula (1); the throughput monitoring module 240 monitors the throughput of tenant data processing in real time, and dynamically adjusts the memory space and throughput threshold of the tenant by using the formulas (2) and (3). When a high-priority tenant and a low-priority tenant have data blocks stored or cached together in a unit time period, the shared duplicated data processing module 230 checks whether the throughput ratio values of the high-priority tenant and the low-priority tenant are greater than or equal to the average throughput ratio value of the corresponding level, if the throughput ratio values of the high-priority tenant and the low-priority tenant are greater than or equal to the average throughput ratio value of the corresponding level, the high-priority tenant completes data block storage or caching, otherwise, the low-priority tenant completes data block storage or caching.
FIG. 2 is a processing flow diagram of a data de-duplication method and a shared data reading method according to the present invention, which includes two parts, namely data backup and data recovery.
The data backup method comprises the following specific steps:
(10) the fingerprint processing module 110 of the client 100 performs data blocking on a data stream that a tenant needs to backup, then calculates a corresponding fingerprint by using a hash algorithm on the blocked data block, and sends the data fingerprint and tenant level information to the server.
(11) After receiving the data information sent by the client, the server 200 performs the following steps:
(11.1) the tenant level management module 210 establishes a corresponding priority for the backup service of the tenant according to the service level of the tenant; the resource fair allocation module 220 allocates the memory space and the throughput threshold to the tenant using the formula (1) according to the weight corresponding to the service level.
(11.2) the throughput monitoring module 240 periodically monitors the throughput of each tenant in the data backup in real time. And summing the monitored tenant throughputs, and calculating the average throughputs of different levels according to the corresponding weight of the tenant level and the number of the tenants in each level in the system. And (3) after each monitoring period is finished, if the throughput of the tenant is not equal to the average throughput of the corresponding grade, adjusting the memory space and the throughput threshold of the step (11.1) by using a formula (3).
(11.3) after the service priority is determined in the step (11.1), sequentially traversing the fingerprint sequence sent in the step (10) from the high priority to the low priority by the tenant level management module 210 according to the tenant backup service priority, inquiring in the fingerprint index table, and if the fingerprint index table does not exist, marking the corresponding data block as a new data block; otherwise, the corresponding data block is stored, the data block is marked as a repeated data block, and the storage address of the data block is recorded.
(11.4) storing the new data block, and specifically comprising the following steps:
(a) if the new data block is data that is backed up by a high-priority tenant and a low-priority tenant together in a unit time period, the shared duplicate data processing module 230 adopts a shared duplicate data storage strategy and updates the fingerprint index table according to the storage address of the new data block. The strategy for storing the shared repeated data is specifically as follows: and (4) checking whether the throughput ratio values of the high-priority tenants and the low-priority tenants are larger than or equal to the average throughput ratio value of the corresponding grade or not according to the tenant throughput and the average throughput of each grade obtained in the step (11.2). And if the throughput ratio values of the high-priority tenants and the low-priority tenants are larger than or equal to the average throughput ratio value of the corresponding grade, finishing the storage of the new data block by the high-priority tenants, and otherwise finishing the storage of the new data block by the low-priority tenants.
(b) And if the new data block is not the data commonly backed up by the high-priority tenant and the low-priority tenant in the unit time period, the tenant to which the data block belongs finishes data block storage, and the fingerprint index table is updated according to the storage address of the new data block.
The data recovery method comprises the following specific steps:
(20) the client 100 reads the address of the tenant needing to recover the data, and sends the address needing to recover the data and the tenant level information to the server.
(21) The server 200 receives the recovery data address and the tenant level information sent by the client, and performs the following steps:
(21.1) the tenant level management module 210 establishes a corresponding priority for the recovery service of the tenant according to the service level of the tenant.
And (21.2) searching the metadata information for storing the data on the disk by restoring the address of the data.
And (21.3) the resource fair allocation module 220 allocates the memory space and the throughput threshold to the tenant by using the formula (1) according to the weight corresponding to the service level.
(21.4) the throughput monitoring module 240 periodically monitors the throughput of each tenant in the data recovery in real time. And summing the monitored tenant throughputs, and calculating the average throughputs of different levels according to the corresponding weight of the tenant level and the number of the tenants in each level in the system. And (4) after each monitoring period is finished, if the throughput of the tenant is not equal to the average throughput of the corresponding grade, adjusting the memory space and the throughput threshold of the step (21.3) by using a formula (3).
(21.5) after the service priority is determined in the step (21.1), the tenant level management module 210 scans the metadata information of the data recovered in the step (21.2) from high priority to low priority according to the tenant recovery service priority, searches the metadata information in the server cache, and directly recovers the metadata information if the metadata information corresponding to the data block exists in the cache; if the data block corresponding to the metadata information is not in the cache, executing the following steps:
(a) if the data block is data that is recovered by a high-priority tenant and a low-priority tenant together in a unit time period, the shared duplicate data processing module 230 employs a shared data reading policy for processing. The strategy is specifically as follows: and (5) according to the tenant throughput and the average throughput of each grade obtained in the step (21.4), checking whether the throughput ratio values of the high-priority tenant and the low-priority tenant are larger than or equal to the average throughput ratio value of the corresponding grade, and judging the performance influence of the high-priority tenant on the data block cache. If the throughput ratio of the high-priority tenant and the low-priority tenant is larger than or equal to the average throughput ratio of the corresponding grade, the high-priority tenant caches the data blocks, the memory space of the high-priority tenant is increased by the size of metadata corresponding to 1 data block, and the memory space of the low-priority tenant is decreased by the size of metadata corresponding to 1 data block; otherwise, the low-priority tenants finish caching and recovering the data blocks;
(b) if the data block is not the data commonly backed up by the high-priority tenant and the low-priority tenant in the unit time period, the tenant to which the data block belongs caches and recovers the data block.
Claims (1)
1. A shared repeated data storage and reading method with dynamically variable grades mainly comprises two parts of data backup and data recovery;
the data backup method comprises the following specific steps:
(10) the client performs data blocking on a data stream needing to be backed up by a tenant, then calculates the blocked data by using a Hash algorithm to obtain a corresponding fingerprint, and sends the data block fingerprint and tenant grade information to a server;
(11) after receiving the data information sent by the client, the server performs the following steps:
(11.1) establishing corresponding priority for backup services of the tenants according to the service levels of the tenants, and distributing memory spaces in corresponding proportion for the tenants according to the weights corresponding to the service levels:
wherein the MemoryLMemory space size, Memory, representing L-level tenant allocationtotalIs the total amount of memory space applied by the system, N represents the number of grades of the current system, Pn、AnRespectively representing the memory space weight and the tenant population corresponding to the nth level of the current system,represents the sum of the weights of all tenants,representing the memory space weight corresponding to the tenant of the L level,is the ratio of the total amount of the memory space occupied by the L-level tenants,representing the size of the memory space obtained by the tenants of the L level from the total memory space;
(11.2) carrying out periodic real-time monitoring on the throughput of each tenant in the data backup, summing the monitored throughputs of the tenants, and calculating the average throughputs of different levels according to the corresponding weight of the tenant level and the number of the tenants in each level in the system:
wherein ThroughputL,aAverage Throughput size, Throughput, of tenants representing L classtotalIs the total throughput of the system, N represents the number of levels of the current system, Pn、AnRespectively representing the throughput weight and the tenant population corresponding to the nth level of the current system,represents the sum of the weights of all tenants, PLtIs the throughput weight for the L level tenant,is the ratio of the throughput size of the L-class tenant to the total throughput size,represents an average throughput size of tenants of the L-class;
initializing a tenant throughput threshold value by using the throughput of each tenant monitored in the first period, and increasing a memory space and the throughput threshold value if the throughput of the tenant is lower than the average throughput of the corresponding grade after each monitoring period is finished; if the throughput size of the tenant is higher than the average throughput size of the corresponding grade, reducing the memory space and the throughput threshold:
wherein ThroughputL,aAverage throughput representing L levelMagnitude of volume, ThroughputiIs the current real-time Throughput of the tenant i, and Δ Memory, Δ Throughput are the increases of the Memory space and the Throughput threshold when the current Throughput of the tenant is not equal to the average Throughput of the corresponding level, and Δ Throughput is the Throughput of the tenantL,a-ThroughputiRepresenting the amount of throughput lost by tenant i, MemoryiOn behalf of tenant i the current memory size,is the compensated memory space size of tenant i;
(11.3) after the service priority is determined in the step (11.1), according to the tenant backup service priority, traversing the fingerprint sequence sent in the step (10) from high priority to low priority in sequence, inquiring in a fingerprint index table, and if the fingerprint does not exist, marking the corresponding data block as a new data block; otherwise, marking the data block as a repeated data block, and recording the storage address of the data block;
(11.4) storing the new data block, and specifically comprising the following steps:
(a) if the new data block is data commonly backed up by a high-priority tenant and a low-priority tenant in a unit time period, adopting a shared repeated data storage strategy, and updating a fingerprint index table according to a storage address of the new data block, wherein the shared repeated data storage strategy specifically comprises the following steps:
checking whether the throughput ratio values of the high-priority tenants and the low-priority tenants are larger than or equal to the average throughput ratio value of the corresponding grade or not according to the throughput of the tenants and the average throughput of each grade obtained in the step (11.2), if the throughput ratio values of the high-priority tenants and the low-priority tenants are larger than or equal to the average throughput ratio value of the corresponding grade, finishing the storage of the new data block by the high-priority tenants, and if not, finishing the storage of the new data block by the low-priority tenants;
(b) if the new data block is not the data commonly backed up by the high-priority tenant and the low-priority tenant in the unit time period, the tenant to which the data block belongs completes data block storage, and the fingerprint index table is updated according to the storage address of the new data block;
the data recovery method comprises the following specific steps:
(20) the client reads the address of the data needing to be recovered by the tenant, and sends the address of the data needing to be recovered and the tenant level information to the server;
(21) the server receives the recovery data address and the tenant grade information sent by the client, and the following steps are carried out:
(21.1) establishing corresponding priority for the recovery service of the tenant according to the service level of the tenant;
(21.2) searching metadata information for storing the data on the disk by restoring the address of the data;
(21.3) distributing memory space for the tenant according to the weight corresponding to the service level;
wherein the MemoryLMemory space size, Memory, representing L-level tenant allocationtotalIs the total amount of memory space applied by the system, N represents the number of grades of the current system, Pn、AnRespectively representing the memory space weight and the tenant population corresponding to the nth level of the current system,represents the sum of the weights of all tenants, PLiRepresenting the memory space weight corresponding to the tenant of the L level,is the ratio of the total amount of the memory space occupied by the L-level tenants,representing the size of the memory space obtained by the tenants of the L level from the total memory space;
(21.4) carrying out periodic real-time monitoring on the throughput of each tenant in data recovery, summing the monitored throughputs of the tenants, and calculating the average throughputs of different levels according to the corresponding weight of the tenant level and the number of the tenants in each level in the system:
wherein ThroughputL,aAverage Throughput size, Throughput, of tenants representing L classtotalIs the total throughput of the system, N represents the number of levels of the current system, Pn、AnRespectively representing the throughput weight and the tenant population corresponding to the nth level of the current system,represents the sum of the weights of all tenants, PLtIs the throughput weight for the L level tenant,is the ratio of the throughput size of the L-class tenant to the total throughput size,represents an average throughput size of tenants of the L-class;
initializing a tenant throughput threshold value by using the throughput of each tenant monitored in the first period, and increasing a memory space and the throughput threshold value if the throughput of the tenant is lower than the average throughput of the corresponding grade after each monitoring period is finished; if the throughput size of the tenant is higher than the average throughput size of the corresponding grade, reducing the memory space and the throughput threshold:
wherein ThroughputL,aRepresents the average Throughput size, Throughput, of the L classiIs the current real-time Throughput of the tenant i, and Δ Memory, Δ Throughput are when the current Throughput of the tenant is not equal to the average Throughput of the corresponding classIncrease in memory space and Throughput threshold, ThroughputL,a-ThroughputiRepresenting the amount of throughput lost by tenant i, MemoryiOn behalf of tenant i the current memory size,is the compensated memory space size of tenant i;
(21.5) after the service priority is determined in the step (21.1), scanning metadata information of the data recovered in the step (21.2) from high priority to low priority according to the service priority recovered by the tenant, searching in a server cache, and directly recovering if a data block corresponding to the metadata information exists in the cache; if the data block corresponding to the metadata information is not in the cache, executing the following steps:
(a) if the data block is data which is recovered by the high-priority tenant and the low-priority tenant together in a unit time period, adopting a shared data reading strategy for processing, wherein the strategy specifically comprises the following steps: checking whether the throughput ratio values of the high-priority tenants and the low-priority tenants are larger than or equal to the average throughput ratio value of the corresponding grade or not according to the tenant throughput and the average throughput of each grade obtained in the step (21.4), if the throughput ratio values of the high-priority tenants and the low-priority tenants are larger than or equal to the average throughput ratio value of the corresponding grade, completing caching of data blocks by the high-priority tenants, increasing the metadata size corresponding to 1 data block in the memory space of the high-priority tenants, and reducing the metadata size corresponding to 1 data block in the memory space of the low-priority tenants; otherwise, the low-priority tenants finish caching and recovering the data blocks;
(b) if the data block is not the data commonly backed up by the high-priority tenant and the low-priority tenant in the unit time period, the tenant to which the data block belongs caches and recovers the data block.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710506611.7A CN107249035B (en) | 2017-06-28 | 2017-06-28 | Shared repeated data storage and reading method with dynamically variable levels |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710506611.7A CN107249035B (en) | 2017-06-28 | 2017-06-28 | Shared repeated data storage and reading method with dynamically variable levels |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107249035A CN107249035A (en) | 2017-10-13 |
CN107249035B true CN107249035B (en) | 2020-05-26 |
Family
ID=60013512
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710506611.7A Active CN107249035B (en) | 2017-06-28 | 2017-06-28 | Shared repeated data storage and reading method with dynamically variable levels |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107249035B (en) |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108733511B (en) * | 2018-03-23 | 2022-05-24 | 赵浩茗 | Electronic data processing method based on big data |
CN110609807B (en) * | 2018-06-15 | 2023-06-23 | 伊姆西Ip控股有限责任公司 | Method, apparatus and computer readable storage medium for deleting snapshot data |
CN110083309B (en) * | 2019-04-11 | 2020-05-26 | 重庆大学 | Shared data block processing method, system and readable storage medium |
CN110955522B (en) * | 2019-11-12 | 2022-10-14 | 华中科技大学 | Resource management method and system for coordination performance isolation and data recovery optimization |
CN113407338A (en) * | 2021-05-29 | 2021-09-17 | 国网辽宁省电力有限公司辽阳供电公司 | A/D conversion chip resource allocation method of segmented architecture |
CN114116323B (en) * | 2022-01-27 | 2022-04-19 | 天津市城市规划设计研究总院有限公司 | Data backup strategy management method and system based on permission level |
CN116126596B (en) * | 2023-02-13 | 2023-08-18 | 北京易华录信息技术股份有限公司 | Information processing system and method based on block chain |
CN117435144B (en) * | 2023-12-20 | 2024-03-22 | 山东云天安全技术有限公司 | Intelligent data hierarchical security management method and system for data center |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101741536A (en) * | 2008-11-26 | 2010-06-16 | 中兴通讯股份有限公司 | Data level disaster-tolerant method and system and production center node |
CN102541751A (en) * | 2010-11-18 | 2012-07-04 | 微软公司 | Scalable chunk store for data deduplication |
CN103377285A (en) * | 2012-04-25 | 2013-10-30 | 国际商业机器公司 | Enhanced reliability in deduplication technology over storage clouds |
US9128948B1 (en) * | 2010-09-15 | 2015-09-08 | Symantec Corporation | Integration of deduplicating backup server with cloud storage |
CN105302669A (en) * | 2015-10-23 | 2016-02-03 | 浙江工商大学 | Method and system for data deduplication in cloud backup process |
CN106066818A (en) * | 2016-05-25 | 2016-11-02 | 重庆大学 | A kind of data layout's method improving data de-duplication standby system restorability |
-
2017
- 2017-06-28 CN CN201710506611.7A patent/CN107249035B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101741536A (en) * | 2008-11-26 | 2010-06-16 | 中兴通讯股份有限公司 | Data level disaster-tolerant method and system and production center node |
US9128948B1 (en) * | 2010-09-15 | 2015-09-08 | Symantec Corporation | Integration of deduplicating backup server with cloud storage |
CN102541751A (en) * | 2010-11-18 | 2012-07-04 | 微软公司 | Scalable chunk store for data deduplication |
CN103377285A (en) * | 2012-04-25 | 2013-10-30 | 国际商业机器公司 | Enhanced reliability in deduplication technology over storage clouds |
CN105302669A (en) * | 2015-10-23 | 2016-02-03 | 浙江工商大学 | Method and system for data deduplication in cloud backup process |
CN106066818A (en) * | 2016-05-25 | 2016-11-02 | 重庆大学 | A kind of data layout's method improving data de-duplication standby system restorability |
Also Published As
Publication number | Publication date |
---|---|
CN107249035A (en) | 2017-10-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107249035B (en) | Shared repeated data storage and reading method with dynamically variable levels | |
US11323514B2 (en) | Data tiering for edge computers, hubs and central systems | |
EP2327024B1 (en) | Techniques for resource location and migration across data centers | |
CN109547566B (en) | Multithreading uploading optimization method based on memory allocation | |
US8386717B1 (en) | Method and apparatus to free up cache memory space with a pseudo least recently used scheme | |
CN111737168B (en) | Cache system, cache processing method, device, equipment and medium | |
CN109492429B (en) | Privacy protection method for data release | |
CN113655969B (en) | Data balanced storage method based on streaming distributed storage system | |
CN106933868A (en) | A kind of method and data server for adjusting data fragmentation distribution | |
CN103631894A (en) | Dynamic copy management method based on HDFS | |
CN113255004A (en) | Safe and efficient federal learning content caching method | |
CN103905517A (en) | Data storage method and equipment | |
US7636736B1 (en) | Method and apparatus for creating and using a policy-based access/change log | |
CN113486026A (en) | Data processing method, device, equipment and medium | |
CN108093024B (en) | Classified routing method and device based on data frequency | |
US20050097130A1 (en) | Tracking space usage in a database | |
CN110597598B (en) | Control method for virtual machine migration in cloud environment | |
CN106973091B (en) | Distributed memory data redistribution method and system, and master control server | |
CN102609508B (en) | High-speed access method of files in network storage | |
CN109325001B (en) | Method, device and equipment for deleting small files based on metadata server | |
CN112823338A (en) | Processing borrowed resource allocations using distributed segmentation | |
CN114969656B (en) | Streaming histogram release method and system for weighted sliding window under differential privacy | |
US11507286B2 (en) | Performing storage provision operations on a file system | |
Li | Dynamic Load Balancing Method for Urban Surveillance Video Big Data Storage Based on HDFS | |
US20220164228A1 (en) | Fine-grained virtualization resource provisioning for in-place database scaling |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |