CN110502365B

CN110502365B - Data storage and recovery method and device and computer equipment

Info

Publication number: CN110502365B
Application number: CN201910624964.6A
Authority: CN
Inventors: 兰东平
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2019-07-11
Filing date: 2019-07-11
Publication date: 2024-03-01
Anticipated expiration: 2039-07-11
Also published as: WO2021003822A1; CN110502365A

Abstract

The application discloses a method, a device and computer equipment for storing and recovering data, relates to the field of data processing, and can solve the problems of long recovery time and high cost when recovering abnormal data files. The method comprises the following steps: acquiring a plurality of data blocks uniformly divided by an original data file; encoding the data block into a plurality of data slices and check slices based on erasure codes; storing the original data file by using the data slice and the check slice; and if the original data file is judged to be missing, decoding and recovering the original data file by utilizing the data piece and the check piece which meet the preset conditions. The method and the device are suitable for block storage of the data file and automatic recovery of the missing data.

Description

Data storage and recovery method and device and computer equipment

Technical Field

The present invention relates to the field of data processing, and in particular, to a method, an apparatus, and a computer device for storing and recovering data.

Background

Along with the popularization of computers in various industries, a large number of data files are required to be processed and stored through a computer server, and if abnormality or loss occurs in the storage process of some data files with higher value, economic benefits of enterprises or individuals are directly related, and serious people even relate to social benefits. The processor used as the server brain is inevitably abnormal under the influence of internal and external environment factors, and under the condition that the processor is abnormal, how to ensure that the data files in the server are not abnormal or lost is a critical problem.

At present, the current industry mainly monitors data files in a storage space dynamically, and when judging that the data files are abnormal in storage, re-uploads the duplicate data files so as to replace missing data files.

However, in the above data recovery method, when it is determined that the storage is abnormal, the copy data file needs to be copied and uploaded, and when the transmission amount of the data file is large, the copying and transmission process takes longer time, the running process of the service is easily delayed, and the cost of data recovery is high.

Disclosure of Invention

In view of this, the application discloses a method, a device and a computer device for storing and recovering data, which mainly aims to solve the problems of long recovery time and high cost when recovering an abnormal data file.

According to one aspect of the present application, there is provided a method of data storage and retrieval, the method comprising:

acquiring a plurality of data blocks uniformly divided by an original data file;

encoding the data block into a plurality of data slices and check slices based on erasure codes;

storing the original data file by using the data slice and the check slice;

and if the original data file is judged to be missing, decoding and recovering the original data file by utilizing the data piece and the check piece which meet the preset conditions.

According to another aspect of the present application, there is provided an apparatus for data storage and retrieval, the apparatus comprising:

the acquisition module is used for acquiring a plurality of data blocks uniformly divided by the original data file;

the processing module is used for encoding and processing the data block into a plurality of data slices and check slices based on erasure codes;

the storage module is used for storing the original data file by utilizing the data piece and the check piece;

and the recovery module is used for decoding and recovering the original data file by using the data piece and the check piece which meet the preset conditions if the original data file is judged to be missing.

According to yet another aspect of the present application, there is provided a non-volatile readable storage medium having stored thereon a computer program which when executed by a processor implements the above-described method of data storage and retrieval.

According to yet another aspect of the present application, there is provided a computer device comprising a non-volatile readable storage medium, a processor and a computer program stored on the non-volatile readable storage medium and executable on the processor, the processor implementing the above described method of data storage and retrieval when executing the program.

By means of the technical scheme, compared with the mode of replacing the missing data file by using the copy data file at present, the method, the device and the computer equipment for storing and recovering the data can divide the original data file into a plurality of data blocks, encode the data blocks based on erasure coding technology to obtain a plurality of data pieces and check pieces, store the data pieces and the check pieces in different data centers, acquire the original data file by sequentially reading the data pieces, and automatically decode and recover the data pieces and the check pieces which are normal by using the rest data when the data pieces and the check pieces in a certain data center are judged to be missing, so that recovery of the data file is realized. The whole recovery process of the scheme is very efficient, and missing data blocks can be detected and repaired in time, so that the service process is not delayed. When the data file is recovered, only the missing data piece is required to be recovered, and the whole data file is not required to be replaced, so that the cost of data recovery can be effectively reduced, and the economic loss caused by long data recovery time can be avoided.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute an undue limitation to the present application. In the drawings:

FIG. 1 is a schematic flow chart of a method for storing and recovering data according to an embodiment of the present application;

FIG. 2 is a flow chart illustrating another method for storing and recovering data according to an embodiment of the present application;

FIG. 3 is a schematic structural diagram of an apparatus for storing and recovering data according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of another apparatus for storing and recovering data according to an embodiment of the present application.

Detailed Description

The present application will be described in detail hereinafter with reference to the accompanying drawings in conjunction with embodiments. It should be noted that, without conflict, the embodiments and features of the embodiments in the present application may be combined with each other.

Aiming at the problems of long recovery time and high cost when the missing data file is recovered at present, the embodiment of the application provides a data storage and recovery method, as shown in fig. 1, which comprises the following steps:

101. a plurality of data blocks evenly divided by an original data file are acquired.

For the embodiment, in a specific application scenario, in order to implement the block processing of the original data file and the encoding storage of the data blocks, it is necessary to uniformly cut the original data file into a plurality of data blocks with equal sizes in advance, and the number of the division may be determined according to the actual application scenario. For example, a 24MB original data file may be divided into two 12MB data blocks according to practical situations, and the two data blocks may be sequentially read to obtain the original data file.

102. The data block is encoded into a plurality of data slices and parity slices based on the erasure codes.

For this embodiment, in a specific application scenario, after an original data file is divided into N original data blocks with the same size according to a data arrangement sequence, N pieces of data with the same size and M pieces of check data with the same size as the pieces of data may be generated by encoding the original data blocks with erasure codes, where the principle of dividing the original data file based on erasure codes is as follows: based on the Erasement Code of the Cauchy matrix, the Erasement Code is mainly applied to the reversibility of the Cauchy matrix, and the data uploading and recovering process of Erasure Code storage mainly utilizes encoding and decoding to recover lost data.

103. And storing the original data file by using the data slice and the check slice.

For the embodiment, in a specific application scenario, in order to realize secure storage of the data file, the original data block and the check data block may be stored in different positions, such as different disks, different storage nodes, and the like, respectively. The partition storage mode can avoid the problem that all data are lost due to the fact that a single storage space is damaged when the data are stored in one position, and when a single storage node fails or is wrong, the data piece or the check piece with the data loss can be clearly judged, and the data recovery can be carried out on the missing file by utilizing the complete data piece and the check piece in other storage spaces.

104. If the original data file is judged to be missing, decoding and recovering the original data file by utilizing the data slices and the check slices which meet the preset conditions.

The preset condition is that the number of data slices and check slices with data missing should meet the fault-tolerant redundancy of the erasure codes, the fault-tolerant redundancy can correspond to the number of check slices, and when the number of data slices and check slices with data missing is smaller than or equal to the fault-tolerant redundancy, the recovery of the missing data can be realized by utilizing the data slices and check slices with normal data; when the number of the data slices and the check slices which are missing is larger than the fault-tolerant redundancy, the data slices and the check slices which are normal in data can not realize decoding recovery of the missing data, namely, the data recovery condition of erasure codes is exceeded.

For example, the original data information is segmented into k blocks of source data (data blocks), then n blocks of encoded data (encoded data blocks) are generated through erasure codes, wherein the encoded data comprises (n-k) block check blocks, and finally unified storage and transmission are carried out; as long as encoded data of k' =k blocks can be received, all source data can be calculated.

According to the method for storing and recovering data in the embodiment, an original data file can be divided into a plurality of data blocks, the data blocks are subjected to coding processing based on erasure coding technology, a plurality of data pieces and verification pieces are obtained, the data pieces and the verification pieces are stored in different data centers, the original data file is obtained by sequentially reading the data pieces, and when the data pieces and the verification pieces in a certain data center are judged to be missing, the data pieces and the verification pieces with normal data can be utilized to automatically decode and recover the data file, so that recovery of the data file is achieved. The whole recovery process of the scheme is very efficient, and missing data blocks can be detected and repaired in time, so that the service process is not delayed. When the data file is recovered, only the missing data piece is required to be recovered, and the whole data file is not required to be replaced, so that the cost of data recovery can be effectively reduced, and the economic loss caused by long data recovery time can be avoided.

Further, as a refinement and extension of the foregoing embodiment, for a complete description of the implementation process in this embodiment, another method for storing and recovering data is provided, as shown in fig. 2, where the method includes:

201. a plurality of data blocks evenly divided by an original data file are acquired.

For example, a 36MB original data file may be uniformly divided into three 12MB data blocks according to actual situations, and the three data blocks are numbered according to the arrangement sequence of the original data file, when the original data file is read, the data blocks may be read according to the sequence from small to large numbers, and then the original data file is read.

202. And encoding the data block by using the erasure codes, and dividing the original data file into first data slices and first check slices with the same size according to a first division rule.

The first dividing rule is to divide the original data file code into preset equal number, generate first data slices with the same number and size as the data blocks, and generate a set number of first check slices with the same size as the first data slices. In a specific application scenario, the first division rule may set the number of divisions of the first data slice and the first data block.

For example, the first partitioning rule is obtained as: the original data file code is divided into 2 first data slices and 1 first parity slice. If the original data file is 24MB, 2 first data slices of 12MB and 1 first check slice of 12MB can be generated according to the first division rule. The original data file can be obtained by reading 2 first data pieces of 12 MB. The first check-up slice is used for recovering the lost data of the first data slice, and the number of check-up slices represents the fault tolerance capacity of the erasure code, namely the maximum redundancy. In this embodiment, at most, only one of the first data slices is allowed to be lost, so that the lost data can be completely recovered.

203. And performing secondary coding on the first data slices by using erasure codes, and dividing each first data slice into second data slices and second check slices with the same size according to a second division rule.

The second division rule is as follows: and performing secondary coding division on each divided first data slice, namely uniformly dividing the first data slice codes into a preset number of second data slices and second check slices. In a specific application scenario, the second division rule may set the number of the single first data slice divided into the second data slice and the second check slice.

For example, the second partitioning rule is obtained as: each first slice code is divided into 4 second slices and 2 second parity slices. If the original data file is 24MB, 2 first data slices of 12MB and 1 first check slice of 12MB are generated according to the first division rule. The first check pieces of the two 12MB can be continuously divided into 4 second data pieces of 3MB and 2 second check pieces of the same size according to the second division rule, the corresponding first data piece data can be obtained by reading the 4 second data pieces of the 12MB, the original data file can be obtained by sequentially reading the 8 second data pieces, the second check pieces are used for recovering the lost data of the second data pieces, and the number of the check pieces represents the fault tolerance capability of erasure codes, namely the maximum redundancy. In this embodiment, at most, only two second data slices are allowed to be lost, so that the lost data can be completely recovered.

204. Each first data slice is stored in a different first data center, and each first check slice is stored in a different second data center.

For the embodiment, in a specific application scenario, in order to avoid data loss at the same time and realize secure storage of a data file, the first data slice and the first check slice may be respectively stored in different positions. For example, a 24MB file may be first divided by using a first division rule, into 2 first data slices of 12MB and 1 first check slice of 12MB, then the 2 first data slices are respectively stored in the a first data center and the B first data center, and the 1 first check slice is stored in the C second data center.

205. And respectively storing the second data slices and the second check slices in different storage units in the first data center.

Based on the embodiment of embodiment step 204, after 2 first data slices of 12MB and 1 first check slice of 12MB are respectively stored in the a first data center, the B first data center and the C second data center, for the first data slices of 12MB stored in each first data center, two-level division may be performed based on the second division rule, and each second data slice of 4 3MB and each second check slice of 2 3MB may be cut, so as to prevent the second data slices from being lost at the same time, the second data slices and the second check slices may be respectively stored in different storage locations in the corresponding a first data center and the B first data center.

206a, obtaining the original data file by reading all the first data pieces.

For the present embodiment, in a specific application scenario, the embodiment step 206a may specifically include: acquiring a first sequence number of each first data sheet, wherein the first sequence number corresponds to the arrangement sequence of the first data sheet spliced original data file; the first data piece is read in the order of the first sequence number from small to large to obtain the original data file.

For example, after the original data file is uniformly divided into four first data pieces a, b, c, d based on the first division rule, when the original data file is to be acquired, the original data file may be acquired by acquiring the first sequence numbers of the four first data pieces, for example, the first sequence numbers of the four first data pieces are 4, 2, 3 and 1, which are acquired a, b, c, d, and then the first data pieces d, b, c and a may be read in order of the first sequence numbers from small to large, so as to acquire the original data file.

207a, if it is determined that there is a missing first piece of data, determining that the original data file is missing.

For the embodiment, in a specific application scenario, all the first data pieces are accumulated to represent the original data file, when the original data file is obtained by reading the first data pieces, if it is determined that there is a missing first data piece, the original data file is missing, and if the first data piece is read, the complete original data file cannot be obtained.

208a, determining that there is a target first data center for which the first data slice data is missing.

Correspondingly, for the present embodiment, when it is determined that there is a missing first data slice, a target first data center where the missing first data slice is located needs to be obtained in advance as a data center point to be recovered.

209a, extracting all target second data slices and target second check-up slices in the target first data center.

In a specific application scenario, after determining that a target first data center with the first data piece data missing exists, the missing first data piece in the target first data center can be recovered, which specifically includes two recovery modes. In the steps of this embodiment, a first recovery method for the first data slice is described first, that is, the second data slice and the second check slice in the target first data center are used for recovery, so that all the target second data slices and the target second check slices in the target first data center need to be extracted first. Further, the recovery of the missing first data piece is achieved by executing the embodiment steps 210a to 211 a.

210a, if it is determined that the target second data slice is complete, reading the target second data slice according to the second sequence number so as to obtain the target first data slice with the missing data.

Correspondingly, when the second data slices and the second check slices in the target first data center are utilized to recover the missing first data slices, two recovery scenes are included, and the first recovery scene is provided in the steps of this embodiment, namely, under the condition that all target second data slices are judged to be complete in data, all the second data slices in the target first data center can be sequentially read to obtain the target first data slices with data missing.

211a, if it is determined that the target second data slice has data missing and meets the second data recovery condition, decoding and recovering the target second data slice with the missing data by using the target second data slice with complete data and the target second check slice, and reading the recovered target second data slice according to the second serial number so as to obtain the target first data slice.

The second data recovery condition is that the number of target second data slices with data missing and target second check slices is smaller than or equal to a second preset threshold value. The second preset threshold corresponds to the maximum redundancy of data recovery, namely represents the maximum fault-tolerant number of the target second data slices, the second preset threshold can correspond to the number of the target second check slices, if the number of the target second check slices is 2, the second preset threshold can be set to be 2, when the number of the target second data slices with data missing and the number of the target second check slices with data missing are smaller than or equal to 2, the target second data slices with complete data and the target second check slices can be utilized to decode and recover the target second data slices with data missing, and after the data recovery of the second data slices is completed, all the second data slices in the target first data center can be sequentially read to obtain the target first data slices with data missing.

For the present embodiment, as another recovery scenario for recovering the missing first data piece by using the second data piece and the second check piece in parallel with the embodiment step 210a, that is, it is determined that the target second data piece has data missing and meets the second data recovery condition, the target second data piece with complete data and the target second check piece can be used to decode the target second data piece with missing recovery data, and after all the target second data pieces are completely recovered, all the second data pieces in the target first data center are sequentially read to obtain the target first data piece with missing data.

212a, if it is determined that the target second data slice has data missing and does not meet the second data recovery condition, acquiring a first data slice and a first check slice with complete data.

In the step of this embodiment, a second recovery manner for the missing first data slice corresponding to the step 209a of this embodiment is described, that is, the first data slice of the target first data center is recovered by using the first data slices with complete data in the other first data centers and the first check slices in the second data center. Since it takes longer time to acquire data in other data centers than in the same data center, in order to increase the efficiency of data recovery, as a preferred manner, the recovery operation of embodiment step 209a may be preferentially performed, and when it is determined that there is a loss in the second data piece in the target data center and decoding is not recoverable, the data recovery operations of embodiment step 213a to embodiment step 214a may be further performed.

213a, if it is determined that the target first data slice meets the first data recovery condition, decoding and recovering the target first data slice by using the first data slice with complete data and the first check slice.

The first data recovery condition is that the number of target first data pieces and target first check pieces of data missing is smaller than or equal to a first preset threshold value. The first preset threshold corresponds to the maximum redundancy of data recovery, namely represents the maximum fault-tolerant number of the first data slices, the first preset threshold can correspond to the number of the first check slices, if the number of the target second check slices is 1, the second preset threshold can be set to be 1, and when the number of the target second data slices and the target second check slices which are missing in data is smaller than or equal to 1, the first data slices with complete data and the first check slices can be utilized to decode and recover the target first data slices.

214a, if the target first data sheet is judged not to meet the first data recovery condition, outputting alarm information of abnormal data recovery.

For this embodiment, if it is determined that the target first data slice and the target first check slice with missing data do not satisfy the first data recovery condition, that is, when the first data slice with missing data is recovered, recovery of the first data slice cannot be achieved by both the above two data recovery methods, an alarm signal may be further output, so as to achieve the purpose of prompting, so that a worker can take a policy to perform timely rush repair.

An embodiment step 206b, in parallel with embodiment step 206a, obtains the original data file by reading all the second data pieces.

For the present embodiment, in a specific application scenario, the embodiment step 206b may specifically include: acquiring a second serial number of each second data sheet, wherein the second serial number corresponds to the arrangement sequence of the second data sheet spliced original data file; the second data piece is read in a second sequence number from small to large to obtain the original data file.

For example, after the original data file is uniformly divided into two first data pieces A, B based on the first division rule, the first data piece a is then encoded and divided into four second data pieces a1, a2, a3, a4 according to the second division rule. The first data slice B is encoded and divided into four second data slices B1, B2, B3, B4 according to a second division rule. When the original data file is to be acquired, the original data file may be acquired by reading the second sequence numbers of the 8 second data pieces, for example, the first sequence numbers of the 8 first data pieces a1, a2, a3, a4, b1, b2, b3, b4 are respectively 4, 2, 3, 1, 5, 6, 8, 7, and the original data file may be acquired by reading the second data piece a4, the second data piece a2, the second data piece a3, the second data piece a1, the second data piece b2, the second data piece b4, and the second data piece b3 in the order of the first sequence numbers from the small to the large.

207b, if it is determined that the missing second data piece exists, determining that the original data file is missing.

For the embodiment, in a specific application scenario, all the second data slices are accumulated to represent the original data file, when the original data file is obtained by reading the second data slices, if it is determined that there is a missing second data slice, the missing original data file can be determined by reading the second data slice, and the complete original data file cannot be obtained.

208b, determining that there is a target first data center for which the second slice data is missing.

Correspondingly, for the present embodiment, when it is determined that there is a missing second data slice, a target first data center where the missing second data slice is located needs to be obtained in advance as a data center point to be recovered.

209b, obtaining a target second data slice with complete data in the target first data center and a target second check slice.

In a specific application scenario, after determining that the target first data center with the second data slice data missing exists, the missing second data slice in the target first data center can be recovered, which specifically includes two recovery modes. In the steps of this embodiment, a first method for recovering the missing second data slice is introduced, that is, the target second data slice and the target second check slice with complete data in the target first data center are recovered, so that all the target second data slice and the target second check slice with complete data in the target first data center need to be extracted first. And further by executing the embodiment step 210b, recovery of the missing second data piece is achieved.

210b, if it is determined that the target second data slice with the missing data meets the second data recovery condition, decoding and recovering the target second data slice with the missing data by using the target second data slice with the complete data and the target second check slice.

The second data recovery condition is that the number of target second data slices with data missing and target second check slices is smaller than or equal to a second preset threshold value. The second preset threshold corresponds to the maximum redundancy of data recovery, namely represents the maximum fault-tolerant number of the target second data slices, the second preset threshold can correspond to the number of the target second check slices, if the number of the target second check slices is 2, the second preset threshold can be set to be 2, and when the number of the target second data slices and the target second check slices with data missing is less than or equal to 2, the target second data slices with complete data and the target second check slices can be utilized to decode and recover the target second data slices with data missing.

211b, if the target second data slice is judged to not meet the second data recovery condition, acquiring the target first data slice in the target first data center.

In the step of this embodiment, a second recovery manner for the missing second data slice corresponding to the step 209b of this embodiment is described, that is, the missing second data slice is recovered by using the target first data slice in the target first data center, so it is necessary to first extract the target first data slice in the target first data center. Further, by executing the embodiment step 212b to the embodiment step 213b, recovery of the missing second data piece is achieved.

212b, if it is determined that the data of the target first data slice is complete, re-dividing the target first data slice into a second data slice and a second check slice according with the second division rule by using the error correction code so as to replace the target second data slice with missing data.

Correspondingly, when the first data sheet in the target first data center is utilized to restore the missing second data sheet, two restoring scenes are included, and the first restoring scene is provided in the step of this embodiment, namely, when the condition that the target first data sheet is complete in data is judged, the target first data sheet can be divided into the second data sheet and the second check sheet which conform to the second division rule through error correction codes, and then the target second data sheet or the target second check sheet which are missing in data is replaced.

213b, if it is determined that the target first data slice is missing and meets the first data recovery condition, decoding and recovering the target first data slice with the missing data by using the first data slice with complete data and the first check slice.

The first data recovery condition is that the number of target first data pieces and target first check pieces of data missing is smaller than or equal to a first preset threshold value. The first preset threshold corresponds to the maximum redundancy of data recovery, namely represents the maximum fault-tolerant number of the first data slices, the first preset threshold can correspond to the number of the first check slices, if the number of the target second check slices is 1, the second preset threshold can be set to be 1, and when the number of the target second data slices and the target second check slices which are missing, is less than or equal to 1, the first data slices with complete data and the first check slices can be utilized to decode and recover the target first data slices. And then the recovered target first data sheet can be divided into a second data sheet and a second check sheet which accord with a second division rule through error correction codes, so that the target second data sheet or the target second check sheet with data missing is replaced.

214b, if the target first data sheet is determined to be missing and the first data recovery condition is not met, outputting alarm information of abnormal data recovery.

For this embodiment, if it is determined that the target first data slice with the data loss does not meet the first data recovery condition, that is, when the second data slice with the data loss is recovered, the recovery of the second data slice cannot be achieved by both the above two data recovery methods, an alarm signal may be further output, so as to achieve the purpose of prompting, so that a worker can take a policy to perform timely rush repair.

In a specific application scenario, in order to provide a preferred mode of data storage and recovery, the two modes of reading and recovering the original data file can be combined, and the original data file is obtained by reading the first data slice and the second data slice; when the first data sheet in the first data center is determined to be missing, acquiring data by reading all the complete second data sheets in the first data center; also, when it is determined that the second piece of data in the first data center is missing, data may be acquired by reading the first piece of data in the first data center; after the data is successfully read, the recovery of the missing data piece is realized based on the complete first data piece and the complete first check piece as well as the complete second data piece and the complete second check piece.

In this embodiment, the lost data is stored and recovered by encoding, decoding, mainly based on the reversibility of the cauchy matrix. The method for recovering the first data slice by using the second data slice and the second check slice which are stored normally to decode specifically may include: forming a decoding column vector by using a second data slice and a second check slice which are stored normally; determining an inverse matrix of a matrix formed by rows corresponding to the decoded column vectors in the generating matrix as a decoding generating matrix; the second data slice of the data missing is obtained by calculating the product of the decoding generation matrix and the decoding column vector. Correspondingly, the method for decoding and recovering the first data slice and the first check slice with the data missing by using the first data slice and the first check slice which are stored normally specifically comprises the following steps: forming a decoding column vector by the first data slice and the first check slice which are normally stored; determining an inverse matrix of a matrix formed by rows corresponding to the decoded column vectors in the generating matrix as a decoding generating matrix; the first data slice of the data miss is obtained by calculating the product of the decoding generation matrix and the decoding column vector.

Through the data storage and recovery method, the original data file can be primarily and uniformly divided into a plurality of data blocks, the data blocks are subjected to coding processing based on erasure coding technology, a plurality of first data pieces and first check pieces are obtained, then the first data pieces are coded based on erasure coding technology, each first data piece is divided into a plurality of second data pieces and second check pieces again, each first data piece is stored in different first data centers, and each first check piece is stored in different second data centers; and respectively storing the second data slices and the second check slices in different storage units in the first data center. The original data file can be read in two ways, the first way: the original data file is obtained by sequentially reading the first data pieces in each first data center. The second reading mode: and acquiring the original data file by sequentially reading the second data pieces corresponding to the first data pieces. The two reading modes correspond to two data recovery modes, and the first recovery mode is as follows: when the first data sheet with the missing is judged, the original first data sheet can be obtained by reading all the complete second data sheets in the first data center, so that replacement recovery of the missing first data sheet is realized. In addition, when the second data sheet in the first data center is judged to have data missing and cannot be recovered, decoding recovery can be carried out through the first data sheet with complete data and the first check sheet. The second recovery mode: when the second data sheet with the missing is judged, decoding recovery can be carried out through the second data sheet with complete other data in the first data center and the second check sheet. And when the recovery is not realized, acquiring the corresponding first data sheet, and carrying out coding division on the second data sheet and the second check sheet again by utilizing the first data sheet so as to replace and recover the missing second data sheet. Through the two data reading modes provided by the scheme and the two recovery modes for the missing data, double protection of service data can be realized, and when data is missing in the data reading process, the reading mode can be replaced in time, so that the integrity and the safety of the service data are effectively ensured. In addition, the scheme can store and recover the whole business data, particularly to the storage and recovery of the data sheet, and provides various recovery modes, so that the efficiency of data recovery can be ensured, the recovery time can be saved, and the data recovery cost can be reduced.

Further, as an embodiment of the method shown in fig. 1 and fig. 2, an apparatus for storing and recovering data is provided in an embodiment of the present application, as shown in fig. 3, where the apparatus includes: the device comprises an acquisition module 31, a processing module 32, a storage module 33 and a recovery module 34.

An acquisition module 31 operable to acquire a plurality of data blocks uniformly divided by the original data file;

a processing module 32, configured to encode and process the data block into a plurality of data slices and check slices based on the erasure codes;

a storage module 33, which is used for storing the original data file by using the data slice and the check slice;

the recovery module 34 may be configured to recover the original data file by decoding the data slice and the check slice that meet the predetermined condition if the original data file is determined to be missing.

In a specific application scenario, in order to encode and process a data block into a plurality of data slices and check slices, the processing module 32 is specifically configured to encode the data block by using erasure codes, and divide the original data file into a first data slice and a first check slice with the same size according to a first division rule; and performing secondary coding on the first data slices by using erasure codes, and dividing each first data slice into second data slices and second check slices with the same size according to a second division rule.

Correspondingly, in order to ensure the security and the recoverability of the data storage, the storage module 33 may be specifically configured to store each first data slice in a different first data center, and store each first check slice in a different second data center; and respectively storing the second data slices and the second check slices in different storage units in the first data center.

In a specific application scenario, in order to provide two ways of determining that an original data file is missing, as shown in fig. 4, the apparatus further includes: a determination module 35.

The obtaining module 31 is further configured to obtain an original data file by reading all the first data pieces; and/or obtaining the original data file by reading all the second data pieces;

the determining module 35 may be configured to determine that the original data file is missing if it is determined that there is a missing first data slice or second data slice.

Correspondingly, in order to obtain the original data file by reading all the first data pieces, the obtaining module 31 is specifically configured to obtain a first sequence number of each first data piece, where the first sequence number corresponds to an arrangement sequence of the first data piece assembled original data file; the first data piece is read in the order of the first sequence number from small to large to obtain the original data file.

Correspondingly, in order to obtain the original data file by reading all the second data pieces, the obtaining module 31 is specifically configured to obtain the second sequence number of each second data piece, where the second sequence number corresponds to the arrangement sequence of the second data piece assembled original data file; the second data piece is read in a second sequence number from small to large to obtain the original data file.

In a specific application scenario, in order to decode and recover the original data file by using the data slice and the check slice according with the preset condition when it is determined that the first data slice has data loss, the recovery module 34 is specifically configured to determine a target first data center in which the first data slice has data loss; extracting all target second data slices and target second check-up slices in the target first data center; if the target second data sheet is judged to be complete in data, the target second data sheet is read according to the second sequence number so as to acquire a target first data sheet with data missing; if the target second data sheet is judged to have data missing and accords with a second data recovery condition, decoding and recovering the target second data sheet with the data missing by utilizing the target second data sheet with complete data and the target second check sheet, and reading the recovered target second data sheet according to a second serial number so as to obtain a target first data sheet, wherein the second data recovery condition is that the number of the target second data sheet with the data missing and the target second check sheet is smaller than or equal to a second preset threshold value; if the target second data sheet is determined to have data missing and does not meet the second data recovery condition, acquiring a first data sheet and a first check sheet with complete data; if the target first data sheet meets the first data recovery condition, decoding and recovering the target first data sheet by using the first data sheet and the first check sheet with complete data, wherein the first data recovery condition is that the number of the target first data sheet and the target first check sheet with missing data is smaller than or equal to a first preset threshold value; and if the target first data sheet is judged to be not in accordance with the first data recovery condition, outputting alarm information of abnormal data recovery.

Correspondingly, in order to decode and recover the original data file by using the data slice and the check slice which meet the preset conditions when the second data slice is determined to have data missing, the recovery module 34 is specifically configured to determine the target first data center where the second data slice has data missing; acquiring a target second data sheet and a target second check sheet with complete data in a target first data center; if the target second data sheet with the data loss accords with the second data recovery condition, decoding and recovering the target second data sheet by utilizing the target second data sheet with the complete data and the target second check sheet, wherein the second data recovery condition is that the number of the target second data sheet with the data loss and the target second check sheet is smaller than or equal to a second preset threshold value; if the target second data sheet is judged to be not in accordance with the second data recovery condition, acquiring a target first data sheet in a target first data center; if the first data sheet of the target is judged to be complete, the first data sheet of the target is divided into a second data sheet and a second check sheet which accord with a second division rule by using an error correction code so as to replace the second data sheet of the target; if the target first data sheet is determined to be missing and accords with a first data recovery condition, decoding and recovering the target first data sheet with the missing data by using the first data sheet with complete data and a first check sheet, wherein the first data recovery condition is that the number of the target first data sheet with the missing data and the number of the target first check sheet are smaller than or equal to a first preset threshold value; and if the target first data sheet data is determined to be missing and the first data recovery condition is not met, outputting alarm information of abnormal data recovery.

It should be noted that, for other corresponding descriptions of each functional unit related to the apparatus for storing and recovering data provided in this embodiment, reference may be made to corresponding descriptions in fig. 1 to fig. 2, and details are not repeated here.

Based on the above methods shown in fig. 1 and fig. 2, correspondingly, the embodiments of the present application further provide a storage medium, on which a computer program is stored, where the program is executed by a processor to implement the above methods for storing and recovering data shown in fig. 1 and fig. 2.

Based on such understanding, the technical solution of the present application may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.), and includes several instructions for causing a computer device (may be a personal computer, a server, or a network device, etc.) to perform the method of each implementation scenario of the present application.

Based on the methods shown in fig. 1 and fig. 2 and the virtual device embodiments shown in fig. 3 and fig. 4, in order to achieve the above objects, the embodiments of the present application further provide a computer device, which may specifically be a personal computer, a server, a network device, etc., where the entity device includes a storage medium and a processor; a storage medium storing a computer program; a processor for executing a computer program to implement the method of data storage and retrieval as described above and shown in fig. 1 and 2.

Optionally, the computer device may also include a user interface, a network interface, a camera, radio Frequency (RF) circuitry, sensors, audio circuitry, WI-FI modules, and the like. The user interface may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), etc., and the optional user interface may also include a USB interface, a card reader interface, etc. The network interface may optionally include a standard wired interface, a wireless interface (e.g., bluetooth interface, WI-FI interface), etc.

It will be appreciated by those skilled in the art that the computer device structure provided in this embodiment is not limited to this physical device, and may include more or fewer components, or may combine certain components, or may be arranged in different components.

The non-volatile readable storage medium may also include an operating system, a network communication module, etc. The operating system is a program of physical device hardware and software resources for video shot cut, supporting the execution of information handling programs and other software and/or programs. The network communication module is used for realizing communication among components in the nonvolatile readable storage medium and communication with other hardware and software in the entity device.

From the above description of the embodiments, it will be apparent to those skilled in the art that the present application may be implemented by means of software plus necessary general hardware platforms, or may be implemented by hardware. By applying the technical scheme, compared with the prior art, the method can divide the original data file into a plurality of data blocks preliminarily and uniformly, encode the data blocks based on the erasure coding technology to obtain a plurality of first data pieces and first check pieces, encode the first data pieces based on the erasure coding technology, divide each first data piece into a plurality of second data pieces and second check pieces again, store each first data piece in different first data centers, and store each first check piece in different second data centers; and respectively storing the second data slices and the second check slices in different storage units in the first data center. The original data file can be read in two ways, the first way: the original data file is obtained by sequentially reading the first data pieces in each first data center. The second reading mode: and acquiring the original data file by sequentially reading the second data pieces corresponding to the first data pieces. The two reading modes correspond to two data recovery modes, and the first recovery mode is as follows: when the first data sheet with the missing is judged, the original first data sheet can be obtained by reading all the complete second data sheets in the first data center, so that replacement recovery of the missing first data sheet is realized. In addition, when the second data sheet in the first data center is judged to have data missing and cannot be recovered, decoding recovery can be carried out through the first data sheet with complete data and the first check sheet. The second recovery mode: when the second data sheet with the missing is judged, decoding recovery can be carried out through the second data sheet with complete other data in the first data center and the second check sheet. And when the recovery is not realized, acquiring the corresponding first data sheet, and carrying out coding division on the second data sheet and the second check sheet again by utilizing the first data sheet so as to replace and recover the missing second data sheet. Through the two data reading modes provided by the scheme and the two recovery modes for the missing data, double protection of service data can be realized, and when data is missing in the data reading process, the reading mode can be replaced in time, so that the integrity and the safety of the service data are effectively ensured. In addition, the scheme can store and recover the whole business data, particularly to the storage and recovery of the data sheet, and provides various recovery modes, so that the efficiency of data recovery can be ensured, the recovery time can be saved, and the data recovery cost can be reduced.

Those skilled in the art will appreciate that the drawings are merely schematic illustrations of one preferred implementation scenario, and that the modules or flows in the drawings are not necessarily required to practice the present application. Those skilled in the art will appreciate that modules in an apparatus in an implementation scenario may be distributed in an apparatus in an implementation scenario according to an implementation scenario description, or that corresponding changes may be located in one or more apparatuses different from the implementation scenario. The modules of the implementation scenario may be combined into one module, or may be further split into a plurality of sub-modules.

The foregoing application serial numbers are merely for description, and do not represent advantages or disadvantages of the implementation scenario. The foregoing disclosure is merely a few specific implementations of the present application, but the present application is not limited thereto and any variations that can be considered by a person skilled in the art shall fall within the protection scope of the present application.

Claims

1. A method of data storage and retrieval, comprising:

encoding the data block into a plurality of data slices and check slices based on erasure codes, and specifically comprising:

encoding the data block by using erasure codes, and dividing the original data file into first data pieces and first check pieces with the same size according to a first division rule, wherein the data pieces comprise the first data pieces, and the check pieces comprise the first check pieces;

Performing secondary coding on the first data slices by using erasure codes, dividing each first data slice into second data slices and second check slices with the same size according to a second division rule, wherein the data slices comprise the second data slices, and the check slices comprise the second check slices;

storing the original data file by using the data slice and the check slice;

2. The method according to claim 1, wherein storing the original data file using the data slice and the check slice comprises:

storing each first data slice in different first data centers, and storing each first check slice in different second data centers;

and respectively and correspondingly storing the second data sheet and the second check sheet in different storage units in the first data center.

3. The method according to claim 2, wherein before decoding and recovering the original data file using the data slice and the check slice meeting a preset condition, further comprising:

Acquiring the original data file by reading all the first data pieces; and/or

Acquiring the original data file by reading all the second data pieces;

and if the first data sheet or the second data sheet which is missing is judged, determining that the original data file is missing.

4. A method according to claim 3, wherein said obtaining said original data file by reading all of said first data pieces comprises:

acquiring a first sequence number of each first data sheet, wherein the first sequence number corresponds to the arrangement sequence of the first data sheets assembled with the original data file;

reading the first data sheet according to the first sequence number from small to large so as to acquire the original data file;

the obtaining the original data file by reading all the second data slices specifically includes:

acquiring a second serial number of each second data sheet, wherein the second serial number corresponds to the arrangement sequence of the original data files assembled by the second data sheets;

and reading the second data sheet according to the second sequence number from small to large so as to acquire the original data file.

5. The method of claim 4, wherein if it is determined that the first data slice has a data loss, the decoding using the data slice and the check slice that meet a predetermined condition restores the original data file, comprising:

determining a target first data center in which the first data slice data is missing;

extracting all target second data slices and target second check-up slices in the target first data center;

if the target second data sheet is judged to be complete in data, the target second data sheet is read according to the second sequence number so as to acquire a target first data sheet with data missing;

if the target second data sheet is judged to have data missing and meets a second data recovery condition, decoding and recovering the target second data sheet with the data missing by utilizing the target second data sheet with complete data and a target second check sheet, and reading the recovered target second data sheet according to the second serial number so as to obtain the target first data sheet, wherein the second data recovery condition is that the number of the target second data sheet with the data missing and the number of the target second check sheet are smaller than or equal to a second preset threshold value;

If the target second data sheet is judged to have data missing and does not meet the second data recovery condition, acquiring a first data sheet and a first check sheet with complete data;

if the target first data sheet meets a first data recovery condition, decoding and recovering the target first data sheet by using the first data sheet and a first check sheet with complete data, wherein the first data recovery condition is that the number of the target first data sheet and the target first check sheet with missing data is smaller than or equal to a first preset threshold value;

and if the target first data sheet is judged to be not in accordance with the first data recovery condition, outputting alarm information of abnormal data recovery.

6. The method of claim 4, wherein if it is determined that the second data slice has a data loss, the decoding using the data slice and the check slice that meet a predetermined condition restores the original data file, specifically comprising:

determining a target first data center in which the second data slice data is missing;

acquiring a target second data sheet and a target second check sheet with complete data in the target first data center;

if the target second data slice with the missing data accords with a second data recovery condition, decoding and recovering the target second data slice by utilizing the target second data slice with the complete data and a target second check slice, wherein the second data recovery condition is that the number of the target second data slice with the missing data and the target second check slice is smaller than or equal to a second preset threshold value;

If the target second data sheet is judged to be not in accordance with the second data recovery condition, acquiring a target first data sheet in the target first data center;

if the target first data sheet is judged to be complete in data, the target first data sheet is divided into a second data sheet and a second check sheet which accord with the second division rule by using an error correction code so as to replace the target second data sheet;

if the target first data sheet is determined to be missing and accords with a first data recovery condition, decoding and recovering the target first data sheet with the missing data by using a first data sheet with complete data and the first check sheet, wherein the first data recovery condition is that the number of the target first data sheet with missing data and the number of the target first check sheet are smaller than or equal to a first preset threshold;

and if the target first data sheet data is determined to be missing and the first data recovery condition is not met, outputting alarm information of abnormal data recovery.

7. An apparatus for data storage and retrieval, comprising:

the processing module is used for encoding and processing the data block into a plurality of data slices and check slices based on erasure codes, and specifically comprises the following steps:

8. A non-transitory readable storage medium having stored thereon a computer program, wherein the program when executed by a processor implements the method of data storage and retrieval of any of claims 1 to 6.

9. A computer device comprising a non-volatile readable storage medium, a processor and a computer program stored on the non-volatile readable storage medium and executable on the processor, characterized in that the processor implements the method of data storage and retrieval according to any one of claims 1 to 6 when executing the program.